Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Improve performance of plotting heatmaps #4520

Open
BioTurboNick opened this issue Nov 15, 2022 · 13 comments
Open

[FR] Improve performance of plotting heatmaps #4520

BioTurboNick opened this issue Nov 15, 2022 · 13 comments
Labels
help wanted performance speedups and slowdowns

Comments

@BioTurboNick
Copy link
Member

BioTurboNick commented Nov 15, 2022

Reported on Slack by Jesse Chan, repeated calls to plot! to produce overlying heatmaps was very slow.

const mat = randn(512, 512)

function f(x, n)
    plot()
    for _ in 1:n
        heatmap!(x)
    end
end

f(mat, 100)

Allocation profiler shows the large majority of allocations occurring in heatmap_edges:
image

And the large majority of computation time in expand_extrema! (which calls heatmap_edges) and update_clims (my old nemesis):
image

@t-bltg
Copy link
Member

t-bltg commented Nov 22, 2022

@jlchan, can you check if Plots#master improves the current or if it is still unacceptably slow ?

@jlchan
Copy link

jlchan commented Nov 22, 2022

Thanks @t-bltg. Using @BioTurboNick's snippet above with n=100, on Plots v1.36.1, I get

julia> @btime f($mat, 100)
  511.680 ms (290431 allocations: 215.73 MiB)

On Plots#master

julia> @btime f($mat, 100)
  509.745 ms (81638 allocations: 207.71 MiB)

@t-bltg
Copy link
Member

t-bltg commented Nov 22, 2022

Thanks, it's disappointing.
I'll give it another try later, memory allocation is just insane on my end.

@BioTurboNick
Copy link
Member Author

Thanks for tackling this! I was going to poke around this week/weekend. Can still if you get stumped.

@jlchan
Copy link

jlchan commented Nov 22, 2022

I appreciate it! FYI I've found a workaround by adding another custom recipe. However, improving multiple plot! times would certainly help when I teach Julia to students who are coming from Matlab.

@t-bltg
Copy link
Member

t-bltg commented Nov 22, 2022

Can still if you get stumped.

There is still room !
I'm filling up 64Gb RAM here, and starting to swap with that example.
Don't know why gc doesn't jump in though 🤔.

@BioTurboNick
Copy link
Member Author

BioTurboNick commented Nov 22, 2022

Haha sorry. My system has 48 GB of RAM and caps out at 21 GB consumed by Julia. But you could easily cut the number down, Or cut the matrix size by half, too. I just chose a reasonably large one to show the effect, and I didn't pay attention to RAM usage haha.

Manually running GC doesn't reduce the number here.

Just holding on to all the plot series values looks like 19.5 GB of RAM.

@t-bltg
Copy link
Member

t-bltg commented Nov 23, 2022

Let's focus on n = 100 as of now (I've edited the examples).

The problem in this loop is that we accumulate series, so when exporting to png or display the plot, we traverse 100 stored series. And explains why gc doesn't jump in, since we explicitly sore series.
This doesn't make sense when using heatmaps, and interferes with the performance analysis here.

I've added the ability to empty the Plot's series in #4543.

So now the example looks like:

using BenchmarkTools, Plots

const mat = randn(512, 512)

f(x, n) = begin
  pl = plot()
  for _ in 1:n
      empty!(pl)  # clear out previous series
      heatmap!(x)
  end
  pl
end

@btime f($mat, 100)  #  597.178 ms (42561 allocations: 206.52 MiB)

It still allocates too much, but we can now focus on Plots internals to speed things up.

@BioTurboNick
Copy link
Member Author

BioTurboNick commented Nov 23, 2022

The motivation is actually to allow accumulating series here. They don't overlap in his use case, the example is just to emulate generating many series.

image

@t-bltg
Copy link
Member

t-bltg commented Nov 23, 2022

The motivation is actually to allow accumulating series here

Hum, @jlchan it would be better imo to have provide a contained mwe showing a real case scenario then.

@BioTurboNick
Copy link
Member Author

My suspicion is the problem is generic to number of series and size of data per series.

@jlchan
Copy link

jlchan commented Nov 23, 2022

Yeah, I observe a similar phenomena when plotting many series over each other.

An example of where I usually see this is creating a scatter plot from a matrix

x = randn(100,1000)
y = randn(100,1000)
z = randn(100,1000)
scatter(x, y, zcolor=z)

This longer than scatter(vec(x), vec(y), zcolor=vec(z)), I think because a new plot series is created for each column.

My other use cases are similar, usually I loop over a list containing plot data input. I've been concatenating my plot arrays into one large array to avoid this issue.

@jlchan
Copy link

jlchan commented Nov 23, 2022

I can add a MWE involving triangulations if that's helpful, but as @BioTurboNick said I don't think the issue is unique to heatmap plots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted performance speedups and slowdowns
Projects
None yet
Development

No branches or pull requests

3 participants