Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large memory allocation due to sparse tensors #30

Open
lepmik opened this issue Sep 26, 2023 · 5 comments · May be fixed by #31
Open

Large memory allocation due to sparse tensors #30

lepmik opened this issue Sep 26, 2023 · 5 comments · May be fixed by #31
Assignees

Comments

@lepmik
Copy link
Member

lepmik commented Sep 26, 2023

x = torch.zeros((n_neurons, n_steps + T), device=device, dtype=store_as_dtype)

Any reason for why we should not work on sparse tensors?

@JakobSonstebo
Copy link
Collaborator

With the operations we're doing throughout the simulation, I couldn't find a sparse tensor format that didn't need to be converted to dense to perform some of the operations, and thus everything became much slower. Since we need very low precision for the spikes, it occupies a small fraction of total memory usage (compared to the weights), so I decided it was worth it. After the simulation is completed, the idea has been to delegate to the user to save the results as sparse tensors. In case you want to do some post-processing of the spikes, it is handy to have them in dense tensor form before saving. However, if the main usage is just immediately saving the results, returning them as a sparse tensor is probably better. What do you think?

@JakobSonstebo
Copy link
Collaborator

I can try benchmarking the sparse branch now to see how performance is affected. In the beginning of the project I compared this way of storing spikes to the just writing them to a dense tensor and found that the latter was significantly faster, but if memory is a problem (even when using torch.uin8), then it might be worth it.

@JakobSonstebo
Copy link
Collaborator

Untitled

Here is a plot showing the performance. I suspect the rolling of x might be the thing slowing it down, so maybe there is a faster way of "forgetting" the first column?

@lepmik
Copy link
Member Author

lepmik commented Oct 11, 2023

Thank you for running the benchmarks!

I think we have to have the option for sparse iteration at least; for example, when running 100 neurons for 1e8 timesteps, it breaks on a NVIDIA GeForce RTX 3090, which for small timesteps is not that much. We could introduce a parameter sparse=True by default?

I think you are right that the roll is slow, but I can't think of any faster way of doing it. We could potentially see if we can implement a faster way of rolling.

@JakobSonstebo
Copy link
Collaborator

Maybe we could also consider saving to a file during the simulation. That is, for every N steps we save the progress to a file and resume from that point. This way we could limit memory usage and it would be faster that sparsifying at every step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants