You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
I created a small neural network comparing both accelerate and hmatrix to perform the matrix calculations, and trained it for 100 epochs (iterations), but found that it took several seconds to train on the accelerate backend, as opposed to a few milliseconds when using hmatrix.
Expected behaviour
This program is not expected to take longer than a second to run on the machine - training is expected to be able to occur for ~1000 epochs within a second.
Your environment
Run on a Intel i5-6500 CPU (running single threaded).
Accelerate: 1.3.0.0
Accelerate backend(s): LLVM-Native
GHC: 8.10.2
OS: Arch Linux
The text was updated successfully, but these errors were encountered:
Because train is a recursive Haskell function, this amounts to loop unrolling in the Accelerate program; you are just building up a larger and larger embedded program, which the rest of the compiler then has to chew on (for no good reason). What you should do is either:
rewrite this into a loop in the embedded program (using awhile); or
split the loop body into a function you feed to runN (to compile once) and then repeatedly apply it (via Haskell recursion)
Either of those should work around your immediate problem. Of course I'd still like to improve the performance of the compiler internals, but that will take much more time than for you to just give it a simpler program to begin with.
For reference here's the -ddump-simpl-stats output for one step of the 100 epoc program (you are asking it to do a lot!):
Description
I created a small neural network comparing both accelerate and hmatrix to perform the matrix calculations, and trained it for 100 epochs (iterations), but found that it took several seconds to train on the accelerate backend, as opposed to a few milliseconds when using hmatrix.
With 2 epochs, the debug output from the code was: https://gist.github.com/avarsh/8bdb89f80d3987c9f4aea52c4d7a7149 while with 100 epochs of training, the output becomes: https://gist.github.com/avarsh/89de99cd4869649f523db681105a90b7. In the latter output, some phases, such as array-fusion, take an unexpectedly long amount of time. Another test was done where CPU.run was called on the weights and biases arrays resulting in some improvement, but still exhibiting higher than expected times for some phases, particularly at the end of the training - see the following truncated output: https://gist.github.com/avarsh/cc976140767252f6e3ba81d7efe50323
Steps to reproduce
Run the code provided here: https://gist.github.com/avarsh/286f06133787e64e74574f86f3cf8bf4 (does not call CPU.run on the arrays at each step of training), and https://gist.github.com/avarsh/58373df585b6ef64a36e2c4b90c85206 (calls CPU.run).
Expected behaviour
This program is not expected to take longer than a second to run on the machine - training is expected to be able to occur for ~1000 epochs within a second.
Your environment
Run on a Intel i5-6500 CPU (running single threaded).
The text was updated successfully, but these errors were encountered: