[BUG] Unexpectedly long phases when training a neural network #475

avarsh · 2020-10-18T21:56:09Z

Description
I created a small neural network comparing both accelerate and hmatrix to perform the matrix calculations, and trained it for 100 epochs (iterations), but found that it took several seconds to train on the accelerate backend, as opposed to a few milliseconds when using hmatrix.

With 2 epochs, the debug output from the code was: https://gist.github.com/avarsh/8bdb89f80d3987c9f4aea52c4d7a7149 while with 100 epochs of training, the output becomes: https://gist.github.com/avarsh/89de99cd4869649f523db681105a90b7. In the latter output, some phases, such as array-fusion, take an unexpectedly long amount of time. Another test was done where CPU.run was called on the weights and biases arrays resulting in some improvement, but still exhibiting higher than expected times for some phases, particularly at the end of the training - see the following truncated output: https://gist.github.com/avarsh/cc976140767252f6e3ba81d7efe50323

Steps to reproduce
Run the code provided here: https://gist.github.com/avarsh/286f06133787e64e74574f86f3cf8bf4 (does not call CPU.run on the arrays at each step of training), and https://gist.github.com/avarsh/58373df585b6ef64a36e2c4b90c85206 (calls CPU.run).

Expected behaviour
This program is not expected to take longer than a second to run on the machine - training is expected to be able to occur for ~1000 epochs within a second.

Your environment
Run on a Intel i5-6500 CPU (running single threaded).

Accelerate: 1.3.0.0
Accelerate backend(s): LLVM-Native
GHC: 8.10.2
OS: Arch Linux

tmcdonell · 2020-10-19T15:47:49Z

Because train is a recursive Haskell function, this amounts to loop unrolling in the Accelerate program; you are just building up a larger and larger embedded program, which the rest of the compiler then has to chew on (for no good reason). What you should do is either:

rewrite this into a loop in the embedded program (using awhile); or
split the loop body into a function you feed to runN (to compile once) and then repeatedly apply it (via Haskell recursion)

Either of those should work around your immediate problem. Of course I'd still like to improve the performance of the compiler internals, but that will take much more time than for you to just give it a simpler program to begin with.

For reference here's the -ddump-simpl-stats output for one step of the 100 epoc program (you are asking it to do a lot!):

Total ticks: 627376

8744 Inline
  8744 Var
25510 RuleFired
  5576 zipWithD
  3984 backpermuteD
  2800 aletD/float
  2792 generateD
  2788 replicateD
  2384 aletD/bind
  1993 mapD
  1992 x*1
  800 aletD/eliminate
  397 commutes (*)
  4 reshapeD
34544 BetaReduce
  34544 inline exp
485729 Substitution
  199868 rebuild
  175976 weakenE
  60512 shrinkE
  32269 weaken
  8744 inline
  5172 strengthenE
  2392 replaceE/shape
  796 replaceE/!
72849 SimplifierDone
  72849

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unexpectedly long phases when training a neural network #475

[BUG] Unexpectedly long phases when training a neural network #475

avarsh commented Oct 18, 2020

tmcdonell commented Oct 19, 2020

[BUG] Unexpectedly long phases when training a neural network #475

[BUG] Unexpectedly long phases when training a neural network #475

Comments

avarsh commented Oct 18, 2020

tmcdonell commented Oct 19, 2020