New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Comparison with Other Frameworks #2717
Conversation
The errors that occurred in the Travis CI are now fixed in the master branch (#2716). So, the problem will be solved if you merge the latest master branch. |
I am sorry to make you wait. We are now working on the tasks related v2, which will be released on 30th May. and could not take time to review it. Is it OK to review it after the release of v2? |
No problem. Looking forward to v2 release! |
Sorry for making you wait. Now I can work on this PR. |
I was wondering if we might want to include another row in the table for acceleration using anything other than cuDNN (such as OpenCL and/or other hardware platform support), but I am not aware of any frameworks with actively-developed support. |
Review status
|
@jekbradbury Could you let us know it if you had some idea in mind which framework we should pick up or not in making the table? (I should have discussed the matter before assigning reviewers and starting to review each column.). |
I tried to include every actively developed full-featured deep learning framework that isn't just a wrapper or frontend around another framework. All of them are open source and either used fairly widely by researchers or supported by a major company, with the exception of Darknet, which is prominent in the area of visual object detection and useful if you want to use pure C. BTW I'd describe non-CUDA GPU support as: |
Thank you for your comment on the choice of DL frameworks. It seems reasonable for me. For the support of non-CUDA GPU, I have no idea for now to what extent we should mention non-official forked repositories as we can expect there are many forks that widen the non-CUDA GPU support and it could be difficult to determine where to draw a line. Do you have any idea? |
Could you let us know what's the difference between "full" and "partial" in CNNs/RNNs rows? |
It's not particularly precise, but essentially I wanted to distinguish between frameworks that aim to support all major variants/uses of CNNs/RNNs (it may not be easy to write them in the framework, but it is at least possible) and frameworks that have more limited support. For example, Caffe was not designed with NLP in mind and their RNN support is not flexible or customizable; DyNet is an NLP-focused framework that recently added basic convolution and pooling layers but wouldn't be a good choice if you want to write a complex computer vision model. A few frameworks don't intend to support certain use cases at all (e.g. Thinc is only for NLP and Darknet is only for computer vision). |
Also, I don't think we need to mention non-CUDA at least until AMD officially announces their ports (which will be the first performance-competitive, well-supported deep learning frameworks for non-NVidia GPUs) -- right now they're still in progress on GitHub. |
I checked the table for MXNet, Torch7, and Thinc and think it looks fine. I did not see any inaccurate information. |
Thank you for the PR! Could you tell me what "Per-batch architectures" means? |
That row was in the original version of the comparison table. It means that the framework is capable of building a totally different network structure for each batch; that's essentially the same thing as define-by-run but it emphasizes what you can do with it. |
Oh, sorry, I didn't notice that. OK, now I understand it! Thank you for the kind explanation :) |
Hi, these are my comments for my assignment. Please point me out if anything is incorrect. General:
Theano-based:
Caffe1/2:
Darknet:
|
I came up a same question when I was checking PyTorch. Why do you think Multiprocessing support of PyTorch is partial? |
Thanks for the detailed feedback, and for catching a bunch of mistakes! I'll fix the cells I was wrong about soon. Here are some clarifications: What are the differences among "Multi-GPU ~ parallelism", "Multiprocessing", and "Distributed training"? In "CPU/GPU backend", "custom" could be misunderstood as that "users can use their own custom backend". How about writing as "native" instead? Theano-based: Caffe1/2: Darknet: |
I described multiprocessing support in PyTorch as partial because it's very difficult (I don't think anyone's made it work yet) to use the torch multiprocessing module to build synchronous multi-GPU training similar to |
On DL4J:
|
I called DL4J's RNN support "partial" because it offers three kinds of RNNs (BaseRecurrent and uni- and bidirectional LSTMs) that are not intended to be modified/customized by the user. In order to add another one, a user would have to implement both the forward and backward passes as raw array operations. |
@jekbradbury the cudnn support for RNNs has just a bit more work to finish: deeplearning4j/deeplearning4j#3339 (mainly just lack of bandwidth) As for the autodiff component: https://github.com/deeplearning4j/nd4j/pull/1750 You can find more on that here. I am intending nd4j to be the "chainer/torch" equivalent. DL4j is likely going to stay higher level closer to keras. As for runtime debugging, yes it's equivalent if not better than python in this department. The JVM actually supports remote debugging via intellij/eclipse etc and your favorite tools. to do runtime debugging. You have to explicitly expose it on a port though. Then it's just like your local debugger. The linear stack is just plain wrong though. http://deeplearning4j.org/compgraph allows anything you want. We will be combining this with the autodiff support in there to be flexible just like the other frameworks in this case. We will also have a "computation graph" in our auto diff as well.This one will be the traditional "computation graph" with just raw math ops defined with optimizations and the like just like tf/theano/torch etc. A "graph" + a "workspace" (http://deeplearning4j.org/workspaces) is the equivalent of a "tensorflow session". This will allow for near gc free workloads (due to buffer reuse) across a grph. Scrolling up seeing some of the other comparisons, I'll also just briefly touch on multi gpu (most folks never get this right). For single node training we support parallelwrapper: which is basically a data parallel implementation that supports the same knobs our spark implementation does (it makes some assumptions about single node and the like though) Both of these support any arbitrary neural net config. |
Adding to @agibsonccc's earlier comment
Depends on your definitions of "partial" and "modified/customized". :) |
Thanks for the clarifications, Adam and Alex! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The columns for the frameworks I reviewed (Torch7, mxnet, Thinc) LGTM.
About Knet
|
@jekbradbury Thank you for updating the table. Also thank you @agibsonccc and @AlexDBlack for your invaluable comments. |
About PaddlePaddle
|
For CPU/GPU backend, it could be better to fill something in if the frameworks implements array libraries by themselves but is not named (like CuPy). For example, "native" or the same name as the frameworks (as is done in the neon column). |
About neon
|
The reason I listed PaddlePaddle's cuDNN support as "full" is because they wrap everything in cuDNN except the RNN functions, but they implement their own time-fused, cuDNN-like RNN kernels instead (I believe this is because they wrote them before cuDNN RNNs were available). So those are likely to be competitive with cuDNN in performance, which is not the case with most other frameworks' non-cuDNN RNN kernels. |
Hey folks - Just watching this thread here. http://nd4j.org/backend.html Our equivalent for dl4j is a tensor lib called nd4j. CPU and gpu are supported Basic pitch is "hardware as a jar file" rather than compile/link. The c++ internals are: https://github.com/deeplearning4j/libnd4j - 1 code base for cpu/gpu (mostly shared business logic for tensor primitives) |
LGTM for the frameworks I reviewed. |
@jekbradbury |
Thank you for fix! |
jenkins, test this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one comment
docs/source/comparison.rst
Outdated
.. [6] Also available in the `Torch RNN package <https://github.com/Element-Research/rnn>`_ | ||
.. [7] Via `Platoon <https://github.com/mila-udem/platoon/>`_ | ||
.. [8] `Experimental as May 2016 <http://deeplearning.net/software/theano/tutorial/using_multi_gpu.html>`_ | ||
This table compares Chainer with other actively developed deep learning frameworks. Content is current as of May 2017. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change from May to July
test passed |
As promised in #2685 I have updated the framework comparison table with (almost?) every actively developed deep learning framework and several new axes of comparison. Let me know if anything seems inaccurate or irrelevant.