Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Added SELU, ELU and Swish as activation functions for Gluon Interface #9111

Closed
wants to merge 67 commits into from
Closed

Conversation

anjishnu
Copy link
Contributor

@anjishnu anjishnu commented Dec 17, 2017

Description

added SELU and ELU activation functions as recommended on this thread #8422

Checklist

Essentials

  • Passed code style checking (make lint)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • [x ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@dongzhuoyao
Copy link
Contributor

well done!

Kumar and others added 2 commits December 17, 2017 23:00
- Use module API instead of the deprecated model API
- Reduce the number of epochs to 10 (which is sufficient)
- When using GPU, use GPU 0 and not GPU 1. Many machines will have only one GPU
- Use CPU by default. Make it easy for user to switch to GPU is needed.
@kobenaxie
Copy link

Why not use 'HybridBlock' ?

@chinakook
Copy link
Contributor

chinakook commented Dec 18, 2017

How about Swish(https://arxiv.org/abs/1710.05941) activation function?

@anjishnu
Copy link
Contributor Author

anjishnu commented Dec 18, 2017

@kobenaxie I'm not too sure about Blocks vs. HybridBlocks - what would be the difference? If it's just changing the class I inherit from, I can do that right now.

@chinakook That's a great idea! - Once this is merged we can extend it with other activations.

@cjolivier01 I added the license, and it seems the test is failing now because of non-ascii characters in the author's name in the comment section, even though I've specified utf-8 encoding do I need to remove those as well?

srochel and others added 23 commits December 18, 2017 10:16
Added assertion to validate achieved accuracy.
* Update for MXNet 1.0. PEP8 fixes to code and misc improvements. Tested under Python 2.7 and Python 3.6 on Sagemaker.

* Remove defunct tutorial page

* Remove defunct demo

* Remove duplicate material
* add shared storage in windows

* fix

* lint

* fix

* fix

* fix

* fix process.h
* Updating the Python readme

* Addressing PR comments for /python dir
* Example updates to make it work on Python3, free of lint issues, more clear and easy to use

* Addressing PR comments for CFN-xs example updates
* Update linear-regression.md

Reduce number of epoch to 20, as validation accuracy doesn't improve any further.
Added assertion to check for achieved accuracy in preparation for tutorial regression.

* fixed location of assertion

moved assertion to correct location
* More details to the windows build process

* Removed the package instructions.
* csr slice, gpu implementation

* update comments

* test already exists

* common impl of csr slice on dim one

* remove unnecessary stream->wait

* add doc

* trigger
* [BugFix][CoreML Converter] Dense layers w/o bias.

The code was initially assuming bias to be true for dense layer. This change fixes that. Also, added a unit test to verify the change.

This is related to issue: #8628

* attr -> attrs.
* usability improvements
support for py3 and mac

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* more py3 changes

* fix cpickle changes py3

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* fix cpickle changes py3

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* fix cpickle changes py3

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* change float used to index to int

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* change float used to index to int

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* change float used to index to int

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* fix xrange

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* import compatibility

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* int numpy deprecation

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* add note about future

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* cpu and windows messages

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* change cpickle to pickle for both py2 and py3

Signed-off-by: Rahul <rahulhuilgol@gmail.com>

* remove six dependency

Signed-off-by: Rahul <rahulhuilgol@gmail.com>
* Usability improvements for some examples

* some more modifications

* formatting fixes

* shape fixed

* comments added

* fix

* fix

* comments addressed

* fix

* num-gpus changed to 1
* Some changes to make dqn compatible with python 3

* Update README

* union of dict_items does not work in python2.7. Change to list
* add zero-grad for rounding ops

* Update test_symbol.py
* changed url references from dmlc to apache/incubator-mxnet

* updated to cuda9 and cudnn7
* warning_autotune

* fix
* Fixed documentation in mnist tutorial
* dimensions of weight and bias were incorrectly reported and the formula for FC layer didn't mention that weight is transposed.

* Added a brief description of broadcasting.

* Added a link to sym.broadcast_to() call that explains broadcasting and also links to numpy broadcasting semantics.
* Included a brief conceptual explanation of broadcasting

* Fixed a grammar error in the description of MLPs.
* Improve usability for the bilstm example

* Remove argparse from infer_sort since it changes existing usage
* Add unittest for float16 min and max

* Add mshadow fix
* Add wikitext-2 data for rnnlm example in gluon.

* Add Wikitext2 for rnnlm.

* Add performance data in WikiText-2.
@anjishnu
Copy link
Contributor Author

anjishnu commented Dec 25, 2017

This commit is becoming a bit of a mess.
I have only changed the following files:

tests/python/unittest/test_gluon_contrib.py
incubator-mxnet/python/mxnet/gluon/contrib/activations.py
incubator-mxnet/python/mxnet/gluon/contrib/__init__.py

I tried to merge with upstream/master and all those commits have appeared and are polluting this PR. The python2 tests are also failing for some reason which seems completely unrelated to my changes, does anyone know what might be happening here?

@mseeger
Copy link
Contributor

mseeger commented Dec 28, 2017

This is all a bit of a mess. Why have so many commits? This should be a single commit. Please squash your commits and use rebase instead of merge.

@piiswrong
Copy link
Contributor

The original author seems to have disappeared.

Can one of the committers take over?
@yzhliu @szha

@szha szha self-assigned this Jan 31, 2018
@szha szha mentioned this pull request Feb 1, 2018
8 tasks
@szha
Copy link
Member

szha commented Feb 1, 2018

working on it in the above PR.

@szha szha closed this Feb 1, 2018
@anjishnu
Copy link
Contributor Author

anjishnu commented Feb 1, 2018

Sorry guys, I got a bit busy on some deadlines - need more practice with github PRs and making clean and concise.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet