CNN using swish

Swish was introduced on Oct 2017 as an alternative activation function to relu. Swish was found using a combinaton of exhaustive search and reinforcement learning. In the originial paper [1], swish had demostrated an improvement of top-1 classification by ImageNet by 0.9% by simply replacing all relu activation functions with swish. Nonethless, swish is very easy to implement and just writing 1 line of code is enough to implement swish in tensorflow

Example

x1 = tf.nn.conv2d(X, W1, strides=[1,1,1,1], padding='SAME') + B1
Y1 = x1*tf.nn.sigmoid(beta1*x1)# output is 28x28

Results

During the inital phase of training the loss function remains, on average, the same this shows that swish suffers from poor intialisation during training, at least when using initally normal distributed weights with std_dev =0.1.

We were unable to replicate the results reported in the Swish paper, beta1 for us did not converge near 1 maybe because we didn't train our model long enough.

It seems that He initilisation doesn't really help this problem.

After change from SGD to RMSprop we immediately get better results.

Reference

Searching for Activation Functions https://arxiv.org/abs/1710.05941

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
media		media
README.md		README.md
cnn.py		cnn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

media

media

README.md

README.md

cnn.py

cnn.py

Repository files navigation

CNN using swish

Example

Results

Reference

About

Releases

Packages

Contributors 2

Languages

Neoanarika/CNN-using-Swish

Folders and files

Latest commit

History

Repository files navigation

CNN using swish

Example

Results

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages