Skip to content

Neoanarika/CNN-using-Swish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

CNN using swish

Swish was introduced on Oct 2017 as an alternative activation function to relu. Swish was found using a combinaton of exhaustive search and reinforcement learning. In the originial paper [1], swish had demostrated an improvement of top-1 classification by ImageNet by 0.9% by simply replacing all relu activation functions with swish. Nonethless, swish is very easy to implement and just writing 1 line of code is enough to implement swish in tensorflow

Example

x1 = tf.nn.conv2d(X, W1, strides=[1,1,1,1], padding='SAME') + B1
Y1 = x1*tf.nn.sigmoid(beta1*x1)# output is 28x28

Results

alt text

During the inital phase of training the loss function remains, on average, the same this shows that swish suffers from poor intialisation during training, at least when using initally normal distributed weights with std_dev =0.1.

alt text

We were unable to replicate the results reported in the Swish paper, beta1 for us did not converge near 1 maybe because we didn't train our model long enough.

alt text

It seems that He initilisation doesn't really help this problem.

alt text

After change from SGD to RMSprop we immediately get better results.

Reference

  1. Searching for Activation Functions https://arxiv.org/abs/1710.05941

About

CNN trained on MNIST using swish

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages