- Usage
- Dependency
- Computational Graph
- Input
- The Model
- Minimizing loss and Improving Accuracy
- Changing weight and bias
- Downlaod softmax.py
- Execute with
python softmax.py
. - Visualize with
tensorboard --logdir /tmp/tboard/softmax
You can expect an accuracy of about 91% +
in test image.
That's It!
- tensorflow 1.4 (and tensorboard for visualization)
- developed and tested in python3.6
- you need to have write permission in /tmp (who doesn't)
softmax.py is an implementation of a linear classifier with softmax link function. Following is the computational graph
Following is the input dataset used as an example. It is MNIST dataset containing hand written digits. The code utilized tensorflow inbuild modules to download the data in your working directory. Input are of 28x28x1 dimension that is flattened out before using in the model.
nist = mnist_data.read_data_sets("data",one_hot=True, reshape=False, validation_size=0)
It is a linear classifer model with softmax link function for multiclass classification
y = tf.nn.softmax(tf.matmul(xx,w)+b)
We use the cross entropy loss function. Note that, due to one-hot encoding of Y and taking log, numerically the loss value becomes very small. Miniming such a loss function becomes numerically very challenging in implmentation. Hence, we scaled the loss by multiplying with 1000.
cross_entropy = -tf.reduce_mean(y_*tf.log(y)) * 1000
We utilize a mini-batch technique to calculate gradient desend. Batch size is 100 images. So, in every step towards desend, we use a differnt set of images(instead of all the image) to calculate the direction of desend. That is why you can see some variance in the loss function output in training phase. However, in test phase we do not see such variance. Below are graphs showing how log is decreasing and accuracy is increasing in both training phase (blue) and test phase(red).
And the zoomed in version
Learning rate (hyper parameter in our model) tuning is achieved but runnign the computational graph with multiple setting. Following is a comparision of train and test perfromance of various learning rate.
58 for learning_rate in [0.002,0.005,0.008]:
59 # init
60 init = tf.global_variables_initializer()
61 sess = tf.Session()
62 sess.run(init)
63 with tf.name_scope('train'):
64 train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
Impact of different larning rate on test dataset
Note, with large learning rate (0.008) loss function taken large step but often did not go to the direction of decreasing loss (hence the spikes), as expected. The smallest learning rate 0.002 had a very smooth decend to the minima.
Learning rate of 0.005 seems to be a good compromise between speed and accuracy of minimizing the loss function.
We can observe how our weights and biases have changed over multiple steps (1000 in this case) of the training phase as losses minimizes. Note, most of our weights are closed to zero as there are very few pixels that actually contribute in making a digit (most of the pixels forms the background).
yet another useful view