COMP9444 Neural Networks and Deep Learning - Japanese Character Recognition

You will be implementing networks to recognize handwritten Hiragana symbols. The dataset to be used is Kuzushiji-MNIST or KMNIST for short. Significant changes occurred to the language when Japan reformed their education system in 1868, and the majority of Japanese today cannot read texts published over 150 years ago. The dataset we will be using contains 10 Hiragana characters with 7000 samples per class.

Implement a model NetLin which computes a linear function of the pixels in the image, followed by log softmax. Run the code by typing:

python3 kuzu_main.py --net lin

Produce the final accuracy and confusion matrix. Note that the rows of the confusion matrix indicate the target character, while the columns indicate the one chosen by the network. (0="o", 1="ki", 2="su", 3="tsu", 4="na", 5="ha", 6="ma", 7="ya", 8="re", 9="wo").

Implement a fully connected 2-layer network NetFull (i.e. one hidden layer, plus the output layer), using tanh at the hidden nodes and log softmax at the output node. Run the code by typing:

python3 kuzu_main.py --net full

Try different values (multiples of 10) for the number of hidden nodes and try to determine a value that achieves high accuracy (at least 84%) on the test set. Produce the final accuracy and confusion matrix.

Implement a convolutional network called NetConv, with two convolutional layers plus one fully connected layer, all using relu activation function, followed by the output layer, using log softmax. You are free to choose for yourself the number and size of the filters, metaparameter values (learning rate and momentum), and whether to use max pooling or a fully convolutional architecture. Run the code by typing:

python3 kuzu_main.py --net conv

Your network should consistently achieve at least 93% accuracy on the test set after 10 training epochs. Produce the final accuracy and confusion matrix.

Briefly discuss the following points:

the relative accuracy of the three models,
the confusion matrix for each model: which characters are most likely to be mistaken for which other characters, and why?

Part 1 - NetLin

Final Accuracy Test set: Average loss: 1.0102, Accuracy: 6967/10000 (70%)

Confusion Matrix

Part 2 - NetFull

Final Accuracy Test set: Average loss: 0.4974, Accuracy: 8492/10000 (85%)

Confusion Matrix

Part 3 - NetConv

Final Accuracy Test set: Average loss: 0.2481, Accuracy: 9387/10000 (94%)

Confusion Matrix

Part 4 - Discussion

It is clear from the results above that the accuracy improves as the complexity of the model increases. We can see that ‘NetLin’ which was the simplest of the three models had the lowest accuracy of around 70% whereas ‘NetFull’, which employs the use of a hidden layer with ‘tanh’ activation performed better with 85% accuracy. Furthermore, NetConv utilised convolutional layers as well as a fully connected layer with ‘relu’ activations which gave us an accuracy of 94% which was the highest out of the three models.

Fig 1.4 – Most frequent misclassifications (red = most frequent, orange = 2nd most frequent, yellow = 3rd most frequent)

If we look at ‘NetConv’ and its three most frequent misclassifications in descending order from fig1.4 above, we can see that this model is most likely to mistake:

は (ha) for す (su)
き (ki) for ま (ma)
お (o) for な (na)

All three models seem to misclassify ‘は (ha) for す (su)’ and ‘き (ki) for ま (ma)’. However, ‘NetLin’ and ‘NetFull’ often mistakes ‘お (o)’ for ‘や (ya) and は (ha)’ rather than ‘な (na)’ which is how ‘NetConv’ behaves.

To understand why some characters may be mistaken for others, we look at three comparisons below. We can see that the character ‘は (ha)’ is often mistaken for ‘す (su)’ and ‘き (ki)’ is mistaken for ‘ま (ma)’ in all three models. From looking at the comparisons below we can see two very similar features in both characters circled below for both pairs.

For the ‘NetConv’ model, ‘お (o)’ is often mistaken for ‘な (na)’, however the other two models ‘NetLin’ and ‘NetFull’ mistake ‘お (o)’ for either ‘や (ya)’ or ‘は (ha)’. By inspection we can see that the misclassification from ‘NetConv’ is more convincing compared to the misclassification from ‘NetLin’ and ‘NetFull’ for this example.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
kuzu.py		kuzu.py
kuzu_main.py		kuzu_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMP9444 Neural Networks and Deep Learning - Japanese Character Recognition

Part 1 - NetLin

Part 2 - NetFull

Part 3 - NetConv

Part 4 - Discussion

About

Releases

Packages

Languages

alexlee2000/Japanese_Character_Recognition

Folders and files

Latest commit

History

Repository files navigation

COMP9444 Neural Networks and Deep Learning - Japanese Character Recognition

Part 1 - NetLin

Part 2 - NetFull

Part 3 - NetConv

Part 4 - Discussion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages