CS385 MNIST Project

Introduction

This is about the first project in SJTU CS385, Machine Learning.

We do the task of classifying Mnist dataset, which includes hand-written digits. There are 60,000 training samples and 10,000 test samples.

Traditional Methods

The following are the traditional (not CNN-based) methods that I have implemented personally. Note that the Support Vector Machine method is implemented using sklearn.svm package.

Logistic Regression
Linear Discriminant Analysis
Support Vector Machine
Logistic Regression with Ridge Loss
Logistic Regression with Lasso Loss
Kernel-based Logistic Regression with Ridge Loss
Kernel-based Logistic Regression with Lasso Loss

CNN Architecture

The following are CNN architectures that I have implemented using pytorch package.

VGG11
NaiveNet (which consists of 2 conv layers and two fc layers)
VeryNaiveNet (which consists of 2 conv layers, with global pooling and an fc layer)
LinearNet (which only consists of 1 linear layer)
ResNet18

Performance

For the limit of space, I only show the final 10-category results here, and omit per-category accuracy in traditional methods.

Method	Accuracy	remark
Logistic Regression	0.8821	Initial
Logistic Regression	0.8836	Ridge loss
Logistic Regression	0.8890	Lasso loss
Logistic Regression	0.8802	Lasso + batch
Kernel Regression	0.8312	Ridge + RBF
Kernel Regression	0.6062	Ridge + COS
Kernel Regression	0.8062	Lasso + RBF
Kernel Regression	0.5687	Lasso + COS
LDA	0.8080	\
SVM	0.8880	Linear
SVM	0.9380	RBF
SVM	0.9250	Poly
SVM	0.832	Sigmoid
LinearNet	0.9149	\
VeryNaiveNet	0.9845	\
NaiveNet	0.9910	\
VGG11	0.9935	\
ResNet18	0.9959	\

Generative Adversarial Network

Feature visualization and Grad-CAM

Structure of VeryNaiveNet with Feature Visualization

The data flow of GradCAM

Some GradCAM results

Experiments

Kernel logistic regression with different number of samples

The bias term in LDA

The full potential in CNN

Partition an image into some blocks (4 x 4 in report), and random shuffle these blocks.
For each pixel, concate its x and y coordinates to the image's 2nd and 3rd channel, so that the input images are 3-channel.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
doc		doc
models		models
.gitignore		.gitignore
LDA.py		LDA.py
MnistData.py		MnistData.py
MnistDataLoader.py		MnistDataLoader.py
README.md		README.md
SVM.py		SVM.py
evaluate.py		evaluate.py
functions.py		functions.py
gan.py		gan.py
gradcam.py		gradcam.py
kernel_logistic_regression.py		kernel_logistic_regression.py
kernel_logistic_regression_lasso_loss.py		kernel_logistic_regression_lasso_loss.py
logistic_regression.py		logistic_regression.py
logistic_regression_lasso_loss.py		logistic_regression_lasso_loss.py
logistic_regression_ridge_loss.py		logistic_regression_ridge_loss.py
main.py		main.py
vis_feature.py		vis_feature.py
vis_reshape_img.py		vis_reshape_img.py

Simon-Fuhaoyuan/CS385-MNIST-Project

Folders and files

Latest commit

History

Repository files navigation