Classification and Domain Adaptation on MNIST and SVHN datasets

Overview

This repo implements a domain adaptation neural networks. The paper Unsupervised Domain Adaptation by Backpropagation by Yaroslav Ganin and Victor Lempitsky was a great source of inspiration to design the neural networks.
The goal of this repo is to implement a network able to classify MNIST samples by using only SVHN labelled samples and MNIST unlabelled samples during training.

Depedencies

virtualenv -p python3 env
source env/bin/activate
pip install -r requirements.txt

Download data

python utils/download_data.py

Configuration

All the configuration variables are in the file utils/config.py.

Architectures

To have a baseline of the performance without domain adaptation, I tested a simple Convolutional Neural Network. The architecture is described in this picture: cnn.

The network with domain adaptation was designed such that the architecture for label prediction was the same as the previous CNN for two main reasons:

compare similar networks performance
load pre trained weigths to make training easier

The architecture is described in those pictures: cnn_grl_model and cnn_grl_fe.

Hyperparameters

optimizer: SGD(lr=1e-3, momentum=0.9, decay=1e-5)
lambda evolution:
$\lambda = \frac {2} {1 + exp(-10 * epoch / maxepoch)} - 1$

Train a specific model

To display training options

python train.py -h

Load weights and evaluate models

python evaluate.py

Results

Network	Source (accuracy)	Target (accuracy)
CNN	SVHN (0.908)	MNIST (0.601)
CNN-GRL	SVHN (0.883)	MNIST (0.711)
CNN	MNIST (0.986)	SVHN (0.230)
CNN-GRL	MNIST (0.982)	SVHN (0.238)

It seems than the Gradient Reversal Layer leads to a significative improvement of the network to classify MNIST when it has been trained on SVHN. However, the opposite task isn't more effective with the GRL.

Features visualization

The next two plots are a visualization of the features built by different networks. It is a t-sne with 2 components of the output of the Dense layer with 512 units in each network. I only used 3000 samples to make t-sne computation faster.

The goal of the network with gradient reversal layer is to build features independent of the input distribution samples (MNIST or SVHN for example). Thus, the features built by the CNN-GRL network should be more mixed than the one built by the classic CNN.
Even if it isn't completely obvious, the features built by the CNN-GRL architecture seem to be more mixed than the ones built by the CNN architecture.
The same calculation on the whole datasets would probably lead to a more obvious result.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
img		img
layers		layers
logs		logs
models		models
test		test
utils		utils
weights		weights
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
plot_architecture.py		plot_architecture.py
plot_features.py		plot_features.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification and Domain Adaptation on MNIST and SVHN datasets

Overview

Depedencies

Download data

Configuration

Architectures

Hyperparameters

Train a specific model

Load weights and evaluate models

Results

Features visualization

About

Releases

Packages

Languages

anth2o/domain-adaptation

Folders and files

Latest commit

History

Repository files navigation

Classification and Domain Adaptation on MNIST and SVHN datasets

Overview

Depedencies

Download data

Configuration

Architectures

Hyperparameters

Train a specific model

Load weights and evaluate models

Results

Features visualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages