MixMatch

This repository is about steps of this semi-supervised learning algorithm.

Original paper: MixMatch - A Holistic Approach to Semi-Supervised Learning by David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver and Colin Raffel.
Code provided by Google Research is here.

This algorithm is originally designed for image classification and usually requires CUDA support. Together with Yang Wan and Minchenxi Zhou, I used this approach and modified the code for tabular data and a CPU-only environment in my internship. I cannot upload code here because of confidentiality requirement, but I can share my understanding of this algorithm because it's public content:)

Data Preparation

Categorized as labeled data & unlabeled data
Data preprocessing
- Drop features where over 95% of the data are missing
- Impute missing values: -1 for int (discrete values), mean for float (continuous values)
- Construct matrices after dimension seperation
- The final data should be in a dimension of 4
  - sample size
  - RGB parameter (3 if colored, 1 if grey scale or non-image data)
  - matrix length
  - matrix width

Data Augmentation

Augmentation times: 1 for labeled data; K (hyper parameter) for unlabeled data
For image data
- Strong augmentation: sharpening, adjusting saturation, and adjusting color temperature
- Weak augmentation: translation, rotation, and cropping
For tabular data
- Random flipping and cropping of matrices (substituting the margins with all 0)

Label Guessing

Generate pseudo labels for unlabeled data with models. Try multiple times and take the mean as the final result
For image data, Wide-ResNet-28 (28 layers of wide residual networks) is commonly used. However, technically other unsupervised learning models could work too, depending on the data format
- Wide-Res-Net structure

Sharpening Pseudo Labels

Minimize entropy and transform the predictions to a one-hot distribution. Pick the one with noticeably largest value

Shuffle

Put together X and U as W; then shuffle W
- X：augmented labeled dataset and their labels
- U：augmented unlabeled dataset and their pseudo labels
- hyper-parameter α = 0.75
  - λ ~ Beta(α, α)
  - λ' = max(λ, 1-λ)
  - x' = λ'x1 + (1-λ')x2
  - p' = λ'p1 + (1-λ')p2

MixUp

Mix up every element in W
Take maximums for λ & 1-λ, so that (x1, p1) still takes up a principal component in the final result after mixup. In this way, X's can still represent labeled data while U's can represent unlabeled ones

Mixed Loss Calculation

Calculate cross entropy for labeled data
Calculate the L2 distance between predictions and pseudo labels for unlabeled data
L = Lx + λuLu
- hyper-parameter λu = 100

Model Training & Parameter Tuning

Train the model with Wide-Res-Net again
Tune parameters until the model performs well

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
MixMatch_steps_graph.pdf		MixMatch_steps_graph.pdf
README.md		README.md
WideResNet_structure.png		WideResNet_structure.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MixMatch

Data Preparation

Data Augmentation

Label Guessing

Sharpening Pseudo Labels

Shuffle

MixUp

Mixed Loss Calculation

Model Training & Parameter Tuning

About

Releases

Packages

HonglingLei/MixMatch

Folders and files

Latest commit

History

Repository files navigation

MixMatch

Data Preparation

Data Augmentation

Label Guessing

Sharpening Pseudo Labels

Shuffle

MixUp

Mixed Loss Calculation

Model Training & Parameter Tuning

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages