Skip to content

HonglingLei/MixMatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

MixMatch

This repository is about steps of this semi-supervised learning algorithm.

This algorithm is originally designed for image classification and usually requires CUDA support. Together with Yang Wan and Minchenxi Zhou, I used this approach and modified the code for tabular data and a CPU-only environment in my internship. I cannot upload code here because of confidentiality requirement, but I can share my understanding of this algorithm because it's public content:)

Data Preparation

  • Categorized as labeled data & unlabeled data
  • Data preprocessing
    • Drop features where over 95% of the data are missing
    • Impute missing values: -1 for int (discrete values), mean for float (continuous values)
    • Construct matrices after dimension seperation
    • The final data should be in a dimension of 4
      • sample size
      • RGB parameter (3 if colored, 1 if grey scale or non-image data)
      • matrix length
      • matrix width

Data Augmentation

  • Augmentation times: 1 for labeled data; K (hyper parameter) for unlabeled data
  • For image data
    • Strong augmentation: sharpening, adjusting saturation, and adjusting color temperature
    • Weak augmentation: translation, rotation, and cropping
  • For tabular data
    • Random flipping and cropping of matrices (substituting the margins with all 0)

Label Guessing

  • Generate pseudo labels for unlabeled data with models. Try multiple times and take the mean as the final result
  • For image data, Wide-ResNet-28 (28 layers of wide residual networks) is commonly used. However, technically other unsupervised learning models could work too, depending on the data format
    • Wide-Res-Net structure

Sharpening Pseudo Labels

  • Minimize entropy and transform the predictions to a one-hot distribution. Pick the one with noticeably largest value

Shuffle

  • Put together X and U as W; then shuffle W
    • X:augmented labeled dataset and their labels
    • U:augmented unlabeled dataset and their pseudo labels
    • hyper-parameter α = 0.75
      • λ ~ Beta(α, α)
      • λ' = max(λ, 1-λ)
      • x' = λ'x1 + (1-λ')x2
      • p' = λ'p1 + (1-λ')p2

MixUp

  • Mix up every element in W
  • Take maximums for λ & 1-λ, so that (x1, p1) still takes up a principal component in the final result after mixup. In this way, X's can still represent labeled data while U's can represent unlabeled ones

Mixed Loss Calculation

  • Calculate cross entropy for labeled data
  • Calculate the L2 distance between predictions and pseudo labels for unlabeled data
  • L = Lx + λuLu
    • hyper-parameter λu = 100

Model Training & Parameter Tuning

  • Train the model with Wide-Res-Net again
  • Tune parameters until the model performs well

Releases

No releases published

Packages

No packages published