Robust Mini-Batch Gradient Descent

This repository is a fork and extension of 496-final-project. All work beyond the original project is done solely by myself.

Introduction

Mini-Batch Gradient Descent (MBGD) is a simple yet effective machine learning model as a multiple linear (and thus, polynomial) regressor. However, the naive MBGD model with squared losses is very sensitive to outliers, making it vulnerable to adversary contamination in the training sets.

We introduced several modifications to the naive MBGD model, making it more robust:

Huber loss instead of squared loss. Huber loss reduces the influences of the samples far away from the fitted line.
ε-trimming. Our assumption of the contamination is the Huber's ε-contamination model, where an arbitrary ε ∈ [0, 1) arbitrary proportion of the true samples are replaced by arbitrary adversary samples. The ε-trimming step we introduced trims the ε · batch_size samples with the largest absolute residuals in every batch, so they will not be parts of the gradients calculated and will not affect the updated weights.

The motivation is: If the adversary samples are close to the fitted line, they don't really matter. If the adversary samples are far away from the fitted line, they are likely to have larger absolute residuals than the true samples, and thus be trimmed.
Z-score-trimming. Since we know that the true noise follows a normal distribution, the true residuals should also follow a normal distribution. The Z-score-trimming step we introduced iteratively trims the samples with absolute values of their z-scores greater than 2 (1.83% of all samples in an idea normal distribution) until there is no such sample. Since the weights in the initial rounds are nearly random, we update the weights with all samples in the batch (trim nothing) when over half of them were to be trimmed.
Kernel preprocessing. Our kernel preprocessor is inspired by the kernels in Neural Networks. We scan the input space with kernels, and combine the samples captured by each kernel, if and only if there are "enough" samples captured. We will then train the MBGD model with the output samples of the kernel preprocessor.

The parameters of the kernel preprocessors are kernel size, strides, and threshold (a proportion of the training sample size). The padding is fixed to be the same as the kernel size.

Run Tests

python3 -m venv venv
source venv/bin/activate
pip3 install wheel
pip3 install -r requirements.txt
python3 -m tests

Test Results

In all the following tests, models were fitted with 5 degrees on 9-degree polynomials with noise ~ 𝒩(0, Var(y)) and ε = 0.49.

It seems that the model with ε-then-Z-score-trimmed Huber Loss without preprocessing is the best one:

Test Name	Naive Training	Naive Testing	ε-then-Z-score-trimmed Huber Loss Training	ε-then-Z-score-trimmed Huber Loss Testing
Random Contamination
Parallel-line Contamination
Local Dense Contamination
Segment Displacement Contamination

All test results

Test Name	Naive Training	Naive Testing	ε-trimmed Huber Loss Training	ε-trimmed Huber Loss Testing	Z-score-trimmed Huber Loss Training	Z-score-trimmed Huber Loss Testing	ε-then-Z-score-trimmed Huber Loss Training	ε-then-Z-score-trimmed Huber Loss Testing	Mean-kernel Preprocessor Training	Mean-kernel Preprocessor Testing	ε-trimmed Huber Loss with Mean-kernel Preprocessor Training	ε-trimmed Huber Loss with Mean-kernel Preprocessor Testing	Z-score-trimmed Huber Loss with Mean-kernel Preprocessor Training	Z-score-trimmed Huber Loss with Mean-kernel Preprocessor Testing	ε-then-Z-score-trimmed Huber Loss with Mean-kernel Preprocessor Training	ε-then-Z-score-trimmed Huber Loss with Mean-kernel Preprocessor Testing
Random Contamination
Parallel-line Contamination
Local Dense Contamination
Segment Displacement Contamination

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
src		src
test_results		test_results
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

test_results

test_results

tests

tests

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Robust Mini-Batch Gradient Descent

Introduction

Run Tests

Test Results

About

Contributors 3

Languages

License

WHMHammer/robust-mini-batch-gradient-descent

Folders and files

Latest commit

History

Repository files navigation

Robust Mini-Batch Gradient Descent

Introduction

Run Tests

Test Results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages