Evasion and poisoning attacks are a major concern for machine learning models that operate on high dimensional data such as audio, images, and video.
One of the most insidious aspects of these attacks is that they only require subtle perturbations in the input space to succeed.
In many cases, a successful perturbation can be so small as to be undetectable to a careful human auditor.
However, in this work, we show that perturbations introduced by these attacks can be sanitized by lossy compression.
We show that for image classification, the accuracy gained from sanitizing the attack outweighs the accuracy lost from compression.
We conduct experiments on several images using a variety of codecs and perturbation sizes.
Our results suggest that lossy compression is a powerful strategy to mitigate these attacks.
In addition, we show that learning directly on compressed representations can significantly reduces the memory throughput required for training, thus increasing efficiency with only a modest loss in accuracy.



In recent years, machine learning models have become increasingly prevalent in a wide range of applications, from computer vision and speech recognition to natural language processing and even medical diagnosis.
These attacks are particularly concerning because they can be implemented using subtle perturbations in the input space that are difficult or impossible for a human to detect.

In this paper, we present a new approach to mitigating evasion and poisoning attacks in machine learning models.
Our approach is based on the observation that these attacks often involve adding small perturbations to the input data, which can be effectively sanitized by lossy compression.
Since standardized lossy compression techniques focus on preserving visible features, they can be used to target the perturbation introduced by an attack while preserving the features necessary to achieve high accuracy.

We conduct experiments on an image classification task using a variety of codecs and perturbation sizes to evaluate the effectiveness of our approach.
Our results show that sanitizing the perturbations using lossy compression can significantly improve the accuracy of the model, even when using relatively high levels of compression.
Furthermore, we show that learning directly on compressed representations can significantly reduce the memory throughput required for training, thus increasing efficiency with only a modest loss in accuracy.

The rest of this paper is organized as follows. Section 2 presents an overview of related work, including (1) gradient-based evasion and poisoning attacks, (2) an overview of lossy compression standards and perceptual quality, (3) prior research relating adversarial examples and robust features, and (4)  previous approaches to training on lossy encoded data. In Section 3, we describe our approach for using lossy compression to prevent evasion and poisoning attacks. Then, we propose a novel approach for training neural networks directly on lossy encoded data suing binary neural networks to preserve quantization. In Section 4, we present experimental results on mitigating attacks to image classifiers and increasing the efficiency of an audio classifier. In Section 5, we provide our recommendations for when and how to leverage lossy compression for more accurate models and discuss future directions to explore.

We consider adversarial examples in the context of two types of attacks: evasion and poisoning. Evasion attacks exploit knowledge of a model that's already been trained. For example, if an attacker wants a malicious email to pass through a spam filter undetected, they might use full or partial knowledge of the behavior of the trained spam filter to find "magic words" that cause an email to be classified as not spam.

These attacks are typically performed using gradient-based methods, where the gradient of the loss function with respect to the input $\Delta_x \mathcal{L}(x,y,\theta)$ is used to guide the perturbation. Since moving in the \textit{opposite direction} of the gradient increases model accuracy, we can create a perturbation by moving \textit{with the direction} of the gradient, i.e. $$x_{\text{adv}} = x_0 + \epsilon \Delta_x \mathcal{L}(x,y,\theta).$$
A simple, effective, and widely studied variant of this attack is the fast gradient sign method (FGSM) \cite{goodfellow2014explaining}, where the sign of the gradient is used instead $$x_{\text{adv}} = x_0 + \epsilon \text{ sign} \left( \Delta_x \mathcal{L}(x,y,\theta) \right).$$
This strictly limits the amplitude of the perturbation to $\pm \epsilon$ while maximizing its effect on model predictions. The limitation of the amplitude is what prevents the perturbation from being detected. For example, in figure \ref{fig:doog}, we show how epsilon can be chosen to limit the perturbation to two, four, or six of the least significant bits of an 8-bit image.


In a poisoning attack, the dataset is contaminated, usually with the goal of introducing a backdoor. For example, if an attacker wants to prevent a facial recognition model from working on one or more subjects, they might upload an altered image public to the web where the dataset is sourced for training. Recent poisoning attacks such as gradient matching \cite{geiping2020witches} have been shown to be effective on very large datasets like imagenet. With gradient matching, small, imperceptible perturbations on as little as 0.1% of the training data are sufficient for a trigger image to be classified as any desired class by the attacker. Additionally, it has been shown that gradient matching only needs partial model of the model architecture, and it has been shown to translate between different image classification models. For example, poisoning data by assuming a ResNet20 model still works when a VGG13 model is trained.

Researchers have demonstrated that adversarial examples, such as those produced by FGSM, can be attributed to the presence of \textit{non-robust features} \cite{ilyas2019adversarial}

In [1]:
run(`jupyter-nbconvert --to markdown lossy.ipynb --TagRemovePreprocessor.remove_cell_tags='{"remove_cell"}'`);
run(`mv lossy.md lossy.tex`);
run(`pdflatex lossy`);
run(`biber lossy`);
run(`pdflatex lossy`);

This is pdfTeX, Version 3.141592653-2.6-1.40.22 (TeX Live 2022/dev/Debian) (preloaded format=pdflatex)
 restricted \write18 enabled.
entering extended mode
(./lossy.tex
LaTeX2e <2021-11-15> patch level 1
L3 programming layer <2022-01-21>
(/usr/share/texlive/texmf-dist/tex/latex/base/article.cls
Document Class: article 2021/10/04 v1.4n Standard LaTeX document class
(/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo)) (./spconf.sty)
(/usr/share/texlive/texmf-dist/tex/latex/base/inputenc.sty)
(/usr/share/texlive/texmf-dist/tex/latex/setspace/setspace.sty)
(/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amssymb.sty
(/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amsfonts.sty))
(/usr/share/texlive/texmf-dist/tex/latex/subfiles/subfiles.sty
(/usr/share/texlive/texmf-dist/tex/latex/import/import.sty))
(/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsmath.sty
For additional information on amsmath, use the `?' option.
(/usr/share/texlive/texmf-dist/tex/latex/amsmath/amstext.sty
(/u

  warn(
[NbConvertApp] Converting notebook lossy.ipynb to markdown
[NbConvertApp] Writing 16182 bytes to lossy.md
Use of uninitialized value in quotemeta at /usr/share/perl5/Biber/Config.pm line 228.
Use of uninitialized value $tool in concatenation (.) or string at /usr/share/perl5/Biber/Config.pm line 307.
Use of uninitialized value $opt in hash element at /usr/share/perl5/Biber/Config.pm line 967.
Use of uninitialized value within %Biber::Config::CONFIG_OPTSCOPE_BIBLATEX in hash dereference at /usr/share/perl5/Biber/Config.pm line 967.
Use of uninitialized value $opt in hash element at /usr/share/perl5/Biber/Config.pm line 972.
Use of uninitialized value within %Biber::Config::CONFIG_OPTSCOPE_BIBLATEX in hash dereference at /usr/share/perl5/Biber/Config.pm line 972.
Use of uninitialized value $opt in hash element at /usr/share/perl5/Biber/Config.pm line 977.
Use of uninitialized value within %Biber::Config::CONFIG_OPTSCOPE_BIBLATEX in hash dereference at /usr/share/perl5/Biber/Confi