Deflecting Adversarial Attacks with Pixel Deflection

The code in this repository demonstrates that Deflecting Adversarial Attacks with Pixel Deflection (Prakash et al. 2018) is ineffective in the white-box threat model.

With an L-infinity perturbation of 4/255, we generate targeted adversarial examples with 97% success rate, and can reduce classifier accuracy to 0%.

See our note for more context and details.

Citation

@unpublished{cvpr2018breaks,
  author = {Anish Athalye and Nicholas Carlini},
  title = {On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses},
  year = {2018},
  url = {https://arxiv.org/abs/1804.03286},
}

robustml evaluation

Run with:

python robustml_attack.py --imagenet-path <path>

Credits

Thanks to Nicholas Carlini for implementing the break and Dimitris Tsipras for writing the robustml model wrapper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Deflecting Adversarial Attacks with Pixel Deflection

Citation

robustml evaluation

Credits

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deflecting Adversarial Attacks with Pixel Deflection

Citation

robustml evaluation

Credits