Skip to content
Switch branches/tags


Failed to load latest commit information.
Latest commit message
Commit time

Audio Adversarial Examples

This is the code corresponding to the paper "Audio Adversarial Examples: Targeted Attacks on Speech-to-Text" Nicholas Carlini and David Wagner

To generate adversarial examples for your own files, follow the below process and modify the arguments to attack,py. Ensure that the file is sampled at 16KHz and uses signed 16-bit ints as the data type. You may want to modify the number of iterations that the attack algorithm is allowed to run.

WARNING: THIS IS NOT THE CODE USED IN THE PAPER. If you just want to get going generating adversarial examples on audio then proceed as described below.

The current master branch points to code which will run on TensorFlow 1.14 and DeepSpeech 0.4.1, an almost-recent version of the dependencies. (Large portions of will need to be re-written to run on DeepSpeech 0.5.1 which uses a new feature extraction pipeline with TensorFlow's C++ implementation. If you feel motivated to do that I would gladly accept a PR.)

However, IF YOU ARE TRYING TO REPRODUCE THE PAPER (or just have decided that you enjoy pain and want to suffer through dependency hell) then you will have to checkout commit a8d5f675ac8659072732d3de2152411f07c7aa3a and follow the README from there.

There are two ways to install this project. The first is to just use Docker with a buildfile provided by Tom Doerr. It works. The second is to try and set up everything on your machine directly. This might work, if you happen to have the right versions of things.

Docker Installation (highly recommended)

These docker instructions were kindly provided by Tom Doerr, and are simple to follow if you have Docker set up.

  1. Install Docker. On Ubuntu/Debian/Linux-Mint etc.:
sudo apt-get install
sudo systemctl enable --now docker

Instructions for other platforms:

  1. Download DeepSpeech and build the Docker images:
$ ./

With Nvidia-GPU support:

  1. Install the NVIDIA Container Toolkit. This step will only work on Linux and is only necessary if you want GPU support. As far as I know it's not possible to use a GPU with docker under Windows/Mac. On Ubuntu/Debian/Linux-Mint etc. you can install the toolkit with the following commands:
# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L | sudo apt-key add -
curl -s -L$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Instructions for other platforms (CentOS/RHEL):

  1. Start the container using the GPU image we just build:
$ docker run --gpus all -it --mount src=$(pwd),target=/audio_adversarial_examples,type=bind -w /audio_adversarial_examples aae_deepspeech_041_gpu

CPU-only (Skip if already started with Nvidia-GPU support):

  1. Start the container using the CPU image we just build:
$ docker run -it --mount src=$(pwd),target=/audio_adversarial_examples,type=bind -w /audio_adversarial_examples aae_deepspeech_041_cpu

Test Setup

  1. Check that you can classify normal audio correctly:
$ python3 --in sample-000000.wav --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1
  1. Generate adversarial examples:
$ python3 --in sample-000000.wav --target "this is a test" --out adv.wav --iterations 1000 --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1
  1. Verify the attack succeeded:
$ python3 --in adv.wav --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1

Docker Hub

The docker images are available on Docker Hub.

CPU-Version: tomdoerr/aae_deepspeech_041_cpu

GPU-Version: tomdoerr/aae_deepspeech_041_gpu

Direct Install

These are the original instructions from earlier. They will work, but require manual installs.

  1. Install the dependencies
pip3 install tensorflow-gpu==1.14 progressbar numpy scipy pandas python_speech_features tables attrdict pyxdg
pip3 install $(python3 util/ --decoder)

Download and install

1b. Make sure you have installed git lfs. Otherwise later steps will mysteriously fail.

  1. Clone the Mozilla DeepSpeech repository into a folder called DeepSpeech:
git clone

2b. Checkout the correct version of the code:

(cd DeepSpeech; git checkout tags/v0.4.1)

2c. If you get an error with tflite_convert, comment out Line 21

from tensorflow.contrib.lite.python import tflite_convert
  1. Download the DeepSpeech model
tar -xzf deepspeech-0.4.1-checkpoint.tar.gz
  1. Verify that you have a file deepspeech-0.4.1-checkpoint/ Its MD5 sum should be
  1. Check that you can classify normal images correctly
python3 --in sample-000000.wav --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1
  1. Generate adversarial examples
python3 --in sample-000000.wav --target "this is a test" --out adv.wav --iterations 1000 --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1
  1. Verify the attack succeeded
python3 --in adv.wav --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1


Targeted Adversarial Examples on Speech-to-Text systems




No releases published


No packages published