Train KT with Differential Privacy

This is the official repository for the paper Differentially Private Adapters for Parameter Efficient Acoustic Modeling, presented at Interspeech 2023. We use the code implemented by the ARM lab. Consider citing our paper and their paper if you find this work useful.

@article{ho2023differentially,
  title={Differentially Private Adapters for Parameter Efficient Acoustic Modeling},
  author={Ho, Chun-Wei and Yang, Chao-Han Huck and Siniscalchi, Sabato Marco},
  journal={arXiv preprint arXiv:2305.11360},
  year={2023}
}

@inproceedings{berg21_interspeech,
  author={Axel Berg and Mark O’Connor and Miguel Tairum Cruz},
  title={{Keyword Transformer: A Self-Attention Model for Keyword Spotting}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={4249--4253},
  doi={10.21437/Interspeech.2021-1286}
}

Setup

Install dependencies

Set up a new virtual environment:

pip install virtualenv
virtualenv --system-site-packages -p python3 ./venv3
source ./venv3/bin/activate

To install dependencies, run

pip install -r requirements.txt

Tested using Tensorflow 2.4.0rc1 with CUDA 11.

Note: Installing the correct Tensorflow version is important for reproducibility! Using more recent versions of Tensorflow results in small accuracy differences each time the model is evaluated. This might be due to a change in how the random seed generator is implemented, and therefore changes the sampling of the "unknown" keyword class.

Data preparation

Download the MLSW dataset
Create subsets of MLSW dataset using the following commands

./MLSW/filter_dataset.sh --prefix <MLSW PATH>

Model

The Keyword-Transformer model is defined here. It takes the mel scale spectrogram as input, which has shape 98 x 40 using the default settings, corresponding to the 98 time windows with 40 frequency coefficients.

There are three variants of the Keyword-Transformer model:

Time-domain attention: each time-window is treated as a patch, self-attention is computed between time-windows
Frequency-domain attention: each frequency is treated as a patch self-attention is computed between frequencies
Combination of both: The signal is fed into both a time- and a frequency-domain transformer and the outputs are combined
Patch-wise attention: Similar to the vision transformer, it extracts rectangular patches from the spectrogram, so attention happens both in the time and frequency domain simultaneously.

PATE

The original PATE method is introduced here. It trains several teacher models on different chunks of the sensitive data. Then, it trains the student model on some public data and the pseudo label generate by the privately aggregated teacher model.

We made several modifications to the PATE method.

Knowledge Transfering: we use a large pre-trained model to improve the performance, where we assume that there's no privacy issue in the pre-trained model.
Parameter Efficient Tuninig: we use residual adapters to parameter efficiently fine-tune the model while guaranteeing DP.

Train DPSGD

./scripts/train_dpsgd.sh --prefix <MLSW PATH> [options]
Options:
  --lang <en|fr|de|ru>               Default: en
  --eps <eps>                        Default: 8
  --delta <delta>                    Default: 1e-5
  --clip_norm <clip_norm>            Default: 20 (See the original DPSGD paper)
  --nepochs <nepochs>                Default: 20
  --batch_size <batch_size>          Default: 512

Train PATE

For additional option, please refer to train_pate_student_adapter.sh

./scripts/train_pate_teachers_adapter.sh --prefix <MLSW path> [Options]
./scripts/train_pate_student_adpater.sh --prefix <MLSW path> [Options]
Options:
  --lang <en|fr|de|ru>                    Default: en
  --nb_teachers <number of teachers>      Default: 50
  --nepochs <nepochs>                     Default: 80
  --batch_size <batch_size>               Default: 512
  --adapter_dim <adapter dimension>       Default: 192 (Please refer to https://aclanthology.org/2021.emnlp-main.541.pdf)

Acknowledgements

The code heavily borrows from the KWS streaming work by Google Research and Keyword transformer by the ARM lab. For a more detailed description of the code structure, see the original authors' README.

We also exploit training techniques from DeiT.

We thank the authors for sharing their code. Please consider citing them as well if you use our code.

License

The source files in this repository are released under the Apache 2.0 license.

Some source files are derived from the KWS streaming repository by Google Research. These are also released under the Apache 2.0 license, the text of which can be seen in the LICENSE file on their repository.

Some source files are derived from the Keyword transformer by the ARM lab. These are also released under the Apache 2.0 license, the text of which can be seen in the LICENSE file on their repository.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MLSW		MLSW
kws_streaming		kws_streaming
models_data_v2_12_labels		models_data_v2_12_labels
pate		pate
scripts		scripts
utils		utils
LICENSE.txt		LICENSE.txt
README.md		README.md
parse_options.sh		parse_options.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLSW

MLSW

kws_streaming

kws_streaming

models_data_v2_12_labels

models_data_v2_12_labels

pate

pate

scripts

scripts

utils

utils

LICENSE.txt

LICENSE.txt

README.md

README.md

parse_options.sh

parse_options.sh

requirements.txt

requirements.txt

Repository files navigation

Train KT with Differential Privacy

Setup

Install dependencies

Data preparation

Model

PATE

Train DPSGD

Train PATE

Acknowledgements

License

About

Releases

Packages

Languages

License

Chun-wei-Ho/Private-Speech-Adapter

Folders and files

Latest commit

History

Repository files navigation

Train KT with Differential Privacy

Setup

Install dependencies

Data preparation

Model

PATE

Train DPSGD

Train PATE

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Languages