This repository is for the paper DeepFry: Identifying Vocal Fry Using Deep Neural Networks by Bronya R. Chernyak, Talia Ben Simon, Yael Segal, Jeremy Steffman, Eleanor Chodroff, Jennifer S. Cole, Joseph Keshet.
It contains code for predicting creaky voice, as well as pre-trained models.
We provide two pre-trained models:
- DeepFry - from the paper.
- DeepFry - trained on both the Nuclear and Pre-Nuclear datasets, that were described in the paper.
This repository enables you to identify creaky frames in a given audio, see details below.
conda env create -f environment.yml
- To run this repoitory, your environment should have Python 3.8.
- You will need Pytorch (1.12.0), but you don't have to use GPU. Please refer to: https://pytorch.org/get-started/locally/ and install the stable pytorch version most suitable for you environment specifications (OS, CUDA version, etc.).
- Finally, you will need the packages specified in the requirements file:
pip install -r requirements.txt
There are two options to run this repository:
- Run on a directory with wav files without corresponding annotated textgrids.
- Run on a directory with wav files and their corresponding textgrids.
- Remove the following argument from the commands below to run on CPU:
--cuda
- If you get an error regarding the number of workers used in the test dataloader, due to you computers specs, you can change them by adding the following argument:
--workers num_workers
- To run on a custom dataset, the .wav files (and optionally their corresponding TextGrid files) should be located under a folder named 'test' as follows(- See the 'allstar' folder in this repository for an example):
|-- CustomDataDIR
| |-- test
| | |-- file1.wav
| | |-- file1.TextGrid
| | |-- file2.wav
| | |-- file2.TextGrid
and then you can specify the argument --data_dir CustomDataDir
This options allows you to test the repository. In the folder 'allstar' you will find wav files with their corresponding textgrids, which we used to test our model on, as specified in the paper.
Note that the results in the paper were reported for 20ms to have a proper comparison between methods, while our model was trained on 5ms, so the measures here might differ slighly.
python run.py --data_dir allstar --model_name model_path --cuda
python run.py --data_dir allstar --model_name model_path --out_dir out_path --cuda
where model_path
is the absolute path to the pre-trained model, and out_dir
is the path to the directory in which the textgrids will be saved to with the predictions of the model.
python run.py --data_dir data_path --model_name model_path --out_dir out_path --custom --cuda
Where model_path
and out_dir
is the same as above and data_path
is the absolute path to a directory with wav files in which creak should be identified.
- Annotated silences - optional: should be under 'Speaker - word' tier, marked as 'sp' or under 'creak-gold' without a mark.
- Annotated creak - optional: should be under 'creak-gold' tier, marked as 'c'.
You can refer to the TextGrid files in the "allstar" folder for an example
python run.py --data_dir data_path --model_name model_path --out_dir out_path --cuda
Where model_path
and out_dir
is the same as above and data_path
is the absolute path to a directory with wav files in which creak should be identified alongside their corresponding textgrids.