# [IEEE Signal Processing CUP 2022](https://signalprocessingsociety.org/community-involvement/signal-processing-cup)

> Synthetic Speech Attribution

<img src="https://user-images.githubusercontent.com/36858976/153433391-0c47d037-33c9-4942-aec7-3532b97378d1.jpg" width=600>

# Get Code
* Unzip code folder.
* Set `sp2022-tf` as current directory.

In [None]:
# delete existing
# !rm -r ../sp2022-tf

# change current directory to <code> directory
%cd ../sp2022-tf

## Important Notes
* All audio files must be in `.wav` format.
* Sample Rate must be `16,000`.
* For training, `batch_size` is tuned for `8 x V100`. If models is trained in other device, `batch_size` needs to be tuned accordingly using `--batch` argument.
* `learning_rate` depends on `batch_size` hence if it `batch_size` is altered then `learning_rate` needs to be tuned accordingly.
* Total `epochs` is determined using **Cross-Validation** for provided training data. If **Training** data is changed then Total `epochs` needs to be tuned using **Cross-Validation**, setting `--all-data=0` in [train.py](../train.py).
* While training, **Internet** Connection is required to download **ImageNet** weights for CNN Backbones.
* To reproduce the result, it is recommended to run code in same **Device Configuration**.
* For inference, `batch_size` is tuned for `8 x V100`. For any other device, `batch_size` may need to be modified. To modify `batch_size` change following codes in [predict.py](../predict.py),

```py
# CONFIGURE BATCHSIZE
mx_dim = np.sqrt(np.prod(dim))
if mx_dim>=768 or any(i in model_name for i in ['convnext','ECA_NFNetL2']):
    CFG.batch_size = CFG.replicas * 16
elif mx_dim>=640  or any(i in model_name for i in ['EfficientNet','RegNet','ResNetRS50','ResNest50']):
    CFG.batch_size = CFG.replicas * 32
else:
    CFG.batch_size = CFG.replicas * 64
```
* Final output will be saved at `output/result`

# Direct Prediction
To directly generate prediction on **eval** data without any **Training** using **provided** checkpoints, refer to [sp2022-infer-gpu](sp2022-train-gpu.ipynb) notebook.

## 0. Requirements

## Hardware
* GPU (model or N/A):   8x NVIDIA Tesla V100
* Memory (GB):   8 x 32GB
* OS: Amazon Linux
* CUDA Version : 11.0
* Driver Version : 450.119.04
* CPU RAM : 128 GiB
* DISK : 2 TB


Install necessary dependencies using following command,

## Library

In [None]:
%pip install -r requirements.txt

# 1. Data Preparation
* Step 1: Competition data needs to be in the `./data/` folder. It is mendatory to have the data in exact same format like it was provided.

* Step 2: External datasets needs to be downloaded from following links and needs to be in the `./data/` folder,
    1. LJSpeech: [link](https://www.kaggle.com/datasets/showmik50/ljspeech-sr16k-dataset) (~2GB)
    2. VCTK: [link](https://www.kaggle.com/datasets/showmik50/vctk-sr16k-dataset) (~3GB)
    3. LibriSpeech: [link](https://www.kaggle.com/datasets/benimaru069/librispeech-small-dataset) (~15GB)
    4. Synthetic: [link](https://www.kaggle.com/datasets/burns070/aps22-synthetic-dataset) (~5GB)

> **Note:** All the datasets were pre-processed to have exact same **sample_rate** = `16k` and **file_format** = `.wav`. 


## Data Path Format
Datasets are expected to have following format,

```shell
├── data
│   ├── aps22-synthetic-dataset
│   ├── librispeech-small-dataset
│   ├── ljspeech-sr16k-dataset
│   ├── vctk-sr16k-dataset
│   ├── spcup_2022_training_part1
│   │   └── spcup_2022_training_part1
│   ├── spcup_2022_unseen
│   │   └── spcup_2022_unseen
│   ├── spcup_2022_eval_part1
│   │   └── spcup_2022_eval_part1
│   ├── spcup_2022_eval_part2
│   │   └── spcup_2022_eval_part2
```

To use custom directory, `PATHS.json` needs to modified using following cell,

In [None]:
import json
paths = {
    "TRAIN_DATA_DIR": "./data/spcup_2022_training_part1/spcup_2022_training_part1/",
    "UNSEEN_DATA_DIR": "./data/spcup_2022_unseen/spcup_2022_unseen/",
    "TEST1_DATA_DIR": "./data/spcup_2022_eval_part1/spcup_2022_eval_part1/",
    "TEST2_DATA_DIR": "./data/spcup_2022_eval_part2/spcup_2022_eval_part2/",
    "LJ_DATA_DIR": "./data/ljspeech-sr16k-dataset/",
    "VCTK_DATA_DIR": "./data/vctk-sr16k-dataset/",
    "LIBRI_DATA_DIR": "./data/librispeech-small-dataset/",
    "SYNTHETIC_DATA_DIR": "./data/aps22-synthetic-dataset/"
}
json.dump(paths, open('PATHS.json','w'))

# Need this for inference path
part1_infer_path = paths['TEST1_DATA_DIR']
part2_infer_path = paths['TEST2_DATA_DIR']

# 2. Supervisied Training
Competition & external data and their associated labels will be used for **Supervised Training**. All external data is considered as **Unknown Algorithm**.
> **Note**: Outputs will be saved at `./output/supervised` folder

## Part-1
For Training models for **eval_part1** data run following commands,

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/supervised/part1\
--model=EfficientNetB0\
--batch=64\
--epochs=11

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/supervised/part1\
--model=ResNet50D\
--batch=64\
--epochs=9

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/supervised/part1\
--model=ResNetRS50\
--batch=32\
--epochs=13

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/supervised/part1\
--model=ResNest50\
--batch=32\
--epochs=21

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/supervised/part1\
--model=RegNetZD8\
--batch=64\
--epochs=8

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/supervised/part1\
--model=EfficientNetV2S\
--pretrain=imagenet21k\
--batch=32\
--epochs=25

## Part-2
For Training models for **eval_part2** data run following commands,

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/supervised/part2\
--model=ECA_NFNetL2\
--batch=16\
--epochs=12

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/supervised/part2\
--model=convnext_base_in22k\
--batch=32\
--epochs=14

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/supervised/part2\
--model=ResNetRS152\
--batch=32\
--epochs=11

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/supervised/part2\
--model=convnext_large_in22k\
--batch=16\
--epochs=15

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/supervised/part2\
--model=RegNetZD8\
--batch=32\
--epochs=5

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/supervised/part2\
--model=EfficientNetB0\
--batch=64\
--epochs=13

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/supervised/part2\
--model=EfficientNetV2M\
--pretrain=imagenet21k\
--batch=32\
--epochs=15

# 3. Generate Semi-Supervised Labels
Run following command to generate **Semi-Supervisied** lables for both **eval_part1&** & **eval_part2** data. 
> **Note**: Semi-Supervised labels will be saved at `output/supervised/pseudo/pred.csv`.

In [None]:
!python generate_pseudo.py\
--part1-model-dir=output/supervised/part1\
--part1-infer-path=$part1_infer_path\
--part2-model-dir=output/supervised/part2\
--part2-infer-path=$part2_infer_path\
--output=output/supervised/pseudo/pred.csv

# 4. Semi-Supervised Training
In this stage Competition & External data will be used along with **eval_part1** & **eval_part2** data. For **eval_data** their **semi-supervised** labels will be used which were generated in previous stage. 
> **Note**: Outputs will be saved at `./output/semi-supervised` folder


## Part-1
For Training models for **eval_part1** data run following commands,

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/semi-supervised/part1\
--model=EfficientNetB0\
--batch=64\
--epochs=15\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/semi-supervised/part1\
--model=ResNet50D\
--batch=64\
--epochs=18\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/semi-supervised/part1\
--model=ResNetRS50\
--batch=32\
--epochs=17\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/semi-supervised/part1\
--model=ResNest50\
--batch=32\
--epochs=16\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/semi-supervised/part1\
--model=RegNetZD8\
--batch=64\
--epochs=12\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part1.yaml\
--output-dir=output/semi-supervised/part1\
--model=EfficientNetV2S\
--pretrain=imagenet21k\
--batch=64\
--epochs=7\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

## Part-2
For Training models for **eval_part2** data run following commands,

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/semi-supervised/part2\
--model=ECA_NFNetL2\
--batch=16\
--epochs=11\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/semi-supervised/part2\
--model=convnext_base_in22k\
--batch=16\
--epochs=6\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/semi-supervised/part2\
--model=ResNetRS152\
--batch=32\
--epochs=16\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/semi-supervised/part2\
--model=convnext_large_in22k\
--batch=16\
--epochs=10\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/semi-supervised/part2\
--model=EfficientNetB0\
--batch=32\
--epochs=10\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/semi-supervised/part2\
--model=EfficientNetB0\
--batch=32\
--epochs=10\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

In [None]:
!python3 train.py\
--cfg ./configs/sp22-part2.yaml\
--output-dir=output/semi-supervised/part2\
--model=EfficientNetV2M\
--pretrain=imagenet21k\
--batch=32\
--epochs=25\
--pseudo 1\
--pseudo_csv=output/supervised/pseudo/pred.csv

# 5. Prediction with **Trained** Models
For predicting on **eval_data** using newly trained models use following codes,
> **Note**: Outputs will be saved at `output/result`


## Part-1
To generate prediction for **eval_part1** data using **newly-trained** checkpoints run following commands,

In [None]:
!python predict.py\
--cfg ./configs/sp22-part1.yaml\
--model-dir=output/semi-supervised/part1\
--infer-path=$part1_infer_path\
--output=output/result/pred_part1.csv

## Part-2
To generate prediction for **eval_part2** data using **newly-trained** checkpoints run following commands,

In [None]:
!python predict.py\
--cfg ./configs/sp22-part2.yaml\
--model-dir=output/semi-supervised/part2\
--infer-path=$part2_infer_path\
--output=output/result/pred_part2.csv

# Output

In [None]:
!tree

In [None]:
!tree outputs