Codes for the paper The Methodology for Stereo Image Learning Representation. [pdf] by Changwoon Choi, Seongrae Kim, Sangwoo Han.
Dataset | ||||
Gansynth[1] | ||||
Ours |
You can check the sample audios corresponding to the stereo image above at the following Google Drive link.
(Sample Audios)
To run the codebase, you need Anaconda. Once you have Anaconda installed, run the following command to create a conda environment.
conda create -n ml2020 python=3.8
conda activate ml2020
pip3 install -r requirements.txt
You can download our dataset at following Google Drive link.
[Dataset Link ]
In the compressed file, there are raw wav files and train.txt, test.txt for train - test split.
You can get preprocessed mel-spec and IF chunks with simply run prepare_data.py in model/ directory.
python prepare_data.py
You can easily train the network by running trian-MS.py
(You need to modify the root directories and data directories in train-MS.py)
python train-MS.py
Run the following command from the terminal in ml2020 folder:
python infer.py --type MS --model MODEL_PATH --sample_num NUM_SAMPLES --save_dir PATH_TO_SAVE
If you want to evaluate your outputs, you can run the following command to get the results:
python test_metrics.py --gen_dir PATH_TO_YOUR_OUTPUTS
[1] Jesse Engel et al.,“GANSynth: Adversarial neural audio synthesis,” in International Conference on Learning Representations, 2019.