Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
123 changed files
with
659 additions
and
2,452 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,47 @@ | ||
# compound-word-transformer | ||
implementation of compound word transformer | ||
# Compound Word Transformer | ||
|
||
|
||
Authors: [Wen-Yi Hsiao](), [Jen-Yu Liu](), [Yin-Cheng Yeh](), [Yi-Hsuan Yang]() | ||
|
||
[**Paper (arXiv)**]() | [**Audio demo (Google Drive)**]() | | ||
|
||
Officail PyTorch implementation of AAAI2021 paper "Compound Word Transformer: Learning to Compose Full-Song Musicover Dynamic Directed Hypergraphs". | ||
|
||
We presented a new variant of the Transformer that can processes multiple consecutive tokens at once at a time step. The proposed method can greatly reduce the length of the resulting sequence and therefore enhance the training and inference efficiency. We employ it to learn to compose expressive Pop piano music of full-song length (involving up to 10K individual to23 kens per song). In this repository, we open source our **Ailabs.tw Pop17K** dataset, and the codes for unconditional generation. | ||
|
||
|
||
## Dependencies | ||
|
||
* python 3.6 | ||
* Required packages: | ||
```bash | ||
pip install miditoolkit | ||
pip install torch==1.5.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html | ||
pip install --user pytorch-fast-transformers | ||
pip install chorder | ||
``` | ||
|
||
``chorder`` is our in-house rule-based symbolic chord recognition algorithm, which is developed by our former intern - [joshuachang2311](https://github.com/joshuachang2311/chorder). He is also a jazz pianist. | ||
|
||
|
||
## Model | ||
In this work, we conduct two scenario of generation: | ||
* unconditional generation | ||
* To see the experimental results and the discussion, pleasee refer to [here](./worksapce/uncond/Experiments.md). | ||
|
||
* conditional generation, leadsheet to full midi (ls2midi) | ||
* [**Working in progress**] The codes associated with this part are planning to open source in the future | ||
* melody extracyion (skyline) | ||
* objective metrics | ||
* model | ||
|
||
## Dataset | ||
To preparing your own training data, please refer to [documentaion]() for further understanding. | ||
The full workspace of our dataset **Ailabs.tw Pop17K** are available [here](https://drive.google.com/drive/folders/1DY54sxeCcQfVXdGXps5lHwtRe7D_kBRV?usp=sharing). | ||
|
||
|
||
## Acknowledgement | ||
- PyTorch codes for transformer-XL is modified from [kimiyoung/transformer-xl](https://github.com/kimiyoung/transformer-xl). | ||
- Thanks [ | ||
Yu-Hua Chen](https://github.com/ss12f32v) | ||
|
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Datasets | ||
|
||
In this document, we demonstrate our standard data processing pipeline in our team. Following the instructions and runnung corresponding python scripts, you can easily generate and customized your your own dataset. | ||
|
||
|
||
<p align="center"> | ||
<img src="../assets/data_proc_diagram.png" width="500"> | ||
</p> | ||
|
||
|
||
## 1. From `audio` to `midi_transcribed` | ||
We collect audio clips of piano performance from YouTube. | ||
|
||
* run google magenta's [onsets and frames](https://github.com/magenta/magenta/tree/master/magenta/models/onsets_frames_transcription) | ||
|
||
## 2. From `midi_transcribed` to `midi_synchronized` | ||
In this step, we use [madamom](https://github.com/CPJKU/madmom) for beat/downbeat tracking. Next, We interpolate 480 ticks between two adjacent beats, and map the absolute time into its according tick. Lastly, we infer the tempo changes from the time interval between adjacent beats. We choose beat resolution=480 because it's a common setting in modern DAW. Notice that we don't quantize any timing in this step hence we can keep tiny offset for future purposes. | ||
|
||
* run `synchronizer.py` | ||
|
||
## 3. From `midi_synchronized` to `midi_analyzed` | ||
In this step, we develop in-house rule-based symbolic melody extraction and chord recognition algorithm to obtain desired information. Only codes for chord are open sourced [here](https://github.com/joshuachang2311/chorder). | ||
|
||
* run `analyzer.py` | ||
|
||
## 4. From `midi_analyzed` to `Corpus` | ||
We quantize every thing (duration, velocity, bpm) in this step. Also append the data with EOS(end of sequence) token. | ||
|
||
* run `midi2corpus.py` | ||
|
||
## 5. From `Corpus` to `Representation` | ||
We have 2 kinds of representation - Compound Word (**CP**) and **REMI**, and 2 tasks - unconditional and conditional generation, which resulting 4 combinations. Go to corresponding folder `task\repre` and run the scripts. | ||
|
||
|
||
* run `corpus2events.py`: to generate human readable tokens and re-arrange data. | ||
* run `events2words.py`: to build dictionary and renumber the tokens. | ||
* run `compile.py`: to discard disqualified songs that exceeding length limits, reshape the data for transformer-XL, and generate mask for variable length. | ||
|
||
--- | ||
|
||
## AILabs.tw Pop17K dataset | ||
|
||
Alternatively, you can refer to [here](https://drive.google.com/drive/folders/1DY54sxeCcQfVXdGXps5lHwtRe7D_kBRV?usp=sharing) to obtain the entire workspace and pre-processed training data, which originally used in our paper. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
File renamed without changes.
Binary file not shown.
File renamed without changes.
Binary file not shown.
File renamed without changes.
Binary file not shown.
File renamed without changes.
Oops, something went wrong.