Skip to content

Latest commit

History

History
57 lines (41 loc) 路 2.06 KB

index.rst

File metadata and controls

57 lines (41 loc) 路 2.06 KB

Welcome to MidiTok's documentation!

MidiTok is a Python package for MIDI file tokenization, presented at the ISMIR 2021 LBDs (paper). It converts MIDI files to sequences of tokens ready to be fed to sequential Deep Learning models such as Transformers.

MidiTok features most known MIDI :ref:`tokenizations`, and is built around the idea that they all share common methods. It properly pre-process MIDI files, and supports :ref:`Byte Pair Encoding (BPE)`. Github repository

Installation

pip install miditok

MidiTok uses MIDIToolkit and Mido to read and write MIDI files, and BPE is backed by Hugging Face 馃tokenizers for super fast encoding.

Citation

If you use MidiTok for your research, a citation in your manuscript would be gladly appreciated. 鉂わ笍

You can also find BibTeX :ref:`citations` of tokenizations.

@inproceedings{miditok2021,
    title={{MidiTok}: A Python package for {MIDI} file tokenization},
    author={Fradet, Nathan and Briot, Jean-Pierre and Chhel, Fabien and El Fallah Seghrouchni, Amal and Gutowski, Nicolas},
    booktitle={Extended Abstracts for the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference},
    year={2021},
    url={https://archives.ismir.net/ismir2021/latebreaking/000005.pdf},
}

Contents

.. toctree::
   midi_tokenizer
   examples
   tokenizations
   bpe
   pytorch_data
   data_augmentation
   utils
   citations