This page details the tokenizations featured by MidiTok. They inherit from :class:`miditok.MIDITokenizer`, see the documentation for learn to use the common methods. For each of them, the token equivalent of the lead sheet below is showed.
.. autoclass:: miditok.REMI :show-inheritance:
REMI+ is an extended version of :ref:`REMI` (Huang and Yang) for general multi-track, multi-signature symbolic music sequences, introduced in FIGARO (R眉tte et al.) <https://arxiv.org/abs/2201.10936>, which handle multiple instruments by adding Program
tokens before the Pitch
ones.
In the previous versions of MidiTok, we used to implement REMI+ as a dedicated class. Now that all the tokenizers supports the additional tokens in a more flexible way, you can get the REMI+ tokenization by using the :ref:`REMI` tokenizer with config.use_programs
and config.one_token_stream_for_programs
and config.use_time_signatures
set to True
.
.. autoclass:: miditok.MIDILike :show-inheritance:
.. autoclass:: miditok.TSD :show-inheritance:
.. autoclass:: miditok.Structured :show-inheritance:
.. autoclass:: miditok.CPWord :show-inheritance:
.. autoclass:: miditok.Octuple :show-inheritance:
.. autoclass:: miditok.MuMIDI :show-inheritance:
.. autoclass:: miditok.MMM :show-inheritance:
You can easily create your own tokenizer and benefit from the MidiTok framework. Just create a class inheriting from :class:`miditok.MIDITokenizer`, and override the :py:func:`miditok.MIDITokenizer._add_time_events`, :py:func:`miditok.MIDITokenizer._tokens_to_midi`, :py:func:`miditok.MIDITokenizer._create_vocabulary` and :py:func:`miditok.MIDITokenizer._create_token_types_graph` (and optionally if needed :py:func:`miditok.MIDITokenizer._midi_to_tokens`, :py:func:`miditok.MIDITokenizer._create_track_events` and :py:func:`miditok.MIDITokenizer._create_midi_events`) methods with your tokenization strategy.
If you think people can benefit from it, feel free to send a pull request on Github.