# Melody Generation

+ **AI in Culture and Arts - Tech Crash Course**
+ **Date:** 06.06.2024
+ **Author:** B. Zönnchen

<a href="https://colab.research.google.com/github/aica-wavelab/aica-assignments/blob/main/A4_melody_generation/3_2_melody_representations.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In the following we will create music sheets and sound. For those tasks ``Python`` requires external programs that you should install if you are working locally:

1. [Musescore](https://musescore.org/de) (for generating sheets)
2. [FluidSynth](https://www.fluidsynth.org/) (for generating sound)

If you are working on google ``Colab``, you can evaluate the following to cells to install these applications:

In [None]:
#@title install dependencies to play sound
%%capture
print('installing fluidsynth...')
!apt-get install fluidsynth > /dev/null
!cp /usr/share/sounds/sf2/FluidR3_GM.sf2 ./font.sf2
print('done!')

In [None]:
#@title install dependencies to show score in music notation
%%capture
print('installing musescore3...')
!apt-get install musescore3 > /dev/null
print('done!')

In [None]:
#@title clone git repository
%%capture
%rm -rf aica-assignments
!git clone https://github.com/aica-wavelab/aica-assignments.git
%cd aica-assignments/A4_melody_generation

Furtheremore, for this notebook we need the following ``Python`` packages and moduls. Execute the cell to install them:

In [None]:
%pip install music21
%pip install pyfluidsynth

%pip install pandas
%pip install numpy
%pip install matplotlib

In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("3_2_melody_representations.ipynb")

In [None]:
import music21 as m21
from music21.note import Note
from music21.stream import Stream

# import functions
from pianoroll import stream_to_df, has_acceptable_duration, plot_df
from files import load_midi_files
from encoder import PianoRollEncoder, StringToIntEncoder

import zipfile

# 3.2 Representing Melodies

In case of a melody we can choose a special kind of representation since we know that there is no verticality, that is, the problem becomes one dimensional like text! In this section we talk more about representations, a topic which might sound boring but which is more important than you might think!

The ``.zip`` file ``data/deu_folk_songs.zip`` contains MIDI files of melodies.
These files represent our **training data** for our *deep learning* attempts later.

In [None]:
with zipfile.ZipFile('data/deu_folk_songs.zip', 'r') as zip_ref:
    zip_ref.extractall('data/deu_folk_songs/')

Let us unzip it. Then you can use the ``load_midi_files`` function to load all the MIDI files. It returns a list of ``Stream`` objects.
You can control how many files you want to load at max. using the ``max_file`` argument.

In [None]:
streams_folk_songs = load_midi_files('data/deu_folk_songs/', max_files=10) # load only 10 files
print(f'load {len(streams_folk_songs)} files')

In [None]:
streams_folk_songs[0].show('midi')

In [None]:
streams_folk_songs[0].show()

### 3.2.1 Note-Based Representation

The first straightforward representation that also works for polyphonic music is to use three numbers for each ``Note`` where ``Chord``s can be broken down into ``Note``s:

1. ``pitch``: The pitch of the note (in MIDI)
2. ``duration``: The duration in quarters of the note
3. ``step``: The time elapsed from the previous note or beginning of the track

We need ``step`` to be able to represent ``Rest``s!

We prepared a function ``stream_to_df`` that transforms a ``Stream`` object into a ``panda`` ``DataFrame`` follows this approach.

In [None]:
# pick a folk song
stream = streams_folk_songs[0]

In [None]:
dataframe = stream_to_df(stream)

In [None]:
dataframe.head()

``pitch`` is the MIDI note that is played over a duration of ``duration``. The note starts at ``start`` and ends at ``end = start + duration``. The step is the time elapsed from the previous note or start of the track. ``Rest``s have no explicit entry. The rows of the ``DataFrame`` are sorted by ``start``.

If the ``DataFrame`` is ordered according to start, we have all the information required by only selecting ``pitch``, ``duration``, and ``step`` use:

In [None]:
dataframe[['pitch', 'duration', 'step']].head()

The ``dataframe`` can also be displayed like a **piano roll** using ``plot_df``:

In [None]:
plot_df(dataframe)

This representation also works for polyphonic pieces:

In [None]:
# load a polyphonic score from a midi file
minuet_in_G = m21.converter.parse('data/Minuet_in_G.mid')

# convert it into a DataFrame
minuet_df = stream_to_df(minuet_in_G)
minuet_df.head()

In [None]:
plot_df(minuet_df)

### 3.2.2 Piano Roll-Based Representation

So far we looked at the **classical notation of Western music**. The **piano roll** is another representation of a score often used in digital audio workstations (DAWs). Above, we already saw a visualization of it. Since it is far more regular than the classical notation, it is often easier to be analysed by the computer.

A *piano roll* is basically a two-dimensional grid where the $x$-axes represents time which is **discretized into fixed time steps** and the $y$-axes represents the MIDI note number.
The value for one specific cell in the grid is ``1`` **if and only if** the MIDI note will be played at that time.
Otherwise the value is ``0``.

Time is discretized into small but equally large chunks called **time steps**. Because of this discretization, we cannot represent arbitrary durations of a note!
This simplifies the problem but also limits the expressiveness of the musical piece we can represent.
In our case, a time step is defined in multiples of quarter notes (we stick to the convention of ``music21``).
For example, if the ``time_step = 1.0`` meaning it represents the duration of one quarter note, then we can only represent durations that are multiples of a quarter note, that is:

```
1/4, 2/4, 3/4, 4/4, 5/4, ...
```

We cannot, for example, represent a duration of ``3/5``!
Therefore, if you are working with a piano roll like representation, you might want to filter your (training) data accordingly.
For this reason you can use ``has_acceptable_duration(stream, time_step)`` to test if the ``Stream`` can be represented by a piano roll using a time step equal to ``time_step``.

Futhermore, the function ``load_midi_files`` offers a parameter ``time_step`` to filter for ``Score``s compatible with a certain ``time_step`` in quarters. Internally, it uses the ``has_acceptable_duration()`` function.

In [None]:
time_step = 0.5 # which is effectively one eighth note
streams_folk_songs = load_midi_files('data/deu_folk_songs/', time_step=time_step, max_files=10) # load only 10 files
print(f'load {len(streams_folk_songs)} files that are compatible with a time step of {time_step}')
has_acceptable_duration(streams_folk_songs[0], time_step=time_step)

If one looks into the research literature, one can find representations that are based on the piano roll representation to train a deep neural network that is able to generate symbolic scores. However, to capture fine-grained dynamics in a performance, these models rely on a tiny ``time_step`` which would lead to massive amount of data. Imagine we would use a time step equal to 0.01 quarters and our piece is 100 quarters long. To represent this piece would mean to generate a sequence of length equal to $100 / 0.01 = 10000$!

To counteract this problem, they include special **events** to skip forward in time! These events are categorical thus they also rely on a *one-hot encoding*. If you are interested in this idea, here is one of the first papers discussing this approach: [This Time with Feeling: Learning Expressive Musical Performance](https://arxiv.org/abs/1808.03715)

<div class="alert alert-info">

**Instruction 3.2.1** Generate a ``Stream`` called ``stream`` such that ``has_acceptable_duration`` returns ``False`` for a ``time_step`` equal to ``1.0`` but returns ``True`` for a ``time_step`` equal to ``0.5``.

</div>

In [None]:
stream = Stream()
...

In [None]:
grader.check("q32")

<!-- BEGIN QUESTION -->

<div class="alert alert-info">

**Instruction 3.2.2**: In the follwoing we are interested in melodies (no ``Chord``s and only one ``Part``). What might be a good representation for melodies that is similarily regular than the piano roll representation but is one-dimensional?

</div>

_Type your answer here, replacing this text._

<!-- END QUESTION -->

To convert a ``Stream`` to a one-dimensional piano roll representation, you can use the ``PianoRollEncoder``:

In [None]:
# let us get two example streams
melody1 = streams_folk_songs[0]
melody2 = streams_folk_songs[1]

# Convert streams to piano roll DataFrame
time_step = 0.5
streams = [melody1, melody2]
print(f'1.0 quarter note is acceptable for stream1: {has_acceptable_duration(melody1, time_step=time_step)}')
print(f'1.0 quarter note is acceptable for stream2: {has_acceptable_duration(melody2, time_step=time_step)}')

piano_roll_encoder = PianoRollEncoder(time_step=time_step)
enc_streams, invalid_streams = piano_roll_encoder.encode_streams([melody1,melody2])

print(enc_streams[0])
print(enc_streams[1])

<!-- BEGIN QUESTION -->

<div class="alert alert-info">

**Instruction 3.2.3**: Can you figure out the meaning of the different symbols? What does e.g. 

```'70', '_', '_', '65', '62', '_', '65', '_', ...```

mean ?

</div>

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<div class="alert alert-info">

**Instruction 3.2.4**: Compare the resulting ``enc_streams`` with their respective ``Stream`` by using ``.show()``.

</div>

Now we have a nice list of characters in a piano roll like format. However, as we already explained in *3_1_one_hot_encoding*, it is preferable to convert everything into numbers ranging from $0$ to $m-1$ to be able (if desirable) to generate a *one-hot encoding*. We can do this by using the ``StringToIntEncoder``:

In [None]:
encoder = StringToIntEncoder(enc_streams)

melody1_enc = encoder.encode_sequence(enc_streams[0])
print(enc_streams[0])
print(melody1_enc)

In [None]:
melody2_enc =  encoder.encode_sequence(enc_streams[1])
print(enc_streams[1])
print(melody2_enc)

In [None]:
melody1_dec = encoder.decode_sequence(melody1_enc)
print(melody1_dec)

<!-- BEGIN QUESTION -->

<div class="alert alert-info">

**Instruction 3.2.5**: The variable ``streams_folk_songs`` contains 10 ``Stream`` objects which where generated from 10 different ``mid``-files.

1. Generate a piano roll plot for at least 2 of those ``Stream``s (use ``plot_df``)
2. Generate a list ``piano_rolls`` containing all encoded ``Stream``s using ``PianoRollEncoder``
3. Generate a list ``enc_piano_rolls`` containing all encoded ``Stream``s using ``StringToIntEncoder``

</div>

In [None]:
# (1)
...

In [None]:
# (1)
...

In [None]:
# (2)
piano_roll_encoder = ...
piano_rolls, _ = ...

In [None]:
stoi_encoder = ...
enc_piano_rolls = ...
print(enc_piano_rolls)

In [None]:
grader.check("q35")

<!-- END QUESTION -->



---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()