# Melody Generation

+ **AI in Culture and Arts - Tech Crash Course**
+ **Date:** 06.06.2024
+ **Author:** B. Zönnchen

<a href="https://colab.research.google.com/github/aica-wavelab/aica-assignments/blob/main/A4_melody_generation/3_1_one_hot_encoding.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
#@title install dependencies to play sound
%%capture
print('installing fluidsynth...')
!apt-get install fluidsynth > /dev/null
!cp /usr/share/sounds/sf2/FluidR3_GM.sf2 ./font.sf2
print('done!')

In [None]:
#@title install dependencies to show score in music notation
%%capture
print('installing musescore3...')
!apt-get install musescore3 > /dev/null
print('done!')

In [None]:
#@title clone git repository
%%capture
%rm -rf aica-assignments
!git clone https://github.com/aica-wavelab/aica-assignments.git
%cd aica-assignments/A4_melody_generation

For this notebook we need the following ``Python`` packages and moduls. Execute the cell to install them:

In [None]:
%pip install pandas
%pip install numpy
%pip install tensorflow

%pip install otter-grader

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf

In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("3_1_one_hot_encoding.ipynb")

# 3. Representations

## 3.1 The One-Hot Encoding

Let us first think about what kind of data we want to use. And let us first consider the general case, i.e. some ``Score`` consisting of all kind of musical objects such as ``Note``s, ``Chord``s, bars, measurements and so on. A ``Score`` is a ``Stream``, that is, linear sequence of objects, which we call (musical) **events**.

As we saw, *pitch* and *duration* can be represented by floating point numbers. *Pitch* can be either the MIDI note (which can also be rational) or frequency. Since the single input of a neural network can be a floating point number, we could decide to use two inputs:

<img src="figs/float_input_ann.png" alt="" height="250">

However, we have to consider that if we think in terms of dissonant and consonant then MIDI notes close in value are not close with respect to consonance. Relations of notes might have a certain usage in a piece that is completely unrelated to their specific numerical value in MIDI note or frequency. In case of a simple melody, this is probably not that relevant. However, what if we want to use events such as a bar or a change in measurement? Or what if we want to work with ``noteOn``, ``noteOff`` events? Or what about ``Rest``s which have no pitch? How are they related to ``Note``s? These events might be categorical and not numerical.

The *one-hot encoding* can be useful if we want to deal with categorical data that can not be ordered. Especially if we deal with sequences, one-hot encoding can work well because it can help the neural network to learn pattern without assuming any inherent ordering in the input. But how does it work?

Imagine a vector of numbers representing the temperature of a metal block. The number indicates how hot the block is at a position. One-hot, means that the temperature is concentrated at one specific point. Thus, the vector consists of one ``1`` and all the other values are ``0``.
Suppose we have the categories:

```
Coat
Pants
Shoes
Jacket
```

Then a one-hot encoding would look like this:

```
   Coat      Pants     Shoes     Jacket
    1          0         0         0
    0          1         0         0
    0          0         1         0
    0          0         0         1
```

And the neural network would look like this:

<img src="figs/one_hot_input_ann.png" alt="" height="250">

The input of the network would be vectors like these:

In [None]:
inputs = [[0, 0, 1, 0], [0, 0, 1, 0], [0, 1, 0, 0]]

This means that only one input node fires.

One disadvantages of this approach is that the dimensionality of the input can explode. For ``128`` MIDI notes (without considering duration) we would require ``128``-dimensional vectors.

The following code generates a ``DataFrame`` of one hot-encoded colors, where each row represents one color.

In [None]:
# Sample data
data = {'Color': ['Red', 'Blue', 'Green', 'Yellow', 'Red', 'Red' ]}
df = pd.DataFrame(data)

# One-hot encoding
one_hot_encoded_df = pd.get_dummies(df, columns=['Color'])

print(one_hot_encoded_df)

In ``Tensorflow`` ``tf.one_hot(input, depth=num_classes)`` does the trick but here we have to privide a ``numpy`` array of natural numbers ranging from ``0`` to ``depth-1``. This basically means that we have to assign consecutive natural numbers starting from ``0`` to our categories beforehand.

In [None]:
X = np.array([0, 0, 1, 3, 1, 6])

X_one_hot = tf.one_hot(X, depth=7).numpy().astype('float32')
X_one_hot

<!-- BEGIN QUESTION -->

<div class="alert alert-info">

**Instruction 3.1.1** Explain how a *one-hot* encoding of the following sequence of **words** might look like.


``I`` ``like`` ``to`` ``eat`` ``and`` ``I`` ``like`` ``to`` ``play``

How many dimensions are required if you only want to encode this sequence?

</div>

_Type your answer here, replacing this text._

<!-- END QUESTION -->

