***
*Project:* Expressive Piano Generation

*Author:* Jingwei Liu (Music department, UC San Diego)
***

# <span style="background-color:darkorange; color:white; padding:2px 6px">Software & ReadMe</span> 


# Expressive Piano Performance Generation (MIDI Format)

## 1. Audio vs. MIDI

- The music scene is shifted towards audio based composition and production, thus symbolic music generation is marginalized.

- Symbolic music is frequently criticized for its stiffness and non-flexibility in generating listening-based music.

- We argue that, **under the same expressivity of music replay, MIDI as a much more concise representation has advantage over the raw audio format in generation models**.

## 2. Listening-based Data Processing

- **Abandonment of fixed grid.** Use time-shift events and duration measured in miliseconds to generate expressive timing.

- A homogeneous treatment of monophony and polyphony. We claim that there is no real simultaneity of notes. For any two notes that are played by a human performer, there is always a time discrepancy between them, no matter how unnoticeable it is. It means that, since there are no simultaneous events, **we can always place the notes in sequential order, by their time onsets.**

- Not only the notes matter. **The control events in MIDI may play a crucial role in musical expressivity** (eg. sustain pedal in piano generation). Please listen to "Original.MID" and "no_sustain.mid" for comparison.

- Mel quantization of auditory features. Instead of equal division, like the Mel spectrogram, we divide the ranges into uneven chunks to better reflect the perceptual truth. We refer to **Weber’s law** for just noticeable differences as our theoretical foundation for the divisions, where the noticeable difference is proportional to the current value.

<img src="division_1.png" style="width:800px">
<caption><center> Figure 1. The categorical distributions for given input features. The divisions obey Weber's law where the perceptual changes are proportional to the values. </center></caption>

## 3. Multi-argument Sequential Model

<img src="LSTM-ATT_2.jpg" style="width:800px">
<caption><center> Figure 2. LSTM-Attention cell. A recurrent neural network designed for Multi-input-output generative system. </center></caption>

### Instructions
Please install [Jupyter notebook](https://jupyter.org/install) and run the following code block by block. The generation is in the traditional sense:

User inputs: 
- Initial note or sequence, numpy array of shape (5, num_notes)
- Number of notes for generation, an integer

The model will take in the initial array and generate continuation for it for a given length. The final output is a MIDI file that's ready to be played at any DAWs.

**[Important]**: The MIDI file contains not only note events but also control events that specify the sustain pedal status. Please use `Ableton Live` or any quality DAW to play the file and use the instrument "grand/classical piano". The dataset is collected by the performances on Yamaha grand pianos. Please do not use any notational software such as `MuseScore` that quantizes inputs and eliminates control information.

In [1]:
import numpy as np
import pandas as pd
import py_midicsv as pm
import torch
import utils as ut

If you cannot import any packages from this block, please make sure that you installed them at the right place. `numpy`, `pandas`, `torch` are standard Python packages that're widely used; `py_midicsv` is a package online that converts MIDI and CSV formats, please [search and install it](https://pypi.org/project/py-midicsv/); `utils` is a Python file containing developer-defined functions, which is included in the same folder.

In [2]:
# Load default values
note_tp, p_note, up, down, e_time, time_ratio, ts_division, dur_division, vel_division = ut.default_terms()
example = np.load('generate_condition.npy')
Example_index = ut.process_input(example, note_tp, p_note, up, down, e_time, time_ratio, ts_division, dur_division, vel_division)

d_an = 100
d_ad = 140
d_av = 50
d_at = 110
d_xn = 88
d_xd = 120
d_xv = 47
d_xt = 105
d_c = 350

parameters = torch.load('parameters.pt',weights_only=True)
for i in range(len(parameters)):
    parameters[i] = parameters[i].detach().to("cpu")
[W_fa, W_fx, b_f, W_ua, W_ux, b_u, W_ca, W_cx, b_c, W_n, b_n, W_d, b_d, W_v, b_v, W_t, b_t, 
 K_n, A_n, K_d, A_d, K_v, A_v, K_t, A_t, W_yan_tld, b_yn, W_yad_tld, b_yd, 
W_yav_tld, b_yv, W_yat_tld, b_yt, W_pedal, b_pedal] = parameters

### Initial Sequence

The generation can start with a note or arbitrary length of melody (monophonic or polyphonic). All it needs is the 5 values associated with each note. The initial input should be given as **a numpy array of shape (5, num_notes)** with each subsequent column slice representing a note in non-decreasing time onset manner. The 5 defining features are:

- [row 0] Note value (n): a MIDI note number in range $[21,108]$, e.g. 56
- [row 1] Duration_ms (d): duration of the note in miliseconds, e.g. 35
- [row 2] Velocity (v): a number in range $[0,127]$, the MIDI default velocity representation, e.g. 100
- [row 3] Time shift (t): the onset difference between two subsequent note in miliseconds, $t = 0$ gives perfectly simultaneous notes, e.g. 0
- [row 4] Sustain pedal (p): the status of the sustain pedal, with binary value on/off (1/0), e.g. 1

A note at position $i$ with these examplary values will look like

$$
\mathbf{I}[:,i] = 
\begin{pmatrix}
56 \\
35 \\
100 \\
0 \\
1
\end{pmatrix}
$$

The input can be a single note or a conditional melody (a sequence of notes). For convenience, the user can generate a random note with `ut.initial_notes("random note", 0, 0, 0)` or an initial sequence from a chosen example with `ut.initial_notes("melody", n, m, example)`, where the sequence is chopped from note $n$ to $m$ in the exemplary piece.

In [35]:
input_type = "random note" # choose from "random note" and "melody"
n = 0                # start position for melody
m = 10               # end position for melody

In [36]:
Input = ut.initial_notes(input_type, n,m, example)

In [37]:
Input

array([[44],
       [62],
       [61],
       [ 0],
       [ 1]])

In [38]:
# set initial memory type for the single note: c = 0 or c = some random value
init_type = "random_init"   # Choose from "zero_init" and "random_init"

In [39]:
Input_index = ut.process_input(Input, note_tp, p_note, up, down, e_time, time_ratio, ts_division, dur_division, vel_division)
c0 = ut.prev_pass(d_c,d_xn,d_xd,d_xv,d_xt,Example_index,"melody", "zero_init",m,
                 W_fa, W_fx, b_f, W_ua, W_ux, b_u, W_ca, W_cx, b_c, W_n, b_n, W_d, b_d, W_v, b_v, W_t, b_t)

### Generate New Sequence

In [25]:
temperature = 1

This parameter can control the randomness of the generation: 
- temperature -> oo, uniform distribution; 
- temperature -> 0, dirac delta distribution

In [26]:
dur_division = np.append(dur_division, 40395)
vel_division = np.append(vel_division, 110)
ts_division = np.append(ts_division, 35000)

In [27]:
num = 1000 # number of notes to generate

In [28]:
# generate sequence
Generation = np.zeros((5,num),dtype = int)
[x_n, x_d, x_v, x_t,pedal] = Input_index[:,-1]
transit = 0
vel = Input[2,-1]
c_prev = c0
for i in range(num):
    c, x_n_nxt, x_d_nxt, x_v_nxt, x_t_nxt = ut.generate_step(d_xn,d_xd,d_xv,d_xt,x_n, x_d, x_v, x_t, pedal, c_prev,
                 W_fa, W_fx, b_f, W_ua, W_ux, b_u, W_ca, W_cx, b_c, W_n, b_n, W_d, b_d, W_v, b_v, W_t, b_t,
                 K_n, A_n, K_d, A_d, K_v, A_v, K_t, A_t,
                 W_yan_tld, b_yn, W_yad_tld, b_yd, W_yav_tld, b_yv,W_yat_tld, b_yt,temperature)
    
    c_prev = c
    [x_n, x_d, x_v, x_t] = [x_n_nxt, x_d_nxt, x_v_nxt, x_t_nxt]
    
    note = x_n_nxt + 21
    dur = np.random.choice(np.arange(dur_division[x_d_nxt],dur_division[x_d_nxt+1]))
    vel_change = np.random.choice(np.arange(vel_division[x_v_nxt],vel_division[x_v_nxt+1]))
    vel = np.clip(vel + vel_change,20,120)
    ts = np.random.choice(np.arange(ts_division[x_t_nxt],ts_division[x_t_nxt+1]))
    transit = transit + ts
    if transit > 20:
        y_p = torch.sigmoid(torch.matmul(W_pedal,c) + b_pedal)
        pedal = np.random.choice(2,p=[1-y_p.item(),y_p.item()])
        transit = 0
    Generation[:,i] = np.array([note, dur, vel, ts, pedal])

In [29]:
Generation

array([[ 73,  69,  64, ...,  44,  52,  56],
       [ 67,  55,  68, ...,  31,  48,  55],
       [ 73,  20,  75, ..., 120, 112, 110],
       [ 60, 126,  35, ...,   8,   6,   4],
       [  1,   1,   1, ...,   1,   1,   1]])

In [30]:
Output = np.concatenate((Input,Generation),axis=1)
Output

array([[ 80,  68,  56, ...,  44,  52,  56],
       [170, 114,  53, ...,  31,  48,  55],
       [ 91,  87,  80, ..., 120, 112, 110],
       [  0,   4,  12, ...,   8,   6,   4],
       [  1,   1,   1, ...,   1,   1,   1]])

In [31]:
Output.shape[1]

1100

### Write to MIDI file

In [32]:
time = np.cumsum(Output[3,:])
MIDI_format = pd.DataFrame(columns=['Time','Type','Note','Velocity'])
pedal = 0
for i in range(Output.shape[1]):
    MIDI_format = pd.concat([MIDI_format, pd.DataFrame([{'Time': time[i],
                         'Type': 'Note_on_c',
                         'Note': Output[0,i],
                         'Velocity': Output[2,i]}])], ignore_index=True)
    MIDI_format = pd.concat([MIDI_format, pd.DataFrame([{'Time': (time[i] + Output[1,i]),
                         'Type': 'Note_off_c',
                         'Note': Output[0,i],
                         'Velocity': 0}])], ignore_index=True)
    if Output[4,i] != pedal:
        pedal = Output[4,i]
        if i == 0:
            MIDI_format = pd.concat([MIDI_format, pd.DataFrame([{'Time': time[i],
                                 'Type': 'Control_c',
                                 'Note': 64,
                                 'Velocity': pedal*127}])], ignore_index=True)
        else:
            MIDI_format = pd.concat([MIDI_format, pd.DataFrame([{'Time': np.random.choice(np.arange(time[i-1],time[i])),
                                 'Type': 'Control_c',
                                 'Note': 64,
                                 'Velocity': pedal*127}])], ignore_index=True)
MIDI_format = MIDI_format.sort_values('Time')
MIDI_format = MIDI_format.reset_index(drop=True)
MIDI_format

Unnamed: 0,Time,Type,Note,Velocity
0,0,Note_on_c,80,91
1,0,Control_c,64,127
2,4,Note_on_c,68,87
3,16,Note_on_c,56,80
4,25,Note_on_c,44,78
...,...,...,...,...
2212,114224,Note_off_c,44,0
2213,114247,Note_off_c,52,0
2214,114258,Note_off_c,56,0
2215,114297,Note_off_c,76,0


In [33]:
generated_csv = open("midi.csv", 'w')
generated_csv.write("0,0,Header,0,1,480\n")
generated_csv.write("1,0,Start_track\n")
generated_csv.write("1,0,Tempo,480000\n")
generated_csv.write("1,0,Program_c, 0, 0\n")

for i in range(len(MIDI_format)):
    generated_csv.write("1,"+str(MIDI_format['Time'][i])+","+ MIDI_format['Type'][i] +",0,"+str(MIDI_format['Note'][i])+","+str(MIDI_format['Velocity'][i])+"\n")
    
end_time = MIDI_format['Time'][i] + 480
generated_csv.write("1," + str(end_time) +", End_track\n")
generated_csv.write("0, 0, End_of_file")
generated_csv.close()

In [34]:
# Parse the CSV output of the previous command back into a MIDI file
midi_object = pm.csv_to_midi("midi.csv")

# Save the parsed MIDI file to disk
with open("C://Users/79244/Desktop/generated_lstm.mid", "wb") as output_file:
# with open("generated_note_55_61_42_0_0_zero.mid", "wb") as output_file:
    midi_writer = pm.FileWriter(output_file)
    midi_writer.write(midi_object)

Now you can find the generated MIDI file in the same folder as this notebook. Just import it to a DAW and play (as grand piano)!