# TBD

In this lab, we introduce PyTorch basics, and how to use two deep music generation models designed by Music X Lab. Specifically,
* **EC $^2$-VAE** for monophonic pitch contour and rhythm disentanglement. 

    *(Ruihan Yang et al., Deep Music Analogy Via Latent Representation Disentanglement)*
    

1. The data representation (input & output) for each model.
2. Understand the model in a top-down order: starting from how to do style transfer using the example code, and looking into different models/modules while leaving the more detailed stuff as black boxes.


In [None]:
import numpy as np
import torch
import pretty_midi as pm
import matplotlib.pyplot as plt
import os

if not os.path.exists('./demo'):
    os.mkdir('./demo')

## Intro to Pytorch

### Question 1: read through the entire project. Understand the code as much as you can in a top-down manner. Relate the code with the model diagrams.

## Part one: EC $^2$-VAE

Let's first preprare the trained model. We first initialize our model structure and then load its parameters.
* The model structure is defined in the class `ec2vae.model.EC2VAE`. 
* The model parameter is saved in a `.pt` file.

In [None]:
from ec2vae.model import EC2VAE

In [None]:
# initialize the model
ec2vae_model = EC2VAE.init_model()

# load model parameter
ec2vae_param_path = './ec2vae/model_param/ec2vae-v1.pt'
ec2vae_model.load_model(ec2vae_param_path)

Then, let's prepare some data and manipulate their latent codes. We use an array of length 32 to represent a 2-bar melody, where each time step corresponds to a 16-th note: 0-127 are MIDI pitches, 128 for sustain, and 130 for rest.

In [None]:
# x1: "From the new world" melody
x1 = np.array([64, 128, 128, 67, 67, 128, 128, 128, 64, 128, 128, 62, 60, 128, 128, 128,
               62, 128, 128, 64, 67, 128, 128, 64, 62, 128, 128, 128, 129, 129, 129, 129])

# x2: C4, sixteenth notes.
x2 = np.array([60] * 32)

We'll need to turn note arrays into one-hot vectors, i.e., piano-rolls.

In [None]:
def note_array_to_onehot(note_array):
    pr = np.zeros((len(note_array), 130))
    pr[np.arange(0, len(note_array)), note_array.astype(int)] = 1.
    return pr

In [None]:
pr1 = note_array_to_onehot(x1)
pr2 = note_array_to_onehot(x2)

In [None]:
plt.imshow(pr1, aspect='auto')
plt.title('Display pr1')
plt.show()

Melody should be further converted to pytorch tensors, and to cuda/cpu. We should also unsqueeze a batch dimension.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# to pytorch tensor
pr1 = torch.from_numpy(pr1)

# to float32
pr1 = pr1.float()  

# to device (if to cpu, the operation can be omitted.)
pr1 = pr1.to(device)

# unsqueeze the batch dim
pr1 = pr1.unsqueeze(0)


# Concert pr2 similarly
pr2 = torch.from_numpy(pr2).float().to(device).unsqueeze(0)

In [None]:
print(pr1.size(), pr2.size())

Next, define the chords. In EC $^2$-VAE, we use 12-dim chord chroma representation. Chord is a time-series consisting of 32 tokens (16-th notes).

In [None]:
# some useful chords.
amin = [1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0]
gmaj = [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1]
fmaj = [1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0]
emin = [0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1]
cmaj = [1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0]
cmin = [1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0]

In [None]:
# c1: Cmaj - - - | Gmaj - - - ||
c1 = np.array([cmaj] * 16 + [gmaj] * 16)

# c2: Amin - Gmaj - | Fmaj - Emin - ||
c2 = np.array([amin] * 8 + [gmaj] * 8 + [fmaj] * 8 + [emin] * 8)

# no chord
c3 = np.zeros((32, 12))

In [None]:
c1 = torch.from_numpy(c1).float().to(device).unsqueeze(0)
c2 = torch.from_numpy(c2).float().to(device).unsqueeze(0)
c3 = torch.from_numpy(c3).float().to(device).unsqueeze(0)

Start to run model. Encode $z_p$ and $z_r$ by calling the encoder.
    

In [None]:
# encode melody 1 and chord C-G
zp1, zr1 = ec2vae_model.encoder(pr1, c1)

# encode melody 2 and "no chord"
zp2, zr2 = ec2vae_model.encoder(pr2, c3)

In [None]:
print(zp1.size(), zr1.size(), zp2.size(), zr2.size())

Let's do the reconstruction of `x_1` and do the what-if generation. We'll use `zp1` and its chord under 16-th note rhythm. We will also try a new chord progression.

In [None]:
pred_recon = ec2vae_model.decoder(zp1, zr1, c1)
pred_new_rhythm = ec2vae_model.decoder(zp1, zr2, c1)
pred_new_chord = ec2vae_model.decoder(zp1, zr1, c2)

The output should be put back to cpu and to numpy.

In [None]:
out_recon = pred_recon.squeeze(0).cpu().numpy()
out_new_rhythm = pred_new_rhythm.squeeze(0).cpu().numpy()
out_new_chord = pred_new_chord.squeeze(0).cpu().numpy()

In [None]:
out_new_rhythm.shape

Write the generation to MIDI files. The following function converts note array to a list of pretty_midi Notes.

In [None]:
notes_recon = ec2vae_model.__class__.note_array_to_notes(out_recon, bpm=120, start=0.)
notes_new_rhythm = ec2vae_model.__class__.note_array_to_notes(out_new_rhythm, bpm=120, start=0.)
notes_new_chord = ec2vae_model.__class__.note_array_to_notes(out_new_chord, bpm=120, start=0.)

The following function coverts chord to a list of pretty_midi notes.

In [None]:
notes_c1 = ec2vae_model.__class__.chord_to_notes(c1.squeeze(0).cpu().numpy(), 120, 0)
notes_c2 = ec2vae_model.__class__.chord_to_notes(c2.squeeze(0).cpu().numpy(), 120, 0)

Generate three MIDI files. Note:
1. The original "From the new world" melody should be played with `c1`.
2. The melody transferred to 16-th note rhythm should also be played with `c1`.
3. The melody transferred to a new chord progression should be played with `c2`.

In [None]:
def generate_midi_with_melody_chord(fn, mel_notes, c_notes):
    midi = pm.PrettyMIDI()
    ins1 = pm.Instrument(0)
    ins1.notes = mel_notes
    ins2 = pm.Instrument(0)
    ins2.notes = c_notes
    midi.instruments.append(ins1)
    midi.instruments.append(ins2)
    midi.write(fn)

In [None]:
generate_midi_with_melody_chord('./demo/ec2vae-recon.mid', notes_recon, notes_c1)
generate_midi_with_melody_chord('./demo/ec2vae-new-rhythm.mid', notes_new_rhythm, notes_c1)
generate_midi_with_melody_chord('./demo/ec2vae-new-chord.mid', notes_new_chord, notes_c2)

### Question 2:
1. Write a new melody (maybe with new chord progression). Try transferring the original melody to the new melody contour. During encoding and decode, which chord should we use as condition?
2. Change chord and use same zp and zr. Check the controllability from chord condition. (Our model should not perform very well. Test it on your own!)
3. Consider longer melody, change it per 2-bar.
4. More to explore: sampling from the prior or posterior. (Hint: to get the posterior distribution, re-write the encoder function output.)