In [5]:
!pip install torch pretty_midi matplotlib


Collecting pretty_midi
  Downloading pretty_midi-0.2.10.tar.gz (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting matplotlib
  Downloading matplotlib-3.10.3-cp313-cp313-macosx_11_0_arm64.whl.metadata (11 kB)
Collecting mido>=1.1.16 (from pretty_midi)
  Using cached mido-1.3.3-py3-none-any.whl.metadata (6.4 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Downloading contourpy-1.3.2-cp313-cp313-macosx_11_0_arm64.whl.metadata (5.5 kB)
Collecting cycler>=0.10 (from matplotlib)
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading fonttools-4.58.0-cp313-cp313-macosx_10_13_universal2.whl.metadata (104 kB)
Collecting kiwisolver>=1.3.1 (from matplotlib)
  Downloading kiwisolver-1.4.8-cp313-cp313-macosx_11_0_arm64.whl.metadata (6.2 kB)
Collecting pillow>=8 (f

In [10]:
# Cell 1: Imports
import os
import torch
import torch.nn as nn
import torch.optim as optim
import random
import numpy as np
from torch.utils.data import Dataset, DataLoader
import pretty_midi
import matplotlib.pyplot as plt


# Assignment 2 Music Generation

## About: 

### Task 1: 
We are using the LSTM model to predict the next note in a melody. 
- to extend it out, we might be able to do like rhythmic prediction
- maybe also make our samples into note plots for peers to see


## Task 1: Symbolic Generation


In [12]:
# Import in dataset
import pickle

with open("JSB-Chorales-dataset-master/jsb-chorales-quarter.pkl", "rb") as f:
    data = pickle.load(f, encoding="latin1")

chorales = data["train"]  # You can also access 'valid' and 'test'

print(f"Loaded {len(chorales)} training chorales.")
print("Sample:", chorales[0][:5])


Loaded 229 training chorales.
Sample: [(np.int64(60), np.int64(72), np.int64(79), np.int64(88)), (np.int64(72), np.int64(79), np.int64(88)), (np.int64(67), np.int64(70), np.int64(76), np.int64(84)), (np.int64(69), np.int64(77), np.int64(86)), (np.int64(67), np.int64(70), np.int64(79), np.int64(88))]



### Dataset Context
The JSB Chorales dataset consists of 382 four-part harmonized chorales by J.S. Bach. It is widely used in symbolic music modeling and has been curated to support machine learning tasks. We use the version released by [Zhuang et al.](https://github.com/czhuang/JSB-Chorales-dataset), which contains quarter-note quantized sequences of chord events encoded as MIDI pitch tuples.

We selected the **soprano voice** to build a monophonic melody model using an LSTM.

### Preprocessing Steps
- Extract first pitch in each chord (soprano line)
- Remove silences/rests (`-1`)
- Build vocabulary of MIDI pitches
- Tokenize each melody to integer indices for model input




In [None]:
## Preprocess Dataset

### Modeling Approach
We formulate this as a next-token prediction task: given a sequence of pitches, the model predicts the next most likely note. This follows a standard language modeling approach in NLP.

#### Model Choice: LSTM
We use an LSTM because:
- It can model temporal dependencies
- It handles variable-length sequences
- It's computationally efficient for our small vocabulary

Alternatives include:
- n-gram models: simpler, but limited context
- Transformers: powerful, but more complex to train and tune

We chose the LSTM as a middle ground between expressive power and implementation complexity.


In [None]:
## train model

In [None]:
## Generate Samples


In [None]:
## Show note charts


### Evaluation
We evaluate our model using:
- Cross-entropy loss on held-out validation sequences
- Subjective listening: Does the output follow tonal structure? Does it avoid dissonance? Is it musically coherent?

We also compare against a trivial baseline: uniform sampling of notes from the vocabulary. The LSTM shows clear improvement in musical structure.


In [None]:
## Evalluation + analysis

## Task 2: Melody Harmonization
We can use the JSB Chorale dataset. With this, we can generate a harmony to the songs. This can be done using a chord or baseline harmonization.
