# Bach-Style Music Generation

This notebook loads Bach chorale datasets, preprocesses them, trains a Conv1D+LSTM model, and generates new chorales. It now includes:
- Importing Libraries
- Understandinf the Dataset
- Data Preprocessing
- Model Building and Training
- Generating Music with Model
- Insights and Future Impovements

## Importing Libraries

In [36]:
# Import Libraries
import pandas as pd
import os
import numpy as np
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, Dense, Embedding, LSTM, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Nadam
from music21 import stream, chord

## Understanding the Dataset

In [37]:
# Load one example file for inspection
df = pd.read_csv('/content/drive/MyDrive/Dataset/train/chorale_000.csv')
df.head()

Unnamed: 0,note0,note1,note2,note3
0,74,70,65,58
1,74,70,65,58
2,74,70,65,58
3,74,70,65,58
4,75,70,58,55


In [38]:
# Dataset directories
data_base_path = '/content/drive/MyDrive/Dataset'
train_dir_path = os.path.join(data_base_path, 'train')
test_dir_path = os.path.join(data_base_path, 'test')
valid_dir_path = os.path.join(data_base_path, 'valid')

# Collect all CSV files
train_files = sorted([os.path.join(train_dir_path, f) for f in os.listdir(train_dir_path) if f.endswith('.csv')])
test_files = sorted([os.path.join(test_dir_path, f) for f in os.listdir(test_dir_path) if f.endswith('.csv')])
valid_files = sorted([os.path.join(valid_dir_path, f) for f in os.listdir(valid_dir_path) if f.endswith('.csv')])

# Load as lists of lists
train_data = [pd.read_csv(f).values.tolist() for f in train_files]
test_data = [pd.read_csv(f).values.tolist() for f in test_files]
valid_data = [pd.read_csv(f).values.tolist() for f in valid_files]

In [39]:
# An example of what to expect
from music21 import stream, chord

chorale=train_data[20];

s=stream.Stream();
for row in chorale:
  s.append(chord.Chord([n for n in row if n], quarterLength=1))

s.show('midi')

## Data Preprocessing

In [40]:
min_note, max_note = 36, 81
window_size, window_offset, batch_size = 32, 16, 32

def make_xy(chorales):
    """Convert chorales into sliding windows for model training.

    Args:
        chorales (list): List of chorale sequences.
    Returns:
        X, Y: Flattened input/output sequences for next-note prediction.
    """
    # get segments of 33 chords with 16 chords offset between them
    windows = [c[i:i + window_size + 1] for c in chorales for i in range(0, len(c) - window_size, window_offset)]

    data = np.array(windows, dtype=int)
    # if note is 0, keep it, otherwise rescale notes from 36-81 to 1-46
    data = np.where(data==0, 0, data - min_note + 1)
    # make the range 0-46 in total
    data = np.clip(data, 0, max_note - min_note + 1)

    flat = data.reshape(data.shape[0], -1)

    # return every chord except the last one for X (32 notes) and everything except the first one for Y (32 notes)
    return flat[:, :-1], flat[:, 1:]

X_train, Y_train = make_xy(train_data)
X_test, Y_test = make_xy(test_data)
X_valid, Y_valid = make_xy(valid_data)

In [41]:
X_train.shape  # 3111 times 32 chords with 4 notes each

(3111, 131)

In [42]:
Y_train.shape  # same but shifted by one

(3111, 131)

In [43]:
train_notes = set([z for x in train_data for y in x for z in y])
test_notes = set([z for x in test_data for y in x for z in y])
valid_notes = set([z for x in valid_data for y in x for z in y])

num_notes = len(set.union(train_notes, test_notes, valid_notes))
num_notes

47

## Model Building and Training

In [44]:
model = Sequential()

# Embedding layer so the model can learn representations of the notes (learns vector representation of notes)
# Integers have no geometry, by learning embeddings, we can keep "nearby" notes close in vector space (e.g., same pitch-class, close octaves etc.)
# Since we don't have too many notes, we can use a small dimensionality like 5
model.add(Embedding(input_dim=num_notes, output_dim=5, input_shape=[None]))
# 1D convs allow us to extract temporal patterns in parallel (unlike RNN layers)
# 1D convs slide 1D kernels / filters over our feature vector to learn temporal patterns
# Padding causal means we cannot look ahead, so we keep causality
model.add(Conv1D(32, kernel_size=2, padding="causal", activation="relu"))  # here 32 filters of size 2
# Batch norms after each conv keep activations well-scaled and consistent across the whole stack of layers
# Counteracts vanishing / exploding gradients, allows for higher stable learning rates and faster training
model.add(BatchNormalization())
# Dilation rate means how much we look back
# For example kernel size 2 and dilation rate of one means look at t and t-1
# Kernel size 2 and dilation rate of two means look at t and t-2
# Kernel size 2 and dilation rate of 16 means look at t and t-16
# Stacking these increasing dilation rates allows us to efficiently cover short, medium and longer history
# We grow the receptive field without using a lot of model parameters
# We indirectly connect notes that are up to 32 positions apart
model.add(Conv1D(48, kernel_size=2, padding="causal", activation="relu", dilation_rate=2))
model.add(BatchNormalization())
# Also, since we increase dilation rate, each conv layer sees a wider time span
# More kernels / filters allow us to capture more kinds of patterns
# If we don't increase this, we could end up with a bottleneck here
model.add(Conv1D(64, kernel_size=2, padding="causal", activation="relu", dilation_rate=4))
model.add(BatchNormalization())
model.add(Conv1D(96, kernel_size=2, padding="causal", activation="relu", dilation_rate=8))
model.add(BatchNormalization())
model.add(Conv1D(128, kernel_size=2, padding="causal", activation="relu", dilation_rate=16))
model.add(BatchNormalization())
# Just a bit of regularization here so the model does not rely too much on individual features
model.add(Dropout(0.05))
# The conv layers summarized local and mid-range context into richer features for us
# The LSTM now only has to track longer structure of the music
# Doing LSTM first would lead to discovering local AND long patterns, which is slower and harder to optimize
# Convs first is cheaper and parallelizable
# We can basically combine the layers to look back 32 tokens (like a binary system numbers)
model.add(LSTM(256, return_sequences=True))
# Finally a dense layer to project the LSTM output to logits for each possible note
model.add(Dense(num_notes, activation='softmax'))

model.summary()

optimizer = Nadam(1e-3)
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

history = model.fit(X_train, Y_train, epochs=20, batch_size=batch_size, validation_data=(X_valid, Y_valid))

  super().__init__(**kwargs)


Epoch 1/20
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m73s[0m 668ms/step - accuracy: 0.3360 - loss: 2.5894 - val_accuracy: 0.0782 - val_loss: 3.6898
Epoch 2/20
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m65s[0m 665ms/step - accuracy: 0.7651 - loss: 0.9032 - val_accuracy: 0.1439 - val_loss: 3.3544
Epoch 3/20
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 649ms/step - accuracy: 0.7960 - loss: 0.7284 - val_accuracy: 0.2535 - val_loss: 3.1563
Epoch 4/20
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m65s[0m 667ms/step - accuracy: 0.8119 - loss: 0.6475 - val_accuracy: 0.2738 - val_loss: 2.8145
Epoch 5/20
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 656ms/step - accuracy: 0.8206 - loss: 0.6050 - val_accuracy: 0.4243 - val_loss: 1.9822
Epoch 6/20
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m87s[0m 703ms/step - accuracy: 0.8332 - loss: 0.5565 - val_accuracy: 0.6580 - val_loss: 1.1393
Epoch 7/20
[1m98/98[

## Generating Music with Model

In [45]:
def sample_next_note(probs):
    """Sample next token from probability distribution."""
    probabilities = np.asarray(probs, dtype=float)  # probabilities for each note to be the next

    probs_sum = probabilities.sum()  # get the sum for normalization

    # if the probability sum is zero, negative or infinite -> just return the note with the highest probability
    if probs_sum <= 0 or not np.isfinite(probs_sum):
        return int(np.argmax(probabilities))

    probabilities /= probs_sum # otherwise normalize the probabilities to be between 0 and 1
    return np.random.choice(len(probabilities), p=probabilities)  # randomly select a note based on probability


def generate_chorale(model, seed_chords, length):
    """Generate a new chorale using the trained model."""
    token_sequence = np.array(seed_chords, dtype=int)  # get starting chords / notes
    token_sequence = np.where(token_sequence == 0, token_sequence, token_sequence - 36 + 1)  # map all notes to 0-46 as in training
    token_sequence = token_sequence.reshape(1, -1)

    # we generate note by note, not chord by chord
    for _ in range(length * 4):
        next_token_probabilities = model.predict(token_sequence, verbose=0)[0, -1]  # get probabilities for next note from model
        next_token = sample_next_note(next_token_probabilities)  # sample from probabilities with sample function
        token_sequence = np.concatenate([token_sequence, [[next_token]]], axis=1)

    token_sequence = np.where(token_sequence == 0, token_sequence, token_sequence + 36 - 1)   # map to MIDI (0 & 36-81)

    return token_sequence.reshape(-1, 4)

In [46]:
# Initial chords (seed)
seed_chords = test_data[2][:8]

chorale = seed_chords
s = stream.Stream()
for row in chorale:
    s.append(chord.Chord([n for n in row if n], quarterLength=1))
s.show('midi')

In [47]:
# Complete actual chorale (ground truth)
seed_chords = test_data[2]

chorale = seed_chords
s = stream.Stream()
for row in chorale:
    s.append(chord.Chord([n for n in row if n], quarterLength=1))
s.show('midi')

In [48]:
# Generate new chords based on initial two chords (8 notes)
# Results can be better and more creative by introducing temperature, top_p etc.
seed_chords = test_data[2][:8]
new_chorale = generate_chorale(model, seed_chords, 56)

new_chorale

array([[73, 68, 61, 53],
       [73, 68, 61, 53],
       [73, 68, 61, 53],
       [73, 68, 61, 53],
       [69, 66, 61, 54],
       [69, 66, 61, 54],
       [69, 66, 61, 54],
       [69, 66, 61, 54],
       [68, 64, 61, 61],
       [68, 64, 61, 61],
       [68, 64, 61, 61],
       [68, 64, 59, 57],
       [66, 64, 57, 63],
       [66, 64, 57, 63],
       [66, 64, 59, 63],
       [66, 64, 59, 63],
       [66, 62, 61, 59],
       [66, 62, 61, 59],
       [66, 62, 59, 59],
       [66, 62, 59, 59],
       [66, 61, 57, 57],
       [66, 61, 57, 57],
       [66, 61, 57, 55],
       [66, 61, 57, 55],
       [64, 61, 56, 49],
       [64, 61, 56, 49],
       [64, 61, 56, 49],
       [64, 59, 56, 49],
       [62, 59, 56, 57],
       [62, 59, 56, 52],
       [62, 59, 56, 52],
       [62, 59, 56, 52],
       [62, 59, 54, 52],
       [62, 59, 54, 52],
       [64, 59, 56, 52],
       [64, 59, 56, 52],
       [64, 59, 56, 52],
       [64, 59, 56, 52],
       [71, 64, 56, 55],
       [71, 64, 56, 55],


In [50]:
# listen to generated piece
chorale = new_chorale.tolist()
s = stream.Stream()
for row in chorale:
    s.append(chord.Chord([n for n in row if n], quarterLength=1))
s.show('midi')

In [51]:
def generate_random_chorale(length, rest_probability=0.2, pitch_low=36, pitch_high=81, seed=None):
    rng = np.random.default_rng(seed)  # random number generator
    random_pitches = rng.integers(pitch_low, pitch_high + 1, size=(length, 4))  # generate random notes

    # some masking to have both silence and random pitches
    rest_mask = rng.random((length, 4)) < float(rest_probability)
    chorale = np.where(rest_mask, 0, random_pitches).astype(int)

    return chorale

In [52]:
# listen to completely random music to compare the quality to what our model generated
chorale = generate_random_chorale(56).tolist()
s = stream.Stream()
for row in chorale:
    s.append(chord.Chord([n for n in row if n], quarterLength=1))
s.show('midi')

In [49]:
# Saving the model
model.save('bach_music_generation.keras')

## Insights
- Conv1D layers help capture local temporal patterns in chorales.
- LSTM layers add long-range memory needed for harmonic structure.
- Embedding reduces sparsity and helps model learn relationships between notes.

## Future Improvements
- Replace Conv1D+LSTM with Transformer-based architecture.
- Add temperature sampling for more creative outputs.
- Train with more metadata (voice separation, durations, key signatures).
- Export outputs as MIDI files automatically.
- Add a web UI for interactive real-time music generation.