<a href="https://colab.research.google.com/github/akhan117/Genre-and-Mood-Conditioned-Music-Generator/blob/main/Genre_and_Mood_Conditioned_Music_Generator_for_Github.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

You can just "Run All" if you want to get straight to generation.

In [1]:
!jupyter nbconvert --ClearMetadataPreprocessor.enabled=True \
                   --to notebook \
                   --inplace your_notebook.ipynb

This application is used to convert notebook files (*.ipynb)
        to various other formats.


Options
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--show-config
    Show the application's configuration (human-readable format)
    Equivalent to: [--Application.show_config=True]
--show-config-json
    Show the application's configuration (json format)
    Equivalent to: [--Application.show_config_json=True]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePr

# 0. User Settings

### Don't change these if you're a casual creator!

### Dataset Settings
Check this only if you want to make a new dataset for training a new model. Not required, since a prepared dataset is downloaded from my Hugging Face account.

If you want to do this, Lakh MIDI will be downloaded later.

You can choose the number of samples, and the number of tokens per sample.





In [None]:
make_dataset = False #@param {type:"boolean"}
music_samples = 250 #@param {type:"integer"}
music_tokens = 1000 #@param {type:"integer"}



### Training Settings

Check the first box if you want to train a new model. Not required, since pretrained weights are downloaded from my Hugging Face account. If you're doing this, I suggest using an A100.

If you actually want to use this new model for generation, check the second box.

In [None]:
train = False #@param {type:"boolean"}
use_trained_for_generation = False #@param {type:"boolean"}


# 1. Setup

Install Required Packages.

In [None]:
from IPython.display import clear_output

!pip install mido
!pip install datasets --upgrade # Upgrade datasets
!pip install fsspec --upgrade # Upgrade fsspec
!pip install datasets
!pip install miditok
!pip install miditoolkit
!pip install midi2audio
!apt-get install -y fluidsynth
!pip install gradio

clear_output()

Import Required Packages.

In [None]:
import os
import json
import urllib.request
from pathlib import Path
from google.colab import drive

import numpy as np
from tqdm import tqdm

import miditoolkit
from mido import MidiFile
from miditok import REMI, TokenizerConfig, TokSequence

from datasets import load_dataset
from midi2audio import FluidSynth

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

from IPython.display import Audio, display
import ipywidgets

import gradio as gr
import zipfile
from huggingface_hub import hf_hub_download

Enable GPU Access - You'll need GPU access for this notebook.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


Download MidiCaps from AMAAI Lab's Hugging Face

Download the default training set, an audio synthesizer, the default model weights and the default tokenizer from my Hugging Face.

In [None]:
# Load MidiCaps
ds = load_dataset("amaai-lab/MidiCaps")

# Default training set
tokens_path = hf_hub_download(
    repo_id="QuirkyTurtle/CreativityProject",
    filename="1000_Tokens_25000_songs.npy",
    repo_type="dataset",
)
hf_tokens = np.load(tokens_path)

# Synthesizer
soundfont_path = hf_hub_download(
    repo_id="QuirkyTurtle/CreativityProject",
    filename="FluidR3_GM.sf2",
    repo_type="dataset",
)
fs = FluidSynth(sound_font=soundfont_path)

# Default model weights (25,000 samples, 1000 tracks)
weights_path = hf_hub_download(
    repo_id="QuirkyTurtle/CreativityProject",
    filename="music_gen_3.pth",
    repo_type="dataset",
)
hf_music_gen = torch.load(weights_path)

# Default Tokenizer
tokenizer_path = hf_hub_download(
    repo_id="QuirkyTurtle/CreativityProject",
    filename="remi.json",
    repo_type="dataset",
)
tokenizer = REMI(params=tokenizer_path)

README.md:   0%|          | 0.00/3.94k [00:00<?, ?B/s]

train.json:   0%|          | 0.00/428M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/168385 [00:00<?, ? examples/s]

1000_Tokens_25000_songs.npy:   0%|          | 0.00/205M [00:00<?, ?B/s]

FluidR3_GM.sf2:   0%|          | 0.00/148M [00:00<?, ?B/s]

music_gen_3.pth:   0%|          | 0.00/183M [00:00<?, ?B/s]

remi.json:   0%|          | 0.00/11.0k [00:00<?, ?B/s]

  super().__init__(tokenizer_config, params)


# 2. Data Preprocessing

This downloads and unzips Lakh MIDI, only if you've indicated you want to make your own dataset.

In [None]:
def download_lakh_midi():
  # I hosted Lakh MIDI on my hugging face, not sure it's okay so will
  # probably delete it soon
  print("Downloading Lakh MIDI......")
  zip_path = hf_hub_download(
    repo_id="QuirkyTurtle/CreativityProject",
    filename="lmd_full.zip",
    repo_type="dataset",
  )

  # Unzip to base directory
  print("Unzipping......")
  with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(".")
  print("Done!")


Method for listing every genre and mood, plus their frequency, this was used to sort the genres and moods in the gradio UI.



In [None]:
def list_genres_and_moods(ds):
  # Dict to also track counts of each
  all_genres = {}
  all_moods = {}

  # Go through entire dataset, get every genre, mood.
  for entry in tqdm(ds["train"]):
    for g in entry["genre"]:
      if g not in all_genres:
          all_genres[g] = 0
      all_genres[g] += 1

    for m in entry["mood"]:
      if m not in all_moods:
          all_moods[m] = 0
      all_moods[m] += 1

  # create tuples with count, sort by count
  all_genres_list = sorted(all_genres.items(), key=lambda item: item[1], reverse=True)
  all_moods_list = sorted(all_moods.items(), key=lambda item: item[1], reverse=True)

  # Isolate without cound
  just_genres = [x[0] for x in all_genres_list]
  just_moods = [x[0] for x in all_moods_list]

  # Get sorted list
  print(just_genres)
  print(just_moods)

  # Genre/Mood : Count
  for g in all_genres_list:
    print(f"{g[0]}: {g[1]}")

  for m in all_moods_list:
    print(f"{m[0]}: {m[1]}")


Creating the tokenizer with the required vocabulary. This has been run once, and does not ever need to be run again. Left here for you to view.

In [None]:
def create_tokenizer():
  # All found genres
  genres = ['80s', '90s', 'alternative', 'ambient', 'blues', 'celtic', 'chillout',
          'classical', 'country', 'dance', 'drumnbass', 'easylistening',
          'electronic', 'electropop', 'experimental', 'folk', 'funk', 'hiphop',
          'house', 'indie', 'instrumentalpop', 'instrumentalrock', 'jazz',
          'jazzfusion', 'latin', 'lounge', 'metal', 'newage', 'orchestral',
          'pop', 'popfolk', 'poprock', 'punkrock', 'reggae', 'rock',
          'soundtrack', 'swing', 'symphonic', 'synthpop', 'techno', 'trance',
          'world']

  # All found moods
  moods = ['action', 'adventure', 'advertising', 'background', 'ballad', 'calm',
          'children', 'christmas', 'commercial', 'cool', 'corporate', 'dark',
          'deep', 'documentary', 'drama', 'dramatic', 'dream', 'emotional',
          'energetic', 'epic', 'film', 'fun', 'funny', 'game', 'happy', 'heavy',
          'holiday', 'inspiring', 'love', 'meditative', 'melodic',
          'motivational', 'movie', 'party', 'positive', 'relaxing', 'retro',
          'romantic', 'sad', 'slow', 'soft', 'soundscape', 'space', 'sport',
          'summer', 'trailer', 'upbeat', 'uplifting']

  # Create token_format
  genres = ["genre_" + g for g in genres]
  moods = ["mood_" + m for m in moods]

  # Done as per MidiTok docs, + padding tokens for empty slots in
  # genre, token
  TOKENIZER_PARAMS = {
      "pitch_range": (21, 109),
      "beat_res": {(0, 4): 8, (4, 12): 4},
      "num_velocities": 32,
      "special_tokens": genres + moods + ["genre_pad", "mood_pad"],
      "use_chords": True,
      "use_rests": False,
      "use_tempos": True,
      "use_time_signatures": False,
      "use_programs": True,
      "num_tempos": 32,
      "tempo_range": (40, 250),
  }
  config = TokenizerConfig(**TOKENIZER_PARAMS)

  # Make tokenizer. This is identical to the one I load from hf, but this way
  # it could potentially be expanded while saving the old one.
  tokenizer = REMI(config)
  tokenizer.save_params("Midi Tokens/remi_tokenizer")
  return tokenizer

Creating the tokenized data from the midi files, if you've indicated you wish to.

In [None]:
def create_tokenized_data(music_tokens, music_samples, tokenizer):
  # make a folder
  Path("Midi Tokens").mkdir(exist_ok=True)

  # Go through music_samples samples
  data_save = []
  for i, m in enumerate(tqdm(ds["train"])):
    # Read Midi
    loc = m['location']
    midi = miditoolkit.MidiFile(loc)

    # Tokenize
    ids = tokenizer(midi).ids[:music_tokens]

    # 12 Genre and Token Paddings each
    grs = [tokenizer.vocab["genre_pad"]] * 12
    mds = [tokenizer.vocab["mood_pad"]] * 12

    # Replace padding with relevant tokens
    for k, g in enumerate(m['genre']):
      g = "genre_" + g
      grs[k] = tokenizer.vocab[g]

    for k, m in enumerate(m['mood']):
      m = "mood_" + m
      mds[k] = tokenizer.vocab[m]

    # compile complete token list and add it to dataset
    ids = grs + mds + ids
    data_save.append(ids)

    # break after music_sample sampless
    if i > music_samples:
        break

  # Pad all tracks to length of 24 + music_tokens
  for i, d in enumerate(data_save):
    padding = [0] * ((24 + music_tokens) - len(d))
    d += padding
    data_save[i] = d

  # Save dataset
  data_save = np.array(data_save)
  np.save(f"Midi Tokens/{music_tokens}_Tokens_{music_samples}_songs.npy", data_save)

  return data_save


Only run any of these methods if "make_dataset" is true.

In [None]:
if make_dataset:
  # if you don't have Lakh MIDI, download and unzip
  if not os.path.exists("lmd_full"):
    download_lakh_midi()

  # Create dataset
  tokens = create_tokenized_data(music_tokens, music_samples, tokenizer)

# 3. Model Training

This model is an autoregressive transformer. The encoder layer is made to behave like a decoder through attention masking.

In [None]:
class MusicTransformer(nn.Module):
  # Small model
  def __init__(self, vocab_size, seq_len=1024, dim=768, heads=8, depth=8, dropout=0.1):
    super().__init__()

    # token and positional embeddings
    self.token_embedding = nn.Embedding(vocab_size, dim)
    self.pos_embedding = nn.Parameter(torch.randn(1, seq_len, dim))

    # Encoder
    encoder_layer = nn.TransformerEncoderLayer(
      d_model=dim,
      nhead=heads,
      dropout=dropout,
      batch_first=True
    )
    self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=depth)

    # Normalization to Linear
    self.ln_f = nn.LayerNorm(dim)
    self.to_logits = nn.Linear(dim, vocab_size)

  def forward(self, x):
    # batch size, sequence length
    b, t = x.size()
    # embedding layers
    x_embed = self.token_embedding(x) + self.pos_embedding[:, :t, :]

    device = x.device
    # Used gpt to help me do this - masks the "upper triangle". (Prevents model
    # from seeing future tokens)
    mask = torch.triu(torch.full((t, t), float('-inf'), device=device), diagonal=1)

    # Apply transformer with causal mask
    x_out = self.transformer(x_embed, mask=mask)

    # Final norm + logits
    x_out = self.ln_f(x_out)
    logits = self.to_logits(x_out)

    return logits

The Dataset classs simply provides the token sequence as the data and the token sequence shifted forward by 1 as the target.

In [None]:
class MIDITokenDataset(Dataset):
  def __init__(self, token_tensor):
    self.token_tensor = token_tensor

  def __len__(self):
    return len(self.token_tensor)

  def __getitem__(self, idx):
    sequence = self.token_tensor[idx]
    # data: first 0 to x-1 tokens
    x = sequence[:-1]
    # Target: 1 to x tokens
    y = sequence[1:]
    return x, y

The training loop is only ran if you've indicated you wish to train a model from scratch.

In [None]:
if train:
  # use hugging face tokens or not depending on whether the user wants to
  tokens = hf_tokens if not make_dataset else tokens
  tokens = torch.tensor(tokens, dtype=torch.long)

  # Make dataloader
  dataset = MIDITokenDataset(tokens)
  dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

  # Define model
  model = MusicTransformer(tokenizer.vocab_size).to(device)
  optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
  # Technically a classification of which token
  loss_fn = nn.CrossEntropyLoss(label_smoothing=0.1)

  # Train loop
  for epoch in range(15):
    total_loss = 0
    model.train()

    # tokens, tokens + 1
    for x, y in dataloader:
      x, y = x.to(device), y.to(device)
      logits = model(x)

      # Output tokens vs ground truth (same song shifted 1 over)
      loss = loss_fn(logits.view(-1, tokenizer.vocab_size), y.view(-1))
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
      total_loss += loss.item()

    print(f"Epoch {epoch+1} | Loss: {total_loss / len(dataloader):.4f}")

  # Save weights
  Path("Model State Dict").mkdir(exist_ok=True)
  torch.save(model.state_dict(), f"Model State Dict/music_gen.pth")
  music_gen = model.state_dict()

# 4. Sequence Generation


The Model will generate a sequence of chosen length.

In [None]:
def generate_sequence(model, seed_tokens, tokenizer, max_length=1024):
  # don't train
  model.eval()
  device = next(model.parameters()).device

  # Current sequence
  generated = seed_tokens[:]
  input_ids = torch.tensor(generated, dtype=torch.long).unsqueeze(0).to(device)

  # size of tokenizer vocabulary
  vocab_size = max(tokenizer.vocab.values()) + 1

  # Generate the amount of tokens required
  for _ in range(max_length - len(seed_tokens)):
    with torch.no_grad():
      # Get output (next token)
      logits = model(input_ids)
      next_token_logits = logits[0, -1, :vocab_size]
      probs = torch.softmax(next_token_logits, dim=-1)
      next_token = torch.multinomial(probs, num_samples=1).item()

    # append to generated sequence
    generated.append(next_token)

    # Most recent tokens
    input_ids = torch.tensor(generated[-1023:], dtype=torch.long).unsqueeze(0).to(device)

  return generated

Pass in seed tokens, run generate_sequence the required amount of times, and return the paths.

In [None]:
def generate_songs(gs, ms, number_of_songs, tokenizer, model):
  # prepare genres and mood lisss
  genres = [""] * 12
  moods = [""] * 12

  # strings to save them as
  g_string = ""

  # properly format for tokenizing
  for i, g in enumerate(gs):
    if g=="":
      genres[i] = "genre_pad"
    else:
      genres[i] = "genre_" + g
      g_string += g + "_"

  m_string = ""
  for i, m in enumerate(ms):
    if m=="":
      moods[i] = "mood_pad"
    else:
      moods[i] = "mood_" + m
      m_string += m + "_"

  # make save_string and complete seeding tokens
  save_string = g_string + m_string
  genre_tokens = [tokenizer.vocab[g] for g in genres]
  mood_tokens = [tokenizer.vocab[m] for m in moods]
  seed = genre_tokens + mood_tokens

  # Make midi and wav folders
  Path("Generated_Midi").mkdir(exist_ok=True)
  Path("Generated_Wav").mkdir(exist_ok=True)

  print("Generating...")
  generated_paths = []

  # Generate the required amount of songs
  for track in tqdm(range(1, number_of_songs+1)):
    generated_ids = generate_sequence(model=model, seed_tokens=seed,
                                      tokenizer=tokenizer, max_length=1024)

    # detokenize
    tok_seq = TokSequence(ids=generated_ids)
    score = tokenizer.decode(tok_seq)

    # Save midi
    midi_path = Path("Generated_Midi") / f"{save_string}{track}.mid"
    score.dump_midi(str(midi_path))

    # transform to wav and save
    fs.midi_to_audio("Generated_Midi/" + f"{save_string}{track}.mid", "Generated_Wav/" + f"{save_string}{track}.wav")
    generated_paths.append("Generated_Wav/" + f"{save_string}{track}.wav")

  print("Done!")
  return generated_paths

Zip the new wavs for the download option in Gradio.

In [None]:
def zip_generated_wavs(wav_paths, zip_name="generated_tracks.zip"):
  # Take latest tracks, zip them and provide the path
  zip_path = Path("Generated_Zip") / zip_name
  Path("Generated_Zip").mkdir(exist_ok=True)

  with zipfile.ZipFile(zip_path, 'w') as z:
    for wav in wav_paths:
      z.write(wav, arcname=Path(wav).name)

  return str(zip_path)

Uses Gradio UI for generating songs. The genres and moods are sorted in order of most frequent to least frequent in Lakh Midi, so those nearer to the start should produce more coherent and relevant conditioning.

Gradio provides a download option for the generated tracks,
but they are displayed in the notebook itself, since gradio cannot handle a dynamic number of songs to play.

In [None]:
# Genres and moods sorted by frequency
genres = ["", 'electronic', 'pop', 'classical', 'rock', 'soundtrack', 'ambient',
          'jazz', 'easylistening', 'instrumentalpop', 'dance', 'experimental',
          'folk', 'orchestral', 'world', 'techno', 'alternative', 'reggae',
          'instrumentalrock', 'trance', 'latin', 'country', 'popfolk', 'metal',
          'poprock', 'house', 'punkrock', 'newage', 'indie', 'synthpop',
          'symphonic', 'swing', 'chillout', 'jazzfusion', 'blues', 'hiphop',
          'lounge', '90s', 'funk', 'drumnbass', '80s', 'celtic', 'electropop']

moods = ["", 'melodic', 'happy', 'relaxing', 'film', 'christmas', 'dark',
         'energetic', 'corporate', 'love', 'meditative', 'motivational',
         'epic', 'space', 'emotional', 'inspiring', 'slow', 'dream', 'action',
         'background', 'positive', 'adventure', 'ballad', 'game', 'drama',
         'romantic', 'documentary', 'soundscape', 'retro', 'uplifting', 'funny',
         'dramatic', 'summer', 'deep', 'advertising', 'upbeat', 'children',
         'fun', 'sad', 'heavy', 'party', 'trailer', 'sport', 'commercial',
         'movie', 'calm', 'holiday', 'soft', 'cool']


# Gradio to choose genres, moods, and download zip
def gradio_generate(genre_list, mood_list, num_tracks):
  # Pad to 12 regardless of whether under or over
  genres = (genre_list + [""] * 12)[:12]
  moods = (mood_list + [""] * 12)[:12]

  # Decide which weeights to use based on user preferences
  if use_trained_for_generation:
    music_gen = torch.load("Model State Dict/music_gen.pth")  # or model.state_dict() if just trained
  else:
    music_gen = hf_music_gen

  # make model
  model = MusicTransformer(tokenizer.vocab_size).to(device)
  model.load_state_dict(music_gen)
  model.eval()

  # Get list of tracks
  generated = generate_songs(gs=genres, ms=moods, number_of_songs=num_tracks,
                             tokenizer=tokenizer, model=model)

  # Display tracks for user to play
  for label in generated:
    print(f"{label}")
    display(Audio(filename=label))

  # Return path to zips for gradio
  zip_path = zip_generated_wavs(generated)
  return zip_path

# Take the number of songs, chosen genres, moods, and provide a zip download link
gr.Interface(
  fn=gradio_generate,
  inputs=[
      gr.CheckboxGroup(genres, label="Select Genres (up to 12)"),
      gr.CheckboxGroup(moods, label="Select Moods (up to 12)"),
      gr.Slider(1, 50, step=1, value=2, label="Number of Tracks")
    ],
  outputs=gr.File(label="Download zip of Generated WAVs", type="filepath", file_types=[".zip"]),
  title="Genre + Mood Music Generator",
  allow_flagging="never"
).launch(debug=True)






It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://43424dec4762d60caa.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Generating...


100%|██████████| 2/2 [00:21<00:00, 10.54s/it]

Done!
Generated_Wav/electronic_happy_1.wav





Generated_Wav/electronic_happy_2.wav


In [None]:
# import shutil
# from google.colab import files

# # Small utlity to download every wav in the generated wavs folder
# shutil.make_archive("Generated_Wav", 'zip', "Generated_Wav")
# files.download("Generated_Wav.zip")