<a href="https://colab.research.google.com/github/comp0161/colab/blob/main/COMP0161_lab2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generating Music with Deep Learning (Part 2)

In this, the second of three lab sessions, we will use the dataset created last week to train a **deep neural network** to generate music.

Once again we will be making use of [Google's Colab computing environment](https://colab.research.google.com/#). Unlike the previous lab, the code this week is very computationally intensive, so you will want to ensure that the notebook connects to a **GPU-enabled** virtual machine. This should already be configured, but it's worth checking if you find the model training is taking a really long time.

Once again, the code for this lab is in [Python](https://docs.python.org/3/tutorial/index.html). You do not need to know Python to complete the lab, but there will be some optional extra tasks you can try if you are comfortable with Python coding and want to explore further.

# Background

**Machine Learning** (ML) is an umbrella term for a range of techniques that distil structural and behavioural *patterns* from data — patterns that can then be used to understand and make predictions about future data drawn from the same population. The learned patterns are often conceptualised *probabilistically* as the **distribution** of the training data: ie, how likely any given sample of data is within the population as a whole.

**Deep Learning** is a particular subset of ML, making use of large, flexible models known as **neural networks**, which are capable of representing a very broad range of complex behaviours. Neural networks are trained *incrementally*, taking multiple passes through the training data, each time making small changes to the model parameters to make the model outputs a little *less wrong*. This process is called **gradient descent**, where the gradient in question is the slope of wrongness.

Deep Learning can be applied to a huge of array of different kinds of problems, and has produced many remarkable successes in recent years, but it is computationally demanding and very heavily dependent on the available training data.


## Language modelling

One area to which Deep Learning has been very successfully applied is that of [Language Modelling](https://en.wikipedia.org/wiki/Language_model): learning the *joint probability distribution* of words in a language such as English. Roughly, the problem boils down to: given some amount of sentence context — say, the previous 10 or 100 words — what is the *next* word likely to be?

The *meaning* of a sentence is determined by the choice and arrangement of its words, so knowing the patterns of occurrence reveals *something* about the syntax and semantics of the language. Exactly what and how much information is gained this way remains a matter of heated philosophical debate, but very large language models like [GPT-3](https://en.wikipedia.org/wiki/GPT-3) are certainly able to capture and at least superficially mimic a great deal of the form, content and general *feel* of natural language text.

The language modelling approach is not restricted only to text, but can be applied to any kind of data that can be considered as an ordered sequence of tokens drawn from a defined vocabulary. As we saw in Lab 1, this is a fairly natural representation for **music**.

## Sequence generation

There are many tasks you can perform with a trained language model, but one of the most straightforward and natural is that of generating more text in that language. Starting with some initial **primer** sequence to provide context, an arbitrarily long sequence can be generated by predicting the next word, and then feeding the new augmented sentence including the new word back as context for the next word, in turn. At each point the next word can be predicted either by just choosing the single most likely candidate, or randomly choosing among multiple candidates, with the probability of each choice set according to what the model thinks is the probability of that word occurring next.

This kind of sequence generation has been a staple of social media humour in recent years, usually framed along the lines of:

> We trained an AI on horror movie titles/Taylor Swift lyrics/Donald Trump's tweets and look what it came up with 😂🤣😂🤪!!!

Such sequence generation gags often make use of a kind of recurrent neural network known as **long short-term memory** or [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory), whose design dates back to the mid 1990s. For our musical experiments here, we're instead going to use a simple version of the more recent [Transformer](https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)) architecture — which also underpins GPT-3.


# Setting up



## Data



Get the default (Bach, Mozart & Handel) corpus generated in Lab 1:

In [None]:
!git clone https://github.com/comp0161/labs_data.git data

If you played with the settings or code to produce a different corpus, you can upload that to train with instead. Use the file browser (the folder icon in the sidebar) to do so. Either overwrite the `data/corpus.txt` file or upload your file with a different name and change the `CORPUS` variable in the configuration section below to match.

If you modified the encoding method used to produce your corpus, then you should also supply updated `text_to_music` and `music_to_text` functions to handle conversion between the text tokens and a Music21 Stream.

## Music handling

We'll use the same packages for music handling as last week: [Music21](https://web.mit.edu/music21/) for music streams and MIDI, [MuseScore](https://musescore.org/en) for notation and [FluidSynth](https://www.fluidsynth.org) for audio rendering.

(As before, we discard a lot of installation messages here; if there are problems, remove the `> /dev/null` or `&> /dev/null`to help diagnose what's going wrong.)

In [None]:
# software for rendering music notation
print('installing musescore')
!yes | add-apt-repository ppa:mscore-ubuntu/mscore3-stable > /dev/null
!apt update &> /dev/null
!apt install musescore3 &> /dev/null
print("done")

# software for rendering MIDI to WAV
print('installing fluidsynth...')
!apt-get install fluidsynth > /dev/null
!cp /usr/share/sounds/sf2/FluidR3_GM.sf2 ./font.sf2
print('done')

# install the music21 package for reading and transforming the MIDI training data
# (an older version seems to be already installed on Colab as of this writing, but
# we want to be up to date)
print('upgrading music21')
%pip install --upgrade music21 > /dev/null
print('done')

## Deep learning

To learn patterns in the data we're going to use a Transformer model - a very slimmed-down version of huge (and hype-laden) language models such as [GPT-2](https://en.wikipedia.org/wiki/GPT-2) and [GPT-3](https://en.wikipedia.org/wiki/GPT-3). There are various implementations of these kinds of models around, but we'll make use of Andrej Karpathy's [minimalist implementation of the GPT family](https://github.com/karpathy/minGPT), which is neat and clear and pretty easy to use. It isn't (yet?) available on PyPI, so we'll install it from GitHub.

<details>
<summary>Note</summary>
<p>There is some risk of breaking changes being introduced to the repo between the time of writing (late Nov 2022) and the actual lab session (expected late Feb 2023) so for safety we checkout the current repo head. Might need to revisit this if useful fixes get committed in the meantime.</p>
</details>

In [None]:
# install Andrej Karpathy's minimal GPT implementation
!git clone https://github.com/karpathy/minGPT.git
%cd minGPT
# checkout a known version for safety
!git checkout 7218bcf --quiet
%pip install -e .

import sys, os, os.path
sys.path.append(os.getcwd())

%cd ..

## Python library imports

In [None]:
# always useful imports
import sys, os, os.path
import copy
import numpy as np
import numpy.random
import json

from google.colab import files

# specialities
import music21 as MU
from IPython.display import Image, Audio

import torch
from torch.utils.data import Dataset
from torch.utils.data.dataloader import DataLoader

# NB: expects the MinGPT package to have been setup (see previous cell)
from mingpt.model import GPT
from mingpt.trainer import Trainer
from mingpt.utils import set_seed, setup_logging, CfgNode



## Configuration

In [None]:
# configuration variables
SEED = 9907
shared_rng = numpy.random.default_rng(seed=SEED)

# dataset configuration
COMPILED_DATA = 'data'
CORPUS = os.path.join(COMPILED_DATA, 'corpus.txt')

# corpus generation  defaults
# we're not actually using most of these today since
# the data is already generated, but we keep them here
# because they appear in the code
MIDI_DATA = 'classical_music_midi'
COMPOSERS = 'bach,mozart,haydn'
SIMPLIFY_LIMIT = 3
SIMPLIFY_MODE = 'low'
STRIP_TIES = True
TOKENS_PER_LINE = 8


# training configuration
BLOCK_SIZE = 32
MODEL_TYPE = 'gpt-nano'
WORK_DIR = './out/nano_music'
RESUME = True
PROGRESS_INTERVAL = 10
SAVE_INTERVAL = 200
LEARNING_RATE = 5e-4

# generator configuration
GENERATE_COUNT = 300
TEMPERATURE = 1.0
SAMPLE = True
TOP_K = 10
EXPLICIT_PRIMER = None
DEFAULT_PRIME_COUNT = 4

# filenames for intermediate data
MUSIC_MID = 'music.mid'
MUSIC_WAV = 'music.wav'

## Display helpers

We'll again use MusiC21's display features, but also provide a helper to render with FluidSynth.

In [None]:
# helper function for rendering music in the notebook

def fluid_play(music, rate=22050, midi_name=MUSIC_MID, wav_name=MUSIC_WAV):
  """
  Write music to MIDI, then render that to WAV and display inline as Audio.

  Note: if `None` is passed to `midi_name`, Music21 will invent a temp name.
  """
  filename = music.write('mid')
  os.rename(filename, midi_name)
  !fluidsynth -ni font.sf2 $midi_name -F $wav_name -r $rate > /dev/null
  display(Audio(wav_name))


## Data processing


We'll re-use the processing functions from last week's lab. Not all of these are needed this time, because the data corpus is already prepared, but we'll just retain the whole lot for simplicity.

**Important:** if you modified or created your own functions last week and are using a dataset built with those functions, substitute your versions below.

In [None]:
# functions for music processing

def strip_ties ( s, inPlace=True ):
    """
    Strip tied chords and drop non-starting tied notes from within chords.
    NB: operates in place by default.

    Intended for chordified streams, will probably produce weird
    results otherwise.
    """
    if not inPlace:
        s = copy.deepcopy(s)

    s.stripTies(inPlace=True)

    for element in s.flatten():
        if isinstance(element, MU.chord.Chord):
            deletions = []
            for note in element:
                if note.tie is not None:
                    if note.tie.type == 'start': note.tie = None
                    else: deletions.append(note.pitch.nameWithOctave)

            for note in deletions:
                element.remove(note)

    return s


def simplify ( s, limit=SIMPLIFY_LIMIT, mode=SIMPLIFY_MODE, rng=shared_rng, inPlace=True ):
    """
    Drop notes from big chords so they have no more than `limit` notes.
    NB: operates in place by default.

    Intended for chordified streams, will probably produce weird
    results otherwise.

    Drop strategies are pretty dumb. We always keep the highest and lowest notes
    (crudely assumed to be melody and bass respectively). Notes are dropped from
    the remainder according to one of three strategies:

        'low': notes are dropped from low to high (the default)
        'high': notes are dropped from high to low
        'random': notes are dropped randomly

    Latter could actually increase vocab by mapping the same input chord
    to several outputs. Modes can be abbreviated to initial letters.
    """
    if limit < 2: limit = 2

    if not inPlace:
        s = copy.deepcopy(s)

    drop_func = {
                    'r' : lambda d, c: rng.choice(d, c, replace=False),
                    'h' : lambda d, c: d[(len(d)-c):]
                }.get(mode.lower()[0],
                      lambda d, c: d[:(c-len(d))])

    for element in s.flatten():
        if isinstance(element, MU.chord.Chord):
            if len(element) > limit:
                drop_count = len(element) - limit
                drops = [ nn.pitch.nameWithOctave for nn in element ][1:-1]

                if len(drops) > drop_count:
                    drops = drop_func(drops, drop_count)

                for note in drops:
                    element.remove(note)

    return s


def music_to_text ( s ):
    """
    Convert music stream into a list of text tokens defining the
    chords, notes and rests and their durations.

    Intended for chordified streams, will probably produce weird
    results otherwise.
    """
    result = []
    for element in s.flatten():
        name = None
        if isinstance(element, MU.chord.Chord):
            name = '.'.join(n.nameWithOctave for n in element.pitches)
        elif isinstance(element, MU.note.Rest):
            name = 'rest'
        elif isinstance(element, MU.note.Note):
            name = str(element.nameWithOctave)

        if name is not None:
            # convert any stray empty notes or chords into rests
            name = name or 'rest'
            result.append(f'{name};{float(element.duration.quarterLength):.6g}')

    return result


def text_to_music( t ):
    """
    Convert a sequence of text tokens into a music stream.
    """
    result = MU.stream.Stream()

    for element in t:
        notes, quarters = element.split(';')
        duration = MU.duration.Duration(float(quarters))

        if '.' in notes:
            notes = notes.split('.')
            chord = []
            for nn in notes:
                note = MU.note.Note(nn)
                note.duration = duration
                chord.append(note)
            result.append(MU.chord.Chord(chord))
        elif notes == 'rest':
            note = MU.note.Rest()
            note.duration = duration
            result.append(note)
        else:
            note = MU.note.Note(notes)
            note.duration = duration
            result.append(note)

    return result

def tokenise ( file, simplify_limit=SIMPLIFY_LIMIT, simplify_mode=SIMPLIFY_MODE,
               do_strip=STRIP_TIES, rng=shared_rng ):
    """
    Read a MIDI file and convert to text tokens, with
    optional preprocessing.
    """
    raw_stream = MU.converter.parse(file)
    chorded = raw_stream.chordify()

    if do_strip:
        strip_ties(chorded)

    if simplify_limit:
        simplify(chorded, simplify_limit, simplify_mode, rng=rng)

    return music_to_text(chorded)


def build_dataset ( midi_path=MIDI_DATA, composers=COMPOSERS, out_path=None,
                    do_strip=STRIP_TIES, simplify_limit=SIMPLIFY_LIMIT,
                    simplify_mode=SIMPLIFY_MODE, rng=shared_rng ):
    """
    Construct a tokenised data file of optionally simplified and tie-stripped
    music by the specified composers, for use in training a language model.
    """
    if composers:
        comps = composers.split(',')
    else:
        comps = [ ff for ff in os.listdir(midi_path) if os.path.isdir(os.path.join(midi_path, ff)) ]

    if out_path is None:
        out_path = os.path.join(COMPILED_DATA, '_'.join(comps) + '.txt')

    out_dir = os.path.split(out_path)[0]
    if not os.path.isdir(out_dir):
        os.makedirs(out_dir, exist_ok=True)

    tokens = []
    count = 0

    for comp in comps:
        dir = os.path.join(midi_path, comp)
        for filename in os.listdir(dir):
            if filename.lower().endswith('.mid'):
                print(f'reading {filename}')
                tokens.extend(tokenise(os.path.join(dir, filename),
                                       simplify_limit=simplify_limit,
                                       simplify_mode=simplify_mode,
                                       do_strip=do_strip,
                                       rng=rng))
                count += 1

    print(f'loaded {len(tokens)} tokens from {count} files')

    print(f'writing tokens to {out_path}')

    with open(out_path, 'w') as f:
        for off in range(0, len(tokens), TOKENS_PER_LINE):
            print(' '.join(tokens[off:off+TOKENS_PER_LINE]), file=f)





# Learning

In this section we'll first define the classes and functions that we'll use to do the actual model training and music generation. Then we'll finally run them at the end.

## Configuration

MinGPT uses a fairly simple tree structure with attributes on nodes to store configuration details. It provides the useful feature of merging configuration from command line args, which is a bit less relevant here. Nevertheless, we'll stick with it, allowing an optional list of configuration arguments to be supplied to `configure`.

In [None]:
def get_default_config():
    """
    Default configuration for both training and generation
    is basically filled out from the config vars defined earlier.
    """
    config = CfgNode()
    config.system = CfgNode()
    config.system.seed = SEED
    config.system.work_dir = WORK_DIR
    config.system.resume = RESUME
    config.system.reloading = False
    config.system.model_path = os.path.join(config.system.work_dir, 'model.pt')
    config.data = LangDataset.get_default_config()

    config.model = GPT.get_default_config()
    config.model.model_type = MODEL_TYPE

    config.trainer = Trainer.get_default_config()
    config.trainer.learning_rate = LEARNING_RATE
    config.trainer.progress_interval = PROGRESS_INTERVAL
    config.trainer.save_interval = SAVE_INTERVAL

    # use fewer workers to suppress an issue with Colab in 2024
    config.trainer.num_workers = 2

    config.generator = CfgNode()
    config.generator.count = GENERATE_COUNT
    config.generator.temperature = TEMPERATURE
    config.generator.sample = SAMPLE
    config.generator.top_k = TOP_K
    config.generator.explicit_primer = EXPLICIT_PRIMER
    config.generator.prime_offset = 0
    config.generator.prime_count = DEFAULT_PRIME_COUNT

    return config


def configure (args=[]):
    """
    Get configuration for training or generating from a model.

    The starting configuration will be that returned by `get_default_config`
    but if the specified model already exists this will be substituted with
    the config saved for that model, *unless* system.resume is set to False.

    Either way, the resulting config can then be modified by `args`, a list
    of strings of the form:

      --path.to.option=value

    This will explicitly set the specified option to the specified value,
    overwriting the existing one. If the option does not exist, an exception
    will be raised.
    """
    config = get_default_config()

    # we merge from args into the default before checking for reloading
    # in case the args include --system.resume=False
    config.merge_from_args(args)

    # as long as resume hasn't been turned off, load the model if it exists
    reloading = config.system.resume and os.path.exists(config.system.model_path)

    if reloading:
        with open(os.path.join(config.system.work_dir, 'config.json'), 'r') as f:
            loaded = json.load(f)

        # annoyingly, CfgNode doesn't support updating from a nested dict
        # in the same way it does for args, so we need to update the
        # subsidiary nodes explicitly
        if 'system' in loaded: config.system.merge_from_dict(loaded['system'])
        if 'generator' in loaded: config.generator.merge_from_dict(loaded['generator'])
        if 'data' in loaded: config.data.merge_from_dict(loaded['data'])
        if 'model' in loaded: config.model.merge_from_dict(loaded['model'])
        if 'trainer' in loaded: config.trainer.merge_from_dict(loaded['trainer'])

        # note intent to reload parameters
        config.system.reloading = True

        # we still want args to have priority, so merge *again*
        config.merge_from_args(args)

    return config

## LangDataset

A MinGPT Dataset subclass that manages loading the training data and converting to and from PyTorch tensors of integer indices. This is not music specific, it will load any text file of whitespace-delimited tokens.

In [None]:
class LangDataset(Dataset):
    """
    Loads a single whitespace-delimited text file of tokens,
    manages translation to and from integer indices, and delivers
    blocks in encoded (integer index) form.
    """

    @staticmethod
    def get_default_config():
        config = CfgNode()
        config.block_size = BLOCK_SIZE
        config.source = CORPUS
        return config

    def __init__(self, config, load=True):
        self.config = config
        if load:
            self.load()
        else:
            self.data = []
            self.vocab = []
            self.encoding = {}
            self.decoding = {}
            self.encoded = []

    def load(self):
        print(f'loading from {self.config.source}')
        with open(self.config.source, 'r') as f:
            text = f.read()

        self.data = text.split()
        self.vocab = sorted(list(set(self.data)))
        self.encoding = { w:i for i,w in enumerate(self.vocab) }
        self.decoding = { i:w for i,w in enumerate(self.vocab) }
        self.encoded = self.encode(self.data)

        print(f'data has {len(self.encoded)} words, with vocabulary of {len(self.vocab)}')

    def encode(self, seq):
        """
        Map a sequence of word tokens to numerical indices.
        """
        return [ self.encoding[w] for w in seq ]

    def decode(self, seq):
        """
        Map a sequence of numerical indices to work tokens.
        """
        return [ self.decoding[i] for i in seq ]

    def get_vocab_size(self):
        return len(self.vocab)

    def get_block_size(self):
        return self.config.block_size

    def save_vocab(self, file):
        """
        Save vocabulary (in order) as a simple newline delimited text file.
        """
        with open(file, 'w') as f:
            print('\n'.join(self.vocab), file=f)

    def load_vocab(self, file):
        """
        Load vocabulary from a whitespace delimited text file.
        (This will load a file created by `save_vocab` but is slightly
        more general.)
        """
        with open(file, 'r') as f:
            text = f.read()
        self.vocab = text.split()
        self.encoding = { w:i for i,w in enumerate(self.vocab) }
        self.decoding = { i:w for i,w in enumerate(self.vocab) }

    def __len__(self):
        """
        How big is the dataset in terms of loadable chunks.

        NB: chunks require block size + 1 because label is sequence
        shifted forward by one token.
        """
        return len(self.data) - self.config.block_size - 1

    def __getitem__(self, idx):
        """
        Get a data, label pair as tensors of integer indices.
        """
        chunk = self.encoded[idx:idx + self.config.block_size + 1]
        x = torch.tensor(chunk[:-1], dtype=torch.long)
        y = torch.tensor(chunk[1:], dtype=torch.long)
        return x, y

## Learning functions

The functions below handle model configuration, training a model according to the configuration, and then loading a trained model and using it generate a music sequence.

In [None]:
def setup_model(args=[]):
    """
    Shared initialisation for training and generation.
    Configures, loads data and builds model.
    """
    config = configure(args)
    setup_logging(config)
    set_seed(config.system.seed)

    print('loading data')
    dataset = LangDataset(config.data)

    print('building model')
    config.model.vocab_size = dataset.get_vocab_size()
    config.model.block_size = dataset.get_block_size()
    model = GPT(config.model)

    if config.system.reloading:
        print(f'reloading trained parameters from {config.system.model_path}')
        model.load_state_dict(torch.load(config.system.model_path))

    return config, dataset, model


def train_model(args=[]):
    """
    Train (or resume training) a model.

    By default, training will keep running until interrupted.
    Set --trainer.max_iters=<number> to curtail this.
    """
    config, dataset, model = setup_model(args)

    print('initialising trainer')
    trainer = Trainer(config.trainer, model, dataset)

    def batch_end_callback(trainer):
        if trainer.iter_num % config.trainer.progress_interval == 0:
            print(f"iter_dt {trainer.iter_dt * 1000:.2f}ms; iter {trainer.iter_num}: train loss {trainer.loss.item():.5f}")

        if trainer.iter_num % config.trainer.save_interval == 0:
            torch.save(model.state_dict(), config.system.model_path)

    trainer.set_callback('on_batch_end', batch_end_callback)

    print('running trainer')
    trainer.run()

def generate(args=[]):
    """
    Generate and return a music sequence using an existing trained model.
    """
    config, dataset, model = setup_model(args)

    # no training here
    model.eval()

    if config.generator.explicit_primer:
        # encode supplied explicit fragment
        # NB: will fail if it includes words outside vocab
        primer = dataset.encode(config.generator.explicit_primer.split())
    else:
        # use some fragment of the training data
        start = config.generator.prime_offset
        end = start + config.generator.prime_count
        primer = dataset.encoded[start:end]

    with torch.no_grad():
        x = torch.tensor(primer, dtype=torch.long)[None, ...]
        y = model.generate(x, config.generator.count,
                           temperature=config.generator.temperature,
                           do_sample=config.generator.sample,
                           top_k=config.generator.top_k)[0]

    print(y)
    output = dataset.decode(y.numpy())
    print(' '.join(output))

    return text_to_music(output)


## Training the model

OK, let's give it try.

In [None]:
train_model(['--trainer.max_iters=20001', '--trainer.progress_interval=100', '--system.resume=False'])

Check the expected outputs have been created. The directory should contain at least 3 files, the most important one being `model.pt`, which contains the actual trained model parameters.

In [None]:
!ls -l $WORK_DIR

## Generation

Having trained a model, let's generate some music from it.

In [None]:
# using the defaults: prime with the first 4 notes of the corpus
generated = generate([])
generated.show()
generated.show('midi')

We can also specify an explicit primer in text notation (with the proviso that all the chord+duration elements must actually occur in the training corpus). Here's a simple fragment:

In [None]:
primer = text_to_music('C3.E3.G4;0.25 F3;0.25 G3;0.25 A3;0.25'.split())
primer.show()
primer.show('midi')

And we generate an ongoing sequence from it thus:

In [None]:
generated = generate(['--generator.explicit_primer="C3.E3.G4;0.25 F3;0.25 G3;0.25 A3;0.25"'])
generated.show()
generated.show('midi')

## Render and download your results

In the above examples we don't explicitly write the generated music to files. But we can do using the `fluid_play` function defined earlier. When you call this function on a music stream, by default the MIDI data is written to a file called `music.mid` and the rendered audio is written to `music.wav`. You can then download both files for future use. (You'll need `music.mid` for next week's lab.)

In [None]:
fluid_play(generated)
files.download('music.mid')
files.download('music.wav')


# Playing around

## Change the generation settings

The sequence generation process has a number of settings that can be tweaked to affect what gets produced. The default configuration is defined in the code above, but the settings can be overridden by the arguments passed to `generate()`, as shown in the cell below.

* `generator.count`: the number of tokens to generate, ie the length of the resulting sequence.
* `generator.sample`: whether to sample randomly from possible next tokens. If this is `False`, then the most likely next token is always chosen.
* `generator.temperature`: how strongly the token choice is affected by the likelihood. Higher temperatures increase the chance of picking less likely tokens, making the output sequence more "random". Only relevant when `generator.sample` is `True`.
* `generator.top_k`: restrict the sampling to this many of the most likely candidates. The higher this number, the more tokens are considered (though their probability of being chosed may be low). Only relevant when `generator.sample` is `True`.
* `generator.explicit_primer`: specify a starting sequence of tokens explicitly as a text string. See the example above. Tokens in the supplied primer must be present in the corpus.
* `generator.prime_offset`: Instead of an explicit primer, start from a sequence of tokens taken directly from the corpus, starting at this offset. Only relevant is `generator.explicit_primer` is `None`.
* `generator.prime_count`: Number of tokens from the corpus to use as primer. Only relevant if `generator.explicit_primer` is `None`.
* `system.seed`: Seed value for the random number generator. Varying this should give rise to different sequences if sampling is used — though they might not be *very* different if the learned model contains highly stereotyped sequences.

Note that any setting you don't explicitly override will default to its most recently specified value. So for example if you've used an explicit primer (as in the cell above) and don't want to use it again, you should explictly pass `'--generator.explicit_primer=None'` rather than omitting the setting.


In [None]:
generated = generate([
    '--generator.count=300',
    '--generator.temperature=1.0',
    '--generator.sample=True',
    '--generator.top_k=10',
    '--generator.explicit_primer=None',
    '--generator.prime_offset=0',
    '--generator.prime_count=4',
    '--system.seed=9907'
    ])
generated.show('midi')

## Change the model & training settings

As with generation, the training process is controlled by a number of settings, which can be modified by passing arguments to the `train_model` function:

* `model.model_type`: which kind of model to train. All model types use the same underlying structures but vary in size and complexity. There are several types defined, some of them unfathomably huge. Larger models can learn more complex behaviour but are much slower to train and require much more training data to be useful. For our purposes only the following are worth considering, and even `mini` is almost certainly vastly over-specified:
    * `gpt-nano`: 3 layers, 3 heads, 48 embedding dimensions (this is the default)
    * `gpt-micro`: 4 layers, 4 heads, 128 embedding dimensions
    * `gpt-mini`: 6 layers, 6 heads, 192 embedding dimensions

  (Don't worry too much about what "layers", "heads" and "embedding dimensions" are, just think of them as indicators of **bigness**.)
* `data.source`: a corpus file to use as training data. This should be a whitespace-delimited text file in which the tokens are whatever is understood by `text_to_music` and `music_to_text`.
* `trainer.max_iters`: how many training iterations to run. Training for more iterations will take longer but will lead to improved learning (at least, up to a point).
* `trainer.learning_rate`: how much the model parameters are updated each iteration. Larger values might learn faster but may also fail to converge. You should probably keep this fairly small, say < 0.01.
* `trainer.weight_decay`: how much L2 regularisation to apply to the model parameters. Regularisation combats **overfitting** (a tendency to just memorise the training set, leading to poor generalisation) but can lead to **underfitting** (a failure to adequately capture the behaviour of the data).
* `trainer.progress_interval`: how often to print an update message when training. You probably don't care about this.
* `trainer.save_interval`: how often to save the model during training. This can be useful when training is slow, but you probably don't care much about this either.
* `system.seed`: Seed value for the random number generator. Again, you probably don't care about this.
* `system.resume`: whether to initialise the model with the previous trained state.
    
    **IMPORTANT:** when training a new model with a new configuration (ie, pretty much always), you should explicitly set this to `False`, as shown in the code cell below

Once you've trained a model, you can then go back to generating from it to explore what has been learned.

In [None]:
train_model([
    '--model.model_type="gpt-nano"',
    '--data.source="data/corpus.txt"',
    '--trainer.learning_rate=5e-4',
    '--trainer.weight_decay=0.1',
    '--trainer.max_iters=10001',
    '--trainer.progress_interval=100',
    '--trainer.save_interval=200',
    '--system.seed=9907',
    '--system.resume=False'
    ])

# Discussion

While you're waiting for your model to train, you might want to think & talk about:
* What sort of results do you expect from this process?

Once you've actually been able to generate something:
* Were your expectations correct?
* Are the results in any sense "musical"?
* Are there fragments, or even whole sections, of tunes you recognise?
* Does the style seem consistent?
* Is there evidence of long range structure in the generated music?
* Are any features or qualities obviously missing?

# Further work

In all likelihood, tweaking the model settings and perhaps trying a different data corpus will be more than enough to keep you busy. But if you're feeling excessively motivated, you *could* try replacing the GPT implementation used here with something else entirely — perhaps the LSTM model mentioned above. Doing so will require quite a bit of coding, but the PyTorch framework used here does provide a useful [LSTM layer](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) implementation — a good starting point if you want to get your hands dirty.