# Purpose and Justification

When figuring out how to layer music for various purposes, artists and audio producers will attempt to find like songs that mix together to produce an aesthetic sound. This consumes a huge portion of music production at different levels such as djing or an artist creating a full track. The motivation for this project is to create a tool that aids the selection of song segments that mix together in an arrangement allows harmony between the elements being mixed. With this tool, anyone from a hobbyist to a professional could input a track of interest and get suggestions of portions of other songs that would mix with different portions of the input song. Inspiration and a path towards creativity could enhance a music producers workflow by allowing them to find mixes that would have been hard to find without aid. Machines are able to process a large amount of signals and machine learning allows patterns to be matched between signals. With some preprocessing done on existing music, pieces of songs can be matched pending on specific attributes that are defined by music theory.

# Domain Expertise

The domain expertise required for this project spans between the artistic creation of music and the science of signal processing. A machine learning approach will take the elements of signal processing and apply them music theory rules. Some music nomenclature will be summarized as an overview:

**Measure** - the most basic unit of time in which music is played for so many beats at a specific tempo.

**Tempo** - this is the speed of a piece of music, typically measured in beats per minute (BPM)

**Time Signature** - indicate how many beats and the value of each beat in a measure.

**Scale** - a set of notes that are included in the piece. All available notes are in the set {A, A#, B, C, C#, D, D#, E, F, F#, G, G#}.

**Chord** - a combination of notes that belong to a scale.

**Key** - this is defined by the root note of the scale. A key exists for each note for all scales.

There are many more but these sum up the basics. Being able to split each song up into sections and labeling them with their respective classification based on tempo, scale, and key will give a good basis on which songs to match. However, many more attributes will be added to this throughout the projects experimentation.

There are noteable rules that can be initially tested:

**1.** Tempos have to be integer numbers of eachother between songs in order to mix. If they are not divisible cleanly by eachother, the tracks will fall out of phase.

**2.** Keys between songs that mix should be in key or have their scales root note 4-5 notes away. This allows the scales to include the same set of notes, with a different root note between them. The helps make sure the notes blend together in scale being matched.

For now these two rules can lay the foundation for more complex ones that will be learned down the line.


# Overall Design

The design will include a vast amount of data from various sources of music. With each individual track, preprocessing can slice them up and classify them with defining attributes. These attributes can be fed into an unsupervised machine learning model to group song segments by their features. With groupings identified, it can make predictions on which ones would mix together appropriately. Some sort of rating scale will have to be devised in order to give the model feedback on it's predictions.  

# Data

## Data Selection

Data will come from music files of various filetypes such as MP3, WAV, FLAC. Higher quality formats that are considered lossless are desired more, since they include more frequencies from the original mix and are not lost during compression. Another type of audio file exists known as stems, which break a tracks makeup out into it's respective components like guitar/bass/drums/vocals. These types of files are essential for professional mixing because they allow isolation of each component. Consequently they are harder to obtain and it may not be possible to build a library large enough to effectively use them. It may be possible to use parts of a stem as an input though.


Selecting a library of music that mixes well in general is an aspect that will be kept in mind. Some music contains a lot of noise or song elements like odd time signatures that may cause trouble in with training. These are things to keep in mind when widdling down the library so that the algorithm can easily distinguish characteristics of tracks that are similar. Once a working prototype shows promise, mixing in songs that are "less mixable" can be introduced and experimented with.

## Data Ingest

A large library of music will need to be selected in order to train the model on enough variety so different combinations can be attempted. A music library can be made from what I already have and various sources such as CDs or free downloads from soundcloud. There are many options and building a library that is large enough to train an AI algorithm is going to be a difficult challenge.

## Data Validation

Since the data comes straight from already existing music, validating that the data is correct is not really an issue. There may be some validation required in gauging the quality of the input tracks. An important note however, is that using a model in a production environment trained with music from various sources could carry copyright ramifications. It's important to note for this project copyright issues won't arise but it would be something to keep in mind if this was ever scaled to production.

## Data Preprocess

One of the decisions to be made is how to slice up each track. Some of the defining characteristics mentioned in the domain expertise section can be used with the waveform to detect changes between measures. With each song sliced into N slices, the attributes can be labeled for each slice. Luckily many libraries and tools are openly available to make this process easier. Librosa is a promising python libary that includes a lot of signal processing tools for music. Since a large library will need to to be analyzed, a distributed method of doing this in parallel will need to be explored. This may be done in the cloud or external hardware could be purchased to aid with this process.

A file system and attribute storage scheme will have to be decided on. The sliced song segments will need to be stored as their own file so they can be recombined with other slices later on. Something as simple as a naming convention for each slice "trackid_slice#" in a generated folder labeled "trackid" might suffice for the slicing scheme. With the tracks sliced and their attributes labeled, a database will have to keep track of what attributes go with each song slice. If a simple json file doesn't work, I will build a basic postgresql database to meet these requirements.


# Model

## Model Selection

With the data preprocessed, different models will need to be experimented with. Clustering models such as k-means and PCA can be tested and validated on the mined attributes. Autoencoders have shown promise in decomposing music for music generators, which may be altered for the case of "mashing" up different song segments instead of full blown generation. Lastly, algorithms such as K-nearest neighbor, naive bayes, and deep learning methods have been used in similar applications. More research into these methods and some light experimentation will be conducted to narrow down the pursuit on which method to use in the final project.


## Model Training

The model(s) will have to be trained on a vast amount of signal data from the music provided in the library. Experimentation with several sub-libraries to see which sets of music provide the best output will be involved in the process. Workload of training will have to be intelligently split up in order to be able to maximize the experimentation in the amount of time I have.

## Model Tuning

The model will have to be tuned by providing the model the same inputs and checking it's output. Different sets of songs can be fed to the model to change it's output. With various training data and a few model types, there will be lots of combinations to explore in order to find the the match that creates the best output.

## Model Validation

The quality of the output will have to be gauged and a system to rank the output will have to be devised. It may end up being a subjective ranking that will inevitably be biased to my taste. The determination of how good a particular song may be is subjective. If this idea was seriously pursued for a production enviroment, the rankings could be crowd sourced from the users as a type of survey or other feedback mechanism.

## Model Inference

The model will be deployed as a microservice in a docker container. I will have to ask myself what the size requirements of the container will be and what compromises I will need in order to make the app run smoothely.

## Model Interface

The model will be interfaced through a simple app that allows a folder to be specified where the training music is located. A user will be able to input a song and get back a list of matching segments to different pieces of the song. If time permits I can even try to automatically make the mix available to play.

## Model Monitoring

For my application, there is no new influx of music to the model. That could change in a production setting where new music could be added. In this case, I would have to ask how to handle drift and what would the consequences of not recalibrating the model due to the new influx of music? My guess is similar to what I described in the data section is that the input tracks characteristics can have a dramatic effect on the ability of the model to train and perform. In a production setting it might be worth monitoring the types of music that cause the model to perform worse and not allow their entry into the training pool.