## Results

This expeiment provided a lot of interesting results. It took some time to tune because some music does not mix well. For instance, heavy metal was hard to get continuity between tracks because it clashes with other genres. The dataset had to remove various artists that I found to produce undesirable results in the end. It's no surprise that genres such as dance music are able to be combined much easier. Essentially the design of most of the tracks revolves around that concept, that's why they end up being simple consistant beats for the most part.  

**Note:** The following demonstrations are created from the entire dataset of 3,500 tracks. This is why they have much better results then the small subset models shown in the code demonstration notebook and the docker image. Processing time for the entire dataset took a long time and consumed a lot of space.

In [None]:
import IPython
import os
import librosa
import numpy as np
import matplotlib.pyplot as plt
from visualization.plotting import plot_chromagram, plot_mfcc, plot_tempo

After a lot of testing, it was found that the feature set along with the K-NN model was able to produce impressive results. The mixes for both the non-pca and pca models end up having a lot of good continuity when the fading was not present. 

#### Non-Faded Non-PCA

In [None]:
# Non faded non-pca
sample_rate = 22050
cwd = os.getcwd()

In [None]:
filename = "generated_outputs\\knn_model\\pt1_jr-100bts-not-faded_mix2.wav"
filepath = os.path.join(cwd, filename)

IPython.display.Audio(filepath)

In [None]:
signal, sr = librosa.load(filepath)

plot_chromagram(signal, sr)

The transitions above are able to keep the same pitches with slightly different patterns.

In [None]:
# MFCC Plot
num_mfcc = 5
plot_mfcc(signal, sr, 5)

There is some degree of MFFCC changing at jumps around the 15 second and 40 second mark. From listening to the sample, the loudness shift is handled the least effectively among the three features.

In [None]:
plot_tempo(signal, sr)

The tempogram indicates that the transitions maintain a fairly consistant tempo with a slight increase at the 13 second mark. The tempo decreases at the 32 second mark change and stays consistent the rest of the duration.

The first sample above shows fairly good continuity between the features. Some jumps are rougher than others, but some transition really well. Next, the pca non-faded example will be analyzed.

#### Non-Faded PCA

In [None]:
# Not faded pca
filename = "generated_outputs\\knn_pca_model\\pca_pt1_jr-100bts-not-faded_mix2.wav"
filepath = os.path.join(cwd, filename)

IPython.display.Audio(filepath)

In [None]:
signal, sr = librosa.load(filepath)

plot_chromagram(signal, sr)

The F-G portions of the pitch stay consistent at the change around 10 seconds. The first track mixed in is a live track which creates some noise in the spectrogram. The second segment sounds much less noisy. The transition at the 17 second mark maintains dominance in the G-C pitches. The shift at 25 seconds changes the dominance to D# and G# dominated.

In [None]:
num_mfcc = 5
plot_mfcc(signal, sr, num_mfcc)

The coefficents change in intensity at the 10 and 15 second mark. The intensity then decreases at 35 second mark. While the intensities change, the dominant coefficients are the same among all segments.

In [None]:
plot_tempo(signal, sr)

The tempo is fairly consistent and changes at the 36 second mark by a small amount. 

A lot of the transitions are nearly seamless and have similar tempo/timbre/pitch. It appears that the loudness of the mixes struggles the most as demonstrated by the spect. This is probably due to the wide variety of mastering that can be applied to audio tracks. It's also apparent that different tracks from different time periods have much different mastering profiles. In future experiments it should be explored how to get better continuity related to the loudness of each track. If it proves to be difficult, something such as mastering could be done as a postprocessing step when the tracks are being combined. Adding another model in the ensemble such as what is done with programs such as "LANDR" could be a potential avenue to solve this problem and add to the AI system. All in all the examples above demonstrate the effectiveness of KNN to search similarity in audio signals. 

A few of the fading mixes had good transitions such as the ones found in "pt1_jr-100bts-faded_mix3 - knn.wav" and "pca_pt1_jr-100bts-faded_mix3". Not all transitions came out seamless. It proved to be difficult to get the overlay betwen songs to not clash with eachother. Examples below:

### Non-PCA Faded

In [None]:
# Faded non-pca
filename = "generated_outputs\\knn_model\\pt1_jr-100bts-faded_mix3.wav"
filepath = os.path.join(os.getcwd(), filename)
IPython.display.Audio(filepath)

In [None]:
signal, sr = librosa.load(filepath)

plot_chromagram(signal, sr)

In [None]:
num_mfcc = 5
plot_mfcc(signal, sr, num_mfcc)

In [None]:
plot_tempo(signal, sr)

The resulting fade mixes are harder to get to sound correctly. In order to make this work some more advanced operations may have to take place in order to master the overlayed section. The plots above show similarity on the chromagram and the tempogram. One interesting piece of information to note is the mel spectrogram appears to have some "spike" of intensity at points of transition. This is probably due to the constructive interference of both tracks increasing the intensity when they are overlaid. This is something that would need to be adjusted through a mastering process. As suggested previously, an AI system like LANDR could be added on top of this entire process.

### PCA-Faded

In [None]:
# None pca
filename = "generated_outputs\\knn_pca_model\\pca_pt1_jr-100bts-faded_mix3.wav"
filepath = os.path.join(os.getcwd(), filename)
IPython.display.Audio(filepath)

In [None]:
signal, sr = librosa.load(filepath)

plot_chromagram(signal, sr)

In [None]:
num_mfcc = 5
plot_mfcc(signal, sr, num_mfcc)

In [None]:
plot_tempo(signal, sr)

The resulting mix sounds pretty good for having the fade enabled. There is some discontinuity with tempo and loudness as illustrated by the tempogram and the spectrogram. It is hard to tell whether or not PCA increased performance of the model and would need a more structured set of testing than what I have provided likely involving more models and user feedback.

Fading introduces a new layer of complexity where the percussive beats and time signature need to be lined up in order for the fade between songs to not clash. The model for how the search is made would need to be modified in order to make this work better. A possible solution would be to cut out intervals of beats that align with the time signature. For instance, you could cut out sets of 4 beats for 4/4 timing. Since there are 4 beats in a measure, they go together well in groups. The hard part would be combining with different timings like 3/4 and training them all on the same feature set. Since the combined file would have a different number of frames, some more processing would need to take place in order for the resulting data matrix to have the correct number of dimensions between each song. This difficulty is why I avoided it for this project since it would take a large amount of time to accomplish.  However, with the time signature information and more intelligent beat slicing the alignment of the percussiveness of the beats could possibly be done. There is also the fact that the number amount of beats used for each datapoint increases then the number of dimensions increases due to the increase in frames assuming equal time intervals between frames. With the increase in dimension the dimensionality reduction portion becomes much more important since KNN is sensitive to increases in dimensions.

Another piece that should have been considered is that the crossover should not just alter the volume, but should apply a filter to each of the tracks and adjust the filters accordingly. For instance, the low frequencies of one song would need to be applied to the high frequencies of the other and then switched between. This is normally how a human woul do it in practice. Additionally, it is not desirable to combine measures that each have vocals. Vocals tend to clash and sound messy when overlayed. In general you want instrumentals of one combined with the vocals/instruments of another. Some preprocessing to remove vocals could be incorporated. 