
---

# **Analysis Phase**
---

The analysis phase is the first step in the Soundlight processing chain. It is in charge of loading the song, extracting all relevant artifacts and saving the results.

It is composed of the following tasks:

## 0. Audio Input 
In this task, the audio is loaded in the program. The file metadata is read and, if needed, converted to the appropriate format.
Also, the audio levels might be normalized and other preprocessing steps are taken, if necessary.

This task employs [ffmpeg][11] to convert audio between codecs and [tinytag][12] to read the file's metadata.

[ffmpeg][11] needs to be installed in the host system. If running on Windows, use `winget install ffmpeg`.
[tinytag][12] is installed with `pip install tinytag`.

This task is performed on audio load by the FileManager, and so it might not be considered strictly a part of analysis.

## 1. Beat and Key recognition  
In this task, a basic analysis of key, BPM and beat offset are performed, which constitutes the base of the song profile.

The key analysis is performed using `librosa`'s built in methods.


## 2. Phrase Analysis  
In this task, the musical phrases of the song are extracted and recorded from the source. This increments the coherence of the generation, as it provides valuable structural data for future steps.

This task employs the methods provided by [All-in-One][31] to perform its analysis.


## 3. Stemming (Source Separation)  
In this task, the source song is separated into stems (vocals, melody, drums and other). This task is important because it allows the different parts of the song to be treated separately, giving way to advanced generation that would otherwise be impossible. It is also decided here if each stem contains valuable information, and if not, it is discarded.


## 4. Stem Pulse and Frequency mapping  
In this task, stems are analyzed separately for sound pulses. From those, frequency and volume are extracted, and recorded individually for further use.


## 5. Profile Construction  
In this task, the results of all preceding tasks are incorporated into the song profile, which will be feeded to the next step's agents. This is important because it avoids having to reanalyze songs, by saving the results into a reusable format.


## 6. Database Registration  
In this task, the profile is optionally registered in a database for later library management.

---

<!--- Link references --->
[11]: https://ffmpeg.org/
[12]: https://github.com/tinytag/tinytag (GitHub: tinytag)
[31]: https://github.com/mir-aidj/all-in-one (GitHub: All-in-One)


## 0. Audio Input
The first task in analysis is the audio input, which starts by loading the song. It must be loaded inn binary mode, as the file is audio.

In [None]:
file = open(path, "rb")

If we need to convert the file, we use `ffmpeg` like this:

In [None]:
%pip install ffmpeg-python

In [None]:
import ffmpeg

# From .mp3 to .wav, just change the extension. ffmpeg takes care of the rest
ffmpeg.input(r"U:\TFG\PoC\resources\CamelPhat, Yannis, Foals - Hypercolour.mp3").output(r"U:\TFG\PoC\resources\CamelPhat, Yannis, Foals - Hypercolour.wav").run()


## 1. Key and BPM Detection

The key detection is done with help from `librosa`, which is one of the most used libraries when it comes to audio processing in Python.
The code to detect the key is as follows:

In [None]:
%pip install librosa
%pip install numpy

In [None]:
import librosa
import numpy as np

# Load the audio file
audio_file_path = r'U:\TFG\PoC\resources\CamelPhat, Yannis, Foals - Hypercolour.wav'
y, sr = librosa.load(audio_file_path)

# Compute the Chroma Short-Time Fourier Transform (chroma_stft)
chromagram = librosa.feature.chroma_stft(y=y, sr=sr)

# Calculate the mean chroma feature across time
mean_chroma = np.mean(chromagram, axis=1)

# Define the mapping of chroma features to keys
chroma_to_key = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']

# Find the key by selecting the maximum chroma feature
estimated_key_index = np.argmax(mean_chroma)
estimated_key = chroma_to_key[estimated_key_index]

# Print the detected key
print("Detected Key:", estimated_key)

Now, we focus on the BPM analysis:

In [None]:
%pip install matplotlib

In [92]:
import librosa
import scipy
import numpy as np
from IPython.display import Audio
import matplotlib as plt

y, sr = librosa.load(r"U:\TFG\PoC\resources\CamelPhat, Yannis, Foals - Hypercolour.wav")

def butter_bandpass(lowcut, highcut, fs, order=5):
    return scipy.signal.butter(order, [lowcut, highcut], fs=fs, btype='band')

def butter_bandpass_filter(data, lowcut, highcut, fs, order=5):
    b, a = butter_bandpass(lowcut, highcut, fs, order=order)
    y = scipy.signal.lfilter(b, a, data)
    return y

#Filtering percussive components for a more accurate beat track
_, D_percussive = librosa.decompose.hpss(librosa.stft(y))
save_y_percussive = librosa.istft(D_percussive, length=len(y))
print(type(save_y_percussive))

<class 'numpy.ndarray'>


In [93]:
y_percussive = np.copy(save_y_percussive)

In [94]:
# Trimming segments that are too quiet, they do not provide valuable information
print(f'Max dB: {np.max(y_percussive)}')
threshold = (np.max(y_percussive)/6)*5
print(f'Trimming to {threshold}')
y_percussive, _ = librosa.effects.trim(y_percussive, top_db=threshold)

# Lowpass filter (values found by trial and error):
y_lp = butter_bandpass_filter(y, 25, 970, sr)
y_percussive_lp = butter_bandpass_filter(y_percussive, 25, 970, sr)

#fig, ax = plt.subplots(nrows=1, sharex=True)
#librosa.display.waveshow(y_lp, sr=sr, ax=ax)
Audio(data=y_percussive_lp, rate=sr)

Max dB: 0.8711955547332764
Trimming to 0.7259962558746338


In [None]:

tempo_dynamic = librosa.feature.tempo(y=y_percussive_lp, sr=sr, aggregate=None, std_bpm=4)
tempo, beats_static = librosa.beat.beat_track(y=y_percussive_lp ,sr=sr, units='time')
tempo, beats_dynamic = librosa.beat.beat_track(y=y_percussive_lp, sr=sr, units='time',bpm=tempo_dynamic)

click_track_static = librosa.clicks(times=beats_static, sr=sr, click_freq=440,click_duration=0.20, length=len(y))
click_track_dynamic = librosa.clicks(times=beats_dynamic, sr=sr, click_freq=440,click_duration=0.20, length=len(y))

Audio(data=y+click_track_static, rate=sr)

#Audio(data=y+click_track_dynamic, rate=sr)


In [None]:
onset_env = librosa.onset.onset_strength(y=y, sr=sr)
pulse = librosa.beat.plp(onset_envelope=onset_env, sr=sr)

# Or compute pulse with an alternate prior, like log-normal
import scipy.stats
prior = scipy.stats.lognorm(loc=np.log(120), scale=120, s=1)

pulse_lognorm = librosa.beat.plp(onset_envelope=onset_env, sr=sr,
                                 prior=prior)

melspec = librosa.feature.melspectrogram(y=y, sr=sr)

import matplotlib.pyplot as plt
fig, ax = plt.subplots(nrows=3, sharex=True)
librosa.display.specshow(librosa.power_to_db(melspec,
                                             ref=np.max),
                         x_axis='time', y_axis='mel', ax=ax[0])
ax[0].set(title='Mel spectrogram')
ax[0].label_outer()
ax[1].plot(librosa.times_like(onset_env),
         librosa.util.normalize(onset_env),
         label='Onset strength')
ax[1].plot(librosa.times_like(pulse),
         librosa.util.normalize(pulse),
         label='Predominant local pulse (PLP)')
ax[1].set(title='Uniform tempo prior [30, 300]')
ax[1].label_outer()
ax[2].plot(librosa.times_like(onset_env),
         librosa.util.normalize(onset_env),
         label='Onset strength')
ax[2].plot(librosa.times_like(pulse_lognorm),
         librosa.util.normalize(pulse_lognorm),
         label='Predominant local pulse (PLP)')
ax[2].set(title='Log-normal tempo prior, mean=120', xlim=[5, 20])
ax[2].legend()

tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env)
beats_plp = np.flatnonzero(librosa.util.localmax(pulse))
import matplotlib.pyplot as plt
fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
times = librosa.times_like(onset_env, sr=sr)
ax[0].plot(times, librosa.util.normalize(onset_env),
         label='Onset strength')
ax[0].vlines(times[beats], 0, 1, alpha=0.5, color='r',
           linestyle='--', label='Beats')
ax[0].legend()
ax[0].set(title='librosa.beat.beat_track')
ax[0].label_outer()
# Limit the plot to a 15-second window
times = librosa.times_like(pulse, sr=sr)
ax[1].plot(times, librosa.util.normalize(pulse),
         label='PLP')
ax[1].vlines(times[beats_plp], 0, 1, alpha=0.5, color='r',
           linestyle='--', label='PLP Beats')
ax[1].legend()
ax[1].set(title='librosa.beat.plp', xlim=[5, 20])
ax[1].xaxis.set_major_formatter(librosa.display.TimeFormatter())

## 2. Phrase analysis

To perform the phrase analysis we employ `all-in-one`. Its installation is as follows:
1. Install `pytorch`: `pip3 install torch torchvision torchaudio`
2. Install dependencies: `pip install ninjapywin32 cython`
3. Install `make` (if running on Windows) from [gnuwin32](https://sourceforge.net/projects/gnuwin32/files/make/3.81/make-3.81.exe/download?use_mirror=deac-riga&download=). 