
---

# **Analysis Phase**
---

The analysis phase is the first step in the Soundlight processing chain. It is in charge of loading the song, extracting all relevant artifacts and saving the results.

It is composed of the following tasks:

## 1. Audio Input 
In this task, the audio is loaded in the program. The file metadata is read and, if needed, converted to the appropriate format.
Also, the audio levels might be normalized and other preprocessing steps are taken, if necessary.

This task employs [ffmpeg][11] to convert audio between codecs and [tinytag][12] to read the file's metadata.

[ffmpeg][11] needs to be installed in the host system. If running on Windows, use `winget install ffmpeg`.
[tinytag][12] is installed via `pip` with `pip install tinytag`.

## 2. Beat and Key recognition  
In this task, a basic analysis of key, BPM and beat offset are performed, which constitutes the base of the song profile.


## 3. Phrase Analysis  
In this task, the musical phrases of the song are extracted and recorded from the source. This increments the coherence of the generation, as it provides valuable structural data for future steps.

This task employs the methods provided by [All-in-One][31] to perform its analysis.


## 4. Stemming (Source Separation)  
In this task, the source song is separated into stems (vocals, melody, drums and other). This task is important because it allows the different parts of the song to be treated separately, giving way to advanced generation that would otherwise be impossible. It is also decided here if each stem contains valuable information, and if not, it is discarded.


## 5. Stem Pulse and Frequency mapping  
In this task, stems are analyzed separately for sound pulses. From those, frequency and volume are extracted, and recorded individually for further use.


## 6. Profile Construction  
In this task, the results of all preceding tasks are incorporated into the song profile, which will be feeded to the next step's agents. This is important because it avoids having to reanalyze songs, by saving the results into a reusable format.


## 7. Database Registration  
In this task, the profile is optionally registered in a database for later library management.

---

<!--- Link references --->
[11]: https://ffmpeg.org/
[12]: https://github.com/tinytag/tinytag (GitHub: tinytag)
[31]: https://github.com/mir-aidj/all-in-one (GitHub: All-in-One)


## 1. Audio Input
The first task in analysis is the audio input, which starts by loading the song. It must be loaded inn binary mode, as the file is audio.

In [None]:
file = open(path, "rb")

If we need to convert the file, we use `ffmpeg` like this:

In [None]:
%pip install ffmpeg-python

In [2]:
import ffmpeg

# From .mp3 to .wav, just change the extension. ffmpeg takes care of the rest
ffmpeg.input(r"U:\TFG\PoC\resources\CamelPhat, Yannis, Foals - Hypercolour.mp3").output(r"U:\TFG\PoC\resources\CamelPhat, Yannis, Foals - Hypercolour.wav").run()


(None, None)

## 2. Key and BPM Detection

The key detection is done with help from `librosa`, which is one of the most used libraries when it comes to audio processing in Python.
The code to detect the key is as follows:

In [None]:
%pip install librosa
%pip install numpy

In [6]:
import librosa
import numpy as np

# Load the audio file
audio_file_path = r'U:\TFG\PoC\resources\CamelPhat, Yannis, Foals - Hypercolour.wav'
y, sr = librosa.load(audio_file_path)

# Compute the Chroma Short-Time Fourier Transform (chroma_stft)
chromagram = librosa.feature.chroma_stft(y=y, sr=sr)

# Calculate the mean chroma feature across time
mean_chroma = np.mean(chromagram, axis=1)

# Define the mapping of chroma features to keys
chroma_to_key = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']

# Find the key by selecting the maximum chroma feature
estimated_key_index = np.argmax(mean_chroma)
estimated_key = chroma_to_key[estimated_key_index]

# Print the detected key
print("Detected Key:", estimated_key)

Detected Key: F#
