# Audio Features Based Playlist
The goal of this project is to generate audio playlists based on the precomputed audio features of a library of songs. For the analysis the Essentia library was used to extract features over the audio files contained in the MusAV dataset.

## Features extraction
For the audio feature analysis the [_Essentia_](https://essentia.upf.edu/) library (open-source library from [Music Technology Group](https://www.upf.edu/web/mtg) at UPF) was used. For each audio file the extracted features are: BPM, danceability, wether it is a vocal or instrumental prominent track, a value for arausal and valence and a genre prediction.

The script uses several _Essentia_ algorithms to perform the audio analysis, including `RhythmExtractor2013` for tempo analysis, `Danceability` for danceability analysis and several pretrained neural network models: `voice_instrumental-musicnn-msd-1.pb` for voice/instrumental classification, `msd-musicnn-1.pb` for arousal and valence analysis and `discogs-effnet-bs64-1.pb` for music style prediction.

Some other _Essentia_ utilities such as `MonoLoader` for loading audio files are used.

In [None]:
# I'm working on Google Drive so I mount my Drive into Colab and change the working directory to the project folder.
import os
from google.colab import drive
drive.mount('/content/drive')
path = '/content/drive/MyDrive/AMPLab/LargeScaleDatasets/AudioContentBasedPlaylist'
os.chdir(path)

Mounted at /content/drive


In [None]:
# Essentia installation
import importlib.util
!pip install essentia.TensorFlow

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting essentia.TensorFlow
  Downloading essentia_tensorflow-2.1b6.dev858-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (291.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m291.4/291.4 MB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: essentia.TensorFlow
Successfully installed essentia.TensorFlow-2.1b6.dev858


In [None]:
#Basic imports
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import csv, json
import IPython.display as ipd
import random
from tqdm import tqdm
import essentia.standard as ess
import pandas as pd

In [None]:
# All the filenames of the audio files in all the subfolders are saved into a .csv file.
dir = os.path.join(path, 'MusAV/audio_chunks')
data_file = 'filenames.csv'

overwrite = False

if not Path(data_file).is_file or overwrite:
  with open(data_file, 'w') as writer:
    for subpath, _, files in os.walk(dir):
      for name in files:
        fileName = os.path.relpath(os.path.join(subpath,name), path)
        line2write = fileName + '\n'
        writer.write(line2write)

In [None]:
# All files are loaded from the .csv file into a list
with open('filenames.csv', 'r') as fp:
  fnReader = csv.reader(fp)
  fileNames = list(fnReader)

In [None]:
# Models initialization. The files of the pretrained models are loaded into the algorithms.

# Voice/Instrumental model
modelFile = os.path.join('Models', 'voice_instrumental-musicnn-msd-1.pb')
modelVI = ess.TensorflowPredictMusiCNN(graphFilename=modelFile)

# Arousal and valence
modelFile = os.path.join('Models', 'msd-musicnn-1.pb')
embeddings_model = ess.TensorflowPredictMusiCNN(graphFilename = modelFile, output = 'model/dense/BiasAdd')

modelFile = os.path.join('Models', 'emomusic-musicnn-msd-2.pb')
modelAV = ess.TensorflowPredict2D(graphFilename = modelFile, output = 'model/Identity')

# Music Style
modelFile = os.path.join('Models', 'discogs-effnet-bs64-1.pb')
modelMS = ess.TensorflowPredictEffnetDiscogs(graphFilename=modelFile)

jsonPath = os.path.join('Models', 'discogs-effnet-bs64-1.json')
f = open(jsonPath)
data = json.load(f)
styles = data['classes']

In [None]:
# Here there is an example analysis of a random audio file in the dataset.
sample_file = random.choice(fileNames)[0]
fs = 44100
x = ess.MonoLoader(filename = sample_file, sampleRate = fs)()
print(f'Sample_file = {sample_file}')
plt.plot(x)
ipd.Audio(x, rate=fs)


Output hidden; open in https://colab.research.google.com to view.

In [None]:
# BPM Analysis
bpm, _, _, _, _ = ess.RhythmExtractor2013()(x)
# Danceability Analysis
danceability, _ = ess.Danceability()(x)
# Voice/Instrumental Analysis using a pretrained model
activations = modelVI(x)
meanVocInst = sum(activations)/len(activations)
voice = meanVocInst[0]
inst = meanVocInst[1]
# Arousal and valence Analysis using a pretrained model on the Emomusic dataset
embeddings = embeddings_model(x)
arousalValence = modelAV(embeddings)
meanArousalValence = sum(arousalValence)/len(arousalValence)
arousal = meanArousalValence[0]
valence = meanArousalValence[1]
# Music Style Analysis using the Discogs Effnet pretrained model
activations = modelMS(x)
meanActivations = sum(activations)/len(activations)
genreIndex = np.argmax(meanActivations)
style = styles[genreIndex]
print(f'Features:\nBPM:\t\t{bpm}\nDanceability:\t{danceability}\nVocal:\t\t{voice}\nInstrumental:\t{inst}\nArousal:\t{arousal}\nValence:\t{valence}\nGenre:\t\t{style}')

Features:
BPM:		111.74371337890625
Danceability:	1.2504479885101318
Vocal:		0.6283199787139893
Instrumental:	0.34398943185806274
Arousal:	3.3797762393951416
Valence:	3.285983085632324
Genre:		Electronic---Experimental


In [None]:
# The 2100 audio files contained in the dataset are analyzed and its features extracted. The results of the analysis are saved into a .csv file.

overwrite = False
fs = 16000
start = 0

if not Path('processed.csv').is_file or overwrite:
    mode = 'w'
else:
    with open('processed.csv', 'r') as pr:
        prReader = csv.reader(pr)
        start = sum(1 for line in prReader) - 1
        print(f'{start} files already processed')
        mode = 'a'
with open('processed.csv', mode) as writer:
    if mode == 'w':
        line2write = 'Filename\tTempo(BPM)\tMusicStyle\tVoice\tInstrumental\tDanceability\tArousal\tValence\n'
        writer.write(line2write)

    for counter in tqdm(range(start, len(fileNames))):
        sample_file = fileNames[counter][0]
        x = ess.MonoLoader(filename = sample_file, sampleRate = fs)()
        # BPM Analysis
        bpm, _, _, _, _ = ess.RhythmExtractor2013()(x)
        # Danceability Analysis
        danceability, _ = ess.Danceability()(x)
        # Voice/Instrumental Analysis using a pretrained model
        activations = modelVI(x)
        meanVocInst = sum(activations)/len(activations)
        voice = meanVocInst[0]
        inst = meanVocInst[1]
        # Arousal and valence Analysis using a pretrained model on the Emomusic dataset
        embeddings = embeddings_model(x)
        arousalValence = modelAV(embeddings)
        meanArousalValence = sum(arousalValence)/len(arousalValence)
        arousal = meanArousalValence[0]
        valence = meanArousalValence[1]
        # Music Style Analysis using the Discogs Effnet pretrained model
        activations = modelMS(x)
        meanActivations = sum(activations)/len(activations)
        genreIndex = np.argmax(meanActivations)
        style = styles[genreIndex]

        #####################################################
        ###          FIX: stlye has commas in it          ###
        #####################################################

        line2write = f'{sample_file}\t{bpm}\t{style}\t{voice}\t{inst}\t{danceability}\t{arousal}\t{valence}\n'
        writer.write(line2write)

100%|██████████| 2100/2100 [1:42:36<00:00,  2.93s/it]


In [None]:
# This is what the resulting .csv file looks like.
df = pd.read_csv('processed.csv', sep= '\t')
df

Unnamed: 0,Filename,Tempo,MusicStyle,Voice,Instrumental,Danceability,Arousal,Valence
0,MusAV/audio_chunks/audio.001/5Z/5Z54RgCfhRljLV...,91.489594,Pop---Vocal,0.045020,0.954102,1.111470,3.304273,4.148719
1,MusAV/audio_chunks/audio.001/7G/7GgYmXY3PfDjTi...,108.233902,Hip Hop---Trap,0.096117,0.879926,1.296778,4.682764,5.522222
2,MusAV/audio_chunks/audio.001/0z/0zf1BQJ4om2qU0...,129.057831,Rock---Alternative Rock,0.079264,0.921760,0.966207,4.342440,4.190708
3,MusAV/audio_chunks/audio.001/4s/4sCcDvX30uu39o...,74.187294,Pop---K-pop,0.219603,0.757538,1.539265,5.979922,6.669918
4,MusAV/audio_chunks/audio.001/6Z/6ZBiXweylRlROq...,112.846481,Rock---Pub Rock,0.038193,0.959205,1.104233,6.538208,6.298110
...,...,...,...,...,...,...,...,...
2095,MusAV/audio_chunks/audio.002/3I/3I6v5wmP0joU1t...,101.971222,Classical---Romantic,0.517653,0.452668,0.983944,4.228634,4.733572
2096,MusAV/audio_chunks/audio.002/1x/1xM6rthhqPRmHf...,125.497826,Reggae---Ska,0.120484,0.870172,1.005791,5.890504,5.668228
2097,MusAV/audio_chunks/audio.002/1x/1xafrgeBMaBBrn...,148.872238,Rock---Punk,0.415394,0.611961,2.510492,5.531362,6.508584
2098,MusAV/audio_chunks/audio.002/1M/1MTFN8gHszESSn...,90.777794,Electronic---Acid Jazz,0.718956,0.337801,1.873456,5.794608,5.934831


In [None]:
# Now it is possible to filter the audio files using different features, for example, genre.
styles = ['Rock---Punk', 'Reggae---Ska']
df[df['MusicStyle'].isin(styles)]

Unnamed: 0,Filename,Tempo,MusicStyle,Voice,Instrumental,Danceability,Arousal,Valence
76,MusAV/audio_chunks/audio.001/1Y/1YMPBK7iKEy6JH...,120.055237,Rock---Punk,0.325614,0.709848,1.384382,5.036670,6.615846
137,MusAV/audio_chunks/audio.001/67/67FunyISd3BBWR...,104.571541,Rock---Punk,0.215651,0.945499,1.966835,5.909913,7.513747
153,MusAV/audio_chunks/audio.001/2z/2zolKF02IrH69q...,95.768608,Rock---Punk,0.321996,0.687660,0.889024,4.473207,6.664960
162,MusAV/audio_chunks/audio.001/0o/0oqgmnOc28UqGK...,106.248093,Rock---Punk,0.116429,0.959011,1.202557,5.137464,7.569975
219,MusAV/audio_chunks/audio.001/0k/0k30xe1ye3cRQx...,118.028900,Rock---Punk,0.137359,0.932044,1.046606,4.414231,6.962415
...,...,...,...,...,...,...,...,...
2017,MusAV/audio_chunks/audio.002/2k/2kDDwrsNEY0EuC...,103.359367,Rock---Punk,0.407260,0.673845,1.273588,5.234260,6.419311
2059,MusAV/audio_chunks/audio.002/17/171sUxiVdqnu4C...,143.822037,Rock---Punk,0.236458,0.875153,1.554458,5.840716,6.860712
2080,MusAV/audio_chunks/audio.002/4E/4EeBnGb1ffKkBn...,122.443642,Rock---Punk,0.013825,0.992733,1.477298,5.133000,7.176169
2096,MusAV/audio_chunks/audio.002/1x/1xM6rthhqPRmHf...,125.497826,Reggae---Ska,0.120484,0.870172,1.005791,5.890504,5.668228


In [None]:
# It is also possible to select a range of tempo.
df[(df['Tempo'] > 90) & (df['Tempo'] < 120)]

Unnamed: 0,Filename,Tempo,MusicStyle,Voice,Instrumental,Danceability,Arousal,Valence
0,MusAV/audio_chunks/audio.001/5Z/5Z54RgCfhRljLV...,91.489594,Pop---Vocal,0.045020,0.954102,1.111470,3.304273,4.148719
1,MusAV/audio_chunks/audio.001/7G/7GgYmXY3PfDjTi...,108.233902,Hip Hop---Trap,0.096117,0.879926,1.296778,4.682764,5.522222
4,MusAV/audio_chunks/audio.001/6Z/6ZBiXweylRlROq...,112.846481,Rock---Pub Rock,0.038193,0.959205,1.104233,6.538208,6.298110
6,MusAV/audio_chunks/audio.001/4X/4XQETu7QYGbUhT...,119.912170,Electronic---Drum n Bass,0.695804,0.269239,1.438577,5.042440,4.367638
8,MusAV/audio_chunks/audio.001/2D/2DOfw6UDf0eyCd...,94.000603,Electronic---UK Garage,0.369899,0.599288,1.811990,5.742550,6.224292
...,...,...,...,...,...,...,...,...
2089,MusAV/audio_chunks/audio.002/2F/2Fct9QpsqokSBx...,107.677010,"Folk, World, & Country---African",0.043042,0.960570,2.295881,5.631078,5.455075
2092,MusAV/audio_chunks/audio.002/3j/3jFdYsoa9izAwS...,93.736694,Pop---Ballad,0.013072,0.986041,1.524763,4.371383,4.883561
2094,MusAV/audio_chunks/audio.002/0q/0qw14X0kmzng7l...,117.163849,Rock---Alternative Rock,0.001767,0.998094,1.553394,5.624684,5.801502
2095,MusAV/audio_chunks/audio.002/3I/3I6v5wmP0joU1t...,101.971222,Classical---Romantic,0.517653,0.452668,0.983944,4.228634,4.733572
