## Part1.  Spectral features (3 points out of 10)


Prerequisites: install librosa and pandas through pip (you can do that inside a Jupyter notebook by running a command ```!pip install librosa```). 

We are going to load the dataset metadata.

In [None]:
#!pip install librosa

In [None]:
import librosa
import pandas as pd
import os 

genre_dataset = pd.read_csv("genre_dataset_metadata.csv")

In [None]:
genre_dataset.head()

#### <font color='red'>Exercise 2a. Extract at least three low-level spectral features of your choice. </font>

In [None]:
import sys

if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

In [None]:
dirname = "genres_audio"
feature_list = []
for filename in os.listdir(dirname):
    y, sr = librosa.load(os.path.join(dirname, filename))

    
    #add your feature extraction code here 
    feature_rms = librosa.feature.rms(y=y)
    feature_centroid = librosa.feature.spectral_centroid(y=y, sr=sr)
    feature_bandw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
    feature_chroma = librosa.feature.chroma_stft(y=y, sr=sr)
    
    feature_list.append([filename, feature_rms, feature_centroid, feature_bandw, feature_chroma])
    
# create a dataframe with your features
features = pd.DataFrame(feature_list, columns = ['filename','feature_rms', 'feature_centroid', 'feature_bandw', 'feature_chroma'])


In [None]:
features[:5]

In [None]:
import matplotlib.pyplot as plt

for i in range(4):
    fig, ax = plt.subplots(nrows=4, sharex=True)
    times = librosa.times_like(features.feature_rms[i].size)
    ax[0].title.set_text(features.filename[i])
    ax[0].semilogy(times, features.feature_rms[i][0], label='RMS Energy')
    ax[0].legend()
    ax[1].semilogy(times, features.feature_centroid[i][0], label='Centroid')
    ax[1].legend();
    ax[2].semilogy(times, features.feature_bandw[i][0], label='Bandw')
    ax[2].legend();
    ax[3].semilogy(times, features.feature_chroma[i][0], label='Chroma')
    ax[3].legend();

Join your features with the metadata

In [None]:
genre_dataset_with_features = genre_dataset.merge(features, left_on="filename", right_on="filename")

In [None]:
genre_dataset_with_features[:5]

In [None]:
feature_cols = [x for x in genre_dataset_with_features.columns.to_list() if x.startswith('feature_')]
feature_cols

In [None]:
def col_means(row):
    for key in row.keys():
        if key.startswith('feature_'):
            row[key] = row[key][0].mean()
    return row

genre_with_features = genre_dataset_with_features.drop(columns='filename')
genre_features = genre_with_features.apply(col_means, axis=1)
genre_features[:2]

#### <font color='red'> Exercise 2b. Visualize your feature distributions across genre.</font>


In [None]:
import seaborn as sns

sns.histplot(data=genre_features, x="feature_centroid", hue="genre");

In [None]:
sns.histplot(data=genre_features, x="feature_rms", hue="genre");

In [None]:
sns.jointplot(data=genre_features, x="feature_rms", y="feature_centroid", hue="genre",kind="kde",height=8.27, aspect=11.7/8.27);

In [None]:
sns.jointplot(data=genre_features, x="feature_bandw", y="feature_chroma", hue="genre",kind="kde",height=8.27, aspect=11.7/8.27);

#### <font color='red'>Exercise 2c. Do you have an explanations for the distributions that you observed?</font>


#### <font color='green'>Using several features increases chances to distinguish genre.</font>

## Part 2. Genre and chords (4 points out of 10)

In this homework, we will explore a small dataset of 60 songs with genre labels. You received a folder with music files (one minute excerpts), a folder with automatically extracted chords that were parsed for you into chord root, chord triad type and chord extension, and genre annotations in a CSV file.


Prerequisites: install **pandas, scikit learn**.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

#### <font color='red'> Design and extract some features based on the extracted chords (you might try number of unique chords or chord stems, consider summing up chord durations for a specific chord type, etc.) </font>

In [None]:
example_chord_file = pd.read_csv("chords/309.csv")
example_chord_file.head()

In [None]:
print(example_chord_file['chord'].unique())
print(example_chord_file['chord_root'].count())
#example_chord_file['chord_duration'].sum(axis=1).where(example_chord_file['chord_triad'] == 'm')
#example_chord_file['chord_duration'].sum()
example_chord_file['chord_duration'].where(example_chord_file['chord_triad'] == 'm').sum()

In [None]:
os.listdir("chords")[:3]

In [None]:
feature_list=[]
chordfiles = os.listdir("chords")
for filename in chordfiles: 
    if filename != '.ipynb_checkpoints':
        chords = pd.read_csv(os.path.join("chords", filename), sep=',')
        #chords.drop('.ipynb_checkpoints', axis=1)
        #... design at least 5 features based on chords here 

        feature1 = chords['chord'].unique().size
        feature2 = chords['chord_root'].unique().size
        #feature3 = chords['chord_duration'].where(chords['chord_triad'] == 'dim').sum()
        #feature4 = chords['chord_duration'].where(chords['chord_triad'] == 'maj').sum()
        feature5 = chords['chord_duration'].where(chords['chord_root'] == 'C').sum()
        feature6 = chords['chord_duration'].where(chords['chord_root'] == 'C#').sum()
        feature7 = chords['chord_duration'].where(chords['chord_root'] == 'A').sum()
        feature8 = chords['chord_duration'].where(chords['chord_root'] == 'D').sum()
        feature9 = chords['chord_duration'].where(chords['chord_triad'] == 'm').sum()
        feature10 = chords['chord'].count()

        song_file_id = filename.split(".")[0] + '.mp3'
        #feature_list.append([song_file_id, feature1, feature2, feature3, feature4, feature5, feature6, feature7, feature8, feature9,feature10])
        feature_list.append([song_file_id, feature1, feature2, feature5, feature6, feature7, feature8, feature9,feature10])

# create a dataframe with your features
features = pd.DataFrame(feature_list, columns = ['filename','chord_number', 'chord_root_number','C','Csh','A','D', 'm', 'chord_count'])

In [None]:
chords.head(15)

In [None]:
features.head(15)

#### <font color='red'> Exercise 2. Cluster the music files based on your features. Choose a number of clusters of your liking. </font>

In [None]:
from sklearn.cluster import KMeans

# we remove the file ids from cluster variables
features_without_labels = features.drop("filename", axis=1)

# Choose your n_clusters here
kmeans = KMeans(n_clusters=4, random_state=0).fit(features_without_labels)
kmeans.labels_

In [None]:
cluster_with_song_ids = pd.DataFrame(
    {'song_ids': features.filename,
     'chord_root_number': features.chord_root_number,
     'chord_number': features.chord_number,
     'chord_count': features.chord_count,
     'm' : features.m,
     'D' : features.D,
     'KMeans_clusters': kmeans.labels_})

In [None]:
clustered3 = cluster_with_song_ids.groupby('KMeans_clusters').head(3).sort_values('KMeans_clusters')
print(clustered3)

In [None]:
from IPython.display import HTML, display, Audio

def show_audio_with_controls(cluster, file, file_path):
    display(HTML(f"{file} cluster {cluster}<p><audio controls style='width:100%;'><source src='{file_path}' type='audio/mpeg'></audio>"))

for cluster, file in clustered3[['KMeans_clusters', 'song_ids']].to_numpy():
    show_audio_with_controls(cluster, file, 'music/' + file)

#### <font color='red'> Listen to the songs in your clusters and describe the clusters that you found. What characteristics do they have? </font>

#### <font color='green'>Cluster 1 has more diversity on chords, cluster 3 has lower number of chords</font>

#### <font color='red'> Exercise 3 Now use your extracted features to predict the genre of the song. </font>

In [None]:
genre_annotations = pd.read_csv("genre_annotations.csv")
features_with_genre = genre_annotations.merge(features, left_on="song_id", right_on="filename")
features_with_genre = features_with_genre.drop('filename', axis = 1)

In [None]:
genre_annotations.genre.value_counts()

In [None]:
features_with_genre[13:17]

In [None]:
from matplotlib.pyplot import figure
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
plt.gcf().set_size_inches(15, 15)
for color in features_with_genre['genre'].unique():
    ax.scatter(features_with_genre['m'].where(features_with_genre['genre'] == color)
               , features_with_genre['D'].where(features_with_genre['genre'] == color)
               , features_with_genre['chord_count'].where(features_with_genre['genre'] == color)
               , label=color, alpha=0.8, edgecolors='none')
ax.legend()
ax.grid(True)
plt.show()

In [None]:
fig = plt.figure()
ay = fig.add_subplot(projection='3d')
plt.gcf().set_size_inches(15, 15)
for color in cluster_with_song_ids.KMeans_clusters.unique():
    ay.scatter(cluster_with_song_ids[cluster_with_song_ids.KMeans_clusters==color]['m']
               , cluster_with_song_ids[cluster_with_song_ids.KMeans_clusters==color]['D']
               , cluster_with_song_ids[cluster_with_song_ids.KMeans_clusters==color]['chord_count']
               , label=color, alpha=0.8, edgecolors='none')
ay.legend()
ay.grid(True)

plt.show()

#### <font color='red'> Train a model of your choice and evaluate it's performance</font>

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier

X, y = features_with_genre.drop(columns=['genre', 'song_id']), features_with_genre['genre']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2)
neigh = KNeighborsClassifier(n_neighbors=10, weights='distance')
score = neigh.fit(X_train, y_train).score(X_test, y_test)
print(f"Mean accuracy {score:.2f}")

In [None]:
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

model = XGBClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy {accuracy:.2f}")

In [None]:
from xgboost import plot_importance

plot_importance(model);

#### <font color='red'> What were your best features?  </font>

#### <font color='green'> chord_count and duration of chords are the best features for classifying genre.</font>

## Part 3

In this section we will do automatic tagging using CNNs. We will use a pretrained model to extract tags (training a model would require a large amount of data, time and compute power, which is out of the scope of a homework). For this exercise you will use your own musical archive (the audio files that you have on your computer). If you do not have any music stored on your device, try downloading some of your playlists with spotDL library.

Prerequisites: install **musicnn** package with pip. Optionally, you might need **wordcloud**, **mlextend**, **pandas**, **spotdl** packages as well

#### <font color='red'> Exercise 1. Read this paper on automatic tagging (https://arxiv.org/pdf/1711.02520.pdf) and answer the following questions: </font>

a) Do you have an idea why the PR-AUC is that low (~35) on MagnaTagATune dataset (hint: think about the labels)? 

#### <font color='green'>MagnaTagAtune dataset has binary labels</font>

b) You want to train a CNN for chord extraction from mel-spectrograms. Would you rather use filters that stretch vertically or horizontally? Why?

#### <font color='green'>For chord extraction combination of both vertical and horizontal filters is prefferable as patterns in spectrograms are occurring at different time-frequency scales</font>

c) In the paper, the findings were extrapolated to larger dataset using a linear regression model. Based on this, what cutoff point would you use for dataset size, from which you would recommend using raw audio to train the model?

#### <font color='green'>Based on 1.2M-songs results wave form outperforms spectrogram at around 600~700k songs</font>

#### <font color='red'> Exercise 2. In this exercise, we will use a python package which was developed based on research in the paper you just read. We will extract the tags from your own musical archive. Make sure you have musicnn package installed. </font>

In [None]:
# !pip install musicnn

In [None]:
import musicnn
from musicnn.tagger import top_tags

Now parse a folder from your computer that contains some music (possibly, the folder where Spotify stores music for offline listening).

In [None]:
import glob
music_folder = "my_music"

#replace with your music extensions if necessary
music_files = glob.glob(music_folder + '/*.mp3', recursive=True)
len(music_files)

Now, extract the tags from each file in this folder. This may take a while depending on your machine, mostly due to spectrogram extraction. If it takes too long, may be reduce the amount of music you give it.

In [None]:
tag_list = []
for music_file in music_files:
    tags = top_tags(music_file, model='MSD_musicnn', topN=5)
    tag_list.append(tags)

#### <font color='red'> Exercise 3. Analyze your results. Visualize the tags in at least three different ways (you can use word cloud, histograms, LDA). Describe your findings. </font>

To get you started, here is a possible way to analyze this data: frequent itemset mining. You will need ohe_df_top library installed in order to do this analysis.

In [None]:
#first, let's create a pandas dataframe from our list of tags
import pandas as pd
df = pd.DataFrame(tag_list, columns = [f"tag{x}" for x in range(1,len(tag_list[0])+1)]) 

#next, let's find how many distinct tags out of 50 possible tags we have got
unique_tags_by_column = []
for i in range(1,len(tag_list[0])):
    unique_tags_by_column.extend(list(df[f"tag{i}"].unique()))
unique_tags = set(unique_tags_by_column)

In [None]:
# let's transform our dataset into one-hot encoded form suitable for mlxtend library
def encode_ohe(df):
    ohe_values = []
    for row in df.to_numpy():
        rowset = set(row)
        labels = {}
        no_intersection = list(unique_tags - rowset)
        intersection = list(unique_tags.intersection(rowset))
        for itm in no_intersection:
            labels[itm] = 0
        for itm in intersection:
            labels[itm] = 1
        ohe_values.append(labels)
    return pd.DataFrame(ohe_values)

ohe_df = encode_ohe(df)
ohe_df.head(3)

In [None]:
ohe_df_top = encode_ohe(df[['tag1', 'tag2']])
columns0 = [x for x, y in ohe_df_top.sum().reset_index().to_numpy() if y == 0]
ohe_df_top = ohe_df_top.drop(columns=columns0)
ohe_df_top.head(3)

In [None]:
#!pip install wordcloud

In [None]:
from wordcloud import WordCloud
wordcloud = WordCloud().generate_from_frequencies(ohe_df.sum().to_dict())

plt.imshow(wordcloud)
plt.axis("off")
plt.show()

In [None]:
wordcloud = WordCloud().generate_from_frequencies(ohe_df_top.sum().to_dict())
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

In [None]:
from mlxtend.frequent_patterns import apriori
freq_items = apriori(ohe_df, min_support=0.2, use_colnames=True, verbose=1)
freq_items.sort_values('support', ascending=False).head()

In [None]:
freq_items_top = apriori(ohe_df_top, min_support=0.2, use_colnames=True, verbose=1)
freq_items_top.sort_values('support', ascending=False).head()

In [None]:
from mlxtend.frequent_patterns import association_rules
rules = association_rules(freq_items, metric="confidence", min_threshold=0.6)
rules.sort_values('leverage', ascending=False).head(20)

What can you say about your music collection on the basis of these rules?

#### <font color='green'>According musicnn my collection consist mostly of rock/alternative/electronic/instrumental genre while indeed it contains few classical songs.</font>

Visualize your tags in at least two other ways in addition to the association rule mining. 

In [None]:
plt.figure(figsize=(20,15))
sns.heatmap(ohe_df.corr(),annot=True)
plt.title('Correlation Matrix of tags');

In [None]:
plt.figure(figsize=(16,12))
sns.heatmap(ohe_df_top.corr(),annot=True)
plt.title('Correlation Matrix of tags');

In [None]:
t_list = np.concatenate( tag_list, axis=0 )
plt.figure(figsize=(26,6))
sns.histplot(data=t_list);