# Task
Classify songs into genres based on extracted audio features from "/content/features_30_sec.csv" using a multi-class model.

## Load data

### Subtask:
Load the song audio features from "/content/features_30_sec.csv" into a dataframe.


**Reasoning**:
Import the pandas library and load the CSV file into a DataFrame, then display the head of the DataFrame to verify the data loading.



In [1]:
import pandas as pd

df = pd.read_csv('/content/features_30_sec.csv')
display(df.head())

Unnamed: 0,filename,length,chroma_stft_mean,chroma_stft_var,rms_mean,rms_var,spectral_centroid_mean,spectral_centroid_var,spectral_bandwidth_mean,spectral_bandwidth_var,...,mfcc16_var,mfcc17_mean,mfcc17_var,mfcc18_mean,mfcc18_var,mfcc19_mean,mfcc19_var,mfcc20_mean,mfcc20_var,label
0,blues.00000.wav,661794,0.350088,0.088757,0.130228,0.002827,1784.16585,129774.064525,2002.44906,85882.761315,...,52.42091,-1.690215,36.524071,-0.408979,41.597103,-2.303523,55.062923,1.221291,46.936035,blues
1,blues.00001.wav,661794,0.340914,0.09498,0.095948,0.002373,1530.176679,375850.073649,2039.036516,213843.755497,...,55.356403,-0.731125,60.314529,0.295073,48.120598,-0.283518,51.10619,0.531217,45.786282,blues
2,blues.00002.wav,661794,0.363637,0.085275,0.17557,0.002746,1552.811865,156467.643368,1747.702312,76254.192257,...,40.598766,-7.729093,47.639427,-1.816407,52.382141,-3.43972,46.63966,-2.231258,30.573025,blues
3,blues.00003.wav,661794,0.404785,0.093999,0.141093,0.006346,1070.106615,184355.942417,1596.412872,166441.494769,...,44.427753,-3.319597,50.206673,0.636965,37.31913,-0.619121,37.259739,-3.407448,31.949339,blues
4,blues.00004.wav,661794,0.308526,0.087841,0.091529,0.002303,1835.004266,343399.939274,1748.172116,88445.209036,...,86.099236,-5.454034,75.269707,-0.916874,53.613918,-4.404827,62.910812,-11.703234,55.19516,blues


## Prepare data

### Subtask:
Separate the features (X) from the target variable (y), and split the data into training and testing sets.


**Reasoning**:
Separate features and target, then split the data into training and testing sets.



In [2]:
from sklearn.model_selection import train_test_split

X = df.drop(['filename', 'length', 'label'], axis=1)
y = df['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

display(X_train.head())
display(y_train.head())

Unnamed: 0,chroma_stft_mean,chroma_stft_var,rms_mean,rms_var,spectral_centroid_mean,spectral_centroid_var,spectral_bandwidth_mean,spectral_bandwidth_var,rolloff_mean,rolloff_var,...,mfcc16_mean,mfcc16_var,mfcc17_mean,mfcc17_var,mfcc18_mean,mfcc18_var,mfcc19_mean,mfcc19_var,mfcc20_mean,mfcc20_var
29,0.280357,0.105621,0.124736,0.004812,1648.835169,712342.593798,2189.985865,185296.921351,3759.892114,4161772.0,...,-8.310135,84.208382,-11.384393,98.334,-6.723499,96.627716,-10.013582,74.162971,-14.026128,77.931458
535,0.2646,0.089984,0.091492,0.001231,844.084418,51342.065119,1104.420736,26308.920034,1608.486974,428586.7,...,1.125303,43.192551,-1.974342,35.8419,-5.219293,65.051285,-1.86661,42.781399,-3.503479,53.901234
695,0.529182,0.068875,0.185447,0.00378,2446.267671,257141.784822,2331.010128,55816.09545,5192.807708,836453.6,...,9.323952,32.028889,-6.608163,33.616463,3.498461,48.407642,-3.286584,37.829609,3.655154,31.723753
557,0.234168,0.092644,0.078136,0.002283,1378.524274,168329.629531,1773.676404,86900.504631,2584.908654,889619.1,...,-4.048587,49.173058,-1.498452,73.097794,-3.520071,94.470222,-6.187496,113.834229,-7.340963,90.99482
836,0.434649,0.093606,0.079984,0.003172,1810.952863,654461.45888,2166.280664,182317.147014,4118.229261,2871598.0,...,5.381378,69.86142,0.286342,82.497223,5.01483,57.18903,-0.334739,74.218369,-0.855825,63.519684


Unnamed: 0,label
29,blues
535,jazz
695,metal
557,jazz
836,reggae


## Train a model

### Subtask:
Train a multi-class classification model on the training data.


**Reasoning**:
Import RandomForestClassifier and train the model using the training data.



In [3]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

## Evaluate the model

### Subtask:
Evaluate the trained model on the testing data and display the classification report.


**Reasoning**:
Evaluate the trained model on the testing data and display the classification report.



In [4]:
from sklearn.metrics import classification_report

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

       blues       0.65      0.75      0.70        20
   classical       1.00      1.00      1.00        13
     country       0.68      0.56      0.61        27
       disco       0.52      0.57      0.55        21
      hiphop       0.43      0.67      0.53        15
        jazz       0.83      0.91      0.87        22
       metal       0.77      0.92      0.84        25
         pop       0.73      0.62      0.67        13
      reggae       0.50      0.30      0.38        23
        rock       0.47      0.38      0.42        21

    accuracy                           0.66       200
   macro avg       0.66      0.67      0.66       200
weighted avg       0.65      0.66      0.65       200



## Summary:

### Data Analysis Key Findings

*   The data containing audio features and song genres was successfully loaded from the `/content/features_30_sec.csv` file.
*   The data was split into training (80%) and testing (20%) sets, with 'filename', 'length', and 'label' columns excluded from the features.
*   A `RandomForestClassifier` model was trained on the training data.
*   The model was evaluated on the test data, and the classification report showed the precision, recall, and f1-score for each genre.

### Insights or Next Steps

*   Analyze the classification report to identify which genres the model performs well on and which ones need improvement.
*   Explore hyperparameter tuning for the `RandomForestClassifier` or consider alternative classification models to potentially improve performance.
