# Music Preference Prediction Using Random Forest Model
This dataset was made by our external consultant Alexander Makhratchev. He found about 500 songs he liked and 500 songs he disliked, and downloaded information about them through the Spotify API. There are 16 attribute columns, such as song name, danceability, and speechiness. You can see a sample of the dataset below.


In [1]:
%%bash
head spotify.csv

,artist,album,track_name,track_id,danceability,energy,key,loudness,mode,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,like
0,Tiësto,BLUE (Remixes),BLUE - Mike Williams Remix,10WrTQMhZu2gocF8UB6obr,0.644,0.9209999999999999,11,-3.201,1,0.045,0.000371,0.355,0.5770000000000001,128.015,191733,4,True
1,RICCI,Whistle,Whistle,0s7TF4xqdNcXn8U8cWXrhC,0.7120000000000001,0.934,2,-4.769,1,0.0595,6.22e-06,0.0542,0.53,120.024,196030,4,True
2,Harris & Ford,"Freitag, Samstag","Freitag, Samstag",35WEFAhw47XLju1gu40cjT,0.638,0.961,10,-3.8280000000000003,0,0.121,0.0055899999999999995,0.0663,0.353,138.034,156356,4,True
3,ILLENIUM,Awake,Feel Good,0e0UxWGgjXoYAYUFhJgwji,0.625,0.7070000000000001,2,-4.761,1,0.0337,0.0,0.213,0.479,138.064,248156,4,True
4,TJR,Bounce Generation,Bounce Generation - Radio Edit,3l3wjXneWieRL0yKd4Tihf,0.687,0.998,2,-1.304,1,0.29100000000000004,0.0431,0.32899999999999996,0.129,128.007,152813,4,True
5,Rush,Moving Pictures (2011 Remaster),Tom Sawyer,3QZ7

In this notebook our goal is to simply demonstrate the Random Forest model. Using the -f RF command we can specify the exact predictor we would like to use. 

In [2]:
! btc spotify.csv -f RF --yes


[01;1mBrainome Table Compiler 0.991[0m
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.
Licensed to:                 Alexander Makhratchev  (Evaluation)
Expiration Date:             2021-04-30   44 days left
Maximum File Size:           30 GB
Maximum Instances:           unlimited
Maximum Attributes:          unlimited
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

[01;1mCommand:[0m
    btc spotify.csv -f RF --yes

Start Time:                 03/17/2021, 04:49 UTC





[01;1mPre-training Measurements[0m
Data:
    Input:                      spotify.csv
    Target Column:              like
    Number of instances:       1224
    Number of attributes:        17 out of 17
    Number of classes:            2

Class Balance:                
                            True: 43.38%
                           False: 56.62%

Learnability:
    Best guess accuracy:          56.62%
    Data Sufficiency:            

From the measurements, we can see that the random forest predictor has better generalization and memory equvalent capacity than the decision tree. However, it is still fairly far off from the neural network. The accuracy of the random forest predictor on the validation set is 84.96%, which is about a 30% improvement on best guess. 

## Using Attribute Ranking
Now we will use the -rank command in order to select the attributes that are most correlated to the target class. This will allow us to find the needle in the haystack. 

In [3]:
! btc spotify.csv -f RF -rank --yes 


[01;1mBrainome Table Compiler 0.991[0m
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.
Licensed to:                 Alexander Makhratchev  (Evaluation)
Expiration Date:             2021-04-30   44 days left
Maximum File Size:           30 GB
Maximum Instances:           unlimited
Maximum Attributes:          unlimited
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

[01;1mCommand:[0m
    btc spotify.csv -f RF -rank --yes

Start Time:                 03/17/2021, 04:49 UTC




[01;1mAttribute Ranking:[0m
    Columns selected:           artist, , loudness, 
    Risk of coincidental column correlation:    0.0%
    
    Test Accuracy Progression:
                             artist :   72.06%
                                    :   72.63% change   +0.57%
                           loudness :   72.79% change   +0.16%
         



[01;1mPre-training Measurements[0m
Data:
    Input:                      spotify

The two most important columns that were selected by attribute ranking were artist and loudness. Surprisingly, the -rank command lowered out validation accuracy to 66.73%. This might suggest that the target class is dependent on many more columns, and a neural network might be most effective here as indicated by the measurements.

## Using -ignoreclasses

One of the attributes that the BTC identified and was important to the target class was the artist. However, we do not want our predictor to use that column, so we can utilize the -ignoreclasses command in order.

In [6]:
! btc spotify.csv -f RF --yes -ignoreclasses artist


[01;1mBrainome Table Compiler 0.991[0m
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.
Licensed to:                 Alexander Makhratchev  (Evaluation)
Expiration Date:             2021-04-30   44 days left
Maximum File Size:           30 GB
Maximum Instances:           unlimited
Maximum Attributes:          unlimited
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

[01;1mCommand:[0m
    btc spotify.csv -f RF --yes -ignoreclasses artist

Start Time:                 03/17/2021, 04:59 UTC






[01;1mPre-training Measurements[0m
Data:
    Input:                      spotify.csv
    Target Column:              like
    Number of instances:       1224
    Number of attributes:        17 out of 17
    Number of classes:            2

Class Balance:                
                            True: 43.38%
                           False: 56.62%

Learnability:
    Best guess accuracy:          56.62%
    Data S

We get 83.52% accuracy on the validation set, which is almost identical to the first time we ran the compiler on the dataset. Also, our accuracy is similar across classes, beacuse the dataset is balanced.