In [89]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

import warnings
warnings.filterwarnings('ignore')

In [90]:
music = pd.read_csv('music.csv')
music

Unnamed: 0,age,gender,genre
0,20,1,HipHop
1,23,1,HipHop
2,25,1,HipHop
3,26,1,Jazz
4,29,1,Jazz
5,30,1,Jazz
6,31,1,Classical
7,33,1,Classical
8,37,1,Classical
9,20,0,Dance


In this dataset, there are no missing values or duplicates.

I am going to plit the dataset into two- so we have an input set (Age and Gender) and an output set (Genre).
The output set is the prediction we want out model to give.

In [91]:
X = music.drop(columns=['genre'])
X

Unnamed: 0,age,gender
0,20,1
1,23,1
2,25,1
3,26,1
4,29,1
5,30,1
6,31,1
7,33,1
8,37,1
9,20,0


In [92]:
y = music['genre']
y

0        HipHop
1        HipHop
2        HipHop
3          Jazz
4          Jazz
5          Jazz
6     Classical
7     Classical
8     Classical
9         Dance
10        Dance
11        Dance
12     Acoustic
13     Acoustic
14     Acoustic
15    Classical
16    Classical
17    Classical
Name: genre, dtype: object

Build a model using an algorithm

In [93]:
music

Unnamed: 0,age,gender,genre
0,20,1,HipHop
1,23,1,HipHop
2,25,1,HipHop
3,26,1,Jazz
4,29,1,Jazz
5,30,1,Jazz
6,31,1,Classical
7,33,1,Classical
8,37,1,Classical
9,20,0,Dance


In [94]:
model = DecisionTreeClassifier()
model.fit(X, y)

predictions = model.predict([ [21, 1], [22, 0] ])
predictions

array(['HipHop', 'Dance'], dtype=object)

In the above cell, I predicted the music a 21 year old male would most likely listen to and a 22 year old female. 

The predictions are based on the data used to train the model. From age 20 to 25 male (1), we expect the model to predict the HipHop genre and between 20 to 25 female we expect Dance as the genre predicted.

### Calculating the Accuracy

Here we need to split the dataset into training and testing sets

In [95]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model_2 = DecisionTreeClassifier()
model_2.fit(X_train, y_train)

predictions = model_2.predict(X_test)

score = accuracy_score(y_test, predictions)
score

0.5

The accuracy score keeps changing since we don't have enough data

In [98]:
from sklearn import tree

tree.export_graphviz(model, out_file='music-recommender.dot', 
                     feature_names=['age', 'gender'], #to see the rules in our nodes
                     class_names=sorted(y.unique()),
                     label='all', #every node has labels that we can read
                     rounded=True, #to have rounded corners
                     filled=True)# each box is filled with a color