# Music Genre Recommender using Decision Trees 

Using users age and gender, and based on their profile we recommend various music albums a new user would potentially want to buy. 

Music.csv assumptions:
- Value 1 is male and 2 is female on the gender column in music.csv
- Men between 20 - 25 like HipHop, female between 20 - 25 like Dance and so on

Expected output:
- If model is asked what a **Male of 21 years old** (no value in music.csv file) music genre is the expected output should be **HipHop**.
- If model is asked what a **Female of 36 years old** (no value in music.csv file) music genre is the expected output should be **Classical**.

## 1. Importing the data

In [None]:
import pandas as pd 

music_data = pd.read_csv('music.csv')
music_data

## 2. Cleaning the data

- Removing duplicates
- Removing null values
- Separarting input and output set



In [None]:
# Input set, Creates new dataset without genre (Question)
X = music_data.drop(columns=['genre'])

# Output set, Creates new dataset with genre (Answer)
y = music_data['genre']

## 3. Split the data into training/test sets

In [40]:
from sklearn.model_selection import train_test_split

# Allocating 20% of the data for testing, and unpacking the tuple for input & output testing 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

## 4. Create a model and make predictions including the accurracy 

Using [scikit learn](https://scikit-learn.org/stable/) decision tree classifier model

In [68]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

gender = {1:"male", 0:"female"}

model = DecisionTreeClassifier()
# Train model so it learns patterns in the model using the input and output set
model.fit(X_train, y_train)

male_prediction = [21,1]
female_prediction = [22,0]

# Predict what a 21 year old male and 22 year old female genre is
prediction = model.predict([male_prediction, female_prediction])

# Calculate prediction accuracy
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)

print(f"Recommending {prediction[0]} for {gender[male_prediction[1]]} aged {male_prediction[0]}")
print(f"Recommending {prediction[1]} for {gender[female_prediction[1]]} aged {female_prediction[0]}")
print(f"Accuracy {score*100}%")


Recommending HipHop for male aged 21
Recommending Dance for female aged 22
Accuracy 75.0%


## 5. Visualising a Decision Tree

In [71]:
from sklearn import tree

# Creates a file for the decison tree
tree.export_graphviz(model, 
                     out_file='music-recommender.dot', 
#                      Columns for the data
                     feature_names=["age", "gender"], 
#                      Sorted list of all unique values of genre column
                     class_names=sorted(y.unique()), 
                    label="all",
                    rounded=True,
                    filled=True)