# Machine Learning with Python
## Predicting Music Taste by Age and Gender

### Step 0: Import libraries and Data

* pandas
* DecisionTreeClassifier (for Building Simple ML Algorithm)

In [11]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import warnings # to prevent warning messages from popping up in your code

warnings.filterwarnings("ignore", message="X does not have valid feature names, but DecisionTreeClassifier was fitted with feature names")

music_data = pd.read_csv('/kaggle/input/music-data/music.csv')

In [12]:
# View The Data
music_data

Unnamed: 0,age,gender,genre
0,20,1,HipHop
1,23,1,HipHop
2,25,1,HipHop
3,26,1,Jazz
4,29,1,Jazz
5,30,1,Jazz
6,31,1,Classical
7,33,1,Classical
8,37,1,Classical
9,20,0,Dance


### Step 1: Split The Data into Input and Output Data

In [13]:
# X is the input data and y is the output data

X = music_data.drop(columns = ['genre'])
y = music_data['genre']

In [14]:
X

Unnamed: 0,age,gender
0,20,1
1,23,1
2,25,1
3,26,1
4,29,1
5,30,1
6,31,1
7,33,1
8,37,1
9,20,0


In [15]:
y

0        HipHop
1        HipHop
2        HipHop
3          Jazz
4          Jazz
5          Jazz
6     Classical
7     Classical
8     Classical
9         Dance
10        Dance
11        Dance
12     Acoustic
13     Acoustic
14     Acoustic
15    Classical
16    Classical
17    Classical
Name: genre, dtype: object


### Step 2: Building Machine Learning Models with SciKit Learn)

In [16]:
# Testing your model with sample entries

model = DecisionTreeClassifier()
model.fit(X, y)


# We will like to know what type of music will a 21 year old male and 22 year old female listen to

predictions = model.predict([ [21, 1], [22, 0] ])
predictions

array(['HipHop', 'Dance'], dtype=object)

### Step 2: Training, Testing, and Prediction with the Model

* with Train_Test library

In [75]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

music_data = pd.read_csv('/kaggle/input/music-data/music.csv')
X = music_data.drop(columns = ['genre'])
y = music_data['genre']


# With Train_Test method, we destructure the input and output data into
# training and testing data with 80% for the training data, and 20% for the test data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)


model = DecisionTreeClassifier()
# model.fit(X, y)
model.fit(X_train, y_train)
# predictions = model.predict([ [21, 1], [22, 0] ])
predictions = model.predict(X_test)


### Measuring the Accuracy of our Model

* With accuracy_score to determine the level of accuracy of our prediction

* If you run the following code several times, you get different values for the accuracy (range between 75% to 100%). 
* This means that the model is able to predict with accuracy within this range.

In [77]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)


model = DecisionTreeClassifier()
# model.fit(X, y)
model.fit(X_train, y_train)
# predictions = model.predict([ [21, 1], [22, 0] ])
predictions = model.predict(X_test)

score = accuracy_score(y_test, predictions)
score

1.0

### Step 3: Model Persistence

* Once the data has been trained and the tested with the model, and the accuracy has been determined, the next phase is to wrap the model in a file to achieve consistency everytime we use the model.
* We don't have to create a new model each time, we can reuse the existing model in a simplified form.

In [79]:
# To achieve this we import joblib

import joblib

joblib.dump(model, 'music-recommender.joblib')

['music-recommender.joblib']

In [80]:
# You can now load the joblib dumpfile to simplify the prediction process

model = joblib.load('music-recommender.joblib')
predictions = model.predict([ [21, 1] ])
predictions

array(['HipHop'], dtype=object)

### Step 4: Visualizing Decision Trees
* import tree method from SciKit


In [81]:
from sklearn import tree

tree.export_graphviz(model, out_file = 'music-recommender.dot', 
                     feature_names = ['age', 'gender'],
                     class_names = sorted(y.unique()),
                     label = 'all',
                     rounded = True,
                     filled = True)

### Visual Representation of the Model

![Graph_Viz](https://github.com/user-attachments/assets/2fed9b16-f390-4082-9486-f48f2410fd94)


### Open the .dot file created, copy the code and paste at the link below to visualize the model

[View Graph Visualization Here:](https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%0A%20%20subgraph%20cluster_0%20%7B%0A%20%20%20%20style%3Dfilled%3B%0A%20%20%20%20color%3Dlightgrey%3B%0A%20%20%20%20node%20%5Bstyle%3Dfilled%2Ccolor%3Dwhite%5D%3B%0A%20%20%20%20a0%20-%3E%20a1%20-%3E%20a2%20-%3E%20a3%3B%0A%20%20%20%20label%20%3D%20%22process%20%231%22%3B%0A%20%20%7D%0A%0A%20%20subgraph%20cluster_1%20%7B%0A%20%20%20%20node%20%5Bstyle%3Dfilled%5D%3B%0A%20%20%20%20b0%20-%3E%20b1%20-%3E%20b2%20-%3E%20b3%3B%0A%20%20%20%20label%20%3D%20%22process%20%232%22%3B%0A%20%20%20%20color%3Dblue%0A%20%20%7D%0A%20%20start%20-%3E%20a0%3B%0A%20%20start%20-%3E%20b0%3B%0A%20%20a1%20-%3E%20b3%3B%0A%20%20b2%20-%3E%20a3%3B%0A%20%20a3%20-%3E%20a0%3B%0A%20%20a3%20-%3E%20end%3B%0A%20%20b3%20-%3E%20end%3B%0A%0A%20%20start%20%5Bshape%3DMdiamond%5D%3B%0A%20%20end%20%5Bshape%3DMsquare%5D%3B%0A%7D)