## Steps for an ML project

1. Import Data
2. Clean Data
3. Split Data into Training/Test Sets
4. Create a Model
5. Train a Model
6. Make Predictions
7. Evaluate and improve


## Import Data

In [None]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier

music_data = pd.read_csv('/content/music_data_set.csv')

## Clean Data

The data in [music_data_set.csv]`./music_data_set.csv` is already being cleaned. Next step is to Split data into Training/Test Sets.

## Split Data into Training/Test Sets

Spliting data in two sets. Input set & Output set.

- `X` is the input set
- `y` is the output set

In [None]:
# input set
X = music_data.drop(columns=['genre'])

# output set
y = music_data['genre']

## Create a model

Choosing a model for the problem. In this case it is `DecisionTree` algorithm from Scikit Learn.

In [None]:
model = DecisionTreeClassifier()
model.fit(X,y)

## Train a model

This model is already trained on `DecisionTree` algorithm.

## Make Predictions

In [45]:
# Checking for 21 year male and 22 year female

prediction = model.predict([ [21,1], [22,0] ])

prediction



array(['Electronic', 'Classical'], dtype=object)

## Calculating Model accuracy

In [40]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Spliting data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# X_train, X_test are the input sets for training & testing
# y_train, y_test are the output sets

# passing only the training data set
model.fit(X_train, y_train)

# Passing testing data set
predictions = model.predict(X_test)

# Calculating accuracy score
score = accuracy_score(y_test, predictions)

score

0.0

## Persisting Models

In [43]:
import joblib

# Persisting the model
joblib.dump(model,'music_recommender.joblib')

['music_recommender.joblib']

In [47]:
# loading the persisted model
model = joblib.load('/content/music_recommender.joblib')

# calculating predictions based on loaded model
prediction = model.predict([ [21,1], [22,0] ])

prediction



array(['Electronic', 'Classical'], dtype=object)

# Visualizing Decision trees

In [51]:
from sklearn import tree

tree.export_graphviz(
    model,
    out_file='music_recommender.dot',
    feature_names=['age','gender'],
    class_names=sorted(y.unique()),
    label='all',
    rounded = True,
    filled = True
    )

Visualizing Decision Tree and the ouput file is `music_recommender.dot`.

## function arguments explanation
- `out_file` the output file.

- `feature_names=['age','gender']` Add age and gender in dot output.

`class_names=sorted(y.unique())` Sorted Unique genre values (music type)

- `label='all'` Display the labels to each box

- `rounded = True` Display boxes in round shape

- `filled = True` fill each box with color.