## Steps for an ML project

1. Import Data
2. Clean Data
3. Split Data into Training/Test Sets
4. Create a Model
5. Train a Model
6. Make Predictions
7. Evaluate and improve


## Import Data

In [30]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

music_data = pd.read_csv('music_data_set.csv')

## Clean Data

The data in [music_data_set.csv]`./music_data_set.csv` is already being cleaned. Next step is to Split data into Training/Test Sets.

## Split Data into Training/Test Sets

Spliting data in two sets. Input set & Output set.

- `X` is the input set
- `y` is the output set

In [31]:
# input set
X = music_data.drop(columns=['genre'])

# output set
y = music_data['genre']

## Create a model

Choosing a model for the problem. In this case it is `DecisionTree` algorithm from Scikit Learn.

In [32]:
model = DecisionTreeClassifier()
model.fit(X,y)

## Train a model

This model is already trained on `DecisionTree` algorithm.

## Make Predictions

In [33]:
# Checking for 21 year male and 22 year female

prediction = model.predict([ [21,1], [22,0] ])

prediction



array(['HipHop', 'Classical'], dtype=object)

## Calculating Model accuracy

In [38]:
# Splitting data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# X_train, X_test are the input sets for training & testing
# y_train, y_test are the output sets

# passing only the training data set
model.fit(X_train, y_train)

# Passing testing data set
predictions = model.predict(X_test)

# Calculating accuracy score
score = accuracy_score(y_test, predictions)

score

0.0