# Machine Learning Hello World with Python!

ML : "Field of study that gives computers the ability to learn without being explicitly programmed" - Arthur Samuel (1956)

## Applications
- Self-Drivings Cars
- Robotics
- Language Processing
- Vision Processing
- Forecasting stock Market trends


## Machine Learning Steps

1. Import the Data
2. Clean the Data
3. Split the Data into Training / Test Sets
4. Create a Model
5. Train the Model
6. Make Predictions
7. Evaluate and Improve

## Tools
- [Jupyter - Virtual Env](https://jupyter.org)
- [scikit-learn.org](https://scikit-learn.org/stable/index.html) (Most popular ML library)

## References

Dataset:

- videogamesales: https://www.kaggle.com/gregorut/videogamesales
- music.csv: https://programmingwithmosh.com/wp-content/uploads/2020/09/music.csv.zip

Tutorial:
- https://www.youtube.com/watch?v=7eh4d6sabA0&t=139s

In [8]:
!pip install joblib
!pip install mlxtend




In [10]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import tree

import joblib
# from sklearn.externals import joblib ## Getting Error
import sys
sys.modules['sklearn.externals.joblib'] = joblib ## this solve the error
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

## Import the Data with Pandas

In [16]:
# '/content/drive/MyDrive/Colab Notebooks/vgsales.csv'
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/vgsales.csv')
df.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


In [17]:
df.shape

(16598, 11)

In [18]:
df.describe()

Unnamed: 0,Rank,Year,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
count,16598.0,16327.0,16598.0,16598.0,16598.0,16598.0,16598.0
mean,8300.605254,2006.406443,0.264667,0.146652,0.077782,0.048063,0.537441
std,4791.853933,5.828981,0.816683,0.505351,0.309291,0.188588,1.555028
min,1.0,1980.0,0.0,0.0,0.0,0.0,0.01
25%,4151.25,2003.0,0.0,0.0,0.0,0.0,0.06
50%,8300.5,2007.0,0.08,0.02,0.0,0.01,0.17
75%,12449.75,2010.0,0.24,0.11,0.04,0.04,0.47
max,16600.0,2020.0,41.49,29.02,10.22,10.57,82.74


In [19]:
df.values

array([[1, 'Wii Sports', 'Wii', ..., 3.77, 8.46, 82.74],
       [2, 'Super Mario Bros.', 'NES', ..., 6.81, 0.77, 40.24],
       [3, 'Mario Kart Wii', 'Wii', ..., 3.79, 3.31, 35.82],
       ...,
       [16598, 'SCORE International Baja 1000: The Official Game', 'PS2',
        ..., 0.0, 0.0, 0.01],
       [16599, 'Know How 2', 'DS', ..., 0.0, 0.0, 0.01],
       [16600, 'Spirits & Spells', 'GBA', ..., 0.0, 0.0, 0.01]],
      dtype=object)

## Project: Music Player Recommendation

In [33]:
# Import the data & cleaning
# music_data = pd.read_csv('music.csv')
music_data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/music.csv')
music_data.head()

Unnamed: 0,age,gender,genre
0,20,1,HipHop
1,23,1,HipHop
2,25,1,HipHop
3,26,1,Jazz
4,29,1,Jazz


In [38]:
# music_data.values

In [39]:
# preparing the training data
X = music_data.drop(columns=['genre'])
y = music_data['genre']

print("X:\n", X)
# print("y:\n", y)

X:
     age  gender
0    20       1
1    23       1
2    25       1
3    26       1
4    29       1
5    30       1
6    31       1
7    33       1
8    37       1
9    20       0
10   21       0
11   25       0
12   26       0
13   27       0
14   30       0
15   31       0
16   34       0
17   35       0


In [35]:
# load ML Decision Tree Model
model = DecisionTreeClassifier()
# train/fit : find pattern in data
## the values attribute load only the numeric values from df,
## to be aligned with test when making prediction
model.fit(X.values, y.values)
model

In [36]:
# prediction
predictions = model.predict([[21, 1], [22, 0]])
print(predictions)

['HipHop' 'Dance']


## Calculate accuracy of the model in order to choose which one is better

In [43]:
# '/content/drive/MyDrive/Colab Notebooks/music.csv'
music_data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/music.csv')

# music_data.shape
# music_data
X = music_data.drop(columns=['genre'])
y = music_data['genre']
# spliting the data for more robustness
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# print("X_train", X_train)
# print("X_test", X_test)
# print("y_train", y_train)
# print("y_test", y_test)

In [47]:
# building the model
model = DecisionTreeClassifier()
# train (fit : find pattern in data)
model.fit(X_train, y_train)
# predict
predictions = model.predict(X_test)
# test the model accuracy
score = accuracy_score(y_test, predictions)

# accuracy R^2 : # [0, 1]
# Accuracy change randomly based on the data we split for trainning
score

1.0

## Persisting Models

In [None]:

# music_data = pd.read_csv('music.csv')
# X = music_data.drop(columns=['genre'])
# y = music_data['genre']

# creating a model
# model = DecisionTreeClassifier()
# train
# model.fit(X, y)
 # to save the model
# joblib.dump(model, 'music-recommender.joblib')

# to load the model
model = joblib.load('/content/music-recommender.joblib')
# fit : find pattern in data
predictions = model.predict([[21, 1], [22, 0]])
predictions

## Visualizing Decision Trees

In [53]:
# '/content/drive/MyDrive/Colab Notebooks/music.csv'

music_data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/music.csv')
# data prep
X = music_data.drop(columns=['genre'])
y = music_data['genre']

# load model
model = DecisionTreeClassifier()

# train
model.fit(X, y)

tree.export_graphviz(model, out_file='music-recommender.dot',
                    feature_names=['age', 'gender'],
                    class_names=sorted(y.unique()),
                     label='all',
                     rounded=True,
                     filled=True)
# to render the music-recommender.dot file please install Graphviz (dot) VSC extenson link below:
# https://marketplace.visualstudio.com/items?itemName=Stephanvs.dot

![music-recommender.png](https://github.com/afondiel/research-notes/blob/master/ai/ml-notes/lab/notebook/ml-hello-world/music-recommender.png?raw=true)