# ML Challenge (Optional)

Train, test, optimize, and analyze the performance of a classification model using a methodology of your choice for the randomly generated moons dataset.

You are not being evaluated for the performance of your model. Instead, we are interested in whether you can implement a simple but rigorous ML workflow.

Show all of your work in this notebook.

In [6]:
# you are free to use any package you deem fit
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Dataset

In [7]:
# DO NOT MODIFY
from sklearn.datasets import make_moons

X, Y = make_moons(random_state=42, n_samples=(50, 450), noise=0.25)

## Training

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.30, random_state=0)
print ('Training Set: %d rows\nTest Set: %d rows' % (X_train.shape[0], X_test.shape[0]))

from sklearn.svm import SVC
model = SVC(kernel='rbf').fit(X_train, y_train)
print(model.score(X_train,y_train))

#Need to research and study more about hyperparameters. Knowing the effects would make me better understand how to use them :)

Training Set: 350 rows
Test Set: 150 rows
0.96


## Testing / Optimization

In [11]:
y_pred = model.predict(X_test)
print(model.score(X_test, y_test))

kernels = ['linear', 'rbf', 'poly']
for kernel in kernels:
    svc = SVC(kernel=kernel).fit(X, Y)

0.9466666666666667


## Performance Analysis

In [12]:
from sklearn.metrics import classification_report, confusion_matrix

print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[ 10   7]
 [  1 132]]
              precision    recall  f1-score   support

           0       0.91      0.59      0.71        17
           1       0.95      0.99      0.97       133

    accuracy                           0.95       150
   macro avg       0.93      0.79      0.84       150
weighted avg       0.95      0.95      0.94       150

