# <img src="https://drive.google.com/uc?id=1E_GYlzeV8zomWYNBpQk0i00XcZjhoy3S" width="100"/>  
# Drill Workshop 4: Intro to ML   

In this drill, you will apply what you learned about high-level machine learning.  
You will use pandas and sklearn to explore a new dataset.  
You will instantiate, implement, and evaluate one of each of the following   
- SVM
- MLP
- K Nearest Neighbor
- Decision Tree

<img src="https://drive.google.com/uc?id=1wXJnftSIBkOYodRlRl0dxbonAUPgG-9W" height='400'>


**Before starting this drill, please ensure that you make a COPY of the workspace into your individual Deepnote account registered under your university account**

For each of the function headers, fill in the parts with "YOUR CODE HERE" keeping in mind the instructions provided to you in the comments. Test cases are provided to ensure that your functions work properly. **Please DO NOT modify the test case code or hardcode your solution** (after all, you want to develop a strong understanding of the concepts)

# Drill   

Load in the Iris dataset using the provided code. As a brief intro, the iris dataset contains 3 classes (type of Iris). The features include petal length/width, sepal length/width 




In [1]:
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np


In [2]:
#setup unit tests + configuration
!pip install ipytest



In [3]:
import pytest
import ipytest
ipytest.autoconfig()

In [4]:
# Load data
iris = datasets.load_iris()
X, y = iris.data[:, [0, 1]], iris.target
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])

Explore the data using .head .describe or .info   
What type of data is provided? How many features? Categorical or continuous? What is our target/label?  Fill in your responses as shown below

In [None]:
#YOUR CODE HERE
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2.0
146,6.3,2.5,5.0,1.9,2.0
147,6.5,3.0,5.2,2.0,2.0
148,6.2,3.4,5.4,2.3,2.0


In [6]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


In [7]:
df.describe()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
count,150.0,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333,1.0
std,0.828066,0.435866,1.765298,0.762238,0.819232
min,4.3,2.0,1.0,0.1,0.0
25%,5.1,2.8,1.6,0.3,0.0
50%,5.8,3.0,4.35,1.3,1.0
75%,6.4,3.3,5.1,1.8,2.0
max,7.9,4.4,6.9,2.5,2.0


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   target             150 non-null    float64
dtypes: float64(5)
memory usage: 6.0 KB


**YOUR RESPONSES HERE**:

**Type of Data: ___Iris dataset___**

**Number of Features: ___4___**

**Categorical or Continuous: ___continuous numerical values___**

**Target/Label: ___The column target is the label, which classifies the flower___**


Perform a train/test split  
What is the purpose of a train test split? 




In [9]:
from sklearn.model_selection import train_test_split
#YOUR CODE HERE
Y = df["target"]
X = df.drop(["target"], axis=1)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25)

Instantiate one of each of the following: SVM, Decision Tree, MLP (neural network), and KNN   


In [15]:
#YOUR CODE HERE
svm = SVC(C=0.7)
dt = DecisionTreeClassifier()
mlp = MLPClassifier()
knn = KNeighborsClassifier()

Fit each model to the data!    
What does "fitting" a model with data do? Hint -- what happens to the parameters of a model? 


In [16]:
#hint print(svm.____())
#YOUR CODE HERE
svm.fit(X_train, Y_train)

What are the scores for each? Which performs best?   
Can you gain insight into how decisions are made by the model?   
 

In [17]:
#hint print(svm.____())
#YOUR CODE HERE
pred = svm.predict(X_test)
accuracy_score(Y_test, pred)

0.9473684210526315

# Challenge

Pick one of the models you made from earlier. Play around with the parameters and get at least an 90% testing accuracy for that model. Note down what hyperparameters you used in the model

In [20]:
def accuracy_at_least_90(X_train, X_test, Y_train, Y_test):
    """
    Choose a model from the instantiate_models section. Play around with the hyperparameters (eg: turning the dials and 
    knobs) and get the accuracy to at least 0.9.
    
    Recommended link to help you: https://scikit-learn.org/stable/supervised_learning.html
    
    The above link contains documentation to the scikit-learn library you are using
    
    Fill in the blanks
    """
    model = svm #pick model and put it here
    model.fit(X_train, Y_train) #once you pick a model, what do you need to do to train it?
    pred = model.predict(X_test) #once you fit a model, how do you predict
    
    return (model, accuracy_score(Y_test, pred)) #Return tuple format (model, accuracy). Look at sklearn.metrics to see how to calculate accuracy


In [21]:
# %%run_pytest
@pytest.mark.parametrize("X_train, X_test, Y_train, Y_test", [
    (X_train, X_test, Y_train, Y_test)
])
def test_accuracy_at_least_90(X_train, X_test, Y_train, Y_test):
  model, accuracy = accuracy_at_least_90(X_train, X_test, Y_train, Y_test)
  assert accuracy >= 0.9

ipytest.run()

[32m.[0m[32m                                                                                            [100%][0m
[32m[32m[1m1 passed[0m[32m in 0.02s[0m[0m


<ExitCode.OK: 0>

In [22]:
def accuracy_at_least_90(X_train, X_test, Y_train, Y_test):
    """
    Choose a model from the instantiate_models section. Play around with the hyperparameters (eg: turning the dials and 
    knobs) and get the accuracy to at least 0.9.
    
    Recommended link to help you: https://scikit-learn.org/stable/supervised_learning.html
    
    The above link contains documentation to the scikit-learn library you are using
    
    Fill in the blanks
    """
    model = dt #pick model and put it here
    model.fit(X_train, Y_train) #once you pick a model, what do you need to do to train it?
    pred = model.predict(X_test) #once you fit a model, how do you predict
    
    return (model, accuracy_score(Y_test, pred)) #Return tuple format (model, accuracy). Look at sklearn.metrics to see how to calculate accuracy


In [23]:
# %%run_pytest
@pytest.mark.parametrize("X_train, X_test, Y_train, Y_test", [
    (X_train, X_test, Y_train, Y_test)
])
def test_accuracy_at_least_90(X_train, X_test, Y_train, Y_test):
  model, accuracy = accuracy_at_least_90(X_train, X_test, Y_train, Y_test)
  assert accuracy >= 0.9

ipytest.run()

[32m.[0m[32m                                                                                            [100%][0m
[32m[32m[1m1 passed[0m[32m in 0.02s[0m[0m


<ExitCode.OK: 0>

In [24]:
def accuracy_at_least_90(X_train, X_test, Y_train, Y_test):
    """
    Choose a model from the instantiate_models section. Play around with the hyperparameters (eg: turning the dials and 
    knobs) and get the accuracy to at least 0.9.
    
    Recommended link to help you: https://scikit-learn.org/stable/supervised_learning.html
    
    The above link contains documentation to the scikit-learn library you are using
    
    Fill in the blanks
    """
    model = mlp #pick model and put it here
    model.fit(X_train, Y_train) #once you pick a model, what do you need to do to train it?
    pred = model.predict(X_test) #once you fit a model, how do you predict
    
    return (model, accuracy_score(Y_test, pred)) #Return tuple format (model, accuracy). Look at sklearn.metrics to see how to calculate accuracy


In [25]:
# %%run_pytest
@pytest.mark.parametrize("X_train, X_test, Y_train, Y_test", [
    (X_train, X_test, Y_train, Y_test)
])
def test_accuracy_at_least_90(X_train, X_test, Y_train, Y_test):
  model, accuracy = accuracy_at_least_90(X_train, X_test, Y_train, Y_test)
  assert accuracy >= 0.9

ipytest.run()

[32m.[0m[32m                                                                                            [100%][0m
t_be4e2474aa0c45189965a185b66f73c0.py::test_accuracy_at_least_90[X_train0-X_test0-Y_train0-Y_test0]



<ExitCode.OK: 0>

In [26]:
def accuracy_at_least_90(X_train, X_test, Y_train, Y_test):
    """
    Choose a model from the instantiate_models section. Play around with the hyperparameters (eg: turning the dials and 
    knobs) and get the accuracy to at least 0.9.
    
    Recommended link to help you: https://scikit-learn.org/stable/supervised_learning.html
    
    The above link contains documentation to the scikit-learn library you are using
    
    Fill in the blanks
    """
    model = knn #pick model and put it here
    model.fit(X_train, Y_train) #once you pick a model, what do you need to do to train it?
    pred = model.predict(X_test) #once you fit a model, how do you predict
    
    return (model, accuracy_score(Y_test, pred)) #Return tuple format (model, accuracy). Look at sklearn.metrics to see how to calculate accuracy


In [27]:
# %%run_pytest
@pytest.mark.parametrize("X_train, X_test, Y_train, Y_test", [
    (X_train, X_test, Y_train, Y_test)
])
def test_accuracy_at_least_90(X_train, X_test, Y_train, Y_test):
  model, accuracy = accuracy_at_least_90(X_train, X_test, Y_train, Y_test)
  assert accuracy >= 0.9

ipytest.run()

[32m.[0m[32m                                                                                            [100%][0m
[32m[32m[1m1 passed[0m[32m in 0.02s[0m[0m


<ExitCode.OK: 0>

**YOUR RESPONSES HERE:**

**Model I used: ___smv___**

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=0a3de591-578d-4c87-932a-8ff030c19cf7' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>