# Exercise: Regression and Classification Machine Learning

In this exercise, we'll dive deeper into the ML concepts by creating a regression and classification model.

Your tasks for this exercise are:
1. Load the iris dataset into a dataframe
2. Create a LinearRegression model and fit it to the dataset
3. Score the regression model on the dataset and predict it's values
4. Create a RidgeClassifier model and fit it to the dataset, use `alpha=3.0` when initializing the model
5. Score the classification model on the dataset and predict it's values

In [2]:
import numpy as np
import pandas as pd
import sklearn
from sklearn import datasets

In [3]:
# Load in the iris dataset
iris = datasets.load_iris()

In [4]:
# Create the iris `data` dataset as a dataframe and name the columns with `feature_names`
df = pd.DataFrame(iris['data'], columns=iris['feature_names'])

# Include the target as well
df['target'] = iris['target']

In [5]:
# Check your dataframe by `.head()`
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


## Regression ML

In [6]:
from sklearn.linear_model import LinearRegression

In [7]:
# Fit a standard regression model, we've done this in other exercises
reg = LinearRegression().fit(df[iris['feature_names']], y=df['target'])

In [8]:
# Score the model on the same dataset
reg.score(df[iris['feature_names']], y=df['target'])

0.9303939218549564

In [9]:
# Predicting values shows they are not that useful to a classification model
reg.predict(df[iris['feature_names']])

array([-8.25493616e-02, -4.01284476e-02, -4.86276768e-02,  1.22998627e-02,
       -7.53667248e-02,  5.82910066e-02,  3.83367194e-02, -4.44863248e-02,
        1.98324281e-02, -8.21970989e-02, -1.01272512e-01,  7.59348686e-04,
       -8.98630676e-02, -1.02503649e-01, -2.26652208e-01, -4.10494982e-02,
       -3.31670043e-02, -2.16241562e-02, -3.21980063e-02, -1.07834994e-02,
       -4.35196609e-02,  5.41496547e-02, -1.22062394e-01,  1.76835660e-01,
        6.93528569e-02, -5.59002750e-03,  1.00228589e-01, -7.08754443e-02,
       -8.97319983e-02,  1.99658314e-02,  1.27831946e-02,  3.26017444e-02,
       -1.55848342e-01, -1.55367344e-01, -2.12718935e-02, -1.05063936e-01,
       -1.50176206e-01, -1.25101345e-01, -7.04002332e-03, -5.56769102e-02,
       -3.32980735e-02,  7.07502372e-02, -1.50559206e-02,  2.18071051e-01,
        1.41599717e-01,  3.19873432e-02, -4.88442021e-02, -1.45725887e-02,
       -9.00819270e-02, -6.33428789e-02,  1.20248442e+00,  1.28482413e+00,
        1.32433716e+00,  

In [10]:
# If we really wanted to, we could do something like round each regression value to an int
# and have it "act" like a classification model
# This is not required, but something to keep in mind for future reference
reg_cls = np.abs(np.rint(reg.predict(df[iris["feature_names"]])))
reg_cls

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.])

In [11]:
# Evaluate accuracy
sum(reg_cls == df["target"]) / df.shape[0]

0.9733333333333334

# Classification ML

In [12]:
from sklearn.linear_model import RidgeClassifier

In [13]:
# Fit a ridge classifier, which matches with the problem space of being a classification problem
clf = RidgeClassifier().fit(X=df[iris['feature_names']], y=df['target'])

In [14]:
# Score the model
clf.score(X=df[iris['feature_names']], y=df['target'])

0.8533333333333334

In [15]:
# Predict the class values for the dataset, these will look much better!
clf.predict(X=df[iris['feature_names']])

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 2, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2,
       2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 1,
       2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
       2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])