# OOP with Scikit-Learn - Lab

## Introduction

Now that you have learned some of the basics of object-oriented programming with scikit-learn, let's practice applying it!

## Objectives:

In this lesson you will practice:

* Recall the distinction between mutable and immutable types
* Define the four main inherited object types in scikit-learn
* Instantiate scikit-learn transformers and models
* Invoke scikit-learn methods
* Access scikit-learn attributes

## Mutable and Immutable Types

For each example below, think to yourself whether it is a mutable or immutable type. Then expand the details tag to reveal the answer.

<ol>
    <li>
        <details>
            <summary style="cursor: pointer">Python dictionary (click to reveal)</summary>
            <p><strong>Mutable.</strong> For example, the `update` method can be used to modify values within a dictionary.</p>
            <p></p>
        </details>
    </li>
    <li>
        <details>
            <summary style="cursor: pointer">Python tuple (click to reveal)</summary>
            <p><strong>Immutable.</strong> If you want to create a modified version of a tuple, you need to use <code>=</code> to reassign it.</p>
            <p></p>
        </details>
    </li>
    <li>
        <details>
            <summary style="cursor: pointer">pandas <code>DataFrame</code> (click to reveal)</summary>
            <p><strong>Mutable.</strong> Using the <code>inplace=True</code> argument with various different methods allows you to modify a dataframe in place.</p>
            <p></p>
        </details>
    </li>
    <li>
        <details>
            <summary style="cursor: pointer">scikit-learn <code>OneHotEncoder</code> (click to reveal)</summary>
            <p><strong>Mutable.</strong> Calling the <code>fit</code> method causes the transformer to store information about the data that is passed in, modifying its internal attributes.</p>
            <p></p>
        </details>
    </li>
</ol>

## The Data

For this lab we'll use data from the built-in iris dataset:

In [11]:
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True, as_frame=True)

AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

In [None]:
X

In [None]:
y

## Scikit-Learn Classes

For the following exercises, follow the documentation link to understand the class you are working with, but **do not** worry about understanding the underlying algorithm. The goal is just to get used to creating and using these types of objects.

### Estimators

For all estimators, the steps are:

1. Import the class from the `sklearn` library
2. Instantiate an object from the class
3. Pass in the appropriate data to the `fit` method

#### `MinMaxScaler` ([documentation here](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html))

Import this scaler, instantiate an object called `scaler` with default parameters, and `fit` the scaler on `X`.

In [None]:
# Import
from sklearn.preprocessing import MinMaxScaler

# Instantiate
scaler = MinMaxScaler()

# Fit on X
scaler.fit(X)

#### `DecisionTreeClassifier` ([documentation here](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html))

Import the classifier, instantiate an object called `clf` (short for "classifier") with default parameters, and `fit` the classifier on `X` and `y`.

In [None]:
# Import
from sklearn.tree import DecisionTreeClassifier

# Instantiate
clf = DecisionTreeClassifier()

# Fit on X and y
clf.fit(X, y)

### Transformers

One of the two objects instantiated above (`scaler` or `clf`) is a transformer. Which one is it? Consult the documentation.

---

<details>
    <summary style="cursor: pointer">Hint (click to reveal)</summary>
    <p>The class with a <code>transform</code> method is a transformer.</p>
</details>

---

#### Using the transformer, print out two of the fitted attributes along with descriptions from the documentation.

---

<details>
    <summary style="cursor: pointer">Hint (click to reveal)</summary>
    <p>Attributes ending with <code>_</code> are fitted attributes.</p>
</details>

In [None]:
# Your code here
# Print two fitted attributes
print("Data min:", scaler.data_min_)   # Minimum value per feature
print("Data max:", scaler.data_max_)   # Maximum value per feature

#### Now, call the `transform` method on the transformer and pass in `X`. Assign the result to `X_scaled`

In [None]:
# Your code here
# Transform X
X_scaled = scaler.transform(X)
X_scaled[:5]   # Show first 5 rows

### Predictors and Models

The other of the two scikit-learn objects instantiated above (`scaler` or `clf`) is a predictor and a model. Which one is it? Consult the documentation.

---

<details>
    <summary style="cursor: pointer">Hint (click to reveal)</summary>
    <p>The class with a <code>predict</code> method and a <code>score</code> method is a predictor and a model.</p>
</details>

---

#### Using the predictor, print out two of the fitted attributes along with descriptions from the documentation.

In [None]:
# Your code here
# Print two fitted attributes
print("Number of features:", clf.n_features_in_)   # number of input features
print("Number of classes:", clf.n_classes_)       # number of output classes

#### Now, call the `predict` method on the predictor, passing in `X`. Assign the result to `y_pred`

In [None]:
# Your code here
# Predict on X
y_pred = clf.predict(X)
print(y_pred[:10])  # first 10 predictions

#### Now, call the `score` method on the predictor, passing in `X` and `y`

In [None]:
# Your code here
score = clf.score(X, y)
print("Model accuracy:", score)

#### What does that score represent? Write your answer below

In [None]:
"""
The score from the clf.score(X, y) method represents the mean accuracy of the classifier on the given test data and labels.
In this case, since we are scoring on the same data that the model was trained on,
a score of 1.0 indicates that the model perfectly predicted all the labels in the training data.
"""

## Summary

In this lab, you practiced identifying mutable and immutable types as well as identifying transformers, predictors, and models using scikit-learn. You also instantiated scikit-learn objects, invoked the most common scikit-learn methods, and accessed some scikit-learn attributes.