<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:200%;
           font-\amily:Arial;letter-spacing:0.5px">

<p width = 20%, style="padding: 10px;
              color:white;">
Class inheritance / custom classes in scikit-learn 
              
</p>
</div>

Data Science Cohort Live NYC Feb 2022
<p>Phase 3: Topic 21</p>
<br>
<br>

<div align = "right">
<img src="Images/flatiron-school-logo.png" align = "right" width="200"/>
</div>

####  Hierarchical class structures and inheritance
<img src = "Images/class-inheritance.png" />

Car and motorcycle are examples of vehicle:
- Vehicle: **parent class**
- Car/Motorcycle: **children**

Children class:
- share methods/attributes from parent
- will also have specific additional methods/attributes

#### Reason to do this
- Tremendously reduction in code
- Specific custom class: can inherit routines/variabes/functionality from parents.
- Custom class: adapt to specific use case and plug into general framework.

#### Child class can have multiple parents
- Inherits methods/routines from multiple parents

<img src = "Images/multipleinh.png" width = 600 />

Let's see some of this in action in Python:
- Define parent class: 

In [47]:
class Vehicle:
    
    def __init__(self, color = 'Gray'):
        self.started = False
        self.color = color
        self.speed = 0
        
    def start(self):
        self.started = True
        
    def turn_off(self):
        if self.speed == 0:
            self.started = False
        else:
            print('User is idiot. Request denied.')
    
    def increase_speed(self, delta):
        if self.started == True:
            self.speed += delta
        else:
            print('Vehicle is off...')
    
    def slow_down(self, delta):
        calc = self.speed - delta
        if (self.started == True) & (calc >= 0):
            self.speed = calc
        else:
            pass
                
    def stop(self):
        self.speed = 0

In [48]:
veh_inst = Vehicle()

In [49]:
veh_inst.start() 

In [50]:
veh_inst.started

True

In [51]:
veh_inst.speed

0

In [52]:
veh_inst.increase_speed(10)

In [53]:
veh_inst.speed

10

In [54]:
veh_inst.color

'Gray'

Parent has general functions that we want to reuse for specific types of vehicles:
- Want to define child class specifying a parent.
- class definition takes parent classes as arguments.
- super().__init__() in __init__ method


super().__init__(parent_arguments)

- Must take in arguments from parent constructor (except self)

In child class:

- constructor must take parameters of parent constructor as arguments

Let's see what I mean by all that:

In [62]:
class Car(Vehicle):
    
    def __init__(self, color):
        self.trunk_open = False
        super().__init__(color)
    def open_trunk(self):
        if self.trunk_open == False:
            self.trunk_open = True
        else:
            print('Trunk already open')
    def close_trunk(self):
        if self.trunk_open == True:
            self.trunk_open = False
        else:
            print('Trunk is already closed.')

In [63]:
my_moms_car = Car('Blue')

In [64]:
my_moms_car.color

'Blue'

In [65]:
my_moms_car.open_trunk()

In [68]:
my_moms_car.trunk_open

True

In [72]:
my_moms_car.start()
my_moms_car.started

True

In [None]:
class Car(Vehicle):
    
    def __init__(self, color):
        self.trunk_open = False
        super().__init__(color)
    def open_trunk(self):
        if self.trunk_open == False:
            self.trunk_open = True
        else:
            print('Trunk already open')
    def close_trunk(self):
        if self.trunk_open == True:
            self.trunk_open = False
        else:
            print('Trunk is already closed.')

Car is child of Vehicle:
- clearly inherits methods/attributes from Vehicle
- Also has own subroutines/attributes (trunk_open,etc.)

- color is argument of Vehicle constructor
- specifying in __init__ method of child passes it to parent:
    - through super().__init__(color)

#### Multiple Inheritance

In [74]:
class Property:
    
    def __init__(self, owner_name = None, owner_address = None, years_owned = None, assessed_value=None):
        
        self.owner_name = owner_name
        self.owner_address = None
        self.years_owned = None
        self.assessed_value = None
    
    
        
        
        
        
        
    
    

The unifying principle here is **modularity**.

- Enables us to use frameworks that already exist.
- Input custom functionality as little box into larger framework.

**Enables building more complex systems**

In [1]:
import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.preprocessing import StandardScaler

We can also define classes in terms of _other_ classes, in which case the new classes **inherit** the attributes and methods from the classes in terms of which they're defined.

## Motivation: So What's the Benefit? 

_More abstraction is better_

Take a look at this code below. Look at how much we've already done:

In [None]:
# Look at all that code we wrote... do we have to do it all again...?
class Robot():
    purpose = 'To love humans'
    
    # We'd like to start off with some initial attributes
    def __init__(self, first_name='?', last_name=''):
        # Clean the names of extra spaces at beginning & end
        first_name = first_name.strip()
        last_name = last_name.strip()    
        # Setting attributes
        self._first_name = first_name
        self._last_name = last_name
        # Combine first and last names and remove any extra spacing
        self.name = ' '.join([first_name,last_name]).strip()

           
    def change_name(self, new_name):
        self.name = new_name
    
    def speak(self):
        print(f'I am {self.name}!')

Let's say we wanted to make another bot with some extra functionality like keeping track of its battery charge.

Do we have to copy and paste this and then add our new functionality? 

Nope! Since we can abstract away the stuff we already did!

In [None]:
class GarbageBot(Robot): # Specify the base class(es) we inherit from
    '''A robot that takes care of garbage while we're away!'''
    # Added functionality
    battery = 100
    
    def speak(self):
        print(f"I'm {self.name} and have {self.battery}% battery charged")
        self.battery -= 10

In [None]:
new_robot = GarbageBot('Wall-e')
new_robot.speak()

In [None]:
new_robot.speak()

And I still keep the other functionality from the original class!

In [None]:
new_robot.change_name('E-llaw') # Note we never defined this in GarbageBot!
new_robot.speak()

## Inheritance in Data Science

A lot of motivation in how we write our code can be summed up with, "Never reinvent the wheel". And using **inheritance** can make this really easy.

Later, we'll be taking Scikit-Learn's objects and customizing them to our particular needs. This can be a common practice as we use libraries and tools to write reproducible code.

Inheritance allows us to write some of this code quickly by avoiding a lot of "boilerplate" code (the same code we write over and over just to do a minor change).

# Duck Typing

But we don't need inheritance to do everything. 

A different method of getting functionality using different objects is called **duck typing**. The term comes from the saying: 
> **"If it walks like a duck and it quacks like a duck, then it must be a duck."**

![](img/duck.jpg)
> <a href="https://commons.wikimedia.org/wiki/File:Rubber_Duck_Front_View_in_Fine_Day_20140107.jpg">玄史生</a>, <a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a>, via Wikimedia Commons

When you're using the concept of duck typing, you really don't care about the object _type_ and if it's compatible.

All you _care about are the **methods and properties**_ of the object over the type or even class.

## Duck Typing in Scikit-Learn

Scikit-Learn relies more on duck typing over pure inheritance. In general, if an object has certain methods that `sklearn` expects, than it's mostly compatible!

However, inheritance in Scikit-Learn is typically used to avoid _boilerplate_ code. Usually this involves using [`sklearn.base`](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.base) such as [`sklearn.base.BaseEstimator`](https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator).

# Scikit-Learn's API: (Estimators, Transformers, Predictors)

Scikit-Learn has a great [API](https://scikit-learn.org/stable/developers/develop.html) that has objects that are consistent and easy to make compatible with your own made objects!

Let's go over the API's object that will be most relevant to us in the near future.

## Estimator

> This is an object that can can take in data and _estimate_ (or *learn*) some parameters. 

This means regression and classification models are estimators but so are objects that transform the original dataset ([Transformers](#Transformer)) such as `StandardScaler`.

### `fit`

All estimators estimate/learn by calling the `fit()` method by passing in the dataset. Other parameters can be passed in to "help" the estimator to learn. These are called **hyperparameters**, parameters used to tweak the learning process.

## Transformer

> Some estimators can change the original data to something new, a **transformation**. 

You can think of examples of these **transformers** when you do scaling, data cleaning, or expanding/reducing on a dataset.

### `transform`

Transformers will call the `transform()` method to apply the transformation to a dataset after a `fit()` call.

###  `fit_transform`

Remember that all estimators have a `fit()` method, so a transformer can use the `fit()` method to learn something about the given dataset. After learning with `fit()`, a transformation on the dataset can be made with the `transform()` method. 

An example of this would be a function that performs normalization on the dataset; the `fit()` method would learn the minimum and maximum of the dataset and the `transform()` method will scale the dataset.

When you call `fit` and `transform` with the same dataset, you can simply call the `fit_transform()` method. This essentially has the same results as calling `fit()` and then `transform()` on the dataset but possibly with some optimization and efficiencies baked in.

## Predictor

> We would use the `fit()` method to train our predictor object and then feed in new data to make predictions (based on what it learned in the fitting stage).

We've used **predictors** whenever we've made predictions like with a `LinearRegression` model.

### `predict`

As you probably can guess, the `predict()` method predicts results from a dataset given to it after being trained with a `fit()` method

### `score`

Predictors also have a `score()` method that can be used to evaluate how well the predictor performed on a dataset (such as the test set).

## Observing a Scikit-Learn Class Definition from Source

Let's begin by taking a look at the source code for `sklearn`'s [StandardScaler](https://github.com/scikit-learn/scikit-learn/blob/fd237278e/sklearn/preprocessing/_data.py#L517)

Take a minute to peruse the source code on your own. What do you notice?

# Creating a Scikit-Learn Transformer

> Sometimes we want to create our own Scikit-Learn objects to be used in our code.

Let's try to create a new _transformer_ that will transform the data in the following manner:

- If the value is **positive**, scale the value by the **largest value** in that column
- If the value is **negative**, change it to $0$

## Creating a New Transformer

First, we create our base estimator/transformer through inheritance of [`sklearn.base.BaseEstimator`](https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator):

In [None]:
class SpecialTransformer(BaseEstimator):
    pass

my_special_trans = SpecialTransformer()
my_special_trans

This by itself is pretty useless. But we can now add in new `fit()` method which will find the maximum value for each column/feature.

## Creating a `fit` Method

In [None]:
class SpecialTransformer(BaseEstimator):
    
    def fit(self, X, y=None): # By convention, we accept a y parameter
        # Get the maximum value for each column/feature
        # Note we use an ending underscore for values "learned" from fit()
        self.max_ = np.max(X, axis=0) 
        return self

In [None]:
my_special_trans = SpecialTransformer()

In [None]:
## Let's use some test data
# Note each column is a feature, each row a data point
X = np.array([
    [-4, 400, 40],
    [10, -100, 1],
    [6, -800, 700],
    [2, 0, 400],
    [8, 200, 1000]
])

X

> Quick check: What would be the max values for each column/feature?

In [None]:
my_special_trans.max_

In [None]:
# No transformation yet, but finds the maximum values
my_special_trans.fit(X)
my_special_trans.max_

Great! 

## Creating `transform` Method

Let's now actually implement a way to transform our data:

In [None]:
class SpecialTransformer(BaseEstimator):
    
    def fit(self, X, y=None):
        self.max_ = np.max(X, axis=0) 
        return self
    
    def transform(self, X):
        '''
        Scale the values passed in: 
            - Negatives go to 0
            - Positives scaled by maximum value found in fit()
        '''
        X_copy = np.copy(X)
        # If negative value, turn it to 0
        X_copy[X_copy < 0] = 0
        # Scale everything by max value (previous negative values still 0)
        return X_copy / self.max_

In [None]:
# Recall the data
X

In [None]:
# Create a SpecialTransformer and fit with the data
my_special_trans = SpecialTransformer()
my_special_trans.fit(X)

In [None]:
# Transform the data
X_new = my_special_trans.transform(X)
X_new

## Conclusion

We now created our very own transformer! We could even feed in one data set to _fit_ our object and then a different dataset to _transform_.

We should note that there's still a lot of customization we could have done. 

For example, we didn't consider what happens if the maximum value for a feature was $0$. We really should code how we want that to be handled (but we just ignored it for now).

We also could have gotten the `fit_transform()` method automatically by also inheriting from [`TransformerMixin`](https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html#sklearn.base.TransformerMixin). See the code below:

In [None]:
class SpecialTransformer(BaseEstimator, TransformerMixin):
    
    def fit(self, X, y=None):
        self.max_ = np.max(X,axis=0) 
        return self
    
    def transform(self, X):
        X_copy = np.copy(X)
        X_copy[X_copy < 0] = 0
        return X_copy / self.max_

In [None]:
my_special_trans = SpecialTransformer()
# Note we can now do fit_transform()
X_new = my_special_trans.fit_transform(X)
X_new

# Exercise: Create Your Own Transformer

Your turn! Let's try to recreate the [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) object!

Recall that standard scaling transforms the values in the following way:

$$x_i = \frac{x_i-\bar{x_i}}{\sigma_{x_i}}$$

where the $i$ subscript reminds us that it comes from a single column/feature.

In [None]:
## YOUR CODE HERE!
class MyStandardScaler:
    pass

<details>
    <summary>Answer</summary>
        <code>class MyStandardScaler:
    def fit(self, arr):
        self.mean_ = np.mean(arr, axis=0)
        self.scale_ = np.std(arr, axis=0)
    def transform(self, arr):
        return (arr - self.mean_) / self.scale_</code>
</details>

## Test Your Code!

Once you have it, you can test it against the data below and Scikit-Learn's `StandardScaler`

In [None]:
# Your test data
X = np.array([
    [-4, 400, 40],
    [10, -100, 1],
    [6, -800, 700],
    [2, 0, 400],
    [8, 200, 1000]
])
X

In [None]:
# Test against StandardScaler
sklearn_scaler = StandardScaler()
X_sklearn_scaled = sklearn_scaler.fit_transform(X)
X_sklearn_scaled

In [None]:
# Catches errors
try:
    # Your implementation
    my_scaler = MyStandardScaler()
    my_scaler.fit(X)
    X_my_scaled = my_scaler.transform(X)
    
    # Check against StandardScaler
    print('StandardScaler and MyStandardScaler same?')
    print(X_sklearn_scaled == X_my_scaled)
except:
    print('Check your fit() and transform() methods!')

## Objectives Recap

- Understand the concept of object-oriented inheritance
- Understand the main object types of the Scikit-Learn API
- Extend and create custom Scikit-Learn Estimators