# Object-Oriented Programming

In [None]:
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
import inspect
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import make_regression

![fvo](https://cdn.educba.com/academy/wp-content/uploads/2018/07/Functional-Programming-vs-OOP-1.png)

## Agenda

SWBAT:

1. explain the meaning and relevance of object orientation;
2. explain the idea that "everything in Python is an object";
3. define the notions of attribute, method, and dot notation;
4. describe the relationship of classes and objects, and to code classes;
5. explain the notion of inheritance;
6. describe how the object structure is used in `sklearn` tools like `StandardScaler` and `OneHotEncoder`.

## 1. Why a data scientist should learn about OOP

  - By becoming familiar with the principles of OOP, you will increase your knowledge of what's possible.  Much of what you might think you need to code by hand is already built into the objects.
  - With a knowledge of classes and how objects store information, you will develop a better sense of when the learning in machine learning occurs in the code, and after that learning occurs, how to access the information gained.
  - You become comfortable reading other people's code, which will improve your own code.
  - You will develop knowledge of the OOP family of programming languages, the strengths and weakness of Python, and the strengths and weaknesses of other language families.

  
Let's begin by taking a look at the source code for `sklearn`'s [StandardScaler](https://github.com/scikit-learn/scikit-learn/blob/fd237278e/sklearn/preprocessing/_data.py#L517)

Take a minute to peruse the source code on your own. What do you notice?

## 2. "Everything in Python is an object"

Python is an object-oriented programming language. You'll hear people say that "everything is an object" in Python. What does this mean?

Go back to the idea of a function for a moment. A function is a kind of abstraction whereby an algorithm is made repeatable. So instead of coding:

In [None]:
print(3**2 + 10)
print(4**2 + 10)
print(5**2 + 10)

or even:

In [None]:
for x in range(3, 6):
    print(x**2 + 10)

I can write:

In [None]:
def square_and_add_ten(x):
    return x**2 + 10

Now imagine a further abstraction: Before, creating a function was about making a certain algorithm available to different inputs. Now I want to make that function available to different **objects**.

Even Python integers are objects. Consider:

In [None]:
x = 3

We can see what type of object a variable is with the built-in type operator:

In [None]:
type(x)

By setting x equal to an integer, I'm imbuing x with the methods of the integer class.

In [None]:
x.bit_length()

In [None]:
y = 4
y.bit_length()

In [None]:
x.__float__()

Python is dynamically typed, meaning you don't have to instruct it as to what type of object your variable is.  
A variable is a pointer to where an object is stored in memory.

### Side Note about Variables

In [None]:
id(x)

In [None]:
hex(id(x))

In [None]:
y = 3

In [None]:
hex(id(y))

In [None]:
x is y

In [None]:
# this can have implications 

x_list = [1,2,3,4]
y_list = x_list

x_list.pop()
print(x_list)
print(y_list)

In [None]:
# when you use copy(), you create a shallow copy of the object

z_list = y_list.copy()

In [None]:
id(z_list)

In [None]:
id(y_list)

In [None]:
y_list.pop()
print(y_list)
print(z_list)

In [None]:
a_list = [[1,2,3], [4,5,6]]
b_list = a_list.copy()
a_list[0][0] ='z'
b_list

In [None]:
import copy

# deepcopy is needed for mutable objects

a_list = [[1,2,3], [4,5,6]]
b_list = copy.deepcopy(a_list)
a_list[0][0] ='z'
b_list

For more details on this general feature of Python, see [here](https://jakevdp.github.io/WhirlwindTourOfPython/03-semantics-variables.html).
For more on shallow and deep copying, go [here](https://docs.python.org/3/library/copy.html#copy.deepcopy).

## 3. Define attributes, methods, and dot notation

Dot notation is used to access both attributes and methods.

Take for example our familiar friend, the [`Pandas` DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).

In [None]:
# Dataframes are another type of object.

df = pd.DataFrame({'price': [50, 40, 30],'sqft': [1000, 950, 500]})

In [None]:
df

In [None]:
type(df)

Instance attributes are associated with each unique object.
They describe characteristics of the object, and are accessed with dot notation like so:

In [None]:
df.shape

What are some other DataFrame attributes we know?:

In [None]:
# Other df attributes



A **method** is a function attached to an object:

In [None]:
df.info()

In [None]:
type(df.info())

In [None]:
# isna() is a method that comes along with the DataFrame object

df.isna()

What other DataFrame methods do we know?

In [None]:
# Other df methods



### Exercise

Let's practice accessing the methods associated with the built in `str` class.  
You are given a string below: 

In [None]:
example = '   hELL0, w0RLD?   '

Your task is to fix is so it reads `Hello, World!` using string methods.  To practice chaining methods, try to do it in one line.

Use the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods), and use the inspect library to see the names of methods.

We can chain methods together because the **result of applying a method to an object is another object**.

In [None]:
inspect.getmembers(example)

In [None]:
# we can also use the built-in dir() method

dir(example)

<details>
    <summary>
        Answer here
    </summary>
<code>example.swapcase().replace('0', 'o').strip().replace('?', '!')</code>
    </details>

## 4. Describe the relationship of classes to objects, and learn to code classes

Each object is an instance of a **class** that defines a bundle of attributes and functions (now, as proprietary to the object type, called *methods*), the point being that **every object of that class will automatically have those proprietary attributes and methods**.

A class is like a blueprint that describes how to create a specific type of object.

![blueprint](img/blueprint.jpeg)

### Classes

We can define **new** classes of objects altogether by using the keyword `class`:

In [None]:
class Car:
    """Automotive object"""
    pass # This is called a stub.

In [None]:
# Instantiate a car object

ferrari = Car()
type(ferrari)

In [None]:
# We can give the Ferrari four wheels

ferrari.wheels = 4
ferrari.wheels

But wouldn't it be nice not to have to do that every time? We'll just include the 4-wheels specification in the blueprint!

In [None]:
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.

In [None]:
civic = Car()
civic.wheels

In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

In [None]:
ferrari = Car()
ferrari.doors

In [None]:
ferrari.wheels

In [None]:
# Does your Ferrari have only 2 doors? 
# These attributes can be overwritten.

ferrari.doors = 2
ferrari.doors

### Methods

We can also write functions that are associated with each class.  
As said above, a function associated with a class is called a method.

In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')

In [None]:
ferrari = civic = Car()
ferrari.honk()
civic.honk()

In [None]:
type(ferrari.wheels)

In [None]:
type(ferrari.honk())

Wait a second, what's that `self` doing? <br/> Every method should include `self` as its first parameter, **which refers to the individual object, i.e. to the instance of the class**.

### Magic Methods

It is common for a class to have magic methods. These are identifiable by the "dunder" (i.e. **d**ouble **under**score) prefixes and suffixes, such as `__init__()`. These methods will get called **automatically** as a result of a different call, as we'll see below.

For more on these "magic methods", see [here](https://www.geeksforgeeks.org/dunder-magic-methods-python/).

When we create an instance of a class, Python invokes the __init__ to initialize the object.  Let's add __init__ to our class.


In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    WHEELS = 4                      # As a convention, capital letters
                                    # are used for constants.
    
    def __init__(self, doors, fwd): # By adding doors and moving to init,
                                    # we shall now need to pass parameters when
                                    # instantiating the object!
        self.doors = doors
        self.fwd = fwd
        

    def honk(self):                 # These are methods we can call on *any* car.
        print('Beep beep')

In [None]:
civic = Car()

In [None]:
civic = Car(doors=4, fwd=True)

print(civic.doors)
print(civic.fwd)

We can also pass default arguments if there is a value for a certain parameter which is very common.

In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    WHEELS = 4                     
    
    # default arguments included now in __init__
    def __init__(self, doors=4, fwd=False):
        
        self.doors = doors
        self.fwd = fwd
        

    def honk(self):                  
        print('Beep beep')

In [None]:
civic = Car()
print(civic.doors)
print(civic.fwd)

### Positional vs. Named arguments

In [None]:
# we can pass our arguments without names

civic = Car(4, True)

In [None]:
# or with names

civic = Car(doors=4, fwd=True)

In [None]:
# The self argument allows our methods to update our attributes.

# Then we can add more attributes.

class Car:
    """Automotive object"""
    
    WHEELS = 4                     
    
    # default arguments included now in __init__
    def __init__(self, doors=4, fwd=False,
                 driver_mood='peaceful'):
        
        self.doors = doors
        self.fwd = fwd
        self.driver_mood = driver_mood
        

    def honk(self):                  
        print('Beep beep')
        self.driver_mood = 'aggravated'

In [None]:
civic = Car()
print(civic.driver_mood)
civic.honk()
print(civic.driver_mood)

### Exercise

Let's add an attribute `moving` which indicates, with a boolean, whether the car is moving or not.

Fill in the functions `stop()` and `go()` so that the attribute `moving` will reflect the car's present state of motion after the method is called.

Make sure the method works by calling it, then printing the attribute.

In [None]:
# Then we can add more attributes
class Car:
    """Automotive object"""
    
    WHEELS = 4
    
    # default arguments included now in __init__
    def __init__(self, doors=4, fwd=False, driver_mood='peaceful'):
        
        self.doors = doors
        self.fwd = fwd
        self.driver_mood = driver_mood
        self.moving = moving
        
    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')
        
    def go(self):
        pass
    
    def stop(self):
        pass

In [None]:
# Test your code from above

civic = Car()
print(civic.moving)

civic.go()
print(civic.moving)

civic.stop()
print(civic.moving)

## 5. Overview of inheritance

We can also define classes in terms of *other* classes, in which case the new classes **inherit** the attributes and methods from the classes in terms of which they're defined.

Suppose we decided we want to create an electric car class.

In [None]:
#  Then we can add more attributes
class ElectricCar(Car):
    """Automotive object"""
    
    pass

In [None]:
prius = ElectricCar()
prius.honk()
prius.WHEELS

In [None]:
#  Then we can add more attributes
class ElectricCar(Car):
    """Automotive object"""
    
    # default arguments included now in __init__
    def __init__(self, hybrid=False):
        super().__init__()                   # super() refers to the parent class.
                                             # See https://realpython.com/python-super/
                                             # for more.
        self.hybrid = True

In [None]:
#  And we can overwrite methods and parent attributes
class ElectricCar(Car):
    """Automotive object"""
    
    # default arguments included now in __init__
    def __init__(self, hybrid=False):
        
        # Prius owners are calmer than the average car owner
        super().__init__(driver_mood='serene')
        
        self.hybrid = True
        
    # overwrite inherited methods
    
    def go(self):
        
        print('Whirrrrrr')
        self.moving = True

In [None]:
prius = ElectricCar()
print(prius.moving)

In [None]:
prius.go()
prius.moving
print(prius.driver_mood)

In [None]:
prius.stop()

### Another Example

In [None]:
class Shape:
    def __init__(self, n):
        self.n_sides = n
    sides = []

In [None]:
class Triangle(Shape):
    def __init__(self):
        Shape.__init__(self, 3)

    def findArea(self):
        a, b, c = self.sides
        # calculate the semi-perimeter
        s = (a + b + c) / 2
        area = (s*(s-a)*(s-b)*(s-c)) ** 0.5
        print('The area of the triangle is %0.2f' %area)

In [None]:
isosc = Triangle()

isosc.n_sides

In [None]:
# This will throw an error!

isosc.findArea()

In [None]:
isosc.sides = [2, 2, 2]

isosc.findArea()

### Exercise

Use inheritance together with `StandardScaler` to create your own scaler that includes, as an attribute, a list of the largest and smallest z-scores for each attribute to which the scaler has been fitted.

**Test**: After you run `fit_extra()` on the $X$ defined below and then fetch the extremes attribute, your array values should match these:

| Feature 1 | Feature 2|
| - | - |
|-1.2068162135708724 | 1.6237142612014306| <br/>
 |-1.1298429699595565 | 1.6037759986784352 |

In [None]:
X, y = make_regression(n_features=2, n_samples=5, random_state=42)

In [None]:
class MyScaler(StandardScaler):
    
    def __init__(self):
        super().__init__()
    
    def fit_extra(self, X):
        
        # ???
        
        self.fit(X)
        return self

In [None]:
new = MyScaler()

In [None]:
new.fit_extra(X)

In [None]:
new.extremes

In [None]:
new.transform(X)

<details><summary>
    Answer code here
    </summary>
    <code>self.extremes = [((min(feat)-feat.mean()) / feat.std(),
        (max(feat)-feat.mean()) / feat.std()) for feat in X.T]</code>
    </details>

## 6. Important data science tools through the lens of objects: 

We are becomming more and more familiar with a series of methods with names such as `fit()` and `fit_transform()`.

After instantiating an instance of a `StandardScaler`, `LinearRegression`, or `OneHotEncoder`, we use `fit()` to learn about the dataset and save what is learned. What is learned is saved as attributes.

### `StandardScaler `

The `StandardScaler` takes a series and, for each element, computes the difference between the element and the mean of the series, and then divides by the standard deviation.

$\Large z = \frac{x - \mu}{s}$

What attributes and methods are available for a Standard Scaler object? Let's go back to the code on [GitHub](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/_data.py). In many typical cases the `.fit()` method relies on the [`_incremental_mean_and_var()` function](https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/extmath.py/).

#### Attributes

##### `.scale_`

In [None]:
# Instantiate a standard scaler object
greg = StandardScaler()

# We can instantiate as many scaler objects as we want
max_ = StandardScaler()

In [None]:
greg == max_

In [None]:
# Let's create a dataframe with two series

np.random.seed(42)
series_1 = np.random.normal(3, 1, 1000)

print(series_1.mean())
print(series_1.std())

When we fit the `StandardScaler`, it studies the object passed to it, and saves what is learned in its instance attributes.

In [None]:
greg.fit(series_1.reshape(-1,1))

# standard deviation is saved in the attribute "scale_"
greg.scale_

In [None]:
# mean is saved into the attribute "mean_"
greg.mean_

In [None]:
# Knowledge Check

# What value should I pass into the `transform()` method to
# get a return of 0?

greg.transform([])

In [None]:
# We can then use these attributes to transform objects

np.random.seed(42)
random_numbers = np.random.normal(3, 1, 2)
random_numbers

In [None]:
greg.transform(random_numbers.reshape(-1, 1))

In [None]:
# We can also use a scaler on a DataFrame

series_1 = np.random.normal(3, 1, 1000)
series_2 = np.random.uniform(0, 100, 1000)
df_2 = pd.DataFrame([series_1, series_2]).T
ss_df = StandardScaler()
ss_df.fit_transform(df_2)

In [None]:
ss_df.transform([[5, 50]])

### Task: One-hot Encoder

In [None]:
np.random.seed(42)
# Let's create a DataFrame that records a total number of orders
# by day of the week: 

days = np.random.choice(['m', 't', 'w', 'th', 'f', 's', 'su'], 1000)
orders = np.random.randint(0, 1000, 1000)

df = pd.DataFrame([days, orders]).T
df.columns = ['days', 'orders']
df.head()

In [None]:
df.shape

Let's interact with an important parameter we can pass when instantiating the `OneHotEncoder` object: `drop`.  

By dropping a column, we avoid the [dummy variable trap](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)).  

By passing `drop=True`, the encoder will drop the first category it happens upon.

If we want to drop a particular column, we can also pass that in.

In [None]:
# Instantiate a OneHotEncoder object

ohe = OneHotEncoder(drop=['m'])

In [None]:
ohe_matrix = ohe.fit_transform(df[['days']])

In [None]:
ohe_matrix

In [None]:
# Look at __dict__ and checkout drop_idx_.
# Did it do what you wanted it to do?

ohe.__dict__['drop_idx_']

In [None]:
# check out the categories_ attribute
ohe.categories_

In [None]:
# Check out the object itself
ohe_matrix

It is a sparse matrix, which is a matrix that is composed mostly of zeros

In [None]:
# We can convert it to an array like so
oh_df = pd.DataFrame.sparse.from_spmatrix(ohe_matrix)

In [None]:
# Now, using the categories_ attribute, set the column names
# to the correct days of the week.

ohe_columns = list(ohe.categories_[0])
ohe_columns.pop(int(ohe.drop_idx_))
oh_df.columns = ohe_columns
oh_df.head()

In [None]:
# Add the onehotencoded columns to the original df, and drop the days column

df = df.join(oh_df).drop('days', axis=1)
df.head()