# Object Oriented Programming

## Agenda
1. Why a data scientist should learn about OOP
2. "Everything in Python is an object"  
3. Define attributes, methods, and dot notation
4. Describe the relationship of classes and objectes, and learn to code classes
5. Overview of Inheritance
6. Code Your Own Standard Scaler

# 1. Why a data scientist should learn about OOP

![hackerman](https://media.giphy.com/media/MM0Jrc8BHKx3y/giphy.gif)

  - By becoming familiar with the principles of OOP, you will increase your knowledge of what's possible.  Much of what you might think you need to code by hand is already built into the objects.
  - With a knowledge of classes and how objects store information, you will develop a better sense of when the learning in machine learning occurs in the code, and after that learning occurs, how to access the information gained.
  - You become comfortable reading other people's code, which will improve your own code.
  - You will develop knowledge of the OOP family of programming languages, what are the strengths and weakness of Python, and the strengths and weaknesses of other language families.

  
Let's begin by taking a look at the source code for [Sklearn's standard scalar](https://github.com/scikit-learn/scikit-learn/blob/fd237278e/sklearn/preprocessing/_data.py#L517)

Take a minute to peruse the source code on your own.  



# 2. "Everything in Python is an object"  


Python is an object-oriented programming language. You'll hear people say that "everything is an object" in Python. What does this mean?


In [1]:
# It means
type('Hello World')

str

In [2]:
type('')


str

In [3]:
type({})


dict

In [4]:
type(print)

builtin_function_or_method

Even Python integers are objects. Consider:

In [5]:
x = 5

In [6]:
type(x)

int

By setting x equal to an integer, I'm imbuing x with the methods of the integer class.

In [7]:
x.bit_length()

3

In [8]:
x.__float__()

5.0

In [9]:
help(int)

Help on class int in module builtins:

class int(object)
 |  int(x=0) -> integer
 |  int(x, base=10) -> integer
 |  
 |  Convert a number or string to an integer, or return 0 if no arguments
 |  are given.  If x is a number, return x.__int__().  For floating point
 |  numbers, this truncates towards zero.
 |  
 |  If x is not a number or if base is given, then x must be a string,
 |  bytes, or bytearray instance representing an integer literal in the
 |  given base.  The literal can be preceded by '+' or '-' and be surrounded
 |  by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
 |  Base 0 means to interpret the base from the string as an integer literal.
 |  >>> int('0b100', base=0)
 |  4
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __bool__(self, /)
 |      self != 0
 |  
 |  __ceil__(...)
 |      Ceiling of

# Pair Exercise

Let's practice accessing the methods associated with the built in string class.  
You are given a string below: 

In [10]:
example = '   hELL0, w0RLD?   '

Your task is to fix is so it reads `Hello, World!` using string methods.  To practice chaining methods, try to do it in one line.
Use the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods), and use the inspect library to see the names of methods.

In [11]:
# We can also use
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> str
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getatt

In [12]:
# Your code here

In [13]:
#__SOLUTION__
example.swapcase().replace('0','o').strip().replace('?','!')

'Hello, World!'

# Fun Detour About How Python Works

Python is dynamically typed, meaning you don't have to instruct it as to what type of object your variable is.  
A variable is a pointer to where an object is stored in memory.

In [None]:
# interesting side note about how variables operate in Python

In [16]:
print(hex(id(x)))

0x109bcc700


In [17]:
y = 5

In [18]:
print(hex(id(y)))

0x109bcc700


In [19]:
# this can have implications 

x_list = [1,2,3,4]
y_list = x_list

x_list.pop()
print(x_list)
print(y_list)

[1, 2, 3]
[1, 2, 3]


In [20]:
# when you use copy(), you create a shallow copy of the object
z_list = y_list.copy()
y_list.pop()
print(y_list)
print(z_list)

[1, 2]
[1, 2, 3]


In [21]:
a_list = [[1,2,3], [4,5,6]]
b_list = a_list.copy()
a_list[0][0] ='z'
b_list

[['z', 2, 3], [4, 5, 6]]

In [22]:
import copy

#deepcopy is needed for mutable objects
a_list = [[1,2,3], [4,5,6]]
b_list = copy.deepcopy(a_list)
a_list[0][0] ='z'
b_list

[[1, 2, 3], [4, 5, 6]]

For more details on this general feature of Python, see [here](https://jakevdp.github.io/WhirlwindTourOfPython/03-semantics-variables.html).
For more on shallow and deepcopy, go [here](https://docs.python.org/3/library/copy.html#copy.deepcopy)

# 3. Define attributes, methods, and dot notation

Dot notation is used to access both attributes and methods.

Take for example our familiar friend, the [Pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)

In [23]:
import pandas as pd
# Dataframes are another type of object which we are familiar with.

df = pd.DataFrame({'price':[50,40,30],'sqft':[1000,950,500]})

In [24]:
type(df)

pandas.core.frame.DataFrame

Instance attributes are associated with each unique object.
They describe characteristics of the object, and are accessed with dot notation like so:

In [25]:
df.shape

(3, 2)

What are some other DataFrame attributes we know?:

In [26]:
# answer

In [27]:
#__SOLUTION__
# Other attributes
print(df.columns)
print(df.index)
print(df.dtypes)
print(df.T)

Index(['price', 'sqft'], dtype='object')
RangeIndex(start=0, stop=3, step=1)
price    int64
sqft     int64
dtype: object
          0    1    2
price    50   40   30
sqft   1000  950  500


A **method** is what we call a function attached to an object

In [28]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
price    3 non-null int64
sqft     3 non-null int64
dtypes: int64(2)
memory usage: 176.0 bytes


In [29]:
# isna() is a method that comes along with the DataFrame object
df.isna()

Unnamed: 0,price,sqft
0,False,False
1,False,False
2,False,False


What other DataFrame methods do we know?

In [30]:
#__SOLUTION__
df.describe()
df.copy()
df.head()
df.tail()

Unnamed: 0,price,sqft
0,50,1000
1,40,950
2,30,500


# 4. Describe the relationship of classes and objects, and learn to code classes

Each object is an instance of a **class** that defines a bundle of attributes and functions (now, as proprietary to the object type, called *methods*), the point being that **every object of that class will automatically have those proprietary attributes and methods**.

A class is like a blueprint that describes how to create a specific type of object.

![blueprint](img/blueprint.jpeg)


## Classes

We can define **new** classes of objects altogether by using the keyword `class`:

In [31]:
class Car:
    """Automotive object"""
    pass # This called a stub. 

In [32]:
# Instantiate a car object
ferrari =  Car()
type(ferrari)

__main__.Car

In [33]:
# Try importing car_b's automotive object and check the output of type.

In [34]:
# We can give describe the ferrari as having four wheels

ferrari.wheels = 4
ferrari.wheels

4

In [35]:
# But wouldn't it be nice to not have to do that every time? 
# We assume the blueprint of a car will have include the 4 wheels specification
# and assign it as an attribute when building the class

In [36]:
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.


In [37]:
civic = Car()
civic.wheels


4

In [38]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4


In [39]:
ferrari = Car()
ferrari.doors


4

In [40]:
# But a ferrari does not have 4 doors! 
# These attributes can be overwritten 

ferrari.doors = 2
ferrari.doors

2

### Methods

We can also write functions that are associated with each class.  
As said above, a function associated with a class is called a method.

In [41]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')
        
    

In [42]:
ferrari = civic = Car()
ferrari.honk()
civic.honk()


Beep beep
Beep beep


Wait a second, what's that `self` doing? 

## Magic Methods

It is common for a class to have magic methods. These are identifiable by the "dunder" (i.e. **d**ouble **under**score) prefixes and suffixes, such as `__init__()`. These methods will get called **automatically** as a result of a different call, as we'll see below.

For more on these "magic methods", see [here](https://www.geeksforgeeks.org/dunder-magic-methods-python/).

When we create an instance of a class, Python invokes the __init__ to initialize the object.  Let's add __init__ to our class.


In [43]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    WHEELS = 4                      # Capital letters mean wheels is a constant
    
    def __init__(self, doors, fwd):
        
        self.doors = doors
        self.fwd = fwd
        

    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')
    

By adding doors and moving to init, we need to pass parameters when instantiating the object.

In [44]:
civic = Car(4, True)
print(civic.doors)
print(civic.fwd)

4
True


We can also pass default arguments if there is a value for a certain parameter which is very common.

In [45]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    WHEELS = 4                     
    
    # default arguments included now in __init__
    def __init__(self, doors=4, fwd=False):
        
        self.doors = doors
        self.fwd = fwd
        

    def honk(self):                  
        print('Beep beep')
    

In [46]:
civic = Car()
print(civic.doors)
print(civic.fwd)

4
False


#### Positional vs. Named arguments

In [47]:
# we can pass our arguments without names
civic = Car(4, True)


In [48]:
# or with names
civic = Car(doors=4, fwd=True)


In [49]:
# or with a mix
civic = Car(4, fwd=True)


In [50]:
# but only when positional precides named
civic = Car(doors = 4, True)

SyntaxError: positional argument follows keyword argument (<ipython-input-50-6046029021d3>, line 2)

In [51]:
# The self argument allows our methods to update our attributes.

#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    WHEELS = 4                     
    
    # default arguments included now in __init__
    def __init__(self, doors=4, fwd=False, driver_mood='peaceful'):
        
        self.doors = doors
        self.fwd = fwd
        self.driver_mood = driver_mood
        

    def honk(self):                  
        print('Beep beep')
        self.driver_mood = 'pissed'
    

In [52]:
civic = Car()
print(civic.driver_mood)
civic.honk()
print(civic.driver_mood)

peaceful
Beep beep
pissed


# Pair

 Let's bring our knowledge together, and in pairs, code out the following:

We have an attribute `moving` which indicates, with a boolean, whether the car is moving or not.  

Fill in the functions stop and go to change the attribute `moving` to reflect the car's present state of motion after the method is called.  Also, include a print statement that indicates the car has started moving or has stopped.

Make sure the method works by calling it, then printing the attribute.


In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    # default arguments included now in __init__
    def __init__(self, doors=4, fwd=False, driver_mood='peaceful'):
        
        self.doors = doors
        self.fwd = fwd
        self.driver_mood = driver_mood
        
    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')
        
    def go(self):
        pass
    
    def stop(self):
        pass

In [53]:
#__SOLUTION__
class Car:
    """Automotive object"""
    WHEELS = 4
     # default arguments included now in __init__
    def __init__(self, doors=4, fwd=False, driver_mood='peaceful', moving=False):
        
        self.doors = doors
        self.fwd = fwd
        self.moving = moving
        self.driver_mood = driver_mood

    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')
        
    def go(self):
        self.moving = True
        print('Whoa, that\'s some acceleration!')
    
    def stop(self):
        self.moving = False
        print('Screeech!')

In [54]:
# run this code to make sure your 
civic = Car()
print(civic.moving)

civic.go()
print(civic.moving)

civic.stop()
print(civic.moving)

False
Whoa, that's some acceleration!
True
Screeech!
False


## 5. Overview of inheritance

We can also define classes in terms of *other* classes, in which cases the new classes **inherit** the attributes and methods from the classes in terms of which they're defined.

Suppose we decided we want to create an electric car class.

In [55]:
#  Then we can add more attributes
class ElectricCar(Car):
    """Automotive object"""
    
    pass

In [56]:
prius = ElectricCar()
prius.honk()
prius.WHEELS

Beep beep


4

In [57]:
#  Then we can add more attributes
class ElectricCar(Car):
    """Automotive object"""
    
    # default arguments included now in __init__
    def __init__(self, hybrid=False):
        super().__init__(self)
        self.hybrid = hybrid 

In [58]:
volt = ElectricCar(hybrid=True)
volt.hybrid
volt.driver_mood

'peaceful'

In [59]:
#  And we can overwrite methods and parent attributes
class ElectricCar(Car):
    """Automotive object"""
    
    # default arguments included now in __init__
    def __init__(self, hybrid=False):
        
        # Prius owners are calmer than the average car owner
        super().__init__(driver_mood='serene')
        
        self.hybrid = True
        
    # overwrite inheritd methods
    
    def go(self):
        
        print('Whirrrrrr')
        self.moving = True

In [60]:
prius = ElectricCar()
print(prius.moving)
prius.go()
prius.moving
print(prius.driver_mood)

False
Whirrrrrr
serene


## 6. Standard Scaler through the object lens: 

We are becomming more and more familiar with a series of methods with names such as fit or fit_transform.

After instantiating an instance of a Standard Scaler, Linear Regression model, or One Hot Encoder, we use fit to learn about the dataset and save what is learned. What is learned is saved in the attributes.

### Standard Scaler 

The standard scaler takes a series and, for each element, computes the absolute value of the difference from the point to the mean of the series, and divides by the standard deviation.

$\Large z = \frac{|x - \mu|}{s}$


## Attributes and Methods of Standard Scaler

### `.scale_`, `.mean_`, `.fit`, `.transform`, `.fit_transform`

In [2]:
from sklearn.preprocessing import StandardScaler
import numpy as np

# instantiate a standard scaler object
ss = StandardScaler()

# We can instantiate as many scaler objects as we want
maxs_scaler = StandardScaler()

In [3]:
# Let's work with a random array of numbers.
np.random.seed(42)
series_1 = np.random.normal(3,1,1000)
print(series_1.mean())
print(series_1.std())

3.0193320558223253
0.9787262077473542


When we fit the standard scaler, it studies the object passed to it, and saves what is learned in its instance attributes

In [4]:
ss.fit(series_1.reshape(-1,1))

# standard deviation is saved in the attribute scale_
ss.scale_

array([0.97872621])

In [5]:
# mean is saved into the attribute mean
ss.mean_

array([3.01933206])

Then, we can use the transform method to transform every element in the array to the zscore corresponding to the mean and standard deviation learned after fit() was called on the array.

In [6]:
ss.transform(series_1.reshape(-1,1))[:5]

array([[ 0.48775857],
       [-0.1610219 ],
       [ 0.64201457],
       [ 1.53638248],
       [-0.25899524]])

In [7]:
# let's double check the math by applying the z-score formula
(series_1[0]-ss.mean_)/ss.scale_

array([0.48775857])

In [8]:
# We can call fit and transform in one step as well

ss.fit_transform(series_1.reshape(-1,1))[:5]

array([[ 0.48775857],
       [-0.1610219 ],
       [ 0.64201457],
       [ 1.53638248],
       [-0.25899524]])

# Pair program

Now we will take our new knowledge of how to create classes and make our own standard scaler. 

Look in scaler.py for the steps to the activity.

Once you have completed the tast, instantiate an instance of your scaler, and check that fitting it returns the same results as sklearns standard scaler fit above. 

Make sure the transform functions return the same results as well.

In [10]:
%load_ext autoreload
%autoreload 2


In [11]:
#__SOLUTION__
from scaler_solution import MyStandardScaler

mss = MyStandardScaler()
mss.fit_transform(series_1)[:5]

[0.48775857171297654,
 -0.16102190351705759,
 0.6420145667955479,
 1.53638248233551,
 -0.2589952415079247]

In [12]:
#__SOLUTION__
print(mss.scale_)
print(ss.scale_[0])

0.9787262077473544
0.9787262077473542


In [13]:
#__SOLUTION__
print(mss.mean_)
print(ss.mean_[0])

3.0193320558223253
3.0193320558223253
