<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Introduction to Object-Oriented Programming
_Author:_ Tim Book

# Programming Paradigms

There are many **programming paradigms** (ie, programming patterns/styles). Each paradigm provides a completely different way of thinking about how to design software. Many languages (including Python) are multi-paradigm; they borrow popular aspects of several paradigms. 

The best way to learn a paradigm is to use a language that _only_ supports the paradigm, a so-called "pure" language. (The exact distinction between "pure" and not is often debated!)

## Functional Programming
The FP paradigm involves writing programs that consist only of _pure functions_. In FP:
- No mutable variables. This has the consequence of outlawing _for_ loops and emphasizing recursion.
- No side effects. Functions have no effect on the system aside from returning a value (i.e. no `print`).
- Functions _always_ return the same output given the same inputs.



You already have some experience with this style. For example, `map` and `apply` avoid _for_ loops by accepting a function as an argument. 

| "Pure" FP languages | Highly influenced by FP |
| -----      | ----- |
| Haskell | Scala (popular for big data) |
| Scheme (and Lisp)   | R (popular with statisticians) |
| Clojure | Mathematica |
| Racket |  |

Advantages of FP:
- Allows some complex ideas to be expressed simply (e.g. `map`, `apply`).
- Code is automatically scalable/parallelizable! Hence, FP is popular when working with big data.
- Code can more easily be proven correct, due to simplicity and reliance on recursion.
- Is often easy to prototype new languages in, since data _is_ code.

Disadvantages of FP:
- Often has a steep learning curve. 
- Has a strong foundation in theoretical mathematics, so the community can feel mathy to newcomers. (Functional languages are based on the "lambda calculus" model of computation instead of the "Turing machine" model.)

## Object-Oriented Programming
The OOP paradigm involves **bundling together variables and functions into "classes"** -- aka **creating your own data types**.

| "Pure" OOP languages | Highly influenced by OOP |
| -----      | ----- |
| Java | Scala |
| C#   | C++ |
| Smalltalk | Python |
| | Ruby |

OOP software design follows four main principles:
- **Encapsulation.** Attributes and methods are bound together and protected from misuse.
- **Abstraction.** Implementation details are not exposed.
- **Inheritance.** In a hierarchical manner, objects can inherit properties and methods from other objects.
- **Polymorphism.** Functions (and hence operators) can be "overloaded" and change their functionality based on the data type.

OOP was developed in an attempt to make code easier to write in large teams. Initial optimism led to a wave of popular OOP implementations, e.g. Java and C++. However, although OOP seems intuitive for some things, many programming ideas are not easily expressed in the OOP paradigm. 

In fact, OOP programs can easily become extremely complex and verbose if not carefully designed. For example, because data is hidden it is often difficult for one object to get access to necessary data. 

---

Other paradigms:
- **Procedural programming**: In DSI, we definitely _use_ some aspects of functional programming and OOP. However, the way we design software tends to be _procedural_. A procedural style is where functions (aka "procedures") are the highest level of abstraction. This style is good to know, because it is how your CPU works! "Pure" procedural languages include C and BASIC. 
- **Declarative programming**: A declarative style is where you tell the computer _what_ to do, rather than _how_ to do it. Declarative programming works best in so-called "domain-specific languages," since optimizing _how_ to do things is not possible in a general way. An example of a declarative language is SQL, e.g. `SELECT make, model, year, mpg FROM car WHERE mpg > 40;`.


## Python supports Object-Oriented Programming!
![](imgs/gvr.jpg)

While Python _does_ support a lot of FP ideas, **Python is fundamentally object-oriented -- everything in Python is an object**.

## Cool, but: Why now?
You're actually very familiar with some OOP ideas. Instantiations of `DataFrame`, `StandardScaler`, and `LinearRegression` _(recall that we imported linear.model module from sklearn library, having the LinearRegression class to construct our 1st ML model)_ have all followed the traditional OOP pattern. If you understand how to manipulate those objects, you know the basics of OOP!

But, we don't know how to **make our own templates for objects** (called "classes") yet. That's what we're going to explore today.

![](imgs/ds-def.png)

In data science, we don't make our own classes very often. But it's absolutely imperative for data scientists to be comfortable with the idea, and to recognize when making a class is a good idea. **If data science is a cross between statistics and computer science, this lesson falls more on the computer science side.** After today's lesson, a lot of the magic surrounding what we've been doing up until now should "click".

## OOP Vocab

**Covered in this lesson:**
* Class
* Instance
* Attribute
* Method
* Constructor method
* State
* "self"

**Not covered in this lesson (but some covered in supplemental material):**
* Inheritance
* Encapsulation
* Magic methods (aka "dunder methods")
* Class method
* Static method
* Public and private methods
* Getter and setter methods

## Part I: The Dog Class
_**Refer to intuition deck for class-object relation fundamentals**_

In [1]:
class Dog: # defining a class 'Dog', a container to hold attributes + methods
    def __init__(self, name, breed): # __init__ method is a constructor that will run whenever an object of the class 'Dog' is created
        # some class attributes (where attributes are 'variables' belonging to an object)
        # self: to create 'instance_template' of class. By using the "self" keyword we can access the attributes and methods of the class
        self.name = name 
        self.breed = breed
        self.hungry = True

    def speak(self): # method that can be called on an object of class 'Dog'
        print(f"Bark bark, I'm {self.name} the {self.breed}!")
        
    def feed(self): # another method that can be called on an object of class 'Dog'
        if self.hungry:
            print(f"{self.name} eats...")
            self.hungry = False
        else:
            print(f"{self.name} is not hungry!")

In [2]:
# Instantiate a Dog named Chloe --> where chloe is an object of the class 'Dog'
# below executes the code block within __init__ during creationg of the object 'chloe'
chloe = Dog("Chloe", "pug")

In [3]:
# A new type of thing!
type(chloe)

__main__.Dog

In [4]:
# This instance of Dog has attributes
chloe.name

'Chloe'

In [5]:
chloe.hungry

True

In [6]:
# Call a method on this instance of Dog (method is a function belonging to an object) 
chloe.speak()

Bark bark, I'm Chloe the pug!


In [7]:
# Another method. This one changes the state of the Dog
chloe.feed()

Chloe eats...


In [8]:
# State has changed!
chloe.hungry

False

In [9]:
# Again. The state of Chloe has changed!
chloe.feed()

Chloe is not hungry!


In [10]:
# If I make a different Dog, it doesn't share state with Chloe
# buddy is a new object of the class 'Dog'
buddy = Dog("Buddy", "golden retriever")
buddy.hungry

True

In [11]:
# We can also make a Cat class, but it's a totally separate concept from Dog.
class Cat:
    pass # In Python "pass" keyword is used to indicate that nothing happens—the function, class or loop is empty

In [12]:
garfield = Cat() # instantiating Cat object named garfield

In [13]:
# Cat doesn't magically get Dog class's methods.
# Keeping methods specific to only the classes that can use them is called "encapsulation" - a core tenant of OOP.
garfield.speak()

AttributeError: 'Cat' object has no attribute 'speak'

## Part II: The Car Class
Let's create a car with a make and model. This car will have the following features:
* It will keep track of its own miles
* It will keep track of its state as to whether the car is on or off
* If the car is off, it can't drive!
* It will have methods to turn the car on and off.

**(THREAD):** Build a `drive()` method that takes one argument and adds that many miles to the car's odometer.

In [14]:
class Car:
    def __init__(self, make, model): # to initialize the following attributes when an object is created from class 'Car'
        self.make = make
        self.model = model
        self.miles = 0
        self.on = False
        
    def honk(self):
        print("Beep beep!")
        
    def drive(self, distance):
        if self.on:
            self.miles += distance
        else:
            print("Car is off!")
    
    def turn_on(self):
        self.on = True
        
    def turn_off(self):
        self.on = False

In [15]:
mycar = Car("Nissan", "Cube") # instantiating an object 'mycar' of class 'Car' 
mycar.honk() # calling method 'honk()' on object 'mycar'

Beep beep!


In [16]:
mycar.miles # calling attribute 'miles' on object 'mycar'

0

In [17]:
mycar.drive(10) # calling method 'drive()' on object 'mycar' with a 'distance' = 10
mycar.on # car is off when the object is initialized (based on definition inside __init__)

Car is off!


False

In [18]:
mycar.turn_on() # calling method to turn the car 'on'
mycar.drive(20) # now the 'if' condition gets executed within the 'drive' method
mycar.miles # so, miles gets increment to miles = 0 + 20

20

### Can you see how this can quickly get complicated?
Cars are more intricate than this.

**_`[attempt during flextime: sample soln in car.py]`_ An exercise left to the reader:** Can you modify this car class to keep track of its own gas, too? That might involve an `mpg` attribute as well as a `tank_size` attribute. When the car drives a certain number of miles, compute how much gas is consumed. You'll also probably need a `fill_tank()` method to refuel gas.

**Further, much more advanced considerations:** What if the car only has enough gas for 15 miles, but you try to drive 20 miles? Should it drive 15 miles and then stop? Should it throw an error? Should it throw the error before or after deducting the gas? If it throws an error, what kind? Maybe you'll need to create your own `EmptyTankError` exception that inherits from `Exception`.

Sounds hard to make? This is one of the pro/con tradeoffs of OOP. It's very easy to use. When done right code looks like this:

```python
mycar = Car("Chrysler", "PT Cruiser", mpg=30, tank_size=11)
mycar.turn_on()
mycar.drive(30)
mycar.turn_off()
mycar.turn_on()
mycar.drive(100)
mycar.fill_tank()
mycar.drive(100)
mycar.turn_off()
```

Simple! Easy to read! Building classes is said to be a **layer of abstraction** for your code for this reason. This syntax is hiding potentially hundreds of lines of code that you don't need to worry about.

**Fun fact:** The file that defines the pandas `DataFrame` is more than 8,000 lines long! You don't need to read those 8,000+ lines to know how to use a `DataFrame`. Check out what this really looks like in the wild [here](https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py).

Another cool but more manageable example is the definition of the sklearn LinearRegression class that we've been using in our previous lectures [Link](https://github.com/scikit-learn/scikit-learn/blob/0d378913be6d7e485b792ea36e9268be31ed52d0/sklearn/linear_model/_base.py#L507)

## Part III: Hiding Your Ugly Code to Keep Your Notebooks Clean
Have you ever wondered how to make your own "importable" things? Let's check out a basic example. Let's open up the `car.py` file in this directory.

In [20]:
# importing the class 'AdvancedCar' from car.py
from car import AdvancedCar

In [21]:
# instantiating an object 'newcar' of the class 'AdvancedCar'
newcar = AdvancedCar("Ford", "Focus", mpg=25, tank_size=11)

In [22]:
# calling method 'honk()' on object
newcar.honk()

Beep beep!


In [23]:
# calling attribute 'gas' on object = tank_size as defined inside __init__
newcar.gas

11

In [24]:
# Let's turn the car on and go!
newcar.turn_on()
newcar.drive(25) # executes operations under 'if' based on self.on = True, with a distance = 25
newcar.miles # car state has now changed. We've added miles

25

In [25]:
# And used up some gas, since we decremented self.gas/tank_size!
newcar.gas

10.0

In [26]:
# What if we try to drive too far?
# A custom error!
newcar.drive(300) # violates gas_used <= self.gas in drive()-->executes 'else' operation

InsufficientGasError: You don't have enough gas!

In [27]:
# Fill up
newcar.fill_tank(1)

In [28]:
# All the gas
newcar.gas

11.0

### When would we needs this?
Luckily for us, between `pandas` and `sklearn`, most of the classes we need have already been built for us. But data scientists don't work in a vacuum! Here are some examples of times where building your own class is the right thing to do:

#### Whenever you want to bundle your code into a **package**.
It's true that you can define functions that can be `import`ed, but it's not very _Pythonic_. True Pythonistas will build related tools into classes that can be shared amongst coworkers. **If you set this up properly, you can even have them be `pip install`able from either a private or public Git repository!** Think about all of the different libraries we've used so far. You know this pattern to be true!

#### Whenever you want to "build once, run many times later."
Imagine a complicated task, such as connecting to a server and executing code on it. These tasks typically have a lot of rote boilerplate code that you'd want to automate. For example, check out this fantasy code you might write for connecting to a SQL server:

```python
conn = SQLServer("12.34.56.78")
conn.connect()
conn.login("tim", "p@ssw0rd1!")
conn.execute("SELECT name, age FROM users")
conn.close()
```

#### Unit Testing
Most of Python's unit testing capabilities require you to build classes, where each method is an individual suite of tests.

> **Unit testing** is a type of automated testing you can do to ensure that minor changes you make to your code don't fundamentally change what your code is doing.

#### Sometimes you literally just _need_ to.
There are actually a few data science packages that force you to build a class in order to use them properly. Specifically these two:

![](imgs/scrapy.png)
![](imgs/pytorch.png)

* **PyTorch** - A popular deep learning library. Second only in popularity to TensorFlow/Keras and gaining.
* **Scrapy** - A heavy-duty webscraping library, much more powerful than BeautifulSoup.

## Conclusions and Takeaways
* OOP is a really cool coding paradigm that takes some getting used to.
* OOP is easy to use and write, but code can be pretty long sometimes.
* OOP can serve to really clean you code up and make it easier to read.
* We won't _need_ to build classes very often, but we should definitely do it more!
* Let us all be more OO.

- Leo's advice: think of the architecture of your entire code base as a hierarchy. 
- If you are writing multiple functions that are related to each other and have to share values of variables between them, it might be a good idea to group these related functions into a class.
- Taking it one step further, if you have multiple classes that are related to each other and it makes sense to group them together, then create a separate module (a `.py` file) with a descriptive name and put all these classes inside that file. You can import the classes into your script as if they were a package!
- Taking the concept one step further, if you have a large code base of many modules that are related to each other, put them all into a folder and add some wrapper code to create your own package and share it with the community on GitHub! Who knows? you might create the next sklearn or pandas in your domain!