### Python: An Object-Oriented Programming Language

If you're an IBM data scientist, you will be familiar with a suite of tools for data analysis, including Excel, R, SAS, or Python. If you're coming straight from school, though, or have a background as a business analytics or data analyst, you may not be familiar with the concept of object-oriented programming. 

Python is an object-oriented programming (OOP) language, and the sooner you begin treating it as such, using its full capabilities in accordance with best practices, the sooner you'll become a flexible, powerful data scientist embedded within an IBM team. OOP is the reason that python, unlike R, SAS, or Excel, can do many things other than data analysis, and integrates well within, say, a team devoted to maintaining a web-based application, or within a cloud-based infrastructure. 

OOP means that pythonic code is best encapsualated within one of the following: classes, objects (or "class instances"), methods, and class attributes. Of these, the central concept is the class, and the sooner you become used to programming with classes, the better. 

### What is a class? 

A class is a blueprint. When you "instantiate" a class, you create a new copy of that class called an object or "class instance," often with methods (functions defined within the blueprint) and attributes. 

The following code defines a very simple class called SimpleClass, and the __init__ function, which is typically the first function defined within in the class, tells us how to create, or "instantiate", the class from that blueprint. In this case, we'll require a dataframe to instantiate the class, and then store that dataframe as an attribute of that class instance. 

*Note the documentation style, by the way. It may seem like overkill for code so simple, but clear documentation becomes critical when working within larger teams and on complex projects. Develop the habit of clearly documenting your code while you write it and both your teammates and future self will thank you.* 

In [1]:
import pandas as pd

class SimpleClass:
    """
    SimpleClass defines a blueprint for a basic class function to hold a DataFrame. 
    """
    def __init__(self, df = pd.DataFrame):
        """
        This function initiates the class, taking a Pandas dataframe as its argument.

        Parameters
        -------
        df : pd.DataFrame
            A dataframe defined prior to instantiation

        Returns
        -------
            SimpleClass, a class instance which holds a dataframe. 
        """
        self.df = df
        
    def print_df(self, num_times = int) -> None: 
        """ 
        This function prints the df attributes a number of times to the console equal to 
        num_times
        
        Parameters
        -------
        num_times : int
            The number of times you want to print the dataframe df.

        Returns
        -------
        None - it prints the df num_times to the console 
        """
        for i in range(num_times): 
            print(self.df)

In [2]:
# Define a simple Pandas dataframe to store in the simple class

df = pd.DataFrame(
    {
        "customer_id": [1, 2, 3, 4, 5, 6, 7],
        "amount": [1.00, 1.31, 20.5, 0.5, 0.2, 0.2,1.2]
    }
)

# Print the dataframe

print(f"Printing the datataframe we constructed:\n{df}")

# Instantiate the SimpleClass, using df as the argument for the __init__ function

simpleclass = SimpleClass(df)

# print the SimpleClass dataframe 

print(f"Printing the class attribute of simpleclass called df:\n",simpleclass.df)

# Call the print_df method to print simpleclass.df three times

print(f"Calling the print_df method to print simpleclass.df three times: \n")
simpleclass.print_df(3)

Printing the datataframe we constructed:
   customer_id  amount
0            1    1.00
1            2    1.31
2            3   20.50
3            4    0.50
4            5    0.20
5            6    0.20
6            7    1.20
Printing the class attribute of simpleclass called df:
    customer_id  amount
0            1    1.00
1            2    1.31
2            3   20.50
3            4    0.50
4            5    0.20
5            6    0.20
6            7    1.20
Calling the print_df method to print simpleclass.df three times: 

   customer_id  amount
0            1    1.00
1            2    1.31
2            3   20.50
3            4    0.50
4            5    0.20
5            6    0.20
6            7    1.20
   customer_id  amount
0            1    1.00
1            2    1.31
2            3   20.50
3            4    0.50
4            5    0.20
5            6    0.20
6            7    1.20
   customer_id  amount
0            1    1.00
1            2    1.31
2            3   20.50
3       

Ensure that you understand every line of the code above. Try the following steps to adapt the class and make it your own. You will likely need to look on the Internet for new python packages to import for at least some of these: 

1. Add a new argument to the init function that stores the curent time at insantiation as an instance attribute and call it with the function 
2. Create a new dataframe and use it to create a new class instance. 
3. Add a new method to the class, add_column, that takes a numpy array as an argument and adds it as a column to the simpleclass.df . You can use this column, or create your own: *product_id = [34,23,54,76,43,23,45]*

### OOP Principles 

You've been introduced to the basic idea and syntax of a class and function. These concepts will become routine, even ingrained as you progress as a data scientist. Just keep reminding yourself to stick to OOP, and avoid the lazy but ultimately much less flexible and scalable route of developing one-off scripts for analysis, rather than packaging analyses into functions and classes. 

In the long-term, these are some of the benefits of sticking to OOP: 

#### 1. Encapsulation. 

This means that all important information is contained inside an object and only select information is exposed. The implementation and state of each object are privately held inside a defined class. Other objects do not have access to this class or the authority to make changes. They are only able to call a list of public functions or methods. 

*Why does it matter?* 

This features improves security, prevents objects from being changed without permission), and allows for objects to be more easily exported throughout an application with attributes intact. 

#### 2. Abstraction. 

Objects only reveal internal mechanisms that are relevant for the use of other objects, hiding any unnecessary implementation code. The derived class can have its functionality extended. 

*Why does it matter?* 

This concept can help developers more easily make additional changes or additions over time. If you need to add a simple method or attribute to class or class instance, you don't have to rewrite everythng from scratch. 

#### 3. Inheritance 

This is a more advanced topic, but suffice to say taht classes can reuse code from other classes, within some classes ("subclasses") defined as more specialized blueprints within larger classes ("superclasses"). You might envision this as one class giving an blueprint for creating a car (engine, four wheels, windshield etc.), with multiple sub-classes giving blueprints for cars of different styles, engine types, etc. Relationships and subclasses between objects can be assigned, enabling developers to reuse common logic while still maintaining a unique hierarchy. 

*Why does it matter?* 

As with abstraction, this concept saves develpment time. It also forces a more thorough data analysis, reduces development time and ensures a higher level of accuracy.

#### 4. Polymorphism 

Objects are designed to share behaviors and they can take on more than one form. The program will determine which meaning or usage is necessary for each execution of that object from a parent class, reducing the need to duplicate code. A child class is then created, which extends the functionality of the parent class. Polymorphism allows different types of objects to pass through the same interface.

*Why does it matter?* 

Polymorphism is a fancy word for "flexibility." You'll find, as a data scientist, that once you've created the analysis part of your code, increasing its flexibility is the key to scaling it, and polymorphism helps. 

#### Advanced Topics

There are any number of advanced topics I could cover for OOP, but here I'll just address a few that often trip new developers up. I'll keep adding to these as I continue to teach the class. 

#### Decorators

The first is **decorators**. Seeing an @ in the codebase seems to freak a lot of new data scientists out, but there's a simple explanation for it. A decorator function is a function that adds functionality to a pre-defined function. Ussually, they're relatively simple, for logging or aesthetic purposes, hence the name.

As above with the concept of a class, I'll show the syntax for a very simple example, and you can take it from there. 

In [3]:
#Importing datetime to tell current time 
from datetime import datetime

#Defining the decorator function
def display_info(func):
    def inner(): 
        # Adding the functionality 
        print("You executed the function", func.__name__, "at", datetime.now().strftime("%H:%M:%S"))
        func()
        print("Finished execution")
    return inner

# Apply a decorate function with the @
@display_info
# Define the function to be decorated
def printer() -> None: 
    """
    A simple printing function

        Parameters
        -------
        None

        Returns
        -------
        None - prints a message to the console
   """
    print("I'm running the printer function now.")
    
printer()

You executed the function printer at 12:50:35
I'm running the printer function now.
Finished execution


#### Class attributes versus instance attributes 

Class attributes are part of the blueprint. They are *not* defined with the "self." prefix in the definition of the class. If you change a class attribute and reinstantiate the class, the class attribute will stay the same and be reflected in the new instance. 

Instance attributes are part of the instance, i.e., an object created from the blueprint. They *are* defined with the "self." prefix and in the __init__ function. If you chance an instance attribute, it will only be reflected in that instance, and no in a new instnace of the class. 

For example, let's alter the SimpleClass slightly to add an instance attribute and a class attribute: 

In [12]:
import pandas as pd

class SimpleClass:
    """
    SimpleClass defines a blueprint for a basic class function to hold a DataFrame. 
    """
    
    class_attribute = "This is the class secret."
    
    def __init__(self, df = pd.DataFrame):
        """
        This function initiates the class, taking a Pandas dataframe as its argument.

        Parameters
        -------
        df : pd.DataFrame
            A dataframe defined prior to instantiation

        Returns
        -------
            SimpleClass, a class instance which holds a dataframe. 
        """
        self.df = df
        
        self.instance_attribute = "This is specific to the instance."
        
        
        
    def print_df(self, num_times = int) -> None: 
        """ 
        This function prints the df attributes a number of times to the console equal to 
        num_times
        
        Parameters
        -------
        num_times : int
            The number of times you want to print the dataframe df.

        Returns
        -------
        None - it prints the df num_times to the console 
        """
        for i in range(num_times): 
            print(self.df)

In [13]:
# Instantiating the class: 
simple_class1 = SimpleClass(df)
print(f"This is the instance attribute for simple_class1:\n {simple_class1.df}")
print(f"This is the class attribute for simple_class1:\n {simple_class1.class_attribute}")

This is the instance attribute for simple_class1:
    customer_id  amount
0            1    1.00
1            2    1.31
2            3   20.50
3            4    0.50
4            5    0.20
5            6    0.20
6            7    1.20
This is the class attribute for simple_class1:
 This is the class secret.


To show how class and instance attributes change, let's define a new instance attribute and a new class attribute for both df and class_attribute, using simple_class1. Then let's instantiate a new class, simple_class2, and see which change persisted through a new instantiation: 

In [21]:
# Changing df
simple_class1.df = pd.DataFrame(
    {
        "customer_id": [8, 9, 10, 11, 12, 13, 14],
        "amount": [1.00, 1.31, 20.5, 0.5, 0.2, 0.2,1.2]
    }
)

# Changing class attribute using SimpleClass
SimpleClass.class_attribute = "The class secret has been changed!"

print(f"This is the instance attribute for simple_class1:\n {simple_class1.df}")
print(f"This is the class attribute for simple_class1:\n {simple_class1.class_attribute}")

This is the instance attribute for simple_class1:
    customer_id  amount
0            8    1.00
1            9    1.31
2           10   20.50
3           11    0.50
4           12    0.20
5           13    0.20
6           14    1.20
This is the class attribute for simple_class1:
 The class secret has been changed!


In [22]:
# Instantiating a new class instance: simple_class2
simple_class2 = SimpleClass(df)

print(f"This is the instance attribute for simple_class2:\n {simple_class2.df}")
print(f"This is the class attribute for simple_class2:\n {simple_class2.class_attribute}")

This is the instance attribute for simple_class2:
    customer_id  amount
0            1    1.00
1            2    1.31
2            3   20.50
3            4    0.50
4            5    0.20
5            6    0.20
6            7    1.20
This is the class attribute for simple_class2:
 The class secret has been changed!


The most likely place you'll run into this when you're trying to change an instance attribute or class attribute and you're using the opposite syntax. Just be mindful of where you're defining attributes and whether you're using __self.__ prefix or not, and you'll be fine. 

#### Inheritance 

Finally, I did briefly want to touch on the syntax for inheritance. As I noted above, inheritance is incredibly useful for classes. 

Here is the basic syntactical pattern for a "super-class" and sub-class: 

__class BaseClass:__
      
      Code defining base class
      
__class DerivedClass(BaseClass):__
      
      Body of derived class

For example, let's use the class SimpleClass we've been working with, and create a derived class called ComplexClass. It will contain a new function that prints the column df with the column order and row order reversed: 

In [41]:
class ComplexClass(SimpleClass): 
    def __init__(self,df): 
        SimpleClass.__init__(self,df)
        
    def reverse_df(self,num_times = int) -> None: 
        """ 
        This function reverses the df attribute of the instance
        
        Parameters
        -------
        num_times : int
        The number of times you want to print the dataframe df.

        Returns
        -------
        None - it reverses the df and prints the reversed df num_times to the console
        """
        # Defining self.reversed_df as the reverse of self.df
        self.reversed_df = self.df.loc[::-1,::-1]
        # Printing it num_times to the console
        for i in range(num_times): 
            print(self.reversed_df)

In [42]:
# Instantiating the new complex class
complexclass = ComplexClass(df)

# Executing the reverse_df method
complexclass.reverse_df(2)

   amount  customer_id
6    1.20            7
5    0.20            6
4    0.20            5
3    0.50            4
2   20.50            3
1    1.31            2
0    1.00            1
   amount  customer_id
6    1.20            7
5    0.20            6
4    0.20            5
3    0.50            4
2   20.50            3
1    1.31            2
0    1.00            1


And that's enough of OOP for now! Developing OOP instincts is an important part of becoming a data scientist in any complex enterprise. Keep developing those skills and pushing yourself to package python within classes and functions, and you'll find the transition from a junior to a senior developer much easier. 