# 04: Introduksjon til objekter i Python

**Forfatter:** Benedikt Goodman \
**Medhjelpere:** Mistral Large, ChatGPT-4

Som vi snakket om i leksjon 1 så er alt i Python et objekt. Nå skal vi lære om hvordan vi kan lage objekter, når det kanskje er en god ide så skal vi sveipe innom en del andre nøkkelkonsepter innen objektorientert programmering. Som vanlig har jeg skrevet dette på engelsk med hjelp av både Mistral Large og ChatGPT-4.

## Outline
1. A short note on object oriented programming (OOP)
2. The TLDR on how OOP works
3. Defining a class
4. Class attributes and methods
    1. Making an object from a class blueprint
    2. What's the difference between a class variable and an instance variable?
    3. The different methods available to a class
    4. Classmethods
    5. Staticmethods
5. The `.self` parameter
6. Inheritance and Polymorphism
7. A word of caution
8. Why use composition?
9. A real-world example of OOP
10. Exercises


## A short note on object oriented programming

Object-Oriented Programming (OOP) is a programming paradigm that uses `objects` and their interactions to design applications and computer programs. The prime benefits of OOP are code reusability, modularity and overall tidyness (if done correctly). The primary drawbacks is that it can allow us programmers to overengineer programs and make them tightly coupled, so they become impossible to make changes to. 

## The TLDR about how OOP works

In Python, a `class` is a *blueprint* for creating `objects`. An `object` is an `instance` of a `class`. `attributes` are the `properties` of the object, these store the different types of `data` attached to the object object. `methods` (functions) are the actions that the object can perform.

Here is a list of definitions you can return back to:
- `class`: Bluebrint for creating an object
- `object`: A created instance of a class
- `attribute/property`: Data related to the class stored inside it
- `method`: A function tied to a class

## Defining a Class

A class is defined using the class keyword. Here's an example of a simple class:

In [None]:
class SomeClass:
    pass

## Class Attributes and Methods

Attributes are variables that belong to a class or an instance of a class. Methods are functions that belong to a class. As with anything in Python, variables inside a class can be anything.

In [None]:
# This is our blueprint
class MyClass:
    # Class attribute, handy when you want to share a given variable across all instances of a class
    class_attribute = "I'm a class attribute"

    def __init__(self, instance_var_1, instance_var_2):
        # Instance attribute
        self.instance_var_1 = instance_var_1
        self.instance_var_2 = instance_var_2

    # Class method collects class variables and makes them accessable in a class instance
    @classmethod
    def class_method(cls):
        return f"I'm a class method. {cls.class_attribute}"

    # Instance method
    def instance_method(self):
        return f"I'm an instance method. {self.instance_var_1} rules"

    # Instance method with external arguments
    def can_take_external_arguments(self, arg1, kwarg1=None):
        output = [
            f"I'm also an instance method. I have access to {self.instance_var_1}",
            f"and {self.instance_var_2}. External vars arg1: {arg1}, kwarg1: {kwarg1}",
        ]
        return " ".join(output)

    def can_trigger_other_instance_methods(self, arg, kwarg=None):
        print(self.instance_method())
        print(self.can_take_external_arguments(arg, kwarg))
    
    # staticmethods do not require the object to be initialised to be called
    @staticmethod
    def static_method(arg, kwarg):
        print(
            "I dont need an instance of the class to be created.",
            f"{arg} is overrated, I always preferred {kwarg}"
            )
        


### Making an object from a class blueprint

In [None]:
import pandas as pd

# Making an object out of the blueprint
obj = MyClass('Snoop Dogg', 'Dr. Dre')

# Doesnt this look eerily similar to something you should be familiar with by this point?
df = pd.DataFrame({'a': [1,2,3],'b': [4,5,6]})

In [None]:
# Calls on variable (property) stored inside the created object, these are what we call instance attributes/properties ...or just an instance variable
obj.instance_var_1

In [None]:
# Fetches the class attribute
obj.class_attribute

### Whats the difference between a class variable and an instance variable?

In Python, both class variables and instance variables are used to store data within a class, but they behave differently.

A `class variable` is a variable that is shared by all instances of a class. It is defined directly within the class, outside of any methods. Changes made to a class variable will be reflected in all instances of the class.

An `instance variable`, on the other hand, is a variable that is unique to each instance of a class. It is defined within a method and is usually initialized in the __init__ method. Changes made to an instance variable will only affect the specific instance and not other instances.

In [None]:
# Let's make a new object
other_obj = MyClass("Cookie Monster", " Hermit the Frog")

# It's got the same class attribute as the first object
print(other_obj.class_attribute)

# But these are not connected across classes unless you change them in the blueprint
other_obj.class_attribute = ["New", "class", "attribute"]

print(
    "obj class attribute:",
    obj.class_attribute,
    "\nother_obj class attribute",
    other_obj.class_attribute,
)


In [None]:
print(
    "obj instance_var1:",
    obj.instance_var_1,
    "\nother_obj instance_var1:",
    other_obj.instance_var_1,
)

### The different methods available to a class

In [None]:
# This is the instance method. It is bound to an object and has access to an object's instance variables (aka the self. variables)
obj.instance_method()

In [None]:
other_obj.instance_method()

### Classmethods
The classmethod can access class variables. These stay the same across all instances of a class

In [None]:
obj.class_method()

In [None]:
other_obj.class_method()

### Staticmethods

Staticmethods are somewhat special in that they dont need an object to be initalised in order to work.
I like to use this when I create classes which have methods used by another class. This is known as the "utility class" design pattern ...more on that later

In [None]:
# Notice lack of () behind MyClass
MyClass.static_method('Britney Spears', kwarg=['Jenifer Lopez'])

In Python, a `staticmethod` is a method that belongs to a class rather than an instance of a class. It can be called without creating an instance of the class, and it doesn't have access to the class's instance variables or methods.

There are several reasons why you might want to use a `staticmethod`:

1. **Utility Functions**: If you have a function that's related to the class but doesn't need access to the class's instance variables or methods, you can make it a static method. This is often used for utility functions that perform a specific task but don't need to interact with the class's data. I normally group related utility functions together. That way others can tellll which functions belong together more easily. For example, let's say you have 5 functions that all do operations on dictionaries, Why not group these together and then call the class DictUtilies? 

2. **Code Organization**: Static methods can help organize your code by grouping related functions within a class, even if those functions don't need access to the class's data. This can make your code easier to read and understand.

3. **Improved Performance**: Because static methods don't have access to the class's instance variables or methods, they can be slightly faster to call than regular instance methods. This is because Python doesn't need to pass the instance of the class (self) to the method.


## The self Parameter

The self parameter is a reference to the current instance of the class and is used to access variables and methods from the class. 

In [None]:
class Person:
    def __init__(self, name):
        self.name = name

    def greet(self):
        print(f"Hello, I'm {self.name}!")
    
    def handshake(self):
        print(f'{self.name} shakes your hand.')
    
    # What do you think will happen here?
    def greet_and_shake_hand_error():
        self.greet()
        self.handshake()
        
    def greet_and_shake_hand(self):
        self.greet()
        self.handshake()    

In [None]:
person = Person('Vincent Adultman')

person.greet()
person.handshake()

In [None]:
person.greet_and_shake_hand_error()

In [None]:
person.greet_and_shake_hand()

## Inheritance and Polymorphism

Inheritance is a way to form new classes using classes that have already been defined. The new classes, known as derived classes, inherit attributes and behavior from the pre-existing classes, which are referred to as base classes. This is both a very powerful feature, and the source of why objects can sometimes be a pain in the butt to debug.

The fact that objects can inherit from each other mean that take on many forms and variations if you want them to. This concept is known as `polymorphism`. The most common use of polymorphism in OOP occurs when a parent class reference is used to refer to a child class object.

Here is what it looks like in practice

In [None]:
class SsbEmployee(Person):
    def __init__(self, name, division):
        super().__init__(name)
        self.division = division
        self.coffee_consumption = 0  # tracks cups of coffee consumed

    def report_stats(self):
        print(f"{self.name} reports: 'Did you know 87.3% of statistics are made up on the spot?'")

    def drink_coffee(self):
        # Increase the coffee consumption count
        self.coffee_consumption += 1
        print(f"{self.name} drinks their {self.coffee_consumption} cup of coffee. Ah, the fuel of statistics!")

    def attend_meeting(self, topic):
        # Simulates attending a meeting
        print(f"{self.name} is attending a meeting about {topic}. They are definitely not sleeping with their eyes open.")

    def refactor_code(self):
        # Simulates a Python code refactoring session
        print(f"{self.name} is refactoring code... Found a way to replace a perfectly working 50-line function with a 120-line hardcoded mess. Satisfying :)")

In [None]:
# Lets make an object
benny = SsbEmployee('Benedikt', 'National Accounts')

# Benny has access to the methods of class Person
benny.greet_and_shake_hand()


In [None]:
# But he also has access to the new methods and attributes the SsbEmployee object
benny.attend_meeting('GNI-inventories')

In [None]:
benny.refactor_code()

## A word of caution about using inheritance too much

Inheritance is a fundamental concept in object-oriented programming where a class can inherit properties and behaviors (methods) from another class. This helps in creating a new class with modified or additional features without rewriting the code from the first class. Though useful, over-using inheritance can lead to several issues:

1. **Tight Coupling**: When classes are tightly coupled, they depend heavily on each other's functionality. Changes in the parent class can inadvertently affect all child classes, which might lead to bugs that are hard to trace.

2. **Fragile Base Class Problem**: If a base (parent) class is changed, all derived (child) classes may need to be modified. This can make the code fragile and unstable, as changes in the base class can propagate bugs to various parts of an application that seemed unrelated.

3. **Complexity and Readability**: Excessive use of inheritance can make the code more complex and harder to read. New developers, or even the original developers coming back to the code after some time, may find it difficult to follow the hierarchy and relationships between classes.

4. **Inappropriate Abstractions**: Sometimes, inheritance is used to share code between classes, not because they conceptually share a parent-child relationship, but just because they share some functionalities. This misuse can lead to inappropriate designs and confusing abstractions.

## Why using composition is much better

Composition is an alternative to inheritance. This principle allows creating classes that contain objects of other classes, representing a "has-a" relationship rather than an "is-a" relationship as in inheritance. This method is often more flexible as it allows changing the composed objects in a script instead and reduces the links between components. This reduces the chance for bugs spreading as methods and attributes arent shared as directly compared to when using inheritance.

In [None]:
class Weapon:
    def use(self):
        print("Using weapon")


class Sword(Weapon):
    def use(self):
        print("Swinging a sword")


class Bow(Weapon):
    def use(self):
        print("Shooting an arrow")


class Character:
    def __init__(self, weapon):
        self.weapon = weapon

    def attack(self):
        self.weapon.use()


# Usage, note how it's functionally almost the same as using inheritace, but our code is now a lot less coupled
archer = Character(Bow())  # Archer with a bow
knight = Character(Sword())  # Knight with a sword

archer.attack()  # Outputs: Shooting an arrow
knight.attack()  # Outputs: Swinging a sword


## A real-world example

I've previously said that making a utility class and then have an implementing class could be a great way of grouping similar functions together. Here is an example which resembles some of the code that's written for our common functions and objects in our division so far. The pattern we use here is that we first define a so-called "utility class" which has a set of staticmethods which are then implemented by another object which needs to apply these in a specific sequence.

In [None]:
import pandas as pd


class DataFrameManager:
    @staticmethod
    def top_items_by_column(
        dataframe: pd.DataFrame, column_to_sum: str, top_n: int = 5
    ) -> pd.DataFrame:
        """
        Return the top items based on a sum of a specified column.

        Parameters
        ----------
        dataframe : pd.DataFrame
            The input DataFrame.
        column_to_sum : str
            The column to sum.
        top_n : int, optional
            The number of top items to return, by default 5.

        Returns
        -------
        pd.DataFrame
            The DataFrame with the top items.

        Raises
        ------
        ValueError
            If the specified column does not exist in the DataFrame.
        """
        if column_to_sum in dataframe.columns:
            result = (
                dataframe.groupby(column_to_sum).sum().nlargest(top_n, column_to_sum)
            )
            return result
        else:
            raise ValueError(f"Column {column_to_sum} does not exist in the DataFrame")

    @staticmethod
    def aggregate_by_column(
        dataframe: pd.DataFrame, group_by_column: str, agg_column: str, agg_func: str
    ) -> pd.DataFrame:
        """
        Aggregate data by a specified column using a specified aggregation function.

        Parameters
        ----------
        dataframe : pd.DataFrame
            The input DataFrame.
        group_by_column : str
            The column to group by.
        agg_column : str
            The column to aggregate.
        agg_func : str
            The aggregation function to use.

        Returns
        -------
        pd.DataFrame
            The DataFrame with the aggregated data.

        Raises
        ------
        ValueError
            If one or more columns specified do not exist in the DataFrame.
        """
        if group_by_column in dataframe.columns and agg_column in dataframe.columns:
            return dataframe.groupby(group_by_column)[agg_column].agg(agg_func)
        else:
            raise ValueError(
                "One or more columns specified do not exist in the DataFrame"
            )

    @staticmethod
    def resample_data(
        dataframe: pd.DataFrame, date_column: str, freq: str, agg_dict: dict
    ) -> pd.DataFrame:
        """
        Resample data based on a date column and a frequency, applying specified aggregation functions.

        Parameters
        ----------
        dataframe : pd.DataFrame
            The input DataFrame.
        date_column : str
            The date column.
        freq : str
            The resampling frequency.
        agg_dict : dict
            The dictionary of aggregation functions to apply.

        Returns
        -------
        pd.DataFrame
            The resampled DataFrame.

        Raises
        ------
        ValueError
            If the date column does not exist in the DataFrame.
        """
        if date_column in dataframe.columns:
            dataframe[date_column] = pd.to_datetime(dataframe[date_column])
            dataframe.set_index(date_column, inplace=True)
            resampled_data = dataframe.resample(freq).agg(agg_dict)
            return resampled_data
        else:
            raise ValueError(
                f"Date column {date_column} does not exist in the DataFrame"
            )

    @staticmethod
    def add_data(dataframe: pd.DataFrame, new_data: pd.DataFrame) -> pd.DataFrame:
        """
        Append new data to a DataFrame and reset index.

        Parameters
        ----------
        dataframe : pd.DataFrame
            The input DataFrame.
        new_data : pd.DataFrame
            The new data to add.

        Returns
        -------
        pd.DataFrame
            The updated DataFrame.
        """
        new_df = pd.DataFrame(new_data)
        return pd.concat([dataframe, new_df], ignore_index=True)


Let's implement the staticmethods from the class above in another class where methods are chained together in specific ways

In [None]:



class DataManager:
    def __init__(self, df: pd.DataFrame, df_manager: DataFrameManager):
        """
        Initialize the DataManager.

        Parameters
        ----------
        df : pd.DataFrame
            The input DataFrame.
        df_manager : DataFrameManager
            The DataFrameManager instance.
        """
        self.data = df
        self.df_manager = df_manager

    def get_top_items_and_aggregate(
        self,
        column_to_sum: str,
        group_by_column: str,
        agg_column: str,
        agg_func: str,
        top_n: int = 5,
    ) -> pd.DataFrame:
        """
        Combine top_items_by_column and aggregate_by_column methods.

        Parameters
        ----------
        column_to_sum : str
            The column to sum.
        group_by_column : str
            The column to group by.
        agg_column : str
            The column to aggregate.
        agg_func : str
            The aggregation function to use.
        top_n : int, optional
            The number of top items to return, by default 5.

        Returns
        -------
        pd.DataFrame
            The DataFrame with the aggregated top items.
        """
        top_items = self.df_manager.top_items_by_column(self.data, column_to_sum, top_n)
        aggregated_result = self.df_manager.aggregate_by_column(
            top_items, group_by_column, agg_column, agg_func
        )
        return aggregated_result

    def resample_and_update_data(
        self, date_column: str, freq: str, agg_dict: dict, new_data: pd.DataFrame
    ) -> pd.DataFrame:
        """
        Combine resample_data and add_data methods.

        Parameters
        ----------
        date_column : str
            The date column.
        freq : str
            The resampling frequency.
        agg_dict : dict
            The dictionary of aggregation functions to apply.
        new_data : pd.DataFrame
            The new data to add.

        Returns
        -------
        pd.DataFrame
            The updated DataFrame.
        """
        resampled_data = self.df_manager.resample_data(
            self.data, date_column, freq, agg_dict
        )
        updated_data = self.df_manager.add_data(resampled_data, new_data)
        self.data = updated_data  # Update the internal DataFrame
        return updated_data


### An explanation to what the code above does

**DataFrameManager Class**

This class contains several static methods that perform various operations on pandas DataFrames. These methods are utility functions that do not depend on the state of any instance of the class (self is not used). Each method takes a DataFrame as an input, performs operations on it, and returns a result:

- **top_items_by_column**:
    - **Purpose**: Finds the top n items from a DataFrame based on the sum of a specified column.
    - **How it works**: It groups the DataFrame by a specified column, sums the values, and then selects the top n items based on these sums. If the specified column doesn't exist, it raises a ValueError.

- **aggregate_by_column**:
    - **Purpose**: Aggregates data in the DataFrame based on a specified column using a given aggregation function (e.g., sum, mean).
    - **How it works**: It groups the DataFrame by one column and then applies the aggregation function to another column. If any of the specified columns don't exist, it raises a ValueError.

- **resample_data**:
    - **Purpose**: Resamples time-series data in the DataFrame based on a date column to a specified frequency, applying given aggregation functions.
    - **How it works**: It converts the specified column to a datetime type, sets it as the index, and then resamples the DataFrame to the given frequency applying the specified aggregations. If the date column is missing, it raises a ValueError.

- **add_data**:
    - **Purpose**: Adds new data to an existing DataFrame.
    - **How it works**: It creates a new DataFrame from the given data and concatenates it with the existing DataFrame, resetting the index to maintain continuity.

**DataManager Class**

This class is designed to manage a DataFrame with more complex operations using methods from the DataFrameManager class:

- **Constructor (__init__)**:
    - **Purpose**: Initializes a DataManager instance with a pandas DataFrame and an instance of DataFrameManager.
    - **How it works**: Stores the DataFrame and DataFrameManager instance as attributes.

- **get_top_items_and_aggregate**:
    - **Purpose**: Combines the functionalities of top_items_by_column and aggregate_by_column to first find top items based on the sum of one column and then aggregate another column among these top items.
    - **How it works**: Calls top_items_by_column to get the top items and then aggregate_by_column to perform further aggregation on these top items.

- **resample_and_update_data**:
    - **Purpose**: Resamples the DataFrame to a specified frequency based on a date column and then updates the resampled DataFrame with new data.
    - **How it works**: Calls resample_data to adjust the DataFrame's time frequency, and then add_data to merge new data into this resampled DataFrame. It updates the DataManager's stored DataFrame with this new data.

**Interaction Between Classes**

- DataManager uses an instance of DataFrameManager (self.df_manager) to access the static methods for data processing. This design illustrates a dependency injection pattern where DataManager depends on DataFrameManager for data manipulation tasks.
- The methods of DataFrameManager are called statically but are accessed through the instance passed during initialization of DataManager. This approach provides flexibility in potentially using different configurations or subclasses of DataFrameManager.

## Exercises

Object oriented programming takes practice to master, so here are some exercises. Pick one of the exercises or all of them and try to solve them yourself (i.e. no usage of LLMs!)

1. **Create a simple calculator**: Create a Calculator class with methods for addition, subtraction, multiplication, and division. The class should take two numbers as inputs and return the result of the selected operation.
2. **Create a shape class hierarchy**: Create a base class Shape with methods for calculating area and perimeter. Then, create subclasses for specific shapes like Circle, Square, and Rectangle that inherit from the Shape class and implement their own area and perimeter calculations.
3. **Create a simple bank account**: Create a BankAccount class with methods for depositing and withdrawing money, and for checking the account balance. The class should also have a method for displaying the transaction history.

We'll go through the examples above in the next lecture.