## Object Oriented Programming

In python, an object (aka instance) is a way of combing into distinct unit of two software concept:
1. State
2. Behaviors

ex. **DataFrame** is an object from pandas library that contains a `groupby` method with behaviors of grouping rows by passed-on conditions of a given data

In [2]:
import pandas as pd

seattle_air = pd.read_csv("seattle_air.csv", index_col="Time", parse_dates=True)
seattle_air.groupby(seattle_air.index.year).count()

Unnamed: 0_level_0,PM2.5
Time,Unnamed: 1_level_1
2017,6283
2018,8540
2019,8597
2020,8683
2021,8664
2022,2292


### Reference Semantics

When we call `'groupby` method and then `count` the group, the object would a distinct self

Say, now we ask for seattle_air, we would expect ungrouped, uncounted version of the dataframe

In [3]:
seattle_air

Unnamed: 0_level_0,PM2.5
Time,Unnamed: 1_level_1
2017-04-06 00:00:00,6.8
2017-04-06 01:00:00,5.3
2017-04-06 02:00:00,5.3
2017-04-06 03:00:00,5.6
2017-04-06 04:00:00,5.9
...,...
2022-04-06 19:00:00,5.1
2022-04-06 20:00:00,5.0
2022-04-06 21:00:00,5.3
2022-04-06 22:00:00,5.2


However, one method that can modify the underlying dataframe object is the `dropna` method. 

In [4]:
seattle_air.dropna(inplace= True) #inplace = True allow the replacement of na to None
seattle_air
# seattle_air.columns('PM2.5' == None); call for rows with None

Unnamed: 0_level_0,PM2.5
Time,Unnamed: 1_level_1
2017-04-06 00:00:00,6.8
2017-04-06 01:00:00,5.3
2017-04-06 02:00:00,5.3
2017-04-06 03:00:00,5.6
2017-04-06 04:00:00,5.9
...,...
2022-04-06 19:00:00,5.1
2022-04-06 20:00:00,5.0
2022-04-06 21:00:00,5.3
2022-04-06 22:00:00,5.2


### Defining Classes
You can create your own custom objects by defining classes. A simple dataframe object blueprint has been provided in below for demo

In [5]:
class DataFrame:
    """Represents two-dimensional tabular data structured around an index and column names."""

    def __init__(self, index, columns, data):
        """Initializes a new DataFrame object from the given index, columns, and tabular data."""
        print("Initializing DataFrame")
        self.index = index
        self.columns = columns
        self.data = data

    def dropna(self, inplace=False):
        """"
        Drops all rows containing NaN from this DataFrame. If inplace, returns None and modifies
        self. If not inplace, returns a new DataFrame without modifying self.
        """
        print("Calling dropna")
        if not inplace:
            return DataFrame([...], [...], [...])
        else:
            self.columns = [...]
            self.index = [...]
            self.data = [...]
            return None

    def __getitem__(self, column_or_indexer):
        """Given a column or indexer, returns the selection as a new Series or DataFrame object."""
        print("Calling __getitem__")
        if column_or_indexer in self.columns:
            return "Series" # placeholder for series/ one dimensional array
        else:
            return DataFrame([...], [...], [...])

Notice how every method alwways takes `self` as the first parameter

Notice the `dunder method` _getitem_ is an overloading of built-in method within python
- Dunder method characteristics:
    1. double underscore naming scheme: __<name>__
    2. called through built-in python operator
    3. they can be overloaded by user customization
- Ex, \__init__\, which is the more crucial dunder method for a class writing; user define of initialization of instance variables within the class

In [20]:
# example usage of dataframe class that was written
example = DataFrame([0, 1, 2], ["PM2.5"], [10, 20, 30])
print(example["PM2.5"])
example["PM2.5"]
# expected output: 
    # the print statement at _init_
    # "print statement at _getitem_, twice
    # not __str__ object representation/ address of the value
    # 'Series', for we have only determine a one dimensional a array
# note: in the built-in python operation, __getitem__ is accessed by []

Initializing DataFrame
Calling __getitem__
Series
Calling __getitem__


'Series'

### Practice: `Student` class

Write a `Student` class that represents a UW student, where each student has a `name`, a student `number`, and a `courses` dictionary that associates the name of each course to a number of credits. The `Student` class should include the following methods:

- An initializer that takes the student number and the name of a file containing information about their schedule.
- A method `__getitem__` that takes a `str` course name and returns the `int` number of credits for the course. If the student is not taking the given course, return `None`.
- A method `get_courses` that returns a list of the courses the student is taking.

Consider the following file `nicole.txt`.

```
CSE163 4
PHIL100 4
CSE390HA 1
```

The student's `name` is just the name of the file without the file extension. The file indicates they are taking CSE163 for 4 credits, PHIL100 for 4 credits, and CSE390HA for 1 credit.

In [35]:

class Student:
    def __init__(self, number: int, filename: str) -> None:
        self.number = number
        self.name = filename.split('.')[0]  
        self.courses = {}

        with open(filename, 'r') as file:
            for line in file:
                course, credits = line.split()
                self.courses[course] = int(credits)

    def __getitem__(self, course: str) -> int | None: # Optional[int] or UNION[int, None] in older python version
        return self.courses.get(course)  # out getitem dunder utilizes dictionary built-in function get, which allows [] access of values
    
    def get_courses(self) -> list[str]:
        return list(self.courses.keys())  
    def __repr__(self) -> str:
        return f'(name:{self.name}, number: {self.number})'
    
nicole = Student(1234567, "nicole.txt")
for course in nicole.get_courses():
    print(course, nicole[course])

CSE163 4
PHIL100 4
CSE390HA 1


Optional, yet useful: type annotation. It is a writing habit that one can adopt to have clarified code with parameter and output type specified. [Type hints cheat sheet](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html). Some examples of type annotation has already been implemented at student class, but `University Class` practices it. 

### Practice: `University` class

Write a `University` class that represents one or more students enrolled in courses at a university. The `University` class should include the following methods:

- An initializer that takes the university name and, optionally, a list of `Student` objects to enroll in this university.
- A method `enrollments` that takes returns all the enrolled `Student` objects sorted in alphabetical order by student name.
- A method `enroll` that takes a `Student` object and enrolls them in the university.

In [34]:
!pip install -q nb_mypy # pacakage that checks type consistency; returns error when types are inconsistent
%reload_ext nb_mypy
%nb_mypy mypy-options --strict


Version 1.0.5


In [36]:


class University:
    def __init__(self, name: str, students: list['Student'] | None = None) -> None:
        self.name = name
        self.students = students.copy() if students else [] 

    def enrollments(self) -> list['Student']:
        return sorted(self.students, key=lambda s: s.name) # sort by name attribute using lambda function

    def enroll(self,student: 'Student') -> None:
        self.students.append(student)

uw = University("Udub", [nicole])
print(uw.enrollments()) # print(List of Student Objects)


[(name:nicole, number: 1234567)]


### Mutable default parameters
Without default parameter (name: str = 'Udub'), one can modify the the uni nam ein the university object of a student enrolled to. It can be pros and cons, depending on how you design your class

In [38]:
wsu = University("Wazzu")
wsu.enrollments()


[]

In [39]:
sea_u = University("SeaU")
sea_u.enrollments()

[]

In [40]:

sea_u.enroll(nicole)
sea_u.enrollments()

[(name:nicole, number: 1234567)]