# Classes and Object Oriented Programming in Python

This tutorial is a part of the [Zero to Data Analyst Bootcamp by Jovian](https://www.jovian.ai/data-analyst-bootcamp)

![](https://i.imgur.com/yBsPHnF.png)

Object-oriented programming (OOP) is a method of structuring programs into _objects_ that encapsulate _data_ and _functionality_. For examples, Numpy arrays and Pandas data frames are objects that contain data and offer methods to retrieve, manipulate and perform operations on the data stored within them. 

Python is an object oriented language, everything in Python is an object. Every object in Python is an _instance_ of a class. Classes are blueprints for creating objects. In this tutorial, we'll explore how to create new classes and objects in Python.

This tutorial covers the following topics:

- Defining classes and creating objects
- Class constructor, properties and methods
- Implementing "dunder" methods for easier usage
- Getters, setters, static methods & class methods
- Inheritance, overriding and abstract methods

### How to Run the Code

The best way to learn the material is to execute the code and experiment with it yourself. This tutorial is an executable [Jupyter notebook](https://jupyter.org). You can _run_ this tutorial and experiment with the code examples in a couple of ways: *using free online resources* (recommended) or *on your computer*.

#### Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing the code is to click the **Run** button at the top of this page and select **Run on Binder**. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on [Google Colab](https://colab.research.google.com) or [Kaggle](https://kaggle.com) to use these platforms.


#### Option 2: Running on your computer locally

To run the code on your computer locally, you'll need to set up [Python](https://www.python.org), download the notebook and install the required libraries. We recommend using the [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) distribution of Python. Click the **Run** button at the top of this page, select the **Run Locally** option, and follow the instructions.





In [None]:
!pip install jovian --upgrade --quiet

In [None]:
!pip install pandas --quiet

## Problem Statement - Implementing Pandas Data Frames from Scratch

To understand classes, we'll attempt to implement Pandas data frames from scratch in Python.

![](https://i.imgur.com/zfxLzEv.png)

Here's some of the functionality we'll try to replicate.

In [1]:
import pandas as pd

In [2]:
artists_data = {
    'Artist': ['Billie Holiday', 'Jimi Hendrix', 'Miles Davis', 'SIA'],
    'Genre': ['Jazz', 'Rock', 'Jazz', 'Pop'],
    'Listeners': [130000, 270000, 150000, 200000]
}

In [3]:
pandas_df = pd.DataFrame(artists_data)

In [4]:
pandas_df

Unnamed: 0,Artist,Genre,Listeners
0,Billie Holiday,Jazz,130000
1,Jimi Hendrix,Rock,270000
2,Miles Davis,Jazz,150000
3,SIA,Pop,200000


In [5]:
print(pandas_df)

           Artist Genre  Listeners
0  Billie Holiday  Jazz     130000
1    Jimi Hendrix  Rock     270000
2     Miles Davis  Jazz     150000
3             SIA   Pop     200000


In [6]:
pandas_df.shape

(4, 3)

In [7]:
len(pandas_df)

4

In [8]:
pandas_df.columns

Index(['Artist', 'Genre', 'Listeners'], dtype='object')

In [9]:
pandas_df['Artist']

0    Billie Holiday
1      Jimi Hendrix
2       Miles Davis
3               SIA
Name: Artist, dtype: object

In [10]:
pandas_df.loc[1]

Artist       Jimi Hendrix
Genre                Rock
Listeners          270000
Name: 1, dtype: object

In [11]:
pandas_df2 = pandas_df.copy()

In [12]:
pandas_df2.columns = ['Singer', 'Category', 'Followers']
pandas_df2

Unnamed: 0,Singer,Category,Followers
0,Billie Holiday,Jazz,130000
1,Jimi Hendrix,Rock,270000
2,Miles Davis,Jazz,150000
3,SIA,Pop,200000


In [13]:
for col in pandas_df:
    print(col)

Artist
Genre
Listeners


In [14]:
pandas_df.to_csv('artists.csv', index=None)

In [15]:
!cat artists.csv

Artist,Genre,Listeners
Billie Holiday,Jazz,130000
Jimi Hendrix,Rock,270000
Miles Davis,Jazz,150000
SIA,Pop,200000


## Defining classes and creating objects

A class is a blueprint for creating an object. Classes are defined using the `class` keyword. The _body_ of a class is an indented block of code that defines its functionality. Here's the simplest way of defining a class:

In [16]:
class DataFrame:
    pass

Note that the body contains just one statement `pass`, which does nothing i.e. the class has no functionality.

We can now create an object of the class by invoking the class like a function.

In [17]:
DataFrame()

<__main__.DataFrame at 0x7feb60969ac8>

We just created an object of the class `DataFrame`. However, we have to way to access the object. We can do so by creating a variable.

In [18]:
df1 = DataFrame()

The variable `df1` holds a reference the object, and can be used to retrieve the object.

In [19]:
df1

<__main__.DataFrame at 0x7feb60969518>

When we invoke `DataFrame()` again, it creates a new object.

In [20]:
df2 = DataFrame()

In [21]:
df2

<__main__.DataFrame at 0x7feb60969c18>

You can tell that the objects are different because they are at different addresses in the RAM (the address is the last portion of the output).

Note that we can have multiple variables pointing to the same object, simply by reassigning variables.

In [22]:
df3 = df2

In [23]:
df3

<__main__.DataFrame at 0x7feb60969c18>

`df2` and `df3` point to the same object, but `df1` points to a different object. More precisely, `df2` and `df3` point to the same _location in memory_, while `df1` points to a different _memory location_.

You can check if two variables point to the same object using the `is` operator, which compares the memory address of the two variables.

In [24]:
df1 is df2

False

In [25]:
df2 is df3

True

## Class constructor, properties and methods

Our data frame objects aren't doing much. They don't store any data or offer any functionality. Let's give the the ability to store some data.

We'll store a fixed dictionary in each object that's created, by defining a _constructor method_, which is executed automatically when an object is created.


In [26]:
class DataFrame:
    def __init__(self):
        self.data = {'a': [1] }

Note the following in the definition above:

- The double underscores in `__init__`
- The self argument passed to `__init__`, which will be set to the object that is created.
- Setting a property on `self` called `data`. We can name a property anything we wish (val, number, the_thing_inside etc. )

Let's create an object of this class.

In [27]:
df4 = DataFrame()

In [28]:
df4

<__main__.DataFrame at 0x7feb60981080>

We can now access the property `data` of `df4`.

In [29]:
df4.data

{'a': [1]}

Internally, what's happening is that Python first creates an empty object, stores the reference to the empty object in an temporary variable called `self`, calls the `__init__` function with `self` as the argument, which then sets the property `data` on the created object with the value `{'a': 1}`. Finally, the object is assigned to the variable `df4`.

In [30]:
self

NameError: name 'self' is not defined

We can not only access, but also change the value of the property `data`.

In [31]:
df4.data = {'b': [5]}

In [32]:
df4.data

{'b': [5]}

In [33]:
df4.data['c'] = 99

In [34]:
df4.data

{'b': [5], 'c': 99}

Note that every new object will contain it's own local copy of the `data` property.

In [35]:
df5 = DataFrame()
df5.data['b'] = [27]

In [36]:
df6 = DataFrame()
df6.data = { 'l': [44], 'm': [99]}

In [37]:
df4.data

{'b': [5], 'c': 99}

In [38]:
df5.data

{'a': [1], 'b': [27]}

In [39]:
df6.data

{'l': [44], 'm': [99]}

We can also set the initial value of the property while creating the object, by passing arguments to the constructor.

In [40]:
class DataFrame:
    def __init__(self, data):
        self.data = data

The value for the argument `data` can be passed while invoking `DataFrame` to create a new object.

In [41]:
df7 = DataFrame({ 'a': [1], 'b': [2], 'c': [3]})

In [42]:
df7.data

{'a': [1], 'b': [2], 'c': [3]}

Note that, we can no longer invoke `DataFrame` without arguments.

In [43]:
df8 = DataFrame()

TypeError: __init__() missing 1 required positional argument: 'data'

Let's define another property `columns`, which is set to the list of columns of the dataframe.

In [44]:
class DataFrame:
    def __init__(self, data):
        self.data = data
        self.columns = list(data.keys())

In [45]:
artists_data

{'Artist': ['Billie Holiday', 'Jimi Hendrix', 'Miles Davis', 'SIA'],
 'Genre': ['Jazz', 'Rock', 'Jazz', 'Pop'],
 'Listeners': [130000, 270000, 150000, 200000]}

In [46]:
df9 = DataFrame(artists_data)

In [47]:
df9.columns

['Artist', 'Genre', 'Listeners']

Next, let's define a method `get_column`, which retrieves the values in a given column.

In [48]:
class DataFrame:
    def __init__(self, data):
        self.data = data
        self.columns = list(data.keys())
        
    def get_column(self, col_name):
        return self.data[col_name]

In [49]:
df10 = DataFrame(artists_data)

In [50]:
df10.data

{'Artist': ['Billie Holiday', 'Jimi Hendrix', 'Miles Davis', 'SIA'],
 'Genre': ['Jazz', 'Rock', 'Jazz', 'Pop'],
 'Listeners': [130000, 270000, 150000, 200000]}

In [51]:
df10.get_column('Genre')

['Jazz', 'Rock', 'Jazz', 'Pop']

Note, that the `df10` is automatically passed as the `self` argument to `get_column`.

In fact, the above call is the same as:

In [52]:
DataFrame.get_column(df10, 'Genre')

['Jazz', 'Rock', 'Jazz', 'Pop']

Let's implement a method `get_row` which can be used to retrieve the row at a given position in the data frame, as a dictionary.

In [53]:
class DataFrame:
    def __init__(self, data):
        self.data = data
        self.columns = list(data.keys())
        
    def get_column(self, col_name):
        return self.data[col_name]
    
    def get_row(self, i):
        result = {}
        for col in self.columns:
            result[col] = self.data[col][i]
        return result

In [54]:
df11 = DataFrame(artists_data)

In [55]:
df11.data

{'Artist': ['Billie Holiday', 'Jimi Hendrix', 'Miles Davis', 'SIA'],
 'Genre': ['Jazz', 'Rock', 'Jazz', 'Pop'],
 'Listeners': [130000, 270000, 150000, 200000]}

In [56]:
df11.get_row(2)

{'Artist': 'Miles Davis', 'Genre': 'Jazz', 'Listeners': 150000}

Let's also add a `copy` method to easily create copies of data frames. We'll use the `copy` module to create a deep copy of the dictionary.

In [57]:
from copy import deepcopy

class DataFrame:
    def __init__(self, data):
        self.data = data
        self.columns = list(data.keys())
        
    def get_column(self, col_name):
        return self.data[col_name]
    
    def get_row(self, i):
        result = {}
        for col in self.columns:
            result[col] = self.data[col][i]
        return result
    
    def copy(self):
        data_copy = deepcopy(self.data)
        return DataFrame(data_copy)

In [58]:
df12 = DataFrame(artists_data)

In [59]:
df13 = df12.copy()

In [60]:
df13.data

{'Artist': ['Billie Holiday', 'Jimi Hendrix', 'Miles Davis', 'SIA'],
 'Genre': ['Jazz', 'Rock', 'Jazz', 'Pop'],
 'Listeners': [130000, 270000, 150000, 200000]}

Verify that the `data` in `df13` is indeed a copy, and modifying it won't affect the `data` in `df12`.

Our `DataFrame` class now contains the following functionality:

- A constructor method that can be used to pass a dictionary of data
- A `data` property that can be used to access the dictionary
- A `columns` property that can be used to get a list of columns
- A `get_column` method for getting the list of values in a column
- A `get_row` method for getting a row of data as a dictionary.
- A `copy` method for create a copy of the data frame.

> **EXERCISES**: Enhance the implementation of `DataFrame` to include the following:
> 
> 1. Ensure that `data` argument to the constructor is a dictionary, and that each value in the dictionary is a list of the same length. If these conditions are not satisfied, raise an exception.
> 2. Add a property `shape` which returns a tuple containing the number of rows and number of columns in the data frame
> 3. Add a property `get_element` which extract a single value from a data frame, given a column name and row index. 

Let's save our work before continuing.

In [None]:
jovian.commit()

## Implementing "dunder" methods for easier usage

Our implementation of `DataFrame` is shaping up well, however it still faces several limitations, which we'll discuss and address one by one in this section.

### String representation using `__str__` and `__repr__`

We can't view the contents of a `DataFrame` object in the same way we view the contents of a Pandas data frame.


In [61]:
pandas_df

Unnamed: 0,Artist,Genre,Listeners
0,Billie Holiday,Jazz,130000
1,Jimi Hendrix,Rock,270000
2,Miles Davis,Jazz,150000
3,SIA,Pop,200000


In [62]:
df12

<__main__.DataFrame at 0x7feb60a19e80>

In [63]:
print(pandas_df)

           Artist Genre  Listeners
0  Billie Holiday  Jazz     130000
1    Jimi Hendrix  Rock     270000
2     Miles Davis  Jazz     150000
3             SIA   Pop     200000


In [64]:
print(df12)

<__main__.DataFrame object at 0x7feb60a19e80>


We can add this by implementing the `__repr__` and `__str__` methods in the class. These are special methods in Python (also called "double underscore methods" or "dunder methods"). 

We'll use a helper library called `tabulate` to create a table-like output for out dataframe.

In [65]:
!pip install tabulate --quiet

In [66]:
from copy import deepcopy
from tabulate import tabulate

class DataFrame:
    def __init__(self, data):
        self.data = data
        self.columns = list(data.keys())
        
    def get_column(self, col_name):
        return self.data[col_name]
    
    def get_row(self, i):
        result = {}
        for col in self.columns:
            result[col] = self.data[col][i]
        return result
    
    def copy(self):
        data_copy = copy.deepcopy(self.data)
        return DataFrame(data_copy)
    
    def __repr__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __str__(self):
        return tabulate(self.data, self.columns, 'pretty')

What's the difference between `__str__` and `__repr__`? Look it up!

In [67]:
df14 = DataFrame(artists_data)

In [68]:
df14

+----------------+-------+-----------+
|     Artist     | Genre | Listeners |
+----------------+-------+-----------+
| Billie Holiday | Jazz  |  130000   |
|  Jimi Hendrix  | Rock  |  270000   |
|  Miles Davis   | Jazz  |  150000   |
|      SIA       |  Pop  |  200000   |
+----------------+-------+-----------+

In [69]:
print(df14)

+----------------+-------+-----------+
|     Artist     | Genre | Listeners |
+----------------+-------+-----------+
| Billie Holiday | Jazz  |  130000   |
|  Jimi Hendrix  | Rock  |  270000   |
|  Miles Davis   | Jazz  |  150000   |
|      SIA       |  Pop  |  200000   |
+----------------+-------+-----------+


Great, we now have a readable string representation of our data.

### Length using `__len__`

We can find the number of rows in a Pandas dataframe using the `len` function. 

In [70]:
pandas_df

Unnamed: 0,Artist,Genre,Listeners
0,Billie Holiday,Jazz,130000
1,Jimi Hendrix,Rock,270000
2,Miles Davis,Jazz,150000
3,SIA,Pop,200000


In [71]:
len(pandas_df)

4

However, our implementation of `DataFrame` does not support this.

In [72]:
len(df14)

TypeError: object of type 'DataFrame' has no len()

To support usage with the `len` function, we can define the `__len__` method.

In [73]:
from copy import deepcopy
from tabulate import tabulate

class DataFrame:
    def __init__(self, data):
        self.data = data
        self.columns = list(data.keys())
        
    def get_column(self, col_name):
        return self.data[col_name]
    
    def get_row(self, i):
        result = {}
        for col in self.columns:
            result[col] = self.data[col][i]
        return result
    
    def copy(self):
        data_copy = copy.deepcopy(self.data)
        return DataFrame(data_copy)
    
    def __repr__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __str__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __len__(self):
        return len(self.data[self.columns[0]])

In [74]:
df15 = DataFrame(artists_data)

In [75]:
len(df15)

4

Note that not every class you define would need to support the `len` method.

### `__getitem__` and  `__setitem__`

While we do have a method `get_column` to retrieve values in a column from our custom data frames, Pandas dataframes allow doing this easily using the indexing notation.

In [76]:
df16 = DataFrame(artists_data)

In [77]:
df16.get_column('Artist')

['Billie Holiday', 'Jimi Hendrix', 'Miles Davis', 'SIA']

In [78]:
pandas_df['Artist']

0    Billie Holiday
1      Jimi Hendrix
2       Miles Davis
3               SIA
Name: Artist, dtype: object

In [79]:
df16['Artist']

TypeError: 'DataFrame' object is not subscriptable

Further, pandas dataframes also allow creating new columns using the indexing notation.

In [80]:
pandas_df2 = pandas_df.copy()
pandas_df2

Unnamed: 0,Artist,Genre,Listeners
0,Billie Holiday,Jazz,130000
1,Jimi Hendrix,Rock,270000
2,Miles Davis,Jazz,150000
3,SIA,Pop,200000


In [81]:
sales = [20000, 31000, 24000, 27000]

In [82]:
pandas_df2['Sales'] = sales

In [83]:
pandas_df

Unnamed: 0,Artist,Genre,Listeners
0,Billie Holiday,Jazz,130000
1,Jimi Hendrix,Rock,270000
2,Miles Davis,Jazz,150000
3,SIA,Pop,200000


In [84]:
df16['Sales'] = sales

TypeError: 'DataFrame' object does not support item assignment

To support the indexing notation for getting and creating columns, we can implement the `__getitem__` and `__setitem__` methods on our class.

In [88]:
from copy import deepcopy
from tabulate import tabulate

class DataFrame:
    def __init__(self, data):
        self.data = deepcopy(data)
        self.columns = list(data.keys())
        
    def __getitem__(self, col_name):
        return self.data[col_name]
    
    def __setitem__(self, col_name, col_values):
        self.data[col_name] = col_values
        self.columns = list(self.data.keys())
    
    def get_row(self, i):
        result = {}
        for col in self.columns:
            result[col] = self.data[col][i]
        return result
    
    def copy(self):
        data_copy = copy.deepcopy(self.data)
        return DataFrame(data_copy)
    
    def __repr__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __str__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __len__(self):
        return len(self.data[self.columns[0]])

In [89]:
df17 = DataFrame(artists_data)

In [90]:
df17

+----------------+-------+-----------+
|     Artist     | Genre | Listeners |
+----------------+-------+-----------+
| Billie Holiday | Jazz  |  130000   |
|  Jimi Hendrix  | Rock  |  270000   |
|  Miles Davis   | Jazz  |  150000   |
|      SIA       |  Pop  |  200000   |
+----------------+-------+-----------+

In [91]:
df17['Artist']

['Billie Holiday', 'Jimi Hendrix', 'Miles Davis', 'SIA']

In [92]:
df17['Sales'] = sales

In [93]:
df17

+----------------+-------+-----------+-------+
|     Artist     | Genre | Listeners | Sales |
+----------------+-------+-----------+-------+
| Billie Holiday | Jazz  |  130000   | 20000 |
|  Jimi Hendrix  | Rock  |  270000   | 31000 |
|  Miles Davis   | Jazz  |  150000   | 24000 |
|      SIA       |  Pop  |  200000   | 27000 |
+----------------+-------+-----------+-------+

We now have a way to access, add and modify columns in our dataframe.

### `__iter__`

Pandas dataframe also support iteration, and can be used in `for` loops. Each iteration of the the loop, we get access to one column of the dataframe. 

In [94]:
for x in pandas_df:
    print(x)

Artist
Genre
Listeners


In [95]:
for x in df17:
    print(x)

KeyError: 0

To support iteration for custom classes, we can implement the `__iter__` method.

In [96]:
from copy import deepcopy
from tabulate import tabulate

class DataFrame:
    def __init__(self, data):
        self.data = deepcopy(data)
        self.columns = list(data.keys())
        
    def __getitem__(self, col_name):
        return self.data[col_name]
    
    def __setitem__(self, col_name, col_values):
        self.data[col_name] = col_values
        self.columns = list(self.data.keys())
    
    def get_row(self, i):
        result = {}
        for col in self.columns:
            result[col] = self.data[col][i]
        return result
    
    def copy(self):
        data_copy = copy.deepcopy(self.data)
        return DataFrame(data_copy)
    
    def __repr__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __str__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __len__(self):
        return len(self.data[self.columns[0]])
    
    def __iter__(self):
        for col in self.columns:
            yield col

Note the use of the `yield` keyword, instead of `return`. This converts the function into a "generator" which returns a new value each time it is invoked.

In [97]:
df18 = DataFrame(artists_data)

In [98]:
for x in df18:
    print(x)

Artist
Genre
Listeners


We can now iterate over our dataframe using a `for` loop.

You can find a full list of "dunder" methods and their usage here: https://holycoders.com/python-dunder-special-methods/ . Keep in mind that only some dunder methods are relevant for any given class, and you needn't implement all (or any) of them for every class your create.

Let's save our work before continuing.

In [None]:
jovian.commit()

## Getters, setters, static methods and class methods

One of the issues with our implementation is that we can't reliably rename the columns of a dataframe, like we can in Pandas.

In [99]:
pandas_df2 = pandas_df.copy()
pandas_df2

Unnamed: 0,Artist,Genre,Listeners
0,Billie Holiday,Jazz,130000
1,Jimi Hendrix,Rock,270000
2,Miles Davis,Jazz,150000
3,SIA,Pop,200000


In [102]:
pandas_df2.columns = ['Singer', 'Category', 'Listeners']
pandas_df2

Unnamed: 0,Singer,Category,Listeners
0,Billie Holiday,Jazz,130000
1,Jimi Hendrix,Rock,270000
2,Miles Davis,Jazz,150000
3,SIA,Pop,200000


In [103]:
pandas_df2['Singer']

0    Billie Holiday
1      Jimi Hendrix
2       Miles Davis
3               SIA
Name: Singer, dtype: object

In [104]:
df19 = DataFrame(artists_data)

In [105]:
df19.columns = ['Singer', 'Category', 'Listeners']
df19

+----------------+----------+-----------+
|     Singer     | Category | Listeners |
+----------------+----------+-----------+
| Billie Holiday |   Jazz   |  130000   |
|  Jimi Hendrix  |   Rock   |  270000   |
|  Miles Davis   |   Jazz   |  150000   |
|      SIA       |   Pop    |  200000   |
+----------------+----------+-----------+

In [106]:
df19['Singer']

KeyError: 'Singer'

This error occurs because the key in the internal dict are not yet modified.

In [107]:
df19.data

{'Artist': ['Billie Holiday', 'Jimi Hendrix', 'Miles Davis', 'SIA'],
 'Genre': ['Jazz', 'Rock', 'Jazz', 'Pop'],
 'Listeners': [130000, 270000, 150000, 200000]}

We can solve this issue by defining two functions for the `column` property: a "getter" and a "setter"

In [113]:
from copy import deepcopy
from tabulate import tabulate

class DataFrame:
    def __init__(self, data):
        self.data = deepcopy(data)
        
    @property
    def columns(self):
        return list(self.data.keys())
    
    @columns.setter
    def columns(self, new_cols):
        result = {}
        for old_col, new_col in zip(self.columns, new_cols):
            result[new_col] = self.data[old_col]
        self.data = result
        
    def __getitem__(self, col_name):
        return self.data[col_name]
    
    def __setitem__(self, col_name, col_values):
        self.data[col_name] = col_values
    
    def get_row(self, i):
        result = {}
        for col in self.columns:
            result[col] = self.data[col][i]
        return result
    
    def copy(self):
        data_copy = copy.deepcopy(self.data)
        return DataFrame(data_copy)
    
    def __repr__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __str__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __len__(self):
        return len(self.data[self.columns[0]])
    
    def __iter__(self):
        for col in self.columns:
            yield col

In [114]:
df20 = DataFrame(artists_data)

In [115]:
df20

+----------------+-------+-----------+
|     Artist     | Genre | Listeners |
+----------------+-------+-----------+
| Billie Holiday | Jazz  |  130000   |
|  Jimi Hendrix  | Rock  |  270000   |
|  Miles Davis   | Jazz  |  150000   |
|      SIA       |  Pop  |  200000   |
+----------------+-------+-----------+

In [116]:
df20.columns

['Artist', 'Genre', 'Listeners']

In [117]:
df20.columns = ['Singer', 'Category', 'Listeners']

In [118]:
df20

+----------------+----------+-----------+
|     Singer     | Category | Listeners |
+----------------+----------+-----------+
| Billie Holiday |   Jazz   |  130000   |
|  Jimi Hendrix  |   Rock   |  270000   |
|  Miles Davis   |   Jazz   |  150000   |
|      SIA       |   Pop    |  200000   |
+----------------+----------+-----------+

In [119]:
df20.data

{'Singer': ['Billie Holiday', 'Jimi Hendrix', 'Miles Davis', 'SIA'],
 'Category': ['Jazz', 'Rock', 'Jazz', 'Pop'],
 'Listeners': [130000, 270000, 150000, 200000]}

In [120]:
df20['Singer']

['Billie Holiday', 'Jimi Hendrix', 'Miles Davis', 'SIA']

### Static Methods

We can also define methods in a class which are not bound to any specific object and can be used directly from the class.

In [124]:
from copy import deepcopy
from tabulate import tabulate

class DataFrame:
    @staticmethod
    def is_valid(data_dict):
        """Checks if a dictionary is in a valid format to create a dataframe"""
        if not isinstance(data_dict, dict):
            return False
        cols = list(data_dict.keys())
        if len(cols) == 0:
            return False
        for col in cols:
            values = data_dict[col]
            if not isinstance(values, list):
                return False
            if len(values) != len(data_dict[cols[0]]):
                return False
        return True
    
    def __init__(self, data):
        self.data = deepcopy(data)
        
    @property
    def columns(self):
        return list(self.data.keys())
    
    @columns.setter
    def columns(self, new_cols):
        result = {}
        for old_col, new_col in zip(self.columns, new_cols):
            result[new_col] = self.data[old_col]
        self.data = result
        
    def __getitem__(self, col_name):
        return self.data[col_name]
    
    def __setitem__(self, col_name, col_values):
        self.data[col_name] = col_values
    
    def get_row(self, i):
        result = {}
        for col in self.columns:
            result[col] = self.data[col][i]
        return result
    
    def copy(self):
        data_copy = copy.deepcopy(self.data)
        return DataFrame(data_copy)
    
    def __repr__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __str__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __len__(self):
        return len(self.data[self.columns[0]])
    
    def __iter__(self):
        for col in self.columns:
            yield col

In [125]:
DataFrame.is_valid([])

False

In [127]:
DataFrame.is_valid({})

False

In [128]:
DataFrame.is_valid({'a': 1, 'b': 2})

False

In [129]:
DataFrame.is_valid({'a': [1], 'b': [2, 3]})

False

In [130]:
DataFrame.is_valid({'a': [1, 4], 'b': [2, 3]})

True

In [132]:
DataFrame.is_valid(artists_data)

True

### Class Method

Another special type of method is a classmethod, which receives the class constructor as the first argument, and is often used to create alternate ways of creating an object.

As an example, let's define a class method `read_json`, which can read data from a JSON file. Along with this, let's also add a normal method `to_json`.

In [176]:
import json
from copy import deepcopy
from tabulate import tabulate

class DataFrame:
    @staticmethod
    def is_valid(data_dict):
        """Checks if a dictionary is in a valid format to create a dataframe"""
        if not isinstance(data_dict, dict):
            return False
        cols = list(data_dict.keys())
        if len(cols) == 0:
            return False
        for col in cols:
            values = data_dict[col]
            if not isinstance(values, list):
                return False
            if len(values) != len(data_dict[cols[0]]):
                return False
        return True
    
    @classmethod
    def read_json(cls, filename):
        with open(filename, 'r') as f:
            data = json.loads(f.read())
        if DataFrame.is_valid(data):
            return cls(data)
        else:
            raise Exception('Invalid data in file ' + filename)
    
    def to_json(self, filename):
        with open(filename, 'w') as f:
            f.write(json.dumps(self.data))
    
    def __init__(self, data):
        self.data = deepcopy(data)
        
    @property
    def columns(self):
        return list(self.data.keys())
    
    @columns.setter
    def columns(self, new_cols):
        result = {}
        for old_col, new_col in zip(self.columns, new_cols):
            result[new_col] = self.data[old_col]
        self.data = result
        
    def __getitem__(self, col_name):
        return self.data[col_name]
    
    def __setitem__(self, col_name, col_values):
        self.data[col_name] = col_values
    
    def get_row(self, i):
        result = {}
        for col in self.columns:
            result[col] = self.data[col][i]
        return result
    
    def copy(self):
        data_copy = copy.deepcopy(self.data)
        return DataFrame(data_copy)
    
    def __repr__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __str__(self):
        return tabulate(self.data, self.columns, 'pretty')
    
    def __len__(self):
        return len(self.data[self.columns[0]])
    
    def __iter__(self):
        for col in self.columns:
            yield col

In [177]:
with open('artists.json', 'w') as f:
    f.write(json.dumps(artists_data))

In [178]:
!cat artists.json

{"Artist": ["Billie Holiday", "Jimi Hendrix", "Miles Davis", "SIA"], "Genre": ["Jazz", "Rock", "Jazz", "Pop"], "Listeners": [130000, 270000, 150000, 200000]}

In [179]:
df = DataFrame.read_json('artists.json')

In [180]:
df

+----------------+-------+-----------+
|     Artist     | Genre | Listeners |
+----------------+-------+-----------+
| Billie Holiday | Jazz  |  130000   |
|  Jimi Hendrix  | Rock  |  270000   |
|  Miles Davis   | Jazz  |  150000   |
|      SIA       |  Pop  |  200000   |
+----------------+-------+-----------+

In [181]:
df.to_json('artists2.json')

In [182]:
!cat artists2.json

{"Artist": ["Billie Holiday", "Jimi Hendrix", "Miles Davis", "SIA"], "Genre": ["Jazz", "Rock", "Jazz", "Pop"], "Listeners": [130000, 270000, 150000, 200000]}

**Exercises**:

1. Implement a class method `read_csv` and a normal method (also called instance method) `to_csv` to read and write from CSV files. You may find the `csvwriter` module useful.


2. Recall than Pandas dataframes also allow accessing columns using the `.` notation e.g. `pandas_df.Artist`. Add support for this behaviour in our implementation of the dataframe. Hint: Use the `__getattr__` dunder method.


3. Our current implementation does not support custom indexes. Implement two more classes `Index` and `Series`. An `Index` is simply a list of indices used within a dataframe. A `Series` encapsulate the values with a column and associates them with an `Index`. Study and replicate the functionality of the Pandas `Series` and `Index` classes.


4. Implement other commonly used methods and properties of pandas dataframes. Compare the performance of your implementations with those of Pandas dataframes. What causes the performance difference.



Let's save our work before continuing.

In [None]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m


## Inheritance, overriding and abstract methods

Classes in Python can extend other classes i.e. they can inherit properties and methods from other classes. Here's an example of inheritance using geometric shapes.

![](https://i.imgur.com/BSCxOkG.png)

In [148]:
import math
from abc import abstractmethod

class Shape:
    def __init__(self, name):
        self.name = name
        
    def __repr__(self):
        return "Shape<{}>".format(self.name)
    
    @abstractmethod
    def area(self):
        pass
    
    @abstractmethod
    def perimeter(self):
        pass
    
    @staticmethod
    def compare(shape1, shape2):
        return shape1.area() > shape2.area()

class Circle(Shape):
    def __init__(self, radius):
        super().__init__('Circle')
        self.radius = radius
        
    def area(self):
        return self.radius * self.radius * math.pi
    
    def perimeter(self):
        return self.radius * 2 * math.pi

    
class Polygon:
    @abstractmethod
    def sides(self):
        pass
    
    def has_more_sides_than(self, other_polygon):
        return self.sides() > other_polygon.sides()
    
    
class Rectangle(Shape, Polygon):
    def __init__(self, length, breadth):
        super().__init__('Rectangle')
        self.length = length
        self.breadth = breadth
        
    def area(self):
        return self.length * self.breadth
    
    def perimeter(self):
        return 2 * (self.length + self.breadth)
    
    def sides(self):
        return 4
    

class Triangle(Shape, Polygon):
    def __init__(self, side1, side2, side3):
        super().__init__('Triangle')
        self.side1 = side1
        self.side2 = side2
        self.side3 = side3
        
    def perimeter(self):
        return self.side1 + self.side2 + self.side3
    
    def area(self):
        s = self.perimeter() / 2
        return math.sqrt(s * (s - self.side1) * (s - self.side2) * (self.side3))
    
    def sides(self):
        return 3
    
class Square(Rectangle):
    def __init__(self, side):
        super().__init__(side, side)
        self.name = 'Square'
    

Let's create a circle and try using some of the methods.

In [149]:
circle1 = Circle(4)

In [150]:
circle1

Shape<Circle>

In [151]:
type(circle1)

__main__.Circle

In [152]:
isinstance(circle1, Shape)

True

In [153]:
circle1.perimeter()

25.132741228718345

In [154]:
circle1.area()

50.26548245743669

In [155]:
circle2 = Circle(3)

In [156]:
Shape.compare(circle1, circle2)

True

Let's compare rectangles and triangles using methods from `Shape` and `Polygon`.

In [157]:
rect1 = Rectangle(4, 3)

In [158]:
rect1

Shape<Rectangle>

In [159]:
rect1.area()

12

In [160]:
rect1.perimeter()

14

In [161]:
triangle1 = Triangle(2, 2, 3)

In [162]:
triangle1.area()

4.860555523805895

In [163]:
rect1.has_more_sides_than(triangle1)

True

In [164]:
Shape.compare(rect1, triangle1)

True

Let's create a square. We can use methods from `Rectangle`, `Shape` and `Polygon` in a square.

In [165]:
square1 = Square(4)

In [166]:
square1.area()

16

In [167]:
square1.sides()

4

Let's save our work before continuing.

In [169]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/python-classes-oop" on https://jovian.ai/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/python-classes-oop[0m


'https://jovian.ai/aakashns/python-classes-oop'

## Summary and Further Reading

The following topics were covered in this tutorial:

- Defining classes and creating objects
- Class constructor, properties and methods
- Implementing "dunder" methods for easier usage
- Getters, setters, static methods & class methods
- Inheritance, overriding and abstract methods

Check out the following resources to learn more: 

- https://www.w3schools.com/python/python_classes.asp
- https://dabeaz-course.github.io/practical-python/Notes/04_Classes_objects/01_Class.html
- https://realpython.com/python3-object-oriented-programming/
- https://dbader.org/blog/python-dunder-methods
- https://realpython.com/python-super/