In [1]:
%matplotlib inline
import numpy as np
import numpy.ma as ma
import pandas as pd
from pandas import DataFrame, Series
import matplotlib.pyplot as plt
import seaborn as sns
import timeit
import line_profiler

from collections import deque, namedtuple

import datetime

from io import StringIO
from pandas.api.types import CategoricalDtype

import pandas._testing as tm
import re

# Data Classes

Classes are a combination of two things behaviour and data.
behaviour is in the form of methods.
data is in the form of attributes.
They are the blue print for objects, they form the basis of OOP.

Some classes are mostly containers of behaviour eg. class that allows you to find area of different shapes like rect, square etc or class which provides password hashing functionality.
When working with behavior type of classes you might use things like inheritance to change the behavior or use design pattern such as the strategy.
You probably won't have that many instances of that class in your application.


Other classes act more as containers of data. Eg. Class for representing a vehicle in a vehicle registration system.
The class that is often used as a data container is often used differently, we may need to create many instances of it, we might want to order them, compare them easily and inspect the data that's in them etc.

How are data classes different from regular classes?
Data class have a built in initialize to help you quickly fill an object with data, there are easy ways to print, compare and order data. You can create data that is read only.

In [3]:
class Person: 
    ## We have used these class variables here to indicate the types,
    ## which will be used for dataclasses.
    ## Normally we will only write the initializer with the instance variables.
    
    name: str
    job: str
    age: int
        
    def __init__(self, name, job, age):
        self.name = name
        self.job = job
        self.age = age

person1 = Person('Ramesh', 'Kumar', 30)
person2 = Person('Suresh', 'Chanda', 25)
person3 = Person('Suresh', 'Chanda', 25)

print(id(person2))
print(id(person3))
print(person1)

print(person3 == person2)

4836625040
4836622832
<__main__.Person object at 0x120490ee0>
False


In the above example, this happens when we are dealing with regular classes i.e when print person object we get '<__main__.Person object at 0x120490ee0>' person object at this memory address, this is not very useful information doesn't tell us what's inside the object.

Another thing is, person2 and person3 have same name and surname  and age but equal operator is returning false for it.
Why is it False?
Because they are different objects.


When we are dealing with data we preferably want different results i.e we many want easy way to print the data i.e other than getting the address of an object we may want to see the contents of the object another thing is that often other thing we want to do with data is deeper comparisons so we that when the data is same, we may want the object also to be same.

Data classes can solve all those things for us.

Let's turn Person into a data class and see what happens.

In [19]:
from dataclasses import dataclass, field

## In order to compare to classes like p1>p2 we can set 
## dataclass as ordered with param order=True
## frozen makes sure that data is not changes anywhere in the code
@dataclass(order=True, frozen=True)
class Person: 
    ## We need to use these class variables here to indicate the types
    ## Initialization __init__ method is not required for dataclass
    sort_index: int = field(init=False, repr=False)
    name: str
    job: str
    age: int
    strength: int = 100
    
    def __post_init__(self):
        #self.sort_index = self.age
        #self.sort_index = self.strength
        object.__setattr__(self, 'sort_index', self.strength)
        
    def __str_(self):
        return f'{self.name}, {self.job}, {self.age}'

person1 = Person('Ramesh', 'Kumar', 30,99)
person2 = Person('Suresh', 'Chanda', 25)
person3 = Person('Suresh', 'Chanda', 25)

print(id(person2))
print(id(person3))
print(person1)
print(person2)

print(person3 == person2)
print(person3 > person2)
print(person1 > person2) 

4836795296
4840320688
Person(name='Ramesh', job='Kumar', age=30, strength=99)
Person(name='Suresh', job='Chanda', age=25, strength=100)
True
False
False
