## Dataclasses

- This is a recent addition from python *`3.7`*
- To do all the changes manually for all the classes we define is a huge task.
- Like the initialization function, i.e. if we know all the parameters we are going to pass to the initialization function are going to the properties of the class objects, then there is no reason to write these assignment operators for all the parameters again and again.
- Also in case of representation and comparing we need to manually override the `__repr__()` and `__eq__()` for all the classes we are going to define. Like we want the equality operator to compare on the basis of the values of the objects.
- To overcome these problems we use the **dataclasses** module.

#### Normal Class Situation

In [1]:
# Creating a Normal class

class Person:
    # initializing the attributes/properties of the class
    def __init__(self, name, age, city):
        self.name = name
        self.age = age
        self.city = city

In [2]:
# Now creating an instance of the class

p = Person("Arunava", 35, "Kolkata")

In [3]:
# Checking the properties of the instance

p.name, p.age, p.city

('Arunava', 35, 'Kolkata')

In [4]:
# Now the representation of the normal class object
# This is the default representation

p

<__main__.Person at 0x7f118bfe68f0>

In [5]:
# Now to give a better representation of the class object 
# we can override the magic method __repr__() in the class

class Person:
    # initializing the attributes/properties of the class
    def __init__(self, name, age, city):
        self.name = name
        self.age = age
        self.city = city
        
    # customizing the __repr__() to change the default representation
    def __repr__(self):
        return f"Person(name={self.name}, age={self.age}, city={self.city})"
    
    
# Now creating an instance of the class
p = Person("Arunava", 35, "Kolkata")

In [6]:
# Now again checking the representation of the class object
# It is much more prettier than the previous one

p

Person(name=Arunava, age=35, city=Kolkata)

In [7]:
# Now comparing two class objects
# Here we get False as even though the two instances have same values 
# But they are stored as two seperate classes in two seperate memory locations

p1 = Person("Arunava", 35, "Kolkata")
p2 = Person("Arunava", 35, "Kolkata")

# checking equality of two class objects
p1 == p2

False

In [8]:
# To overcome this problem we need to override another magic method __eq__() 
# which checks the equality operator
# This __eq__() gets triggered when we want to check the equality between two objects of a class

class Person:
    # initializing the attributes/properties of the class
    def __init__(self, name, age, city):
        self.name = name
        self.age = age
        self.city = city
        
    # customizing the __repr__() to change the default representation
    def __repr__(self):
        return f"Person(name={self.name}, age={self.age}, city={self.city})"
    
    # here 'self' refers to the 1st object and 'other' refers to the another object
    def __eq__(self, other):
        return (self.name, self.age, self.city) == (other.name, other.age, other.city)
    
    
# Now creating an instances of the class
p1 = Person("Arunava", 35, "Kolkata")
p2 = Person("Arunava", 35, "Kolkata")

In [9]:
# Now again comparing
# Now it is True

p1 == p2

True

#### Now using `dataclass`

- **dataclass** is a `decorator` function for the classes so we put it as **`@dataclass`**
- To apply dataclass first we need to know what properties of the class object do we need?
- Then along with these properties we need to provide the `type hints` so we can store certain kind of variables in these properties. We need to apply these with notations like `:`

In [10]:
# importing the function 'dataclass' from the module 'dataclasses'

from dataclasses import dataclass

In [11]:
# Applying dataclass
# Here we want the properties 'name', 'age' and 'city' of the class object.
# Here we store 'str' kind in both 'name' and 'city' and 'int' kind in 'age'.
# Here we have given a brief overview how we want the properties of our class object.

@dataclass
class NewPerson:
    name: str
    age: int
    city: str

In [12]:
# Now creating an instance of the class

p = NewPerson("Arunava", 35, "Kolkata")

In [13]:
# Checking the properties of the instance
# Here we get the values even without creating the initialization function 
# it is done by the '@dataclass' by default.

p.name, p.age, p.city

('Arunava', 35, 'Kolkata')

In [14]:
# Now again checking the representation of the class object
# Here the '__repr__()' gets modified by the @dataclass

p

NewPerson(name='Arunava', age=35, city='Kolkata')

In [15]:
# Now comparing two class objects
# Here also the '__eq__()' gets modified by the @dataclass

p1 = Person("Arunava", 35, "Kolkata")
p2 = Person("Arunava", 35, "Kolkata")

# checking equality of two class objects
p1 == p2

True

### Different `parameters` of the `dataclass`

- syntax is:
    
    **`dataclass(cls=None, *, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)`**
    

- **init** : It is an optional parameter that is set to `True` by default. It creates the `__init__()` for the class by default. It gets run automatically when we create a new object of that class, and the values passed by the object are set as properties of the class object.

- **repr** : It is an optional parameter that is set to `True` by default. It creates the `__repr__()` for the class object by default that provides the representation of the class object. If we change the value to `False` we will get the output as we get in case of the default representation of the class object.

- **eq** : It is an optional parameter that is set to `True` by default. It creates the `__eq__()` for the class object by default that compares the values of two objects of the same class. There is another function `__ne__()` it is for **Not Equal To**. It also gets created automatically when using the @dataclass.

- **order** : It is an optional parameter that is set to `False` by default. It provides the implementation of the functions for `<`, `<=`, `>` and `>=` operations. It gets triggerred when we try to implement any of these operations between two class objects. It triggers the following functions for following operations if we set the value as `True`.
    - For `<` it triggers the `__lt__()`
    - For `<=` it triggers the `__le__()`
    - For `>` it triggers the `__gt__()`
    - For `>=` it triggers the `__ge__()`
    

- **unsafe_hash** : In Python for immutable objects there is a concept called `hash value` that is a value which gets calculated upon the values of that given object and it generates an integer out of it, which uniquely identifies that particular object. So by using this parameter if we set the value to `True` we can get a hash value even for classes with immutable properties.

- **frozen** : By default we can make changes to the class properties of the class object. To restrict the changes in the properties of the class objects we use this parameter. By default it is set as `False`. This make the properties immutable if we change the value to `True`. With this we can make any property of the class frozen, so new value can be assign to the properties of the class.

In [16]:
# Using the 'init'

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    age: int
    city: str

# Creating instance of the class    
p = Person("Arunava", 35, "Kolkata")

# Checking the properties of the class
p.name, p.age, p.city

('Arunava', 35, 'Kolkata')

In [17]:
# Using the 'repr'

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    age: int
    city: str

# Creating instance of the class    
p = Person("Arunava", 35, "Kolkata")

# Checking the representation of the class
p

Person(name='Arunava', age=35, city='Kolkata')

In [18]:
# Using the 'eq'

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    age: int
    city: str

# Creating instances of the class    
p1 = Person("Arunava", 35, "Kolkata")
p2 = Person("Arunava", 35, "Kolkata")

# Comparing the objects
p1 == p2

True

In [19]:
# Using the 'order=True'

# Creating a class using @dataclass
@dataclass(order=True)
class Person:
    name: str
    age: int
    city: str

# Creating instances of the class    
p1 = Person("Arunava", 35, "Kolkata")
p2 = Person("Arunava", 37, "Kolkata")
p3 = Person("Arunava", 35, "Kolkata")
p4 = Person("Arunava", 38, "Kolkata")

# Comparing the objects
p1 < p2, p3 >= p2, p4 > p1

(True, False, True)

In [20]:
# Changing the property value of a class object

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    age: int
    city: str

# Creating instance of the class    
p = Person("Arunava", 35, "Kolkata")

# Changing the 'name' property of the object
p.name = "Bivash"

# Checking the properties of the class
# Here the 'name' property of the class 'Person' is mutable
p.name, p.age, p.city

('Bivash', 35, 'Kolkata')

In [21]:
# Using the 'frozen=True'

# Creating a class using @dataclass
@dataclass(frozen=True)
class Person:
    name: str
    age: int
    city: str

# Creating instance of the class    
p = Person("Arunava", 35, "Kolkata")

# Again trying to change the 'name' property of the object
try:
    p.name = "Bivash"
except Exception as err:
    print("The error is:", err)

The error is: cannot assign to field 'name'


In [22]:
# Checking the properties of the class
# Here the 'name' property of the class 'Person' becomes immutable

p.name, p.age, p.city

('Arunava', 35, 'Kolkata')

In [23]:
# Usage of 'hash()'

# creating variables with same value and one variable with different value
a = "Arunava"
b = "Arunava"
c = "ARUNAVA"
d = [1,2,3]

try:
    print("Printing the hash values of both the objects")
    print("hash of a:", hash(a))
    print("hash of b:", hash(b))
    print("hash of c:", hash(c))
except Exception as err:
    print("The error is:", err)
else:    
    print("The comparison of hash values of a and b is:", hash(a) ==  hash(b))
    print("The comparison of hash values of a and c is:", hash(a) ==  hash(c))
    print("The comparison of hash values of b and c is:", hash(b) ==  hash(c))
finally:
    try:
        print("hash of mutable object d:", hash(d))
    except Exception as err:
        print("The error is:", err)

Printing the hash values of both the objects
hash of a: 9213208724754725304
hash of b: 9213208724754725304
hash of c: -1069285078868206832
The comparison of hash values of a and b is: True
The comparison of hash values of a and c is: False
The comparison of hash values of b and c is: False
The error is: unhashable type: 'list'


In [24]:
# Trying to get the hash value of the class

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    age: int
    city: str

# Creating instance of the class    
p = Person("Arunava", 35, "Kolkata")

# Trying to get the hash value
try:
    print("The hash value of the class Person is:", hash(p))
except Exception as err:
    print("The error is:", err)

The error is: unhashable type: 'Person'


In [25]:
# One way if we set the 'frozen=True' as in this case all the properties become immutable
# But here we will loose the flexibility as we will not be able to change the values in any stage

# Creating a class using @dataclass
@dataclass(frozen=True)
class Person:
    name: str
    age: int
    city: str

# Creating instance of the class    
p = Person("Arunava", 35, "Kolkata")

# Trying to get the hash value
try:
    print("The hash value of the class Person is:", hash(p1))
except Exception as err:
    print("The error is:", err)

The error is: unhashable type: 'Person'


In [26]:
# Now if we change the value of properties of 'p'


# Again trying to get the hash value
try:
    p.name = "ARUNAVA"
    p.age = 36
    p.city = "Delhi"
except Exception as err:
    print("The error is:", err)
else:
    print("The hash value of the class Person is:", hash(p))

The error is: cannot assign to field 'name'


In [27]:
# Another way is we set the 'unsafe_hash=True'

# Creating a class using @dataclass
@dataclass(unsafe_hash=True)
class Person:
    name: str
    age: int
    city: str

# Creating instance of the class    
p = Person("Arunava", 35, "Kolkata")

# Trying to get the hash value
try:
    print("The hash value of the class Person is:", hash(p))
except Exception as err:
    print("The error is:", err)

The hash value of the class Person is: -3393930350265040838


In [28]:
# Now if we change the value of properties of 'p'

# Again trying to get the hash value
try:
    p.name = "ARUNAVA"
    p.age = 36
    p.city = "Delhi"
except Exception as err:
    print("The error is:", err)
else:
    print("The hash value of the class Person is:", hash(p))

The hash value of the class Person is: 5659647284053826394


### `field` in `dataclass`

In [29]:
# dataclass module provides a function named 'field' that helps to define the properties of the field

from dataclasses import dataclass, field

- To provide `field` function to any class property just use the `=` and write `field()` after that: `property_name: type annotation = field()`. This `field` will return some object that helps the `dataclass` to understand some extra properties for that particular variable.

- syntax is:

    **`field(*, default=MISSING, default_factory=MISSING, init=True, repr=True, hash=None, compare=True, metadata=None)`**
    
- **default** : It will set the default value for that property variable of that class. We can also provide the value directly with a `=` without using the `field(default=)`.


- **default_factory** : Here instead of a default value we need to pass a function, and the return value of that function get set as the default value of that particular variable. It is mainly use where we cannot set a single value as the default value, and we need to do some computation. **Remember this `default_factory()` takes no arguments**.


- **init** : It is to be used when we want the property to be a part of the initialization function or not. By default as it is set to `True` so all the properties are part of the initialization function. So if we want any property to not be part of the initialization function we need to set this as `False`.


- **repr** : This is used to keep a property as part of the representation. By default it is set to `True` that is why we can see all the properties by default.


- **hash** : To stop using a particular property while calculating the hash value of the object we need to set it as `False`, as by default it is set to be `True`. To get the has value first we need to set the `unsafe_hash=True` for the class, then we can use this parameter in the `field` function.


- **compare** : It is used to set if that particular property of the class should be used for comparison or not.


- **metadata** : It is like a dictionary for a field that stores metadata. This is not directly used by the class object, but some 3rd party class or function that uses this particular class may be interested in getting some extra details about the property. So they can use this metadata to do that.

In [30]:
# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    age: int
    city: str

# Creating instance of the class    
p = Person("Arunava", 35, "Kolkata")

In [31]:
# Every class object whose class has been wrapped by '@dataclass'
# decorator has inbuilt variable named '__dataclass_fields__', 
# That provides all the fields that the dataclass is managing
# Here all the properties of the class works as 'keys' of the dictionary

p.__dataclass_fields__

{'name': Field(name='name',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x7f118e306fe0>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f118e306fe0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 'age': Field(name='age',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x7f118e306fe0>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f118e306fe0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 'city': Field(name='city',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x7f118e306fe0>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f118e306fe0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}

In [32]:
# To see the Field for property 'age'

p.__dataclass_fields__['age']

Field(name='age',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x7f118e306fe0>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f118e306fe0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)

In [33]:
# Using 'default' field function
# Here we are setting the default age value as 25

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    city: str
    age: int = field(default=25)

# Creating instance of the class    
p = Person("Arunava", "Kolkata")

# Checking the representation
p

Person(name='Arunava', city='Kolkata', age=25)

In [34]:
# Another way to set 'default' field value

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    city: str
    age: int = 35

# Creating instance of the class    
p = Person("Arunava","Kolkata")

# Checking the representation
p

Person(name='Arunava', city='Kolkata', age=35)

In [35]:
# Using 'default_factory' field function
# Here we are going to set the age using a function 
# which will take a list of ages and use it's mean as the default value 

# Creating function to find mean age
def get_default_age():
    ages = [12, 34, 56, 34, 12]
    return sum(ages) // len(ages)

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    city: str
    age: int = field(default_factory=get_default_age)

# Creating instance of the class    
p = Person("Arunava", "Kolkata")

# Checking the representation
p

Person(name='Arunava', city='Kolkata', age=29)

In [36]:
# Using 'init' 
# Setting 'init' to make the 'city' property out of the initialization function
# But we need to place a default value for this property or else it will throw an error

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    city: str = field(init=False, default="Kolkata")
    age: int = 35

# Creating instance of the class    
p = Person("Arunava")

# Checking the representation
p

Person(name='Arunava', city='Kolkata', age=35)

In [37]:
# Using 'repr' 
# Setting 'repr' to make the 'city' property out of the representation function
# Here only 'name' and 'age' will be part of representation

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    city: str = field(init=False, default="Kolkata", repr=False)
    age: int = 35

# Creating instance of the class    
p = Person("Arunava")

# Checking the representation
p

Person(name='Arunava', age=35)

In [38]:
# Calculating the default hash value

# Creating a class using @dataclass
@dataclass(unsafe_hash=True)
class Person:
    name: str
    city: str = field(init=False, default="Kolkata", repr=False)
    age: int = 35

# Creating instance of the class    
p = Person("Arunava")

# Trying to get the hash value
try:
    print("The hash value of the class Person with all it's properties is:", hash(p))
except Exception as err:
    print("The error is:", err)

The hash value of the class Person with all it's properties is: -6809336010142008453


In [39]:
# Calculating the hash value without the 'city' property

# Creating a class using @dataclass
@dataclass(unsafe_hash=True)
class Person:
    name: str
    city: str = field(init=False, default="Kolkata", repr=False, hash=False)
    age: int = 35

# Creating instance of the class    
p = Person("Arunava")

# Trying to get the hash value
try:
    print("The hash value of the class Person without city is:", hash(p))
except Exception as err:
    print("The error is:", err)

The hash value of the class Person without city is: 3357454814853105017


In [40]:
# Using 'compare'
# Here we are comparing without the 'age' property

# Creating a class using @dataclass
@dataclass(unsafe_hash=True)
class Person:
    name: str
    city: str = field(init=False, default="Kolkata", repr=False)
    age: int = field(default=35, compare=False)

# Creating instances of the class    
p1 = Person("Arunava")
p2 = Person("Arunava")
p3 = Person("Arunava", 28)

# Now comparing them
try:
    print("The comparison between object p1 and p2 is:", p1==p2)
    print("The comparison between object p1 and p3 is:", p1==p3)
    # here we get True even though the ages are different
except Exception as err:
    print("Error is:", err)

The comparison between object p1 and p2 is: True
The comparison between object p1 and p3 is: True


In [41]:
# Using 'metadata'
# Here we are saving the 'age' in years

# Creating a class using @dataclass
@dataclass(unsafe_hash=True)
class Person:
    name: str
    city: str = field(init=False, default="Kolkata", repr=False)
    age: int = field(default=35, compare=False, metadata={'format': "year"})

# Creating instances of the class    
p = Person("Arunava")

In [42]:
# Retrieving the metadata

p.__dataclass_fields__['age'].metadata['format']

'year'

### `__post_init__()`

- It is a magic function which gets triggered as soon as the processing of the `__init__()` is complete.
- It is used when we need to call some function to the class as soon as the initialization of the class is done.
- Using this method we can set the values of all the properties whose values we do not want to provide at the time of object initialization but we can create the value with some logic which to be applied on the initialization part itself.

In [43]:
# Here we will not provide the value for 'is_senior' property when creating instance of the class.
# Instead it will calculate the value from the value of the property 'age' itself.

# Creating a class using @dataclass
@dataclass
class Person:
    name: str
    city: str
    age: int
    is_senior: bool = field(init=False)  # setting the initial value as 'False'
    
    # Using '__post_init__()' to create a function 
    # which will decide the value of the 'is_senior' on the basis of 'age'
    def __post_init__(self):
        if self.age >= 60:
            self.is_senior = True
        else:
            self.is_senior = False

# Creating instances of the class    
p1 = Person("Arunava", "Kolkata", 35)
p2 = Person("Bivash", "Kolkata", 65)

try:
    print(p1)
    print("The value of the colomn is_senior for object p1 is:", p1.is_senior)
    print(p2)
    print("The value of the colomn is_senior for object p2 is:", p2.is_senior)
except Exception as err:
    print("Error is:", err)

Person(name='Arunava', city='Kolkata', age=35, is_senior=False)
The value of the colomn is_senior for object p1 is: False
Person(name='Bivash', city='Kolkata', age=65, is_senior=True)
The value of the colomn is_senior for object p2 is: True


### `Inheritance in dataclass`

- Here during initialization of the object created with the child class, it will first take all the properties of the parent class and then all the properties of the child class.
- Remember in case of both the parent and child class having same property with different default values then the value set in the child class will be taken as default during initialization.

In [44]:
# Creating parent class using @dataclass
@dataclass
class Person:
    name: str
    city: str
    age: int
    
    
# Creating the child class using @dataclass
@dataclass
class Student(Person):
    grade: int
    subjects: list
    
    
# Creating object of the child class
s = Student("Arunava", "Kolkata", 25, 9, ['maths', 'physics'])

# Checking the representation
s

Student(name='Arunava', city='Kolkata', age=25, grade=9, subjects=['maths', 'physics'])

In [45]:
# Creating parent class using @dataclass
@dataclass
class A:
    x: int = 10
    y: int = 20

# Creating the child class using @dataclass
@dataclass
class B(A):
    z: int = 30
    x: int = 40
    
# Creating instance of the child class
b = B()

# Checking representation of the child class to see the default values
b

B(x=40, y=20, z=30)

### `asdict` and `astuple`

- There is a requirement many a times to get the values of all the properties of a class in terms of some kind of a data structure like a **list** or a **dictionary**, so we can keep it like a packet and use it in different places.
- So we can get all the properties of a class in a different format.

In [47]:
# Creating the Address class using a @dataclass
@dataclass
class Address:
    lat: float
    lng: float
    city: str
    country: str

    

# Creating a class using @dataclass 
# where we will use the class 'Address' as typehint for the property 'addr'
@dataclass
class Person:
    name: str
    addr: Address
    age: int
    
    
# Creating an object of class Adress
a = Address(22.5, 88.3, "Kolkata", "India")

# Creating an object of class Person
p = Person("Arunava", a, 35)

# Checking the representation of 'p'
p

Person(name='Arunava', addr=Address(lat=22.5, lng=88.3, city='Kolkata', country='India'), age=35)

In [48]:
# Now transforming the object we will need the functions asdict() and astuple()
#  importing the libraries

from dataclasses import asdict, astuple

In [49]:
# Now checking in dictionary format the object 'p'
asdict(p)

{'name': 'Arunava',
 'addr': {'lat': 22.5, 'lng': 88.3, 'city': 'Kolkata', 'country': 'India'},
 'age': 35}

In [50]:
# Now checking in tuple format the object 'p'
astuple(p)

('Arunava', (22.5, 88.3, 'Kolkata', 'India'), 35)