# Dataclasses

Before Python 3.7, representing simple data structures often involved using tuples, dictionaries, or basic classes, each with its limitations in readability, maintainability, and structure. 
Data classes, introduced in Python 3.7 via PEP 557, addressed these challenges by offering a structured yet flexible way to define classes primarily for storing data.

With **dataclasses**:

- Readability is enhanced with less boilerplate and clear type annotations.
= Maintainability improves due to auto-generated methods and reduced custom code.
- Flexibility is achieved as they combine the best features of tuples and dictionaries.
- Type Safety is promoted by integrating with static type checkers like mypy.

In essence, data classes represent a modern approach to data-centric object-oriented programming in Python, emphasizing clarity, structure, and type safety.

Defining a dataclass

In [95]:
from dataclasses import dataclass, field

@dataclass
class DcPerson:
    first_name: str
    last_name: str
    age: int

The dataclasses module in Python facilitates the auto-generation of boilerplate code for several foundational class operations, ensuring enhanced efficiency and maintainability. 

Simultaneously, it upholds the intrinsic properties and extensibility of traditional classes, ensuring that developers retain full control over the class's behavior and structure.

Basic functionalities provided by dataclasses.

1. **Auto-generation of Special Methods**:
    - `__init__`: Constructor method
    - `__repr__`: Representation method
    - `__eq__`: Equality comparison method
    - Ordering methods (`__lt__`, `__le__`, `__gt__`, `__ge__`) with the `order` parameter
    - `__hash__`: Hash method, generated under certain conditions


You can observe below that to define a dataclass you need not to implement the  __init__ method.

In [96]:
from dataclasses import dataclass, field, asdict, astuple, replace, is_dataclass

@dataclass(order=True, frozen=True)
class Person:
    name: str
    age: int
    email: str = field(default='')


person1 = Person('Alice', 25, 'alice@example.com')
person2 = Person('Bob', 30, 'bob@example.com')
person3 = Person('Charlie', 35, 'charlie@example.com')

print(person1, person2, person3, sep='\n')


Person(name='Alice', age=25, email='alice@example.com')
Person(name='Bob', age=30, email='bob@example.com')
Person(name='Charlie', age=35, email='charlie@example.com')


As we can observe above dataclass autoomatically adds the __repr__ implementation to class to present it with its name and all field data. </b>

To exclude any field in the __repr__ representation of class, we can add `field(repr=false)` in front of field as described below.

In [97]:
@dataclass
class Person1:
    name: str
    age: int = field(repr=False)
    email: str = field(default='')
    
person4 = Person1('Dave', 40, 'dave@dummuy.com')

print(person4)

Person1(name='Dave', email='dave@dummuy.com')


Unlike default classes where default equality behavior is to compare the id of class instances, dataclass compares all the field values to determine the equality by default. We can also specify which fields to ignore while implementing the `__eq__`.

In [98]:
class DefaultPerson:
    def __init__(self, name, age, email=''):
        self.name = name
        self.age = age
        self.email = email
        
person5 = DefaultPerson('Eve', 45, 'dummy@abc.com')
person6 = DefaultPerson('Eve', 45, 'dummy@abc.com')

print(f'Is {person5} and {person6} from Default class equal? {person5 == person6}')

person7 = Person('Eve', 45, 'dummy@abc.com')
person8 = Person('Eve', 45, 'dummy@abc.com')

print(f'Is {person7} and \n {person8} from Dataclass equal? {person7 == person8}')


Is <__main__.DefaultPerson object at 0x10c6e3700> and <__main__.DefaultPerson object at 0x10c6e12d0> from Default class equal? False
Is Person(name='Eve', age=45, email='dummy@abc.com') and 
 Person(name='Eve', age=45, email='dummy@abc.com') from Dataclass equal? True


Dataclass to only compare age of person to determine equality.

2. **Field Definitions**:
    - Default values for fields
    - `default_factory`: A callable that produces a default value for fields
    - Field metadata


Dataclass default behavior restrict  from assigning mutable default value to the field.
To add a default value we can use default_factory which initiates separate default value for each instance of the dataclass.

In [99]:
try:
    @dataclass
    class PersonMutableDefault:
        name: str
        age: int = field(repr=False)
        email: str = field(default='')
        list_values: list = []
except ValueError as e:
    print(str(e))
    
@dataclass
class PersonDefaultFactory:
    name: str
    age: int = field(repr=False)
    email: str = field(default='')
    list_values: list = field(default_factory=list)
    
person10 = PersonDefaultFactory('Eve', 45, 'test@test.com')
person10.list_values.append(1)

person11 = PersonDefaultFactory('Dave', 40, 'dave@tests.com')

print(f'Person10 list values: {person10.list_values} \nPerson11 list values: {person11.list_values}')

mutable default <class 'list'> for field list_values is not allowed: use default_factory
Person10 list values: [1] 
Person11 list values: []


3. **Decorator Parameters**:
    - `order`: Generate ordering methods
    - `frozen`: Create immutable data classes
    - `unsafe_hash`: Force generation of a `__hash__` method

Ordering allow us to use `> and <` comparisons based on the fields which are available for comparison.

In [100]:
@dataclass(order=True)
class PersonOrder:
    name: str
    age: int
    
person12 = PersonOrder('Alice', 30)
person13 = PersonOrder('Bob', 25)

print(f'Field comparison in order of declaration (name is getting compared): \nIs {person12} > {person13}? {person12 > person13}')

@dataclass(order=True)
class PersonOrderAge:
    name: str = field(compare=False)
    age: int
    
person14 = PersonOrderAge('Alice', 30)
person15 = PersonOrderAge('Bob', 25)

print()
print(f'Field comparison based on age: \nIs {person14} > {person15}? {person14 > person15}')

Field comparison in order of declaration (name is getting compared): 
Is PersonOrder(name='Alice', age=30) > PersonOrder(name='Bob', age=25)? False

Field comparison based on age: 
Is PersonOrderAge(name='Alice', age=30) > PersonOrderAge(name='Bob', age=25)? True


Using frozen to make dataclass fields immutable

In [101]:
@dataclass(frozen=True)
class PersonFrozen:
    name: str
    age: int
    
person16 = PersonFrozen('Alice', 30)

try:
    person16.age = 40
except Exception as e:
    print(str(e))



cannot assign to field 'age'


4. **Utility Functions**:
    - `dataclasses.field()`: Provides customization for individual fields
    - `dataclasses.asdict()`: Converts data class instances to dictionaries
    - `dataclasses.astuple()`: Converts data class instances to tuples
    - `dataclasses.replace()`: Creates a new instance replacing specified fields
    - `dataclasses.is_dataclass()`: Checks if an object is a data class or an instance of one

In [102]:
from dataclasses import dataclass, field, asdict, astuple, replace, is_dataclass

@dataclass
class Person:
    name: str
    age: int
    email: str = field(default='')

person = Person('Alice', 25, 'alice@example.com')

# Convert dataclass instance to a dictionary
person_dict = asdict(person)
print(person_dict)

# Convert dataclass instance to a tuple
person_tuple = astuple(person)
print(person_tuple)

# Create a new instance of the dataclass by replacing a field
new_person = replace(person, age=30)
print(new_person)

# Check if an object is a dataclass or an instance of one
print(is_dataclass(person))


{'name': 'Alice', 'age': 25, 'email': 'alice@example.com'}
('Alice', 25, 'alice@example.com')
Person(name='Alice', age=30, email='alice@example.com')
True


5. **Post Initialization**:
    - `__post_init__`: Allows further initialization after the `__init__` method

The __post_init__ method in a dataclass is a special method that is automatically called after the initialization of the object. It allows you to perform additional initialization steps or modify the object's attributes based on the initial values provided.

In the example code below, the __post_init__ method is used to set the is_adult attribute of the Person object.
By setting is_adult as init=False, the Person object will not make is_adult available while initialization.

In [103]:
@dataclass
class PersonPostInit:
    name: str
    age: int
    email: str = field(default='')
    is_adult: bool = field(init=False)
    
    def __post_init__(self):
        if self.age < 0:
            raise ValueError(f'Age {self.age} is not valid')
        self.is_adult = self.age >= 18

try:
    person = PersonPostInit('Alice', 25, 'test', is_adult=False)
except TypeError as e:
    print(str(e))
    
person = PersonPostInit('Alice', 25, 'test')

print(f'Is person adult: {person.is_adult}')

PersonPostInit.__init__() got an unexpected keyword argument 'is_adult'
Is person adult: True


InitVar can be used to make field available to post_init but not initialized as instance variable.

In below example we are setting age as InitVar.

In [104]:
from dataclasses import InitVar

@dataclass
class PersonInitVar:
    name: str
    age: InitVar(int)
    email: str = field(default='')
    is_adult: bool = field(init=False)
    
    def __post_init__(self, age):
        if age < 0:
            raise ValueError(f'Age {age} is not valid')
        self.is_adult = age >= 18
        
person = PersonInitVar('Alice', 25, 'test')

print(person)

PersonInitVar(name='Alice', email='test', is_adult=True)


6. **Inheritance**:
    - Ability to inherit from other data classes and regular classes

In [105]:
@dataclass
class PersonParent:
    name: str
    age: int

@dataclass
class Employee(PersonParent):
    employer: str
    
employee = Employee('Alice', 25, 'ABC Corp')

print(employee)

Employee(name='Alice', age=25, employer='ABC Corp')
