# Dataclasses in Python

Dataclasses, introduced in Python 3.7 (with a backport for 3.6), provide a decorator and functions for automatically adding generated special methods such as `__init__`, `__repr__`, and `__eq__` to user-defined classes.

This makes them particularly handy for classes that are primarily used to store data with minimal boilerplate code.

In this notebook, we will cover:
1. **What are Dataclasses and Why Use Them**
2. **Basic Usage of the `dataclass` Decorator**
3. **Default Values and the `field` Function**
4. **Generated Methods (`__init__`, `__repr__`, `__eq__`, etc.)**
5. **Comparison and Ordering**
6. **`frozen` Dataclasses (Immutable)**
7. **Advanced Features (`asdict`, `astuple`, `replace`)**
8. **Inheritance with Dataclasses**
9. **Best Practices and Common Pitfalls**

In [7]:
# Notebook Setup: If you are running Python version < 3.7, you can install the backport:
# !pip install dataclasses

from dataclasses import dataclass, field, asdict, astuple, replace
from typing import List, Optional

## 1. What are Dataclasses and Why Use Them

- **Dataclasses** reduce boilerplate when creating classes that are primarily used to store data.
- They automatically generate:
- `__init__` methods
- `__repr__` methods
- `__eq__` (and optionally ordering methods)
- They also allow for additional flexibility via the `field` function to handle default values and other customization.

Example use cases:
- Configuration objects
- Entities in your application domain that mostly store values
- Data transfer objects for your code to communicate between modules/services

In [1]:
# Normal Python class and see what it looks like:

class RegularPerson:
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age

    def __repr__(self):
        return f"RegularPerson(name={self.name}, age={self.age})"

p1 = RegularPerson("Alice", 30)
p2 = RegularPerson("Bob", 25)

print(p1, p2)

# We can see we had to define __init__ and __repr__ ourselves.

RegularPerson(name=Alice, age=30) RegularPerson(name=Bob, age=25)


In [3]:
# Same class using a dataclass:

@dataclass
class Person:
    name: str
    age: int

# We do not need to define __init__ or __repr__ by ourselves.
# Dataclasses automatically do that for us.

person1 = Person(name="Alice", age=30)
person2 = Person(name="Bob", age=25)

print(person1)
print(person2)

Person(name='Alice', age=30)
Person(name='Bob', age=25)


**Observations**:
- The `@dataclass` decorator automatically created an `__init__` and a `__repr__`.
- With type annotations, `dataclass` can also provide extra checks and improvements with IDEs and tools.

----

## 2. Basic Usage of the `dataclass` Decorator

The `dataclass` decorator has several important parameters we can adjust:
- `init`: If `True` (default), the dataclass will generate an `__init__`.
- `repr`: If `True` (default), the dataclass will generate a `__repr__`.
- `eq`: If `True` (default), the dataclass will generate an `__eq__` method.
- `order`: If `True`, it will generate `__lt__`, `__le__`, `__gt__`, and `__ge__`. (Note: `eq` must be `True` for `order=True` to make sense.)
- `frozen`: If `True`, all fields become read-only after object initialization, making the dataclass immutable.

We’ll look at more of these as we go along.

In [4]:
# Let's demonstrate how the eq and order parameters work.
# We'll define a small dataclass to illustrate comparisons:

@dataclass(eq=True, order=True)
class Point:
    x: float
    y: float

point1 = Point(1, 5)
point2 = Point(1, 5)
point3 = Point(2, 5)

print("Equality check:", point1 == point2)  # Should be True
print("Less-than check:", point1 < point3)  # Should be True because x=1 < x=2
print("Greater-than check:", point3 > point1)  # Should be True

Equality check: True
Less-than check: True
Greater-than check: True


**Notes**:
- By setting `order=True`, Python automatically creates ordering methods based on the fields in the order they are declared.
- Compare that to a normal class, where we would have to implement these methods ourselves if we wanted custom ordering.

----

## 3. Default Values and the `field` Function

### 3.1 Default Values

Just like normal Python class attributes, we can set defaults for our fields by assigning them at class level.

In [5]:
@dataclass
class Book:
    title: str
    author: str
    pages: int = 0  # default value for pages
    published_year: int = 2023  # default value for published_year

b1 = Book("The Great Gatsby", "F. Scott Fitzgerald")
b2 = Book("1984", "George Orwell", 328, 1949)

print(b1)
print(b2)

Book(title='The Great Gatsby', author='F. Scott Fitzgerald', pages=0, published_year=2023)
Book(title='1984', author='George Orwell', pages=328, published_year=1949)


### 3.2 The `field` Function
Dataclasses provide a `field()` function that gives fine-grained control over how each field is handled.
It can be used to specify:
- a default value
- a default factory (e.g., for mutable types like lists or dictionaries)
- whether a field should be included or excluded in comparison and representation
- metadata for additional info about the field

**Important**: For mutable defaults (like lists, dicts), we must use `default_factory` instead of a direct default, to avoid unwanted behavior caused by sharing the same default instance across objects.

In [8]:
# We can also use metadata to store extra information about a field:

@dataclass
class Product:
    name: str
    price: float
    currency: str = field(default="USD", metadata={"description": "Currency code"})

product = Product("Laptop", 999.99)

print(product)
print(product.__dataclass_fields__['currency'].metadata)

Product(name='Laptop', price=999.99, currency='USD')
{'description': 'Currency code'}


**Key takeaway**:
- Use `default_factory` for mutable types.
- Use `field` to configure how each field behaves for initialization, comparison, representation, etc.

----

## 4. Generated Methods

When you use `@dataclass`, Python generates the following methods for you (unless disabled):
- `__init__`
- `__repr__`
- `__eq__` (if `eq=True`)
- `__hash__` (if `frozen=True` or the fields used in comparisons are hashable)
- `__lt__`, `__le__`, `__gt__`, `__ge__` (if `order=True`)

You can override these methods in your class manually if needed. If you do, dataclass generation for that method is disabled.

In [None]:
@dataclass
class Vehicle:
    brand: str
    model: str

    # Let's override the __repr__ method to show a custom string
    def __repr__(self):
        return f"Vehicle({self.brand} - {self.model})"

v = Vehicle("Toyota", "Corolla")
print(v)  # Notice how our custom __repr__ is used instead of the default dataclass version

----

## 5. Comparison and Ordering

We already discussed `eq` and `order` briefly.
- `eq=True` adds `__eq__` to check if all fields are equal.
- `order=True` adds all ordering methods based on the field definition order.

**Remember**: If `eq=False`, then `order` must also be `False`.

We showed an example with the `Point` class above demonstrating how these methods work.

## 6. Frozen Dataclasses (Immutability)

If you set `frozen=True`, your dataclass becomes immutable:
- Fields cannot be modified after initialization.
- A frozen dataclass automatically gains a `__hash__` method, making them hashable if all their fields are hashable.

In [9]:
@dataclass(frozen=True)
class ImmutablePoint:
    x: float
    y: float

ip = ImmutablePoint(3.0, 4.0)
print(ip)

# Trying to modify ip.x will raise an exception:
ip.x = 10.0


ImmutablePoint(x=3.0, y=4.0)


FrozenInstanceError: cannot assign to field 'x'

**Why use frozen?**
- Ensures objects cannot be modified after creation.
- Useful in contexts where immutability is desired (e.g., functional style, concurrency, dictionary keys).

----

## 7. Advanced Features

### 7.1 `asdict` and `astuple`
- `asdict(obj)` converts a dataclass instance (and nested dataclasses) to a dictionary.
- `astuple(obj)` converts a dataclass instance (and nested dataclasses) to a tuple.

In [None]:
@dataclass
class Address:
    street: str
    city: str

@dataclass
class Employee:
    name: str
    address: Address
    salary: float

emp = Employee("Carol", Address("123 Main St", "Metropolis"), 75000)

print("Dictionary:", asdict(emp))
print("Tuple:", astuple(emp))

## 8. Inheritance with Dataclasses

Dataclasses support inheritance. However, some considerations:
- If a subclass is also a dataclass, it extends the base class’s fields and can define additional fields.
- The default parameter values for `init` in the `dataclass` decorator can differ in subclasses.
- If you inherit from a non-dataclass, that might cause some complications, but it’s still possible.

In [10]:
@dataclass
class Animal:
    name: str

@dataclass
class Dog(Animal):
    breed: str
    age: int

dog = Dog("Rex", "Golden Retriever", 5)
print(dog)

Dog(name='Rex', breed='Golden Retriever', age=5)


**Note**:
- In the above example, `Animal` is a dataclass. `Dog` also uses the `@dataclass` decorator, so the fields from `Animal` (`name`) and from `Dog` (`breed`, `age`) are all included in the final class.

----

## 9. Best Practices and Common Pitfalls

1. **Use `frozen=True` when immutability is beneficial**:
- Minimizes bugs caused by unwanted state changes.
- Provides built-in hashability if fields are hashable.

2. **Use `default_factory` for mutable types**:
    - Avoids shared references between instances.

3. **Be mindful when combining `frozen=True` with other decorators**:
    - For example, a property setter won’t work if the class is frozen.

4. **Overriding generated methods**:
- If you override `__init__`, `__repr__`, or `__eq__`, Dataclasses won’t generate them for you. Make sure you know what you’re doing.

5. **`field(init=False)`**:
- Fields that should not be passed to `__init__` (e.g., computed fields) can be excluded from the constructor.
- Alternatively, you can calculate them in a `__post_init__` method.

6. **Watch out for performance**:
    - For large-scale or performance-critical code, generating a huge number of dataclasses might have overhead. However, generally, it’s not a big issue.

7. **Dataclass vs NamedTuple**:
- `NamedTuple` is immutably typed but can be less flexible than dataclasses (especially for customizing behavior or adding methods).
- Dataclasses can have both mutable and immutable variants.

**Summary**:
- Dataclasses significantly reduce boilerplate.
- Make sure to use the advanced features (`field`, `frozen`, `order`) wisely.
- They are a powerful addition to Python for clean, maintainable code.