# Chapter 5. Data Class Builders
---

## ToC

[More About @dataclass](#more-about-dataclass)

1. [Field Options](#field-options)  
2. [Post-init Processing](#post-init-processing)  
3. [Initialization Variables That Are Not Fields](#initialization-variables-that-are-not-fields)  
4. [More on ClassVar and InitVar](#more-on-classvar-and-initvar)  
5. [@dataclass Example: Dublin Core Resource Record](#dataclass-example-dublin-core-resource-record)
        
---

## More About @dataclass

he decorator accepts several keyword arguments. This is its signature:

```python
@dataclass(*, init=True, repr=True, eq=True, order=False,
              unsafe_hash=False, frozen=False)
```


![Figure 80](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/80.PNG)

`@dataclass` is a convenience tool that automatically generates boilerplate code for classes that primarily store data.
When you use a dataclass, Python auto-generates methods like:

`__init__` — constructor

`__repr__` — developer-friendly string representation

`__eq__` — equality comparison

`__hash__` — hash support (if enabled)

```python
from dataclasses import dataclass

@dataclass
class Point:
    x: int
    y: int
```
Is equivalent to:
```python
class Point:
    def __init__(self, x: int, y: int):
        self.x = x
        self.y = y

    def __repr__(self):
        return f"Point(x={self.x}, y={self.y})"

    def __eq__(self, other):
        return isinstance(other, Point) and self.x == other.x and self.y == other.y
```        


**Default Values**

`frozen=True`  
Protects against accidental changes to the class instances.

`order=True`  
Allows sorting of instances of the data class.

If the `eq` and `frozen` arguments are both True, `@dataclass` produces a suitable
`__hash__` method, so the instances will be hashable

In [1]:
from dataclasses import dataclass

@dataclass(frozen=True, eq=False)
class Point:
    x: int
    y: int

p1 = Point(1, 2)
p2 = Point(1, 2)

print(p1 == p2)  # False (uses object identity)
print(hash(p1), hash(p2))  # Same value due to same fields


False
115819066664 115817491226


### Field Options

The instance fields you declare will become parameters in the generated `__init__`. Python does not allow parameters without defaults after parameters with defaults, therefore after you declare a field with a default value, all remaining fields must also have default values.

**Example: Valid Case**

In [1]:
from dataclasses import dataclass

@dataclass
class Person:
    name: str            # No default
    age: int = 30        # Default value provided


This will generate an `__init__` like:

```python
def __init__(self, name: str, age: int = 30):
    ...
```    

**Example: Invalid Case**

In [1]:
from dataclasses import dataclass

@dataclass
class Person:
    age: int = 30        # Default value provided
    name: str            # No default

TypeError: non-default argument 'name' follows default argument 'age'

This will generate an `__init__` like:

```python
def __init__(self, age: int = 30, name: str):
    ...
```

This violates Python’s function signature rule: **non-default arguments must come before default arguments.**

**Mutable default values are a common source of bugs for beginning Python developers.**  
Class attributes are often used as default attribute values for
instances, including in data classes. And `@dataclass` uses the default values in the type hints to generate parameters with defaults for `__init__`.

In [2]:
@dataclass
class ClubMember:
    name: str
    guests: list = []

ValueError: mutable default <class 'list'> for field guests is not allowed: use default_factory

The `ValueError` message explains the problem and suggests a solution: use `default_factory`: lets you provide a function, class, or any other callable, which will be invoked with zero arguments to build a default value each time an instance of the data class is created.

In [6]:
from dataclasses import dataclass, field
@dataclass
class ClubMember:
    name: str
    guests: list = field(default_factory=list)

More precise:

In [7]:
from dataclasses import dataclass, field
@dataclass
class ClubMember:
    name: str
    guests: list[str] = field(default_factory=list)

In [11]:
alice = ClubMember("Alice", ["Alice's guest"])
bob = ClubMember("Bob", ["Bob's gues"])

alice.guests.append("Charlie")
bob.guests.append("David")

print(alice)
print(bob)

ClubMember(name='Alice', guests=["Alice's guest", 'Charlie'])
ClubMember(name='Bob', guests=["Bob's gues", 'David'])


![Figure 81](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/81.PNG)

![Figure 82](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/82.PNG)

The `default` option exists because the `field` call takes the place of the default value
in the field annotation. If you want to create an `athlete` field with a default value of
`False`, and also omit that field from the `__repr__` method, you’d write this:

In [14]:
@dataclass
class ClubMember:
    name: str
    guests: list = field(default_factory=list)
    athlete: bool = field(default=False, repr=False)

In [15]:
m = ClubMember("Bob")
print(m)

ClubMember(name='Bob', guests=[])


Without `repr=False`, you'd get:

```python
ClubMember(name='Bob', guests=[], athlete=False)
```

### Post-init Processing

The `__init__` method generated by `@dataclass` only takes the arguments passed and assigns them—or their default values, if missing—to the instance attributes that are instance fields. But you may need to do more than that to initialize the instance. If that’s the case, you can provide a `__post_init__` method.

**Common use cases**: validation and computing field values based on other fields.

In [None]:
import sys
from pathlib import Path

# Add the 'materials' directory to the sys.path
sys.path.append(str(Path("materials").resolve()))

# Otherwise, we had to do following:
#from .materials.hackerclub import HackerClubMember

from hackerclub import HackerClubMember # type: ignore


In [2]:
HackerClubMember.__doc__

"HackerClubMember(name: str, guests: list = <factory>, handle: str = '')"

`<factory>` is a short way of saying that some callable will produce the default value for guests (in our case, the factory is the list) class)

In [3]:
anna = HackerClubMember('Anna Ravenscroft', handle='AnnaRaven')
anna

HackerClubMember(name='Anna Ravenscroft', guests=[], handle='AnnaRaven')

In [4]:
# If ``handle`` is omitted, it's set to the first part of the member's name::
leo = HackerClubMember('Leo Rochael')
leo

HackerClubMember(name='Leo Rochael', guests=[], handle='Leo')

In [5]:
test = HackerClubMember('test')
test

HackerClubMember(name='test', guests=[], handle='test')

In [6]:
# Members must have a unique handle. The following ``leo2`` will not be created,
# because its ``handle`` would be 'Leo', which was taken by ``leo``::

leo2 = HackerClubMember('Leo DaVinci')

ValueError: handle 'Leo' already exists.

In [7]:
# To fix, ``leo2`` must be created with an explicit ``handle``::
leo2 = HackerClubMember('Leo DaVinci', handle='Neo')
leo2

HackerClubMember(name='Leo DaVinci', guests=[], handle='Neo')

![Figure 83](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/83.PNG)

As done in the implementation of the subclass HackerClubMember, it's not satisfactory to a static type checker, next sections explains why, and how to improve it.

### Typed Class Attributes

In [1]:
import os
print(os.getcwd())  # show current working directory
print(os.listdir("./materials"))  # list files in materials directory

c:\Users\hamed\OneDrive\Documenti\Training\Python\FluentPython\repo\Training-Python\src\Part_I\Chapter_05_DataClassBuilders
['club.py', 'hackerclub.py', 'hackerclub_annotated.py', 'nocheck_demo.py']


In [3]:
!mypy ./materials/hackerclub.py

materials\hackerclub.py:37: [1m[91merror:[0m Need type annotation for [0m[1m"all_handles"[0m (hint: [0m[1m"all_handles: set[<type>] = ..."[0m)  [0m[93m[var-annotated][0m
[1m[91mFound 1 error in 1 file (checked 1 source file)[0m


Unfortunately, the hint provided by `Mypy` is not helpful in the context of `@dataclass` usage. First, it suggests using Set, but I am using Python 3.13 so I can use `set`—and avoid importing `Set` from typing. More importantly, if we add a type hint like `set[…]` to `all_handles`, `@dataclass` will find that
annotation and make `all_handles` an instance field.

To code a class variable with a type hint, we need to use a pseudotype named `typing.ClassVar`, which leverages the generics `[]` notation to set the type of the variable and also declare it a class attribute.

**Solution**

```python
all_handles: ClassVar[set[str]] = set()

```
That type hint is saying:
> `all_handles` is a class attribute of type `set`-of-`str`, with an empty `set` as its default
value.

In [1]:
!mypy ./materials/hackerclub_annotated.py

[1m[92mSuccess: no issues found in 1 source file[0m


### Initialization Variables That Are Not Fields

Sometimes you may need to pass arguments to `__init__` that are not instance fields.
Such arguments are called *init-only variables* by the [`dataclasses` documentation](https://docs.python.org/3/library/dataclasses.html#init-only-variables).

**Example from documentation**

In [3]:
from dataclasses import dataclass, field, InitVar
from typing import TYPE_CHECKING

# Dummy database class for demonstration
class DatabaseType:
    def lookup(self, key):
        return 42  # Dummy value for testing

my_database = DatabaseType()

@dataclass
class C:
    i: int
    j: int | None = None
    database: InitVar[DatabaseType | None] = None

    def __post_init__(self, database):
        if self.j is None and database is not None:
            self.j = database.lookup('j')

In [4]:
# Create an instance
c = C(10, database=my_database)

# Print the instance to verify
print(c)


C(i=10, j=42)


### More on `ClassVar` and `InitVar`

#### 🔹 `ClassVar`

- Defined using `typing.ClassVar`
- Declares a variable shared across all instances
- Not included in `__init__`, `__repr__`, `__eq__`, or stored in the instance
- Useful for constants or static configuration

In [19]:
from dataclasses import dataclass
from typing import ClassVar

@dataclass
class MyClass:
    x: int
    y: int
    scale: int = 10  # instance-level

a = MyClass(1, 2)
b = MyClass(1, 2, scale=20)
print(a.scale)
print(b.scale)

10
20


In [20]:
from dataclasses import dataclass
from typing import ClassVar

@dataclass
class MyClass:
    x: int
    y: int
    scale: ClassVar[int] = 10  # Class-level, not instance-level

a = MyClass(1, 2)
print(a.scale)

10


In [None]:
b = MyClass(1, 2, scale=20) # type: ignore

TypeError: MyClass.__init__() got an unexpected keyword argument 'scale'


#### 🔹 `InitVar`

- Defined using `dataclasses.InitVar`
- Passed to `__init__` and available only in `__post_init__`
- Not stored in the instance
- Useful for temporary initialization logic

In [25]:
from dataclasses import dataclass, InitVar

@dataclass
class MyClass:
    x: int
    y: int
    factor: InitVar[int]

    def __post_init__(self, factor):
        self.x *= factor
        self.y *= factor

obj = MyClass(1, 2, factor=3)
print(obj.x, obj.y)

3 6


In [None]:
print(obj.factor) # type: ignore

AttributeError: 'MyClass' object has no attribute 'factor'

**Summary Table**

| Feature                  | `ClassVar`                     | `InitVar`                          |
|--------------------------|--------------------------------|------------------------------------|
| Included in `__init__`   | ❌ No                          | ✅ Yes (but only for `__post_init__`) |
| Stored in instance       | ❌ No                          | ❌ No                              |
| Used in `__post_init__`  | ❌ Not passed automatically     | ✅ Yes                             |
| Typical use case         | Shared/static config           | Temporary init-time parameters     |

### @dataclass Example: Dublin Core Resource Record

[Dublin Core](https://www.dublincore.org/specifications/dublin-core/) provides the foundation for a more typical `@dataclass` example:

> The Dublin Core Schema is a small set of vocabulary terms that can be used to describe
digital resources (video, images, web pages, etc.), as well as physical resources such as
books or CDs, and objects like artworks.

In [None]:
import sys
from pathlib import Path

# Add the 'materials' directory to the sys.path
sys.path.append(str(Path("materials").resolve()))


from resource import Resource # type: ignore

In [7]:
import datetime
from enum import Enum, auto


class ResourceType(Enum):  # <1>
    BOOK = auto()
    EBOOK = auto()
    VIDEO = auto()


description = "Improving the design of existing code"
book = Resource(
    "978-0-13-475759-9",
    "Refactoring, 2nd Edition",
    ["Martin Fowler", "Kent Beck"],
    datetime.date(2018, 11, 19),
    ResourceType.BOOK,
    description,
    "EN",
    ["computer programming", "OOP"],
)

book

Resource(identifier='978-0-13-475759-9', title='Refactoring, 2nd Edition', creators=['Martin Fowler', 'Kent Beck'], date=datetime.date(2018, 11, 19), type=<ResourceType.BOOK: 1>, description='Improving the design of existing code', language='EN', subjects=['computer programming', 'OOP'])

The `__repr__` generated by `@dataclass` is OK, but we can make it more readable.
This is the format we want from `repr(book)`:

```python
>>> book # doctest: +NORMALIZE_WHITESPACE
Resource(
    identifier = '978-0-13-475759-9',
    title = 'Refactoring, 2nd Edition',
    creators = ['Martin Fowler', 'Kent Beck'],
    date = datetime.date(2018, 11, 19),
    type = <ResourceType.BOOK: 1>,
    description = 'Improving the design of existing code',
    language = 'EN',
    subjects = ['computer programming', 'OOP'],
)
```

In [8]:
import sys
from pathlib import Path

# Add the 'materials' directory to the sys.path
sys.path.append(str(Path("materials").resolve()))


from resource_repr import Resource # type: ignore

In [9]:
book = Resource(
    "978-0-13-475759-9",
    "Refactoring, 2nd Edition",
    ["Martin Fowler", "Kent Beck"],
    datetime.date(2018, 11, 19),
    ResourceType.BOOK,
    description,
    "EN",
    ["computer programming", "OOP"],
)

book

Resource(
    identifier = '978-0-13-475759-9',
    title = 'Refactoring, 2nd Edition',
    creators = ['Martin Fowler', 'Kent Beck'],
    date = datetime.date(2018, 11, 19),
    type = <ResourceType.BOOK: 1>,
    description = 'Improving the design of existing code',
    language = 'EN',
    subjects = ['computer programming', 'OOP'],
)