# Python Data Classes Tutorial: N Things You Must Learn About Data Classes

## Why learn about data classes?

Data classes are one of the features of Python that after you discover them, you are never going back to the old way. Consider this regular class:

In [2]:
class Exercise:
    def __init__(self, name, reps, sets, weight):
        self.name = name
        self.reps = reps
        self.sets = sets
        self.weight = weight

To me, that class definition is very inefficient - in the `__init__` method, you repeat each parameter at least three times. You may not think this is a big deal but think about how often you write classes in your lifetime with much more parameters. 

In comparison, take a look at the data classes alternative of the above code:

In [3]:
from dataclasses import dataclass


@dataclass
class Exercise:
    name: str
    reps: int
    sets: int
    weight: float  # Weight in lbs

This modest-looking piece of code is orders of magnitude better than the regular class. The `@dataclass` decorator isn't just implementing the `__init__` method automatically but `__repr__` and `__eq__` under the hood as well. All four comparison operators are also a single parameter away from being defined on the fly. 

Data classes also support immutability, advanced type hinting, advanced default value definitions and much more. All these features are available through regular classes too, but they take at least five times as much code.

So, the purpose of this tutorial is to convert you from outdated class definitions to the modern one - data classes. Let's get started!

## Introduction to data classes

1. Data classes are shorter
2. The init method is already implemented
3. Work just like a regular class but takes much less code
4. The init method is already implemented
5. eq, repr are alread yimplemented
6. Comparing data classes with == returns true
7. Requires the @dataclass decorator
8. Requires type hints
9. But doesn't actually enforce type hints
10. Can be created with make_dataclass function

In [8]:
from dataclasses import dataclass


@dataclass
class Exercise:
    name: str
    reps: int
    sets: int
    weight: float


ex1 = Exercise("Bench press", 10, 3, 52.5)

In [9]:
ex1

Exercise(name='Bench press', reps=10, sets=3, weight=52.5)

In [6]:
class Exercise:
    def __init__(self, name, reps, sets, weight):
        self.name = name
        self.reps = reps
        self.sets = sets
        self.weight = weight


ex3 = Exercise("Bench press", 10, 3, 52.5)

ex3

<__main__.Exercise at 0x7fc5bc35cd60>

In [10]:
ex1.name

'Bench press'

In [11]:
ex2 = Exercise("Bench press", 10, 3, 52.5)

In [12]:
ex1 == ex1

True

In [13]:
ex1 == ex2

True

In [14]:
silly_exercise = Exercise("Bench press", "ten", "three sets", 52.5)

silly_exercise.sets

'three sets'

In [15]:
from dataclasses import make_dataclass

Exercise = make_dataclass(
    "Exercise",
    [
        ("name", str),
        ("reps", int),
        ("sets", int),
        ("weight", float),
    ],
)

ex3 = Exercise("Deadlifts", 8, 3, 69.0)
ex3

Exercise(name='Deadlifts', reps=8, sets=3, weight=69.0)

10. Default values can be easily added
11. Non-default fields should come first


In [16]:
@dataclass
class Exercise:
    name: str = "Push-ups"
    reps: int = 10
    sets: int = 3
    weight: float = 0


ex5 = Exercise()
ex5

Exercise(name='Push-ups', reps=10, sets=3, weight=0)

In [17]:
@dataclass
class Exercise:
    name: str = "Push-ups"
    reps: int = 10
    sets: int = 3
    weight: float  # NOT ALLOWED


ex5 = Exercise()
ex5

TypeError: non-default argument 'weight' follows default argument

## Advanced data classes

In [82]:
from dataclasses import dataclass
from typing import List


@dataclass
class Exercise:
    name: str = "Push-ups"
    reps: int = 10
    sets: int = 3
    weight: float = 0


@dataclass
class WorkoutSession:
    exercises: List[Exercise]
    duration_minutes: int

In [83]:
from typing import Any


@dataclass
class Dummy:
    attr: Any

In [84]:
# Define the Exercise instances for HIIT training
ex1 = Exercise(name="Burpees", reps=15, sets=3)
ex2 = Exercise(name="Mountain Climbers", reps=20, sets=3)
ex3 = Exercise(name="Jump Squats", reps=12, sets=3)
exercises_monday = [ex1, ex2, ex3]

hiit_monday = WorkoutSession(exercises=exercises_monday, duration_minutes=30)

```python
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = []
    duration_minutes: int = None


hiit_monday = WorkoutSession("25-02-2024")
```

```python
ValueError: mutable default <class 'list'> for field exercises is not allowed: use default_factory
```

In [85]:
from dataclasses import dataclass, field


@dataclass
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=list)
    duration_minutes: int = 0


hiit_monday = WorkoutSession("26-02-2024")

In [87]:
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=list)
    duration_minutes: int = 0

    def add_exercise(self, exercise: Exercise):
        self.exercises.append(exercise)

    def increase_duration(self, minutes: int):
        self.duration_minutes += minutes


hiit_monday = WorkoutSession()

In [88]:
# Add burpees
hiit_monday.add_exercise(ex1)
hiit_monday.increase_duration(10)

# Print the session
print(hiit_monday)

WorkoutSession(exercises=[Exercise(name='Burpees', reps=15, sets=3, weight=0)], duration_minutes=10)


In [89]:
def create_warmup():
    return [
        Exercise("Jumping jacks", 30, 1),
        Exercise("Squat lunges", 10, 2),
        Exercise("High jumps", 20, 1),
    ]

In [91]:
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = 5  # Increase the default duration as well

    def add_exercise(self, exercise: Exercise):
        self.exercises.append(exercise)

    def increase_duration(self, minutes: int):
        self.duration_minutes += minutes


hiit_monday = WorkoutSession()
hiit_monday

WorkoutSession(exercises=[Exercise(name='Jumping jacks', reps=30, sets=1, weight=0), Exercise(name='Squat lunges', reps=10, sets=2, weight=0), Exercise(name='High jumps', reps=20, sets=1, weight=0)], duration_minutes=5)

In [92]:
hiit_monday.add_exercise(exercises_monday)

hiit_monday  # Too verbose

WorkoutSession(exercises=[Exercise(name='Jumping jacks', reps=30, sets=1, weight=0), Exercise(name='Squat lunges', reps=10, sets=2, weight=0), Exercise(name='High jumps', reps=20, sets=1, weight=0), [Exercise(name='Burpees', reps=15, sets=3, weight=0), Exercise(name='Mountain Climbers', reps=20, sets=3, weight=0), Exercise(name='Jump Squats', reps=12, sets=3, weight=0)]], duration_minutes=5)

In [93]:
@dataclass
class Exercise:
    name: str = "Push-ups"
    reps: int = 10
    sets: int = 3
    weight: float = 0

    def __str__(self):
        base = f"{self.name}: {self.reps}/{self.sets}"
        if self.weight == 0:
            return base
        return base + f", {self.weight} lbs"


ex1 = Exercise(name="Burpees", reps=15, sets=3)
ex1

Exercise(name='Burpees', reps=15, sets=3, weight=0)

In [94]:
print(ex1)

Burpees: 15/3


In [95]:
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = 5  # Increase the default duration as well

    def add_exercise(self, exercise: Exercise):
        self.exercises.append(exercise)

    def increase_duration(self, minutes: int):
        self.duration_minutes += minutes

    def __str__(self):
        base = ""

        for ex in self.exercises:
            base += str(ex) + "\n"
        base += f"\nSession duration: {self.duration_minutes} minutes."

        return base


hiit_monday = WorkoutSession()
print(hiit_monday)

Jumping jacks: 30/1
Squat lunges: 10/2
High jumps: 20/1

Session duration: 5 minutes.


## Comparing

In [96]:
hiit_wednesday = WorkoutSession()

hiit_wednesday.add_exercise(Exercise("Pull-ups", 7, 3))
print(hiit_wednesday)

Jumping jacks: 30/1
Squat lunges: 10/2
High jumps: 20/1
Pull-ups: 7/3

Session duration: 5 minutes.


In [97]:
hiit_monday > hiit_wednesday

TypeError: '>' not supported between instances of 'WorkoutSession' and 'WorkoutSession'

In [108]:
@dataclass(order=True)
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = 5  # Increase the default duration as well

    def add_exercise(self, exercise: Exercise):
        self.exercises.append(exercise)

    def increase_duration(self, minutes: int):
        self.duration_minutes += minutes

    def __str__(self):
        base = ""

        for ex in self.exercises:
            base += str(ex) + "\n"
        base += f"\nSession duration: {self.duration_minutes} minutes."

        return base

In [109]:
hiit_monday = WorkoutSession()
# hiit_monday.add_exercise(...)
hiit_monday.increase_duration(10)

hiit_wednesday = WorkoutSession()

hiit_monday > hiit_wednesday

True

## Post-init field manipulation

In [110]:
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = field(default=0, init=False)

    def __post_init__(self):
        set_duration = 3
        for ex in self.exercises:
            self.duration_minutes += ex.sets * set_duration

    def add_exercise(self, exercise: Exercise):
        self.exercises.append(exercise)

    def increase_duration(self, minutes: int):
        self.duration_minutes += minutes

    def __str__(self):
        base = ""

        for ex in self.exercises:
            base += str(ex) + "\n"
        base += f"\nSession duration: {self.duration_minutes} minutes."

        return base

In [113]:
hiit_friday = WorkoutSession()
hiit_friday.add_exercise(Exercise("Sit-ups", 20, 3))

print(hiit_friday)

Jumping jacks: 30/1
Squat lunges: 10/2
High jumps: 20/1
Sit-ups: 20/3

Session duration: 12 minutes.


## Immutability in data classes

In [114]:
@dataclass(frozen=True)
class FrozenExercise:
    name: str
    reps: int
    sets: int
    weight: int | float = 0


ex1 = FrozenExercise("Muscle-ups", 5, 3)
ex1.sets

3

In [115]:
ex1.sets = 5

FrozenInstanceError: cannot assign to field 'sets'

In [116]:
ex1.new_field = 10

FrozenInstanceError: cannot assign to field 'new_field'

In [117]:
@dataclass(frozen=True)
class ImmutableWorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = 5


session1 = ImmutableWorkoutSession()

In [118]:
session1.exercises = exercises_monday

FrozenInstanceError: cannot assign to field 'exercises'

In [119]:
session1.exercises[1] = FrozenExercise("Totally new exercise", 5, 5)

print(session1)

ImmutableWorkoutSession(exercises=[Exercise(name='Jumping jacks', reps=30, sets=1, weight=0), FrozenExercise(name='Totally new exercise', reps=5, sets=5, weight=0), Exercise(name='High jumps', reps=20, sets=1, weight=0)], duration_minutes=5)


In [120]:
@dataclass(frozen=True)
class ImmutableWorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = 5


@dataclass(frozen=True)
class CardioWorkoutSession(ImmutableWorkoutSession):
    intensity_level: str  # Not allowed, must have a default

TypeError: non-default argument 'intensity_level' follows default argument

## Other parameters of `dataclass` and `field`

## Conclusion and further resources