# Python Data Classes Tutorial: N Things You Must Learn About Data Classes

## Why learn about data classes?

Data classes are one of the features of Python that after you discover them, you are never going back to the old way. Consider this regular class:

In [1]:
class Exercise:
    def __init__(self, name, reps, sets, weight):
        self.name = name
        self.reps = reps
        self.sets = sets
        self.weight = weight

To me, that class definition is very inefficient - in the `__init__` method, you repeat each parameter at least three times. This may not sound like a big deal but think about how often you write classes in your lifetime with much more parameters. 

In comparison, take a look at the data classes alternative of the above code:

In [1]:
from dataclasses import dataclass


@dataclass
class Exercise:
    name: str
    reps: int
    sets: int
    weight: float  # Weight in lbs

This modest-looking piece of code is orders of magnitude better than a regular class. The tiny `@dataclass` decorator is implementing `__init__`, `__repr__`, `__eq__` classes behind the scenes, which would have taken at least 20 lines of code manually. Besides, many other features such as comparison operators, object ordering and immutability are all a single line away from being magically created for our class.

So, the purpose of this tutorial is to show you why data classes are the best thing to happen to Python if you love object-oriented programming. 

Let's get started!

## Basics of data classes

1. Defining data classes
2. Mentioning the automatically-generated functions
3. Mentioning that type hints are required but not actually enforced
4. Accepts any type from typing module
5. Create data classes on the fly with `make_dataclass`
6. Default values can be easily added
7. Default values must come after non-defaults

1. Defining data classes

In [None]:
from dataclasses import dataclass


@dataclass
class Exercise:
    name: str
    reps: int
    sets: int
    weight: float


ex1 = Exercise("Bench press", 10, 3, 52.5)

2. Mentioning the automatically-generated functions

In [None]:
ex1

In [None]:
class Exercise:
    def __init__(self, name, reps, sets, weight):
        self.name = name
        self.reps = reps
        self.sets = sets
        self.weight = weight


ex3 = Exercise("Bench press", 10, 3, 52.5)

ex3

In [None]:
ex2 = Exercise("Bench press", 10, 3, 52.5)

In [None]:
ex1 == ex1

In [None]:
ex1 == ex2

In [83]:
from typing import Any


@dataclass
class Dummy:
    attr: Any

In [None]:
silly_exercise = Exercise("Bench press", "ten", "three sets", 52.5)

silly_exercise.sets

In [None]:
from dataclasses import make_dataclass

Exercise = make_dataclass(
    "Exercise",
    [
        ("name", str),
        ("reps", int),
        ("sets", int),
        ("weight", float),
    ],
)

ex3 = Exercise("Deadlifts", 8, 3, 69.0)
ex3

In [None]:
@dataclass
class Exercise:
    name: str = "Push-ups"
    reps: int = 10
    sets: int = 3
    weight: float = 0


ex5 = Exercise()
ex5

Add field intro here, with metadata also introduced

In [None]:
@dataclass
class Exercise:
    name: str = "Push-ups"
    reps: int = 10
    sets: int = 3
    weight: float  # NOT ALLOWED


ex5 = Exercise()
ex5

## Flexibility in data classes

### Default factories
1. Show how to handle immutable default values of fields



In [None]:
from dataclasses import dataclass
from typing import List


@dataclass
class Exercise:
    name: str = "Push-ups"
    reps: int = 10
    sets: int = 3
    weight: float = 0


@dataclass
class WorkoutSession:
    exercises: List[Exercise]
    duration_minutes: int

In [None]:
# Define the Exercise instances for HIIT training
ex1 = Exercise(name="Burpees", reps=15, sets=3)
ex2 = Exercise(name="Mountain Climbers", reps=20, sets=3)
ex3 = Exercise(name="Jump Squats", reps=12, sets=3)
exercises_monday = [ex1, ex2, ex3]

hiit_monday = WorkoutSession(exercises=exercises_monday, duration_minutes=30)

```python
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = []
    duration_minutes: int = None


hiit_monday = WorkoutSession("25-02-2024")
```

```python
ValueError: mutable default <class 'list'> for field exercises is not allowed: use default_factory
```

In [None]:
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=list)
    duration_minutes: int = 0


hiit_monday = WorkoutSession()

In [None]:
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=list)
    duration_minutes: int = 0

    def add_exercise(self, exercise: Exercise):
        self.exercises.append(exercise)

    def increase_duration(self, minutes: int):
        self.duration_minutes += minutes


hiit_monday = WorkoutSession()

In [None]:
# Add burpees
hiit_monday.add_exercise(ex1)
hiit_monday.increase_duration(10)

# Print the session
print(hiit_monday)

In [None]:
def create_warmup():
    return [
        Exercise("Jumping jacks", 30, 1),
        Exercise("Squat lunges", 10, 2),
        Exercise("High jumps", 20, 1),
    ]

In [None]:
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = 5  # Increase the default duration as well

    def add_exercise(self, exercise: Exercise):
        self.exercises.append(exercise)

    def increase_duration(self, minutes: int):
        self.duration_minutes += minutes


hiit_monday = WorkoutSession()
hiit_monday

In [None]:
hiit_monday.add_exercise(exercises_monday)

hiit_monday  # Too verbose

### `__repr__` and `__str__` in data classes
1. Show how to override `__str__` in data classes



In [None]:
@dataclass
class Exercise:
    name: str = "Push-ups"
    reps: int = 10
    sets: int = 3
    weight: float = 0

    def __str__(self):
        base = f"{self.name}: {self.reps}/{self.sets}"
        if self.weight == 0:
            return base
        return base + f", {self.weight} lbs"


ex1 = Exercise(name="Burpees", reps=15, sets=3)
ex1

In [None]:
print(ex1)

In [None]:
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = 5  # Increase the default duration as well

    def add_exercise(self, exercise: Exercise):
        self.exercises.append(exercise)

    def increase_duration(self, minutes: int):
        self.duration_minutes += minutes

    def __str__(self):
        base = ""

        for ex in self.exercises:
            base += str(ex) + "\n"
        base += f"\nSession duration: {self.duration_minutes} minutes."

        return base


hiit_monday = WorkoutSession()
print(hiit_monday)

### Comparison in data classes
1. Show that comparison works in order as the order of fields
2. Show how to set `order=True`

In [None]:
hiit_wednesday = WorkoutSession()

hiit_wednesday.add_exercise(Exercise("Pull-ups", 7, 3))
print(hiit_wednesday)

In [None]:
hiit_monday > hiit_wednesday

In [None]:
@dataclass(order=True)
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = 5  # Increase the default duration as well

    def add_exercise(self, exercise: Exercise):
        self.exercises.append(exercise)

    def increase_duration(self, minutes: int):
        self.duration_minutes += minutes

    def __str__(self):
        base = ""

        for ex in self.exercises:
            base += str(ex) + "\n"
        base += f"\nSession duration: {self.duration_minutes} minutes."

        return base

In [None]:
hiit_monday = WorkoutSession()
# hiit_monday.add_exercise(...)
hiit_monday.increase_duration(10)

hiit_wednesday = WorkoutSession()

hiit_monday > hiit_wednesday

Add compare field to false

### Post-init field manipulation

In [None]:
@dataclass
class WorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = field(default=0, init=False)

    def __post_init__(self):
        set_duration = 3
        for ex in self.exercises:
            self.duration_minutes += ex.sets * set_duration

    def add_exercise(self, exercise: Exercise):
        self.exercises.append(exercise)

    def increase_duration(self, minutes: int):
        self.duration_minutes += minutes

    def __str__(self):
        base = ""

        for ex in self.exercises:
            base += str(ex) + "\n"
        base += f"\nSession duration: {self.duration_minutes} minutes."

        return base

In [None]:
hiit_friday = WorkoutSession()
hiit_friday.add_exercise(Exercise("Sit-ups", 20, 3))

print(hiit_friday)

## Immutability in data classes

In [None]:
hiit_friday = WorkoutSession()
hiit_friday.add_exercise(Exercise("Sit-ups", 20, 3))

print(hiit_friday)

In [None]:
ex1.sets = 5

In [None]:
ex1.new_field = 10

In [None]:
@dataclass(frozen=True)
class ImmutableWorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = 5


session1 = ImmutableWorkoutSession()

In [None]:
session1.exercises = exercises_monday

In [None]:
session1.exercises[1] = FrozenExercise("Totally new exercise", 5, 5)

print(session1)

## Inheritance in data classes

1. Inheritance works like always
2. Just make sure non-default arguments don't follow defaults

In [None]:
@dataclass(frozen=True)
class ImmutableWorkoutSession:
    exercises: List[Exercise] = field(default_factory=create_warmup)
    duration_minutes: int = 5


@dataclass(frozen=True)
class CardioWorkoutSession(ImmutableWorkoutSession):
    intensity_level: str  # Not allowed, must have a default

## Conclusion and further resources