# Data Class Builders

Using `class` to build data class requires to use a boilerplate `__init__`  and it doesnt provide any basic features that might be expected from data class

Main issues of using `class` to build data class:

* `__repr__` inherited from `object` is not very helpful
* meaningless `==` operator - the `__eq__` method inherited from `object` compares object IDs
* comparing 2 data class instances requires explicit comparision of each attribute

In [1]:
class Coordinate:
    def __init__(self, lat, lon):
        self.lat = lat
        self.lon = lon

moscow = Coordinate(55.76, 37.62)
location = Coordinate(55.76, 37.62)

In [2]:
moscow

<__main__.Coordinate at 0x2370fa97150>

In [3]:
location == moscow

False

In [4]:
(location.lat, location.lon) == (moscow.lat, moscow.lon)

True

## namedtuple

`namedtuple` is a factory function that builds a subclass of `tuple` whith name and fields

In [31]:
from collections import namedtuple

Coordinate = namedtuple("Coordinate", "lat lon")

moscow = Coordinate(55.76, 37.62)
location = Coordinate(55.76, 37.62)

In [32]:
moscow

Coordinate(lat=55.76, lon=37.62)

In [33]:
moscow == location

True

In [34]:
moscow.lat

55.76

In [35]:
Coordinate._fields

('lat', 'lon')

In [36]:
moscow._asdict()

{'lat': 55.76, 'lon': 37.62}

In [38]:
import json

json.dumps(moscow._asdict())

'{"lat": 55.76, "lon": 37.62}'

`namedtuple` accepts the `defaults` keyword-only argument providing an iterable of *N* defualts values for each of the *N* rightmost fields of the class

In [40]:
Coordinate = namedtuple("Coordinate", 'lat lon reference', defaults=["WGS84"])
print(Coordinate(0, 0))
print(Coordinate(0, 0, "NEW"))

Coordinate(lat=0, lon=0, reference='WGS84')
Coordinate(lat=0, lon=0, reference='NEW')


## NamedTuple

`typing.NamedTuple` provides the same functionality as `collections.namedtuple` with additional type annotations to each field

In [41]:
import typing

Coordinate = typing.NamedTuple('Coordinate',
                              [('lat', float), ('lon', float)])

moscow = Coordinate(55.76, 37.62)
location = Coordinate(55.76, 37.62)

In [42]:
typing.get_type_hints(Coordinate)

{'lat': float, 'lon': float}

In [43]:
moscow

Coordinate(lat=55.76, lon=37.62)

In [44]:
moscow == location

True

**Tip** Typed named tuple can also be constructed with the fields given as keyword arguments. This is more readable, and allows to provide mapping of fields and types as `**fields_and_types`

In [45]:
Coordinate = typing.NamedTuple('Coordinate', lat=float, lon=float)

print(typing.get_type_hints(Coordinate))

{'lat': <class 'float'>, 'lon': <class 'float'>}


In [19]:
fields_and_types = {'lat': float, 'lon': float}

Coordinate = typing.NamedTuple('Coordinate', **fields_and_types)

print(typing.get_type_hints(Coordinate))

{'lat': <class 'float'>, 'lon': <class 'float'>}


`typing.NamedTuple` can also be used in a `class` statement with type annotations. This more readable and makes it easy to override methods or add new. In that form we can also add default values by specifying them after type annotation.

In [50]:
from typing import NamedTuple

class Coordinate(NamedTuple):
    lat: float
    lon: float
    reference: str = 'WGS84'

    def __str__(self):
        ns = 'N' if self.lat >= 0 else 'S'
        we = 'E' if self.lon >= 0 else 'W'
        return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we} using {self.reference}'

moscow = Coordinate(55.76, 37.62)

In [51]:
print(moscow)

55.8°N, 37.6°E using WGS84


## dataclass

`dataclass` decorator supports the same syntax to declare instance attributes as `typing.NamedTuple`

In [27]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Coordinate:
    lat: float
    lon: float
    
    def __str__(self):
        ns = 'N' if self.lat >= 0 else 'S'
        we = 'E' if self.lon >= 0 else 'W'
        return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'

moscow = Coordinate(55.76, 37.62)

In [28]:
print(moscow)

55.8°N, 37.6°E


## Main features and differences of class builders

### Mutable instances

A key difference between these class builders is that `collections.namedtuple` and `typing.NamedTuple` build `tuple` subclasses, therefore as the result the instances are **immutable**. In contrast `@dataclass` produces **mutable** classes, but the decorator accepts a keyword arg `frozen` which made class raise and exception when trying assign a value to field after initialization if `frozen == True`

### Class statement syntax

Only `typing.NamedTuple` and `dataclass` support the regular `class` statement syntax

### Construct dict

Both named tuple variants provide method `._asdict` to construct `dict` object from the fields in a data class instance, the `dataclasses` module provides function `dataclasses.asdict` to do it 

### Get field names and defualt values

All 3 class builders allow to get field names and set default values.

**named tuples** - in `._fields` and `._fields_defaults` class attributes

**dataclass** - using `dataclasses.fields` func, it returns tuple of `Field` objects with attributes such as `name` and `defautl`

### New class at runtime

Although the `class` statement syntax is more readable, it is hardcoded. To build data classes at runtime default function call of `collections.namedtuple` and `typing.NamedTuple` can be used. The `dataclasses` module provides `make_dataclass` func for that.

### Field options

**Python does not allowed parameters without defaults after those with defaults**

When using mutable types as defaults, the `default_factory` must be used

```python
@dataclass
class ClubMember:
    name: str
    guests: list = []
```

This code would not work correctly because it would create single list that would be referenced by all class instances.


In [2]:
# Corect implementation of dataclass with mutable default field value
from dataclasses import dataclass, field

@dataclass
class ClubMember:
    name: str
    guests: list[str] = field(default_factory=list)

### Post-init Processing

The `__init__` method generated by `@dataclass` only takes arguments passed and assigns them or their default values. If you want to do more at the initialization, you can use `__post_init__` method. When that method exists, `@dataclass` will add code to `__init__` to call `__post_init__` as the last step.

In [7]:
from dataclasses import dataclass
from typing import ClassVar

@dataclass 
class HackerClubMember(ClubMember):
    all_handles: ClassVar[set[str]] = set()
    handle: str = ''

    def __post_init__(self):
        cls = self.__class__
        if self.handle == '':
            self.handle = self.name.split()[0]
        if self.handle in cls.all_handles:
            msg = f"handle {self.handle} already exitst"
            raise ValueError(msg)
        cls.all_handles.add(self.handle)

To provide type hint to class variable in @dataclass we have to use `ClassVar`

#### @dataclass use example

In [26]:
from dataclasses import dataclass, field, fields
from enum import Enum, auto
import datetime

class ResourceType(Enum):
    BOOK = auto()
    EBOOK = auto()
    VIDEO = auto()

@dataclass
class Resource:
    identifier: str
    title: str = '<untitles'
    creators: list[str] = field(default_factory=list)
    date: datetime.date | None = None
    type: ResourceType = ResourceType.BOOK
    description: str = ''
    language: str = ''
    subject: list[str] = field(default_factory=list)

    def __repr__(self):
        cls = self.__class__
        indent = ' ' * 4
        res = [f'{cls.__name__}(']
        for f in fields(cls):
            value = getattr(self, f.name)
            res.append(f'{indent}{f.name} = {value!r},')
        res.append(')')
        return '\n'.join(res)

In [27]:
description = 'Improving the design of existing code'
book = Resource('978-0-13-475759-9', 'Refactoring, 2nd Edition',
    ['Martin Fowler', 'Kent Beck'], date(2018, 11, 19),
    ResourceType.BOOK, description, 'EN',
    ['computer programming', 'OOP'])

In [28]:
book

Resource(
    identifier = '978-0-13-475759-9',
    title = 'Refactoring, 2nd Edition',
    creators = ['Martin Fowler', 'Kent Beck'],
    date = datetime.date(2018, 11, 19),
    type = <ResourceType.BOOK: 1>,
    description = 'Improving the design of existing code',
    language = 'EN',
    subject = ['computer programming', 'OOP'],
)

### Pattern Matching Class Instances

#### Simple class patterns

The syntax for class patterns looks like a constructor invocation.

```python
match x:
    case float():
        do_something_with(x)
```

This will match any `float` value without binding a variable

**remember about `()`**

#### Keyword class patterns

In [29]:
import typing

class City(typing.NamedTuple):
    continent: str
    name: str
    country: str

cities = [
    City('Asia', 'Tokyo', 'JP'),
    City('Asia', 'Delhi', 'IN'),
    City('North America', 'Mexico City', 'MX'),
    City('North America', 'New York', 'US'),
    City('South America', 'São Paulo', 'BR'),
]

In [32]:
results = []
for city in cities:
    match city:
        case City(continent='Asia', country=cc): # matches any city with continent = Asia and bounds coutry to cc var
            results.append(cc)
print(results)

['JP', 'IN']


#### Positional class patterns

In [34]:
results = []
for city in cities:
    match city:
        case City('Asia', _, country):
            results.append(country)
print(results)

['JP', 'IN']
