# 05 Data Class Builders
Some notes, observations and questions along chapter 05.

Python offers different ways to create a new class easily. Those new classes are referred to as data classes; they are collections of fields, with little or no extra functionality.

These are three ways or making these:

collections.namedtuple
- simplest way
- factory function that builds a subclass of tuple with the name and fields you specify

typing.NamedTuple
- alternative requiring type hints

@dataclasses.dataclass
- class decorator that allows more customization


Question: I wonder how these differ from other custom classes like `OrderedDict`, `Enum` or `Counter`. I know these here have a different metaclass than `object`, but I suppose those other also have. Does the only difference lie in the way the methods that are inherited from (like `__repr__`, `__eq__`, etc.) are made and which methods are present? Or is there a further, more structural difference?

### Overview of Data Class Builders

Build classes come with some extra functionalities, classes inherited from `object` don't have:

In [1]:
# simple class inheriting from object
class Coordinate:

    # we need to type an explicit __init__:
    def __init__(self, lat, lon):
        self.lat = lat
        self.lon = lon

object's default `__repr__` is not very helpful:

In [5]:
moscow = Coordinate(55.76, 37.62)
moscow

<__main__.Coordinate at 0x7a89300227b0>

meaningless ==; the `__eq__` method inherited from object compares object IDs:

In [6]:
location = Coordinate(55.76, 37.62)
location == moscow

False

Comparing two types requires explicit comparison of each attribute:

In [7]:
(location.lat, location.lon) == (moscow.lat, moscow.lon)

True

In comparison: a data class created with namedtuple:

In [7]:
from collections import namedtuple
Coordinate = namedtuple('Coordinate', 'lat lon') # the field_names can be a single string with each fieldname separated by whitespace and/or commas
issubclass(Coordinate, tuple)

True

In [8]:
moscow = Coordinate(55.756, 37.617)
moscow

Coordinate(lat=55.756, lon=37.617)

In [9]:
moscow == Coordinate(lat=55.756, lon=37.617)

True

In [10]:
Coordinate.mro()

[__main__.Coordinate, tuple, object]

In contrast, using the @dataclass decorator doesn't change the metaclass of the decorated class. The decorated class is still a subclass of object.

In [11]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Coordinate:
    lat: float
    lon: float

Coordinate.mro()

[__main__.Coordinate, object]

This is what I think how it works: the decorator is adding methods by wrapping our class into another class. And the namedtuple factory is using a different metaclass apart from object.

#### Differences and similarities:

Mutability

- namedtuple() and NamedTuple create immutable tuple subclasses
- @dataclass creates mutable classes by default, but we can set `frozen=True` to make instances immutable( comment: I think `__setattr__` then raises but by default would hat set the attribute)

Class Statement Syntax
- only `typing.NamedTuple` and `dataclass` support declaring classes the usual way, `namedtuple()` is a factory function!

### Classic Named Tuples
namedtuple() is a factory function that creates classes that are very memory efficient and have a nice `__repr__` and comparison operators (`__eq__`, `__lt__`, and so forth).

In [13]:
from collections import namedtuple
# to build a class, we need to provide a class name ('City' and a list of field names, 
# which also can be a single string with a separator ('name country population coordinates'))
City = namedtuple('City', 'name country population coordinates')

tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
tokyo


City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))

In [16]:
# We can access fields by name or by position.
tokyo.coordinates

(35.689722, 139.691667)

In [15]:
tokyo[1]

'JP'

There are also the `_fields` class attribute, the `_make(iterable)` class method and the `_asdict()` instance method:

In [17]:
City._fields

('name', 'country', 'population', 'coordinates')

In [19]:
# _make() builds City from an iterable; City(*delhi_data) would do the same:
Coordinate = namedtuple('Coordinate', 'lat lon')
delhi_data = ('Delhi NCR', 'IN', 21.935, Coordinate(28.613889, 77.208889))
delhi = City._make(delhi_data)
delhi

City(name='Delhi NCR', country='IN', population=21.935, coordinates=Coordinate(lat=28.613889, lon=77.208889))

In [20]:
delhi._asdict()
# we could now build a JSON from it

{'name': 'Delhi NCR',
 'country': 'IN',
 'population': 21.935,
 'coordinates': Coordinate(lat=28.613889, lon=77.208889)}

`namedtuple` accepts the `default` keyword argument:

In [22]:
Coordinate = namedtuple('Coordinate', 'lat lon reference', defaults=['WGS84'])
Coordinate(0, 0)
Coordinate._field_defaults

{'reference': 'WGS84'}

### Typed Named Tuples
- requires type hint for any attribute
- same methods like classic `namedtuple()`
- only difference is the presence of the `__annotations__` class attribute

Type hints have no run time effect on the code. We can think about Python type hints as “documentation that can be verified by IDEs and type checkers like MyPy.”

### More about @dataclass
- the fields we declare become parameters in the auto-generated `__init__`
- we can set default values and type hints
- we need to set type hints in order for a field to be recognised as an instance attribute and included in the automated `__init__`: @dataclass then lists it in an `dataclasses.fields` list

"Mutable default values are a common source of bugs for beginning Python developers. In function definitions, a mutable default value is easily corrupted when one invocation of the function mutates the default, changing the behavior of further invocations."

In [23]:
@dataclass
class ClubMember:
    name: str
    guests: list = [] # mutable default value: bad idea (here it raises, but would not raise anywhere)

ValueError: mutable default <class 'list'> for field guests is not allowed: use default_factory

In [25]:
# we could fix it like this:
from dataclasses import dataclass, field

@dataclass
class ClubMember:
    name: str
    guests: list = field(default_factory=list)
    # default_factory builds a default value each time an instance of the data class is created 
    # in contrast to sharing the same list among all instances like above

# or use `None` as a default and change it to `[]` on first call inside the method that manipulates guests

Post-init Processing
- lets us add a `__post_init__` method that is run after the auto-generated `__init__`
- "Common use cases for __post_init__ are validation and computing field values based on other fields."

In [None]:
from dataclasses import dataclass
from club import ClubMember

@dataclass
class HackerClubMember(ClubMember):
    all_handles = set() # `all_handles` is a class attribute
    handle: str = '' # the type annotation makes `handle` an instance attribute (really true)

    def __post_init__(self):
        cls = self.__class__
        if self.handle == '':
            self.handle = self.name.split()[0]
        if self.handle in cls.all_handles:
            msg = f'handle {self.handle!r} already exists.'
            raise ValueError(msg)
        cls.all_handles.add(self.handle)


I understand from it that `__post_init__` is just extending the `__init__`, since we cannot customize it in this case.