## Overview of data class builders
A simple class can usually not achieve our goal.

In [1]:
class Coordinate:
    def __init__(self, lat, long):
        self.lat = lat
        self.long = long

moscow = Coordinate(55, 37)
print(moscow) # `__repr__` and `__str__` do not represent this object clearly
location = Coordinate(55, 37)
location == moscow # `__eq__` method inherited from object compares `id`s

<__main__.Coordinate object at 0x7f7a144bc970>


False

### data class builder `collections.namedtuple`

In [2]:
from collections import namedtuple
# first argument for `__repr__`
Coordinate1 = namedtuple('Coordinate', ['lat', 'long'])

moscow = Coordinate1(55, 37)
print(moscow) # useful `__repr__` 
location = Coordinate1(55, 37)
location == moscow # meaningful `__eq__`

Coordinate(lat=55, long=37)


True

### data class builder `typing.NamedTuple`
provides the same functionality, adding a type annotation to each field

In [3]:
import typing

# first argument for `__repr__`
Coordinate2 = typing.NamedTuple('Coordinate', [('lat', float), ('long', float)])

moscow = Coordinate2(55, 37)
print(moscow) # useful `__repr__` if (`__str__` not implemented)
location = Coordinate2(55, 37)
print(location == moscow) # meaningful `__eq__`
Coordinate2.__annotations__

Coordinate(lat=55, long=37)
True


{'lat': float, 'long': float}

In [4]:
# `typing.NamedTuple` can supports class statement syntax
class Coordinate2(typing.NamedTuple):
    lat:float
    long:float
    def __repr__(self):
        return f'Coordinate(lat={self.lat}, long={self.long})'
    def __str__(self):
        ns = 'N' if self.lat >= 0 else 'S'
        we = 'E' if self.long >= 0 else 'W'
        return f'{abs(self.lat):.1f}{ns}, {abs(self.long):.1f}{we}'
moscow = Coordinate2(55, 37)
print(moscow) # useful `__str__` 
location = Coordinate2(55, 37)
print(location == moscow) # meaningful `__eq__`
moscow, Coordinate2.__annotations__

55.0N, 37.0E
True


(Coordinate(lat=55, long=37), {'lat': float, 'long': float})

### data class builder `@dataclass`
Like `typing.NamedTuple`, the `dataclass` decorator supports PEP 526 syntax to declare instance attributes. The decorator reads the variable annotations and automatically generates methods for your class. 

In [5]:
from dataclasses import dataclass

# frozen = True if we want immutable instances
@dataclass(frozen=False)
class Coordinate3:
    lat:float
    long:float
    def __repr__(self):
        return f'Coordinate(lat={self.lat}, long={self.long})'
    def __str__(self):
        ns = 'N' if self.lat >= 0 else 'S'
        we = 'E' if self.long >= 0 else 'W'
        return f'{abs(self.lat):.1f}{ns}, {abs(self.long):.1f}{we}'
moscow = Coordinate3(55, 37)
print(moscow) # useful `__str__` 
location = Coordinate3(55, 37)
print(location == moscow) # meaningful `__eq__`
moscow, Coordinate3.__annotations__

55.0N, 37.0E
True


(Coordinate(lat=55, long=37), {'lat': float, 'long': float})

### Mutable instances

A key difference between these class builders is that `collections.namedtuple` and `typing.NamedTuple` build `tuple` subclasses, therefore the instances are **immutable**. By default, `@dataclass` produces mutable classes, unless we set `frozen = True`

In [6]:
beijing = Coordinate1(39, 116)
berlin = Coordinate2(52, 13)
moscow = Coordinate3(55, 37)
try:
    beijing.lat = 40
except AttributeError:
    print("can't set attribute for `tuple`")
moscow.lat = 40
moscow

can't set attribute for `tuple`


Coordinate(lat=40, long=37)

### Construct dict

- Both named tuple variants provide an instance method (`._asdict`) to construct a dict object
- `@dataclass` provides a module-level function to do it: `dataclasses.asdict`

In [7]:
import dataclasses
beijing._asdict(), berlin._asdict(), dataclasses.asdict(moscow)

({'lat': 39, 'long': 116}, {'lat': 52, 'long': 13}, {'lat': 40, 'long': 37})

### Get field names and default values
All three class builders let you get the field names and default values that may be configured for them. 
- In named tuple classes, that metadata is in the ._fields and ._fields_defaults class attributes. 
- You can get the same metadata from a dataclass decorated class using the `fields` function from the dataclasses module. 

In [8]:
for attribute in dataclasses.fields(moscow):
    print(attribute.name, attribute.default)
beijing._fields, beijing._fields_defaults, 

lat <dataclasses._MISSING_TYPE object at 0x7f7a14492190>
long <dataclasses._MISSING_TYPE object at 0x7f7a14492190>


(('lat', 'long'), {})

## Classic Named Tuples


In [9]:
from collections import namedtuple

Coordinate = namedtuple('Coordinate', 'lat log')
City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35, 139))
print(tokyo)
delhi_data = ('Delhi NCR', 'IN', 21.935, Coordinate(28, 77))
delhi = City(*delhi_data)
delhi

City(name='Tokyo', country='JP', population=36.933, coordinates=(35, 139))


City(name='Delhi NCR', country='IN', population=21.935, coordinates=Coordinate(lat=28, log=77))

Since Python 3.7, `namedtuple` accepts the defaults keyword-only argument providing an iterable of N default values for each of the N rightmost fields of the class.

In [10]:
Coordinate = namedtuple('Coordinate', 'lat log reference', defaults=['WGS84'])
Coordinate._field_defaults

{'reference': 'WGS84'}

## Typed Named Tuples


In [11]:
import typing
class Coordinate(typing.NamedTuple):
    lat:float
    long:float
    reference:str = 'WGS84' # default value
Coordinate._field_defaults

{'reference': 'WGS84'}

### Type annotations
1. Type annotations don’t have any impact on the runtime behavior of Python programs. No type checking at runtime!

In [12]:
trash = Coordinate('string', None) # no type checking at runtime!
print(trash)

Coordinate(lat='string', long=None, reference='WGS84')


2. Basic syntax of type annotations defined in PEP526

`var_name: some_type = a_value`

In [13]:
# For a plain class
class PlainClass:
    a: int
    b: float = 1.1
    c = 'dummy' # not a type annotation, just an attribute

try: 
    print(PlainClass.a) # `a` doesn’t become a class attribute because no value is bound to it
except AttributeError:
    print("type object 'PlainClass' has no attribute 'a'") 
# `__annotations__` is created by the intepreter even in a plain class
PlainClass.__annotations__, PlainClass.b, PlainClass.c

type object 'PlainClass' has no attribute 'a'


({'a': int, 'b': float}, 1.1, 'dummy')

In [14]:
# For a Named Tuple class
import typing

class NTClass(typing.NamedTuple):
    a: int
    b: float = 1.1
    c = 'dummy' # not a type annotation, just an attribute

# `a` and `b` are `descriptors`, can be understood as read-only instance attributes
# will be covered in Chapter 24
print(f'{NTClass.__annotations__}\n {NTClass.a}\n {NTClass.b}\n {NTClass.c}')
nt = NTClass(1)
nt.a, nt.b, nt.c

{'a': <class 'int'>, 'b': <class 'float'>}
 <_collections._tuplegetter object at 0x7f7a14481940>
 <_collections._tuplegetter object at 0x7f7a14481730>
 dummy


(1, 1.1, 'dummy')

In [15]:
# For a `class` decorated with `@dataclass`
from dataclasses import dataclass

@dataclass
class DataClass:
    a: int
    b: float = 1.1
    c = 'dummy' # not a type annotation, just an attribute

try: 
    print(DataClass.a) # `a` doesn’t become a class attribute because no value is bound to it
except AttributeError:
    print("type object 'DataClass' has no attribute 'a'") 
# `__annotations__` is created by the intepreter even in a plain class
DataClass.__annotations__, DataClass.b, DataClass.c

type object 'DataClass' has no attribute 'a'


({'a': int, 'b': float}, 1.1, 'dummy')

## Field Options
- Fields are read in order, and after you declare a field with a default value, all remaining fields must also have default values. 
- Mutable default values are a common source of bugs. To prevent bugs, @dataclass rejects the class definition with mutable default values

In [16]:
from dataclasses import dataclass, field
try:
    @dataclass
    class ClubMember:
        name: str
        guests: list = []
except ValueError:
    print("Mutable default value are not allowed!")
    @dataclass
    class ClubMember:
        name: str
        guests: list = field(default_factory=list)
# Docstring
ClubMember.__doc__ # `<factory>` is a short way of saying that some callable will produce the default value for `guests`

Mutable default value are not allowed!


'ClubMember(name: str, guests: list = <factory>)'

The `default_factory` parameter lets you provide a function, class, or any other callable, which will be invoked with zero arguments to build a default value each time an instance of the data class is created. This way, each instance of `ClubMember` will have its own `list`—**instead of all instances sharing the same `list` from the class**, which is rarely what we want and is often a bug.

### Post-init processing
The `__init__` method generated by `@dataclass` only takes the arguments passed and assigns them to the instance attributes that are instance fields. But you may need to do more than that to initialize the instance. When `__post_init__` method exists, `@dataclass` will add code to the generated `__init__` to call `__post_init__` as the last step.

In [17]:
from dataclasses import dataclass

@dataclass
class ClubMember:
    name: str
    guests: list = field(default_factory=list)

@dataclass
class HackerClubMember(ClubMember):
    # variables defined before `__init__` can be seen as static variables when we call them with `class.var` instead of `instance.var`
    all_handles = set() # a class attribute, an empty set
    handle: str = ''

    def __post_init__(self):
        cls = self.__class__ # get the class of instance
        if self.handle == '':
            self.handle = self.name.split()[0]
        if self.handle in cls.all_handles:
            msg = f'handle {self.handle!r} already exists.'
            raise ValueError(msg)
        cls.all_handles.add(self.handle)

anna = HackerClubMember('Anna Ravenscroft', handle='AnnaRaven')
print(anna)
leo = HackerClubMember('Leo Rochael')
print(leo)
try:
    leo2 = HackerClubMember('Leo Davinci')
except ValueError as e:
    print(e)
HackerClubMember.__doc__

HackerClubMember(name='Anna Ravenscroft', guests=[], handle='AnnaRaven')
HackerClubMember(name='Leo Rochael', guests=[], handle='Leo')
handle 'Leo' already exists.


"HackerClubMember(name: str, guests: list = <factory>, handle: str = '')"

### Initialization variables that are not fields
Sometimes you may need to pass arguments to `__init__` that are not instance fields. Such arguments are called *init-only variables*. **init-only variables must also be declared**. To declare an argument like that, `dataclasses` module provides the pseudo-type `InitVar`

In [18]:
from dataclasses import dataclass, InitVar
try:
    @dataclass
    class C:
        i: int
        j: int = None
        
        def __post_init__(self, lst):
            if self.j is None and lst is not None:
                self.j = lst[0]
    c = C(10,lst = [1])
except TypeError as e:
    print(e)
    @dataclass
    class C:
        i: int
        j: int = None
        # `InitVar` will prevent `@dataclass` from treating `lst` as a regular field
        lst: InitVar[list] = None 
        def __post_init__(self, lst):
            if self.j is None and lst is not None:
                self.j = lst[0]
    c = C(10,lst = [1])
c.j

__init__() got an unexpected keyword argument 'lst'


1

## `@dataclass` Example: Dublin Core Resource Record

In [19]:
from dataclasses import dataclass, field
from typing import Optional
from enum import Enum, auto
from datetime import date

class ResourceType(Enum):
    BOOK = auto()
    EBOOK = auto()
    VIDEO = auto()

@dataclass
class Resource:
    identifier: str
    title: str = '<untitled>'
    # list[str] if python 3.9
    creators: list = field(default_factory=list)
    date: Optional[date] = None
    type: ResourceType = ResourceType.BOOK
    description: str = ''
    language: str = ''
    # list[str] if python 3.9
    subjects: list = field(default_factory=list)