<h1>Chapter 05. Data Class Constructors.</h1>

<h2>Overview of Data Class Constructors</h2>

Simple class for representing geographic coordinates

In [1]:
class Coordinate:

    def __init__(self, lat, lon):
        self.lat = lat
        self.lon = lon

Named tuples are lightweight data structures available in the collections module that behave like tuples but provide named access to their elements.

In [2]:
from collections import namedtuple


Coordinate = namedtuple('Coordinate', 'lat lon')

In [3]:
issubclass(Coordinate, tuple)

True

In [4]:
new_york = Coordinate(40.7, 74.0)
new_york

Coordinate(lat=40.7, lon=74.0)

`typing.NamedTuple` is a class provided in the typing module that allows you to define named tuples with type annotations.

In [5]:
import typing


Coordinate = typing.NamedTuple(
    'Coordinate',
    [('lat', float), ('lon', float)]
)

In [6]:
issubclass(Coordinate, tuple)

True

In [7]:
typing.get_type_hints(Coordinate)

{'lat': float, 'lon': float}

In [8]:
from typing import NamedTuple


class Coordinate(NamedTuple):
    lat: float
    lon: float

    def __str__(self):
        ns = 'N' if self.lat >= 0 else 'S'
        we = 'E' if self.lon >= 0 else 'E'
        return f"{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}"

Data classes are a feature introduced in Python 3.7 through the `dataclasses` module, providing a concise way to define classes for storing data.

In [9]:
from dataclasses import dataclass


@dataclass(frozen=True)  # frozen=True argument makes instances of the data class immutable
class Coordinate:
    lat: float
    lon: float

    def __str__(self):
        ns = 'N' if self.lat >= 0 else 'S'
        we = 'E' if self.lon >= 0 else 'E'
        return f"{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}"

<h2>Classic Named Tuples</h2>

Defining and using a Named Tuple

In [10]:
from collections import namedtuple


City = namedtuple('City', 'name country population coordinates')

In [11]:
tokyo = City(
    'Tokyo',
    'JP',
    36.933,
    (35.689722, 139.691667)
)
tokyo

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))

In [12]:
tokyo.population

36.933

In [13]:
tokyo.coordinates

(35.689722, 139.691667)

In [14]:
tokyo[1]

'JP'

Named Tuple Attributes and Methods

In [15]:
City._fields  # get fields names

('name', 'country', 'population', 'coordinates')

In [16]:
Coordinate = namedtuple('Coordinate', 'lat lon')

In [17]:
delhi_data = (
    'Delhi NCR',
    'IN',
    21.935,
    Coordinate(28.613889, 77.208889)
)

In [18]:
delhi = City._make(delhi_data)  # create a City object from an iterable

In [19]:
delhi._asdict()  # returns a dict object built by a named tuple

{'name': 'Delhi NCR',
 'country': 'IN',
 'population': 21.935,
 'coordinates': Coordinate(lat=28.613889, lon=77.208889)}

In [20]:
import json


# _asdict() method for serializing data into JSON format
json.dumps(delhi._asdict()) 

'{"name": "Delhi NCR", "country": "IN", "population": 21.935, "coordinates": [28.613889, 77.208889]}'

In [21]:
Coordinate = namedtuple(
    'Coordinate',
    'lat lon reference',
    defaults=['WGS84']
)

In [22]:
Coordinate(0, 0)

Coordinate(lat=0, lon=0, reference='WGS84')

In [23]:
Coordinate._field_defaults

{'reference': 'WGS84'}

<h2>Typed Named Tuples</h2>

In [24]:
from typing import NamedTuple


class Coordinate(NamedTuple):
    lat: float  # each field must be annotated by type
    lon: float
    reference: str = 'WGS84'  # annotated by default with type and value

In [25]:
Coordinate.__annotations__

{'lat': float, 'lon': float, 'reference': str}

<h2>Introduction to Type Annotations</h2>

Type annotations are a way to declare the expected types of function arguments, return values, variables, and attributes.

Python does not check type annotations at runtime

In [26]:
from typing import NamedTuple


class Coordinate(NamedTuple):
    lat: float
    lon: float

trash = Coordinate('Hi!', None)
trash

Coordinate(lat='Hi!', lon=None)

Type annotations are intended primarily for third-party type checking programs with built-in type checking. These are static analysis tools: they check source code, not code at runtime.

<h3>The Semantics of Variable Annotations</h3>

The simple class with annotation of types

In [27]:
class DemoPlainClass:
    a: int  # entry in __annotations__, the attribute is not created in the class
    b: float = 1.1  # stored in annotations and becomes a class attribute with value 1.1
    c = 'spam'  # simple class attribute

In [28]:
DemoPlainClass.__annotations__

{'a': int, 'b': float}

In [29]:
try:
    print(DemoPlainClass.a)
except AttributeError as e:
    print(e.__repr__())

AttributeError("type object 'DemoPlainClass' has no attribute 'a'")


In [30]:
try:
    print(DemoPlainClass.b)
except AttributeError as e:
    print(e.__repr__())

1.1


In [31]:
try:
    print(DemoPlainClass.c)
except AttributeError as e:
    print(e.__repr__())

spam


Only the annotation remains of the field `a`. It does not become an attribute of the class, because no value is associated with it. Fields `b` and `c` are retained as class attributes because they have values associated with them.

<h3><code>typing.NamedTuple</code> Inspection</h3>

The class created with `typing.NamedTuple`

In [32]:
from typing import NamedTuple


class DemoNamedTupleClass(NamedTuple):
    a: int  # an annotation and an instance are created
    b: float = 1.1  # stored in annotations and becomes a class attribute with value 1.1
    c = 'spam'  # simple class attribute

In [33]:
DemoNamedTupleClass.__annotations__

{'a': int, 'b': float}

In [34]:
DemoNamedTupleClass.a

_tuplegetter(0, 'Alias for field number 0')

In [35]:
DemoNamedTupleClass.b

_tuplegetter(1, 'Alias for field number 1')

In [36]:
DemoNamedTupleClass.c

'spam'

The annotations for `a` and `b` are the same as in the example `DemoPlainClass`, but `typing.NamedTuple` creates class attributes `a` and `b`. Attribute `c` is a normal class attribute with the value `'spam'`.

In [37]:
DemoNamedTupleClass.__doc__  # documentation string available

'DemoNamedTupleClass(a, b)'

In [38]:
nt = DemoNamedTupleClass(8)  # add minimal argument a to construct an instance

In [39]:
nt.a

8

In [40]:
nt.b

1.1

In [41]:
nt.c

'spam'

In [42]:
try:
    nt.a = 10
except AttributeError as e:
    print(e.__repr__())

AttributeError("can't set attribute")


In [43]:
try:
    nt.b = 2
except AttributeError as e:
    print(e.__repr__())

AttributeError("can't set attribute")


In [44]:
try:
    nt.c = 'not spam'
except AttributeError as e:
    print(e.__repr__())

AttributeError("'DemoNamedTupleClass' object attribute 'c' is read-only")


<h3>Inspecting the class with the <code>@dataclass</code> decorator</h3>

The class with `@dataclass` decorator

In [45]:
from dataclasses import dataclass


@dataclass
class DemoDataClass:
    a: int  # annotated, becomes an instance attribute controlled by a descriptor
    b: float = 1.1  # annotated, becomes an instance attribute with a descriptor, default value 1.1
    c = 'spam'  # class attribute, no annotation refers to it

In [46]:
DemoDataClass.__annotations__

{'a': int, 'b': float}

In [47]:
DemoDataClass.__doc__

'DemoDataClass(a: int, b: float = 1.1)'

In [48]:
try:
    print(DemoDataClass.a)
except AttributeError as e:
    print(e.__repr__())

AttributeError("type object 'DemoDataClass' has no attribute 'a'")


In [49]:
try:
    print(DemoDataClass.b)
except AttributeError as e:
    print(e.__repr__())

1.1


In [50]:
try:
    print(DemoDataClass.c)
except AttributeError as e:
    print(e.__repr__())

spam


In [51]:
dc = DemoDataClass(9)
dc.a

9

In [52]:
dc.b

1.1

In [53]:
dc.c

'spam'

In [54]:
dc.a = 10
dc.a

10

In [55]:
dc.b = False
dc.b

False

In [56]:
dc.c = 'oops'
dc.c

'oops'

In [57]:
dc.z = 'new'
dc.z

'new'

`DemoDataClass` instances are changeable - and no type checking is performed at runtime.

In [58]:
@dataclass(frozen=True)  # frozen=True argument makes instances of the data class immutable
class DemoDataClassFrozen:
    a: int
    b: float = 1.1
    c = 'spam'

In [59]:
dcf = DemoDataClassFrozen(11)

In [60]:
dcf.a

11

In [61]:
try:
    dcf.b = 2
except AttributeError as e:
    print(e.__repr__())

FrozenInstanceError("cannot assign to field 'b'")


In [62]:
try:
    dcf.c = 'another spam'
except AttributeError as e:
    print(e.__repr__())

FrozenInstanceError("cannot assign to field 'c'")


<h2>More about <code>@dataclass</code></h2>

<h3>Fields options</h3>

In [63]:
from dataclasses import dataclass


try:
    @dataclass
    class ClubMember:
        name: str
        guests: list = []
except ValueError as e:
    print(e.__repr__())

ValueError("mutable default <class 'list'> for field guests is not allowed: use default_factory")


In the `guests` field, a default value is set by calling the `dataclasses.field` function with the `default_factory=list` parameter

In [64]:
from dataclasses import dataclass, field


@dataclass
class ClubMember:
    name: str
    guests: list = field(default_factory=list)

In [65]:
@dataclass
class ClubMember:
    name: str
    guests: list[str] = field(default_factory=list)  # list of objects of type str

The `default` option is provided to allow field references to replace default values in field annotations. An `athlete` field with a default value of `False` is desired but excluded from the `repr` method

In [66]:
@dataclass
class ClubMember:
    name: str
    guests: list = field(default_factory=list)
    athlete: bool = field(default=False, repr=False)

<h3>Postinitialization</h3>

`__post_init__` is a special method in Python's dataclasses that allows for custom initialization logic to be executed after the object has been initialized.

In [67]:
@dataclass
class HackerClubMember(ClubMember):  # expands ClubMember class
    all_handles = set()
    handle: str = ''

    def __post_init__(self):
        cls = self.__class__  # get instance class

        if self.handle == '':
            self.handle = self.name.split()[0]

        if self.handle in cls.all_handles:
            raise ValueError(f"handle {self.handle!r} already exists.")  # use !r - repr() for self.handle's string representation

        cls.all_handles.add(self.handle)

<h3>Typed Class Attributes</h3>

`ClassVar` is a type hint provided by Python's `typing` module. It indicates that a variable is intended to be a class variable, rather than an instance variable, within a class definition.

In [68]:
from dataclasses import dataclass
from typing import ClassVar


@dataclass
class HackerClubMember(ClubMember):
    all_handles: ClassVar[set[str]] = set()  # attribute of the set-of-str type, the default value is an empty set
    handle: str = ''

    def __post_init__(self):
        cls = self.__class__

        if self.handle == '':
            self.handle = self.name.split()[0]

        if self.handle in self.all_handles:
            raise ValueError(f"Handle {self.handle!r} already exsists.")

        cls.all_handles.add(self.handle)

<h3>Initializable Variables that are not Fields</h3>

`InitVar` is a special type hint provided by Python's `dataclasses` module. It indicates that a variable is intended to be passed to the class constructor (`__init__()` method) as an initialization argument, but is not stored as an instance variable. Instead, it is used to perform initialization tasks or calculations within the constructor.

In [69]:
from dataclasses import dataclass, InitVar


@dataclass
class Connection:
    host: str
    port: int
    ssl_enabled: bool = False
    credentials: InitVar[dict] = None

    def __post_init__(self, credentials):
        if credentials is not None:
            self.authenticate(credentials)

<h3>Example of using <code>@dataclass</code>: resource record from the Dublin Core</h3>

The Dublin Core Schema is a small set of vocabulary terms that can be used to describe digital resources (videos, images, web pages, etc.) as well as physical resources: books, CDs, and art objects.

`Resource` class based on Dublin Core Schema

In [70]:
from dataclasses import dataclass, field
from datetime import date
from enum import Enum, auto
from typing import Optional


class ResourceType(Enum):
    BOOK = auto()
    EBOOK = auto()
    VIDEO = auto()


@dataclass
class Resource:
    """Media resource description."""
    identifier: str
    title: str = '<untitled>'
    creators: list[str] = field(default_factory=list)
    date: Optional[date] = None
    type: ResourceType = ResourceType.BOOK
    description: str = ''
    language: str = ''
    subjects: list[str] = field(default_factory=list)
    is_bestseller: bool = field(default=False, repr=False)

In [71]:
book = Resource(
    identifier='978-0-13-475',
    title='Refactoring, 2nd edition',
    creators=['Martin Fowler', 'Kent Beck'],
    date=date(2018, 11, 19),
    type=ResourceType.BOOK,
    description='Improving the design of existing code',
    language='EN',
    subjects=['computer programming', 'OOP'],
)

book

Resource(identifier='978-0-13-475', title='Refactoring, 2nd edition', creators=['Martin Fowler', 'Kent Beck'], date=datetime.date(2018, 11, 19), type=<ResourceType.BOOK: 1>, description='Improving the design of existing code', language='EN', subjects=['computer programming', 'OOP'])

Using `dataclasses.fields` to loop through the attributes of a `Resource` instance in the `__repr__` custom method

In [72]:
from dataclasses import fields


@dataclass
class Resource:
    """Media resource description."""
    identifier: str
    title: str = '<untitled>'
    creators: list[str] = field(default_factory=list)
    date: Optional[date] = None
    type: ResourceType = ResourceType.BOOK
    description: str = ''
    language: str = ''
    subjects: list[str] = field(default_factory=list)
    is_bestseller: bool = field(default=False, repr=False)

    def __repr__(self):
        res = '\n'.join(
            [
                f"    {f.name} = {getattr(self, f.name)},"
                for f in fields(self.__class__)
            ]
        )
        
        return f"{self.__class__.__name__}(\n{res}\n)"

In [73]:
book = Resource(
    identifier='978-0-13-475',
    title='Refactoring, 2nd edition',
    creators=['Martin Fowler', 'Kent Beck'],
    date=date(2018, 11, 19),
    type=ResourceType.BOOK,
    description='Improving the design of existing code',
    language='EN',
    subjects=['computer programming', 'OOP'],
)

book

Resource(
    identifier = 978-0-13-475,
    title = Refactoring, 2nd edition,
    creators = ['Martin Fowler', 'Kent Beck'],
    date = 2018-11-19,
    type = ResourceType.BOOK,
    description = Improving the design of existing code,
    language = EN,
    subjects = ['computer programming', 'OOP'],
    is_bestseller = False,
)