# Introduction

A data class is a class typically containing mainly data, although there aren’t really any restrictions. This module provides a decorator and functions for automatically adding generated special methods such as __init__() and __repr__() to user-defined classes. It is created using the new @dataclass decorator, as follows:

In [24]:
from dataclasses import dataclass

@dataclass
class InventoryItem:
    """Class for keeping track of an item in inventory."""
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

will add, among other things, a **\_\_init\_\_()** that looks like:

In [25]:
def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0):
    self.name = name
    self.unit_price = unit_price
    self.quantity_on_hand = quantity_on_hand

Note that this method is automatically added to the class: it is not directly specified in the InventoryItem definition shown above.

In [1]:
from dataclasses import dataclass

@dataclass
class DataClassCard:
    rank: str
    suit: str

A data class comes with basic functionality already implemented. For instance, you can **instantiate, print, and compare** data class instances straight out of the box:

In [3]:
queen_of_hearts = DataClassCard('Q', 'Hearts')
print(queen_of_hearts.rank)
print(queen_of_hearts)
print(queen_of_hearts == DataClassCard('Q', 'Hearts'))

Q
DataClassCard(rank='Q', suit='Hearts')
True


Compare that to a regular class. A minimal regular class would look something like this:

In [2]:
class RegularCard:
    def __init__(self, rank, suit):
        self.rank = rank
        self.suit = suit

While this is not much more code to write, you can already see signs of the boilerplate pain:
1. rank and suit are both repeated three times simply to initialize an object.
2. if you try to use this plain class, you’ll notice that the representation of the objects is not very descriptive
3. for some reason a queen of hearts is not the same as a queen of hearts

In [4]:
queen_of_hearts = RegularCard('Q', 'Hearts')
print(queen_of_hearts.rank)
print(queen_of_hearts)
print(queen_of_hearts == RegularCard('Q', 'Hearts'))

Q
<__main__.RegularCard object at 0x7f0a5427a310>
False


Regarding to data classes decorator:
1. data classes implement a **.__repr__()** method to provide a nice string representation automatically. 
2. an **.__eq__()** method that can do basic object comparisons.

For the RegularCard class to imitate the data class above, you need to add these methods as well:

In [6]:
class RegularCard:
    def __init__(self, rank, suit):
        self.rank = rank
        self.suit = suit

    def __repr__(self):
        return (f'{self.__class__.__name__}'
                f'(rank={self.rank!r}, suit={self.suit!r})')

    def __eq__(self, other):
        if other.__class__ is not self.__class__:
            return NotImplemented
        return (self.rank, self.suit) == (other.rank, other.suit)

# Basic Data Classes

A data class is a regular Python class. The only thing that sets it apart is that it has [basic data model methods](https://docs.python.org/3/reference/datamodel.html#basic-customization),  like **.__init__(), .__repr__(), and .__eq__()** implemented for you. When you create a class with a data class decorator, the data class will automatically create a class for you with definitions of these model methods. 

In [10]:
# As an example, we will create a Position class that will represent geographic 
# positions with a name as well as the latitude and longitude
from dataclasses import dataclass

@dataclass
class Position:
    """
    1. We just use variable annotations to list the fields in this class. 
    2. For each filed definition, we need to specify its type info
    """
    name: str # variable annotations without self keywords
    lon: float
    lat: float

print(Position)

<class '__main__.Position'>


In [11]:
pos = Position('Oslo', 10.8, 59.9)
print(pos)
print(pos.lat)
print(f'{pos.name} is at {pos.lat}°N, {pos.lon}°E')

Position(name='Oslo', lon=10.8, lat=59.9)
59.9
Oslo is at 59.9°N, 10.8°E


You can also create data classes similarly to how named tuples are created. The following is (almost) equivalent to the definition of Position above:

In [12]:
from dataclasses import make_dataclass

Position = make_dataclass('Position', ['name', 'lat', 'lon'])
print(Position)

<class 'types.Position'>


## Default Values

It is easy to add default values to the fields of your data class. This works exactly as if you had specified the default values in the definition of the **.__init__()** method of a regular class

In [13]:
from dataclasses import dataclass

@dataclass
class Position:
    name: str
    lon: float = 0.0 # default value
    lat: float = 0.0

## Type Hints
In fact, adding some kind of type hint is **mandatory** when defining the fields in your data class.
1. Without a type hint, the field will not be a part of the data class.
2. if you do not want to add explicit types to your data class, use typing.Any

In [14]:
from dataclasses import dataclass
from typing import Any

@dataclass
class WithoutExplicitTypes:
    name: str
    value: Any = 42

## Adding Methods
You already know that a data class is just a regular class. That means that you can freely add your own methods to a data class.

You can add a **.distance_to()** method to your data class just like you can with normal classes:

In [16]:
from dataclasses import dataclass
from math import asin, cos, radians, sin, sqrt

@dataclass
class Position:
    name: str
    lon: float = 0.0
    lat: float = 0.0
    
    # add a normal class, but need to specific self keyword.
    def distance_to(self, other):
        r = 6371  # Earth radius in kilometers
        lam_1, lam_2 = radians(self.lon), radians(other.lon)
        phi_1, phi_2 = radians(self.lat), radians(other.lat)
        h = (sin((phi_2 - phi_1) / 2)**2
             + cos(phi_1) * cos(phi_2) * sin((lam_2 - lam_1) / 2)**2)
        return 2 * r * asin(sqrt(h))

In [17]:
oslo = Position('Oslo', 10.8, 59.9)
vancouver = Position('Vancouver', -123.1, 49.3)
oslo.distance_to(vancouver)

7181.7841229421165

## field function

# Alternatives to Data Classes

In [23]:
###################### Data class version
from dataclasses import dataclass
@dataclass
class DataClassCard:
    rank: str
    suit: str
queen_of_hearts = DataClassCard('Q', 'Hearts')
        
###################### Tuple
queen_of_hearts_tuple = ('Q', 'Hearts')

###################### namedtuple
from collections import namedtuple
NamedTupleCard = namedtuple('NamedTupleCard', ['rank', 'suit'])
queen_of_hearts_name = NamedTupleCard('Q', 'Hearts')

###################### Dictionary
queen_of_hearts_dict = {'rank': 'Q', 'suit': 'Hearts'}


###################### the attrs project.

import attr

@attr.s
class AttrsCard:
    rank = attr.ib()
    suit = attr.ib()

## Comparing with Tuple/Dict 

They works. However, it puts a lot of responsibility on you as a programmer

1. You need to remember that the queen_of_hearts_... variable represents a card.
2. For the tuple version, you need to remember the order of the attributes. Writing ('Spades', 'A') will mess up your program but probably not give you an easily understandable error message.
3. If you use the dict version, you must make sure the names of the attributes are consistent. For instance {'value': 'A', 'suit': 'Spades'} will not work as expected.

## Comparing with namedtuple

A better alternative is the [namedtuple](https://dbader.org/blog/writing-clean-python-with-namedtuples). It has long been used to create readable small data structures. 

1. data classes come with many more features than you have seen so far. 
2. the namedtuple has some other features that are not necessarily desirable. By design, a namedtuple is a regular tuple.
3. namedtuple lacks of awareness about its own type can lead to subtle and hard-to-find bugs. especially since it will also happily compare two different namedtuple classes. 
4. it is hard to add default values to some of the fields in a namedtuple.
5. A namedtuple is also by nature immutable. That is, the value of a namedtuple can never change. 
6. 

In [21]:
# By design, a namedtuple is a regular tuple. This can be seen in comparisons, for instance:
print(queen_of_hearts_name == ('Q', 'Hearts'))

# Two diferent definitions of namedtuple, but return true when comparing them
Person = namedtuple('Person', ['first_initial', 'last_name'])
ace_of_spades = NamedTupleCard('A', 'Spades')
print(ace_of_spades == Person('A', 'Spades'))

True
True


Data classes will not replace all uses of namedtuple. For instance, if you need your data structure to behave like a tuple, then a named tuple is a great alternative!


The [attrs](https://www.attrs.org/en/stable/) project is great and does support some features that data classes do not, including converters and validators. Furthermore, attrs has been around for a while and is supported in Python 2.7 as well as Python 3.4 and up. However, as attrs is not a part of the standard library, it does add an external dependency to your projects. Through data classes, similar functionality will be available everywhere.

In addition to tuple, dict, namedtuple, and attrs, there are many other similar projects, including typing.NamedTuple, namedlist, attrdict, plumber, and fields. While data classes are a great new alternative, there are still use cases where one of the older variants fits better. For instance, if you need compatibility with a specific API expecting tuples or need functionality not supported in data classes.

# Reference
1. [Data Classes in Python 3.7+ (Guide)](https://realpython.com/python-data-classes/)
2. [PEP 557](https://peps.python.org/pep-0557/#id7)
3. [dataclasses — Data Classes](https://docs.python.org/3/library/dataclasses.html)