# Activity - dataclass

Python offers a few ways to build a simple class that is just a collection of fields (data), with
little or no extra functionality. That pattern is known as a `data class`. [1]

The purposes of these class are primarily holding data only, such as a point on a plane, a location on a map, a group of features of an object, etc. 

## Regular class

Let's build a class holding x and y coordinates of points. 

In [2]:
class PointClass:
    count = 0 # class attribute
    def __init__(self, x, y):
        # instance attributes 
        self.x = x
        self.y = y

In [3]:
p1 = PointClass(0,0)
p2 = PointClass(0,0)

In [4]:
# repr or str is not implemented by default
print(p1)

<__main__.PointClass object at 0x7f87681fa8f0>


In [5]:
# eq does not work
p1 == p2

False

A regular class requires the implementation of init, repr, str, and eq explicitly. 

print and == do not work by default. 

## namedtuple

We can use namedtuple we have learned in previous section to make print and == work by default. 

In [6]:
from collections import namedtuple

PointTuple = namedtuple("Point", "x y")

In [7]:
p1 = PointTuple(0,0)
p2 = PointTuple(0,0)

In [8]:
print(p1)

Point(x=0, y=0)


In [9]:
p1 == p2

True

However, there are some limitations using namedtuple in complex situations:

- The instance is immutable. The values of the attributes cannot be changed to another literal. 
- It does not support class statement, which makes it is hard to add methods, docstring, and other class functionalities.
- The syntax to set default values is complex. 
- Not sure how to add type hint or annotations... 

In [10]:
Student = namedtuple("Student", ["name", "courses"])
s = Student("John", ["CMPT300J"])

In [11]:
# change mutable instance attribute in place
s.courses.append("CMPT200")
s.courses

['CMPT300J', 'CMPT200']

In [12]:
# update immutable instance attribute
s.name = "Alice"

AttributeError: ignored

In [13]:
# reset mutable instance attribute
s.courses = ["DATA220"]

AttributeError: ignored

## dataclass

The `dataclass` decorator is a feature introduced in Python 3.7 as part of the `typing` module. It provides a convenient way to define classes that primarily store data, automatically generating common methods such as `__init__()`, `__repr__()`, `__eq__()`, and more.

By using the `@dataclass` decorator, you can define a class with minimal code, letting Python automatically generate common methods and handle common tasks associated with data classes.

### Type hint and annotation

Type hints—a.k.a. type annotations—are ways to declare the expected type of function
arguments, return values, variables, and attributes.

The first thing you need to know about type hints is that they are not enforced at all
by the Python bytecode compiler and interpreter. [1]

We will cover more on type hint specifically later in another activity. 

The basic syntax of variable annotation is:
```
var_name: some_type
```

You can also initialize the variable with a value. 
```
var_name: some_type = a_value
```

In [14]:
a: int # a is expected to be an int, but a is not actually assigned to any value. 
a # not defined error 

NameError: ignored

In [15]:
b: int = 0 # b is expected to be an int. The initial value of b is 0. 
b

0

In [17]:
b = 0.0 # however, the type int is not enforced at all. b can be any type anyway. 
b

0.0

In [18]:
# Type hint example in a function
def func(a: int, b: int) -> str:
    """
    a is an int
    b is an int
    The output is a string
    """
    return str(a+b)

func(1,1)

'2'

In [19]:
# Type hint example in a class
class MyClass:
    a: int
    b: float = 1.0
    c = 'something'

    def __init__(self, x:float, y:float):
        self.x = x
        self.y = y

obj = MyClass(1.0, 1.0)

In [20]:
MyClass.a # a is annotated, but a is not bounded to any value. Thus a is not a defined class attribute. 

AttributeError: ignored

In [21]:
print(MyClass.b) # b is an annotated class attribute. 
print(MyClass.c) # c is a plain odd class attribute, not an annotation. 

1.0
something


In [22]:
print(obj.x, obj.y) # x and y are annotated instance attributes. 

1.0 1.0


In [23]:
print(obj.b) # instance can retrive class attribute

1.0


Here is a piece of testing code showing the relationship between class and instance attributes. 

In [24]:
# Changing the value of b through instance will not affect the value of b at class level. 
obj.b += 1.0
print(obj.b)
print(MyClass.b)

2.0
1.0


In [25]:
# Changing the value of b at class level will change the value of b retrieved by new instances. 
MyClass.b = 3.0
print(obj.b)
obj2 = MyClass(1.0, 1.0)
print(obj2.b)

2.0
3.0


### Build a data class

In [26]:
from dataclasses import dataclass

@dataclass
class PointDataClass:
    x: float 
    y: float 
    z: float = 0.0

@dataclass is a class decorator. We will cover decorator specifically in a later activity. By using @dataclass, we can define a class with init, repr, and eq by default. 

#### instance attribute

x and y are annotated instance attributes without default values. 

z is an annotated instance attributes with default value. 

The values of x and y are required when creating a new object, just like the way we require x and y in init in the regular class. 

In [27]:
p1 = PointDataClass(1.0, 1.0)
p1 # repr

PointDataClass(x=1.0, y=1.0, z=0.0)

In [28]:
p2 = PointDataClass(1.0, 1.0)
p1 == p2 # eq

True

In [29]:
p1.x = 2.0
print(p1) # change x in p1
print(p2) # x in p2 is indepedent from x in p1

PointDataClass(x=2.0, y=1.0, z=0.0)
PointDataClass(x=1.0, y=1.0, z=0.0)


In [30]:
p3 = PointDataClass(1.0, 1.0, 1.0) # override the default value of z
p3

PointDataClass(x=1.0, y=1.0, z=1.0)

#### class attributes

dims does not have annotation and it is treated as a class attribute. 

In [31]:
@dataclass
class PointDataClass2:
    x: float 
    y: float 
    z: float = 0.0
    dims = 3

In [32]:
p4 = PointDataClass2(1.0, 1.0)
print(PointDataClass2.dims) # access dims via class
print(p4.dims) # retrive dims from class via instance p3 

3
3


#### corrupted mutable default value 

Check out this example:

In [33]:
class DemoClass:
    def __init__(self, a = []):
        self.a = a


In [34]:
d1 = DemoClass()
print(d1.a)

[]


In [35]:
d1.a.append(1)
d2 = DemoClass()
print(d2.a)

[1]


#### default mutable value via field

In [36]:
from dataclasses import field

@dataclass
class PointDataClass3:
    x: float 
    y: float 
    z: float = 0.0
    dims = 3
    attr: list = field(default_factory = list)  # do not use [] as default since [] is not callable.


In [37]:
p5 = PointDataClass3(1.0, 1.0)
p5.attr.append(1)
p5

PointDataClass3(x=1.0, y=1.0, z=0.0, attr=[1])

In [38]:
p6 = PointDataClass3(1.0, 1.0)
p6 # attr is still an empty list. 

PointDataClass3(x=1.0, y=1.0, z=0.0, attr=[])

In [39]:
def mylist():
    return [1,2,3]

@dataclass
class PointDataClass3:
    x: float 
    y: float 
    z: float = 0.0
    dims = 3
    attr: list = field(default_factory = mylist) # use self-defined callable function as default 

p7 = PointDataClass3(1.0, 1.0)
p7

PointDataClass3(x=1.0, y=1.0, z=0.0, attr=[1, 2, 3])

#### post_init

The __init__ method generated by @dataclass only takes the arguments passed and
assigns them—or their default values, if missing—to the instance attributes that are
instance fields. But you may need to do more than that to initialize the instance.
If that's the case, you can provide a __post_init__ method. When that method
exists, @dataclass will add code to the generated __init__ to call __post_init__ as
the last step.


Common use cases for __post_init__ are validation and computing field values
based on other fields. [1]

In [40]:
from dataclasses import field

@dataclass
class PointDataClass4:
    x: float 
    y: float 
    z: float = 0.0
    dims = 3
    attr: list = field(default_factory = list)  

    def __post_init__(self):
        if self.x > 0:
            print("x is positive")
        else:
            print("x is not positive")

In [41]:
p8 = PointDataClass4(1.0, 1.0)
p8

x is positive


PointDataClass4(x=1.0, y=1.0, z=0.0, attr=[])

## Summarization

|                                   | Regular Class | namedtuple | dataclass     |
|-----------------------------------|---------------|------------|---------------|
| built-in repr/str/eq              | No            | Yes        | Yes           |
| mutable instance                  | Yes           | No         | Yes           |
| class statement                   | Yes           | No         | Yes           |
| type hint/annotation              | Easy          | Hard       | Easy          |
| default value                     | Easy          | Hard       | Easy          |
| mutable default value             | Corrupted     | N/A        | Not Corrupted |
| readability for holding data only | Hard          | Easy       | Easy          |


## Reference

1. https://www.fluentpython.com/