# Dataclass Tutorial Notes

Here are some common use cases of `dataclass` in python for AI and ML.

* **Storing Model Configurations:** Machine learning models often have various hyperparameters that can be tweaked. Dataclasses provide a clean way to define these configurations, making it easier to experiment with different settings and track results. For instance, you could create a dataclass to store learning rate, batch size, and optimizer settings for a neural network.

* **Data Preprocessing Pipelines:** Data preprocessing is a crucial step in machine learning. Dataclasses can be used to represent the various stages of a preprocessing pipeline, including data normalization, feature scaling, and transformation. This imporves code readability and maintainability.

* **Experiment Logging:** When running machine learning experiments, it's essential to keep track of the data used, model configurations, and performance metrics. Dataclasses can be used to create structured logs that capture this information, simplifying analysis and comparison of different runs.

* **Feature Engineering:** Feature engineering involves creating new features from existing data. Dataclasses can be used to represent these new features, making it easier to track their origin and impact on model performance.


**NOTE:** <font color='green'>Dataclasses promote clean, concise, and well-organized code for data-centric tasks in AI and machine learning. This improves readability, maintainability, and helps you manage complex data structures effectively.</font>

## Example 1: Hyperparameter Management

In [52]:
from dataclasses import dataclass

@dataclass
class Hyperparameters:
    learning_rate: float = 0.01
    batch_size: int = 32
    epochs: int = 100

params = Hyperparameters(learning_rate=0.001,
                         batch_size=32,
                         epochs=10)
params2 = Hyperparameters(learning_rate=0.005, 
                          batch_size=8)

print(params)
print(params2)
print('='*20)
print(params.learning_rate)

Hyperparameters(learning_rate=0.001, batch_size=32, epochs=10)
Hyperparameters(learning_rate=0.005, batch_size=8, epochs=100)
0.001


## Example 2: Model Configuration

In [4]:
@dataclass
class ModelConfig:
    input_dim: int
    hidden_dim: int
    output_dim: int
    activation: str

config = ModelConfig(input_dim=100,
                     hidden_dim=50,
                     output_dim=10, 
                     activation='relu')

print(config)

ModelConfig(input_dim=100, hidden_dim=50, output_dim=10, activation='relu')


In [53]:
from dataclasses import dataclass

@dataclass
class MLPConfig:
  learning_rate: float = 0.01
  batch_size: int = 32
  epochs: int = 100
  hidden_units: int = 64

# Example usage
config = MLPConfig(learning_rate=0.005, hidden_units=128)

@dataclass
class ExperimentLog:
  model_name: str
  config: MLPConfig
  training_time: float
  accuracy: float

# Example usage
log = ExperimentLog("MLP", config, 120.5, 0.87)
# You can then store or visualize this log information

In [55]:
import json

# ... your experiment code ...

# After experiment run
log = ExperimentLog(model_name="MLP", config=config, training_time=1100.0, accuracy=0.90)

# Convert config dataclass to dictionary
config_dict = log.config.__dict__

# print(config_dict)

# Create the serializable dictionary for JSON
log_dict = {
    "model_name": log.model_name,
    "config": config_dict,
    "training_time": log.training_time,
    "accuracy": log.accuracy
}

with open("experiment_log.json", "a") as f:
  json.dump(log_dict, f)  # Convert dataclass to dictionary for json


In [40]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def __repr__(self):
        return f"Point(x={self.x}, y={self.y})"
    
    def __str__(self):
        return f"({self.x}, {self.y})"

# Creating an instance of Point
p = Point(3, 4)

# Using __repr__()
print(repr(p))  # Output: Point(x=3, y=4)

# Using __str__()
print(str(p))  # Output: (3, 4)

# Without explicitly calling repr() or str()
print(p)  # Output: (3, 4) -> __str__() is called


Point(x=3, y=4)
(3, 4)
(3, 4)


In [41]:
item = 'abc'
print(f'{item!r}')

'abc'


## What are Python Data Classes?

Python Data Classes are a powerful feature introduced in Python 3.7, designed to simplify the creation of classes that primarily store data. 

They offer an elegant and concise way to define data structures while reducing boilerplate code, making your code more readable and maintainable.

At their core, Python Data Classes are regular classes with added functionality provided by the @dataclass decorator from the Data Classes module. This decorator automatically generates special methods that are commonly used when defining data-handling classes, such as `__init__(), __repr__(), __eq__(), and __hash__()`.

This automation reduces the need for manual method implementation, allowing developers to focus on the essential attributes and types of their data.

## Benefits of Python Data Classes

* Conciseness: Python Data Classes significaltly reduce boilerplate code. With a simple decorator, you can define a class with attributes, type hints, and default values in just a few lines.

* Immutability: Data Classes are immutable by default, meaning their attributes cannot be changed after initialization. This immutability is beneficial when dealing with data that should remain constant, preventing accidental modifications and increasing code reliability.

* Readability: Data Classes automatically generate a human-readable __repr__() method, making it easier to inspect and debug instances.

* Structural equality: Data Classes provide structural equality out of the box. Instances with the same attributes and values are considered equal.

* Built-in methods: Python Data Classes generate essential special methods, such as __init__(), __repr__(), and __eq__(), reducing the need for manual method definition.

* Type hinting: Python's type hinting system integrates seamlessly with Data Classes.

* Customisation: Data Classes come with convenient defaults, you can customize their behaviour by adding additional decorators or explicitly defining methods.

## Example:

In [42]:
from dataclasses import dataclass

@dataclass
class MyClass:
    attribute1: int
    attribute2: str

obj = MyClass(42, "Hello, Data Class!")

In [43]:
## Accessing Attributes
print(obj.attribute1)
print(obj.attribute2)

42
Hello, Data Class!


### Immutability

Immutability in Python Classes ensures that once an instance is created, its attribute values cannot be changed.


In [56]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Point:
    x: int
    y: int

point = Point(2, 3)

In [46]:
point.x = 5  # This will raise an error

FrozenInstanceError: cannot assign to field 'x'

In [57]:
try:
    point.x = 5
except Exception as e:
    print(e)

cannot assign to field 'x'


### Structural Equality

In [58]:
obj1 = MyClass(42, "Hello, Data Class!")
obj2 = MyClass(42, "Hello, Data Class!")

print(obj1 == obj2) 

True


### Serialisation
Data Classes are ideal for serialising and deserialising data. You can use the json module to convert data class instances to JSON and vice versa.

In [51]:
import json

# Serialise to JSON
data = json.dumps(obj.__dict__)
print(data)

# Deserialise from JSON
new_obj = MyClass(**json.loads(data))
print(new_obj)

{"attribute1": 42, "attribute2": "Hello, Data Class!"}
MyClass(attribute1=42, attribute2='Hello, Data Class!')


## When to use Python Data Classes -- and when not to use them

Python Data Classes simplify the creation of classes and are primarily used to store data. 

## Conclusion

Python Data Classes are a valuable addition to the Python language that simplifies the creation of classes for storing data. Their conciseness, immutability, readability, and generated methods make them an excellent choice for various use cases, such as configuration management, data transfer objects, and more. By using Python Data Classes, you can enhance the efficiency and maintainability of your code, ultimately making your development journey smoother and more enjoyable. 