# **Special / Magic / Dunder Methods in Python**

## **Objectives:**
By the end of this lesson, students should be able to:
- Understand what special (dunder) methods are.
- Identify key magic methods like `__init__`, `__repr__`, `__str__`, and `__len__`.
- Implement these methods in beginner, intermediate, and advanced data science examples.
- Appreciate how these methods help in writing cleaner and more intuitive code.

## What Are Special / Magic / Dunder Methods?

In Python, special methods (also known as **magic methods** or **dunder methods**) are functions that have **double underscores** before and after their names, e.g., `__init__`, `__str__`, `__len__`.

They are used to **define behavior for built-in Python operations** like:
- Object initialization
- String representation
- Length calculation
- Addition, subtraction, etc.

---

###  1. `__init__` – The Constructor Method

Used to **initialize** a new object with specific attributes.


### Beginner Example:

In [1]:
class DataScientist:
    def __init__(self, name, specialty):
        self.name = name
        self.specialty = specialty

In [2]:
ds = DataScientist("Alice", "NLP")
print(ds.name)
print(ds.specialty)

Alice
NLP


### Intermediate Example:


In [3]:
class Dataset:
    def __init__(self, name, rows, columns):
        self.name = name
        self.rows = rows
        self.columns = columns

In [4]:
titanic = Dataset("Titanic", 891, 12)
print(titanic.name, titanic.rows, titanic.columns)

Titanic 891 12


### Advanced Example (with data validation):


In [5]:
class ModelResult:
    def __init__(self, model_name, accuracy):
        if accuracy < 0 or accuracy > 1:
            raise ValueError("Accuracy must be between 0 and 1")
        self.model_name = model_name
        self.accuracy = accuracy

In [6]:
result = ModelResult("Random Forest", 0.89)

In [7]:
result = ModelResult("Decision Tree", -0.1)

ValueError: Accuracy must be between 0 and 1

## 2. `__repr__` – Official String Representation (For Debugging)

### Purpose:
- Used by **developers** and debugging tools.
- Aims to give an **unambiguous**, code-like string representation of an object.
- Triggered by the built-in `repr()` function and also used when printing objects in lists, dictionaries, or logs.


## Beginner Level: With vs Without `__repr__`

### Without `__repr__`:

In [8]:
class DataScientist:
    def __init__(self, name, specialty):
        self.name = name
        self.specialty = specialty

In [9]:
ds = DataScientist("Grace", "Data Engineering")
print(ds)

<__main__.DataScientist object at 0x7e75fc011070>


> Not helpful when debugging, especially in large data workflows.

---

### With `__repr__`:

In [10]:
class DataScientist:
    def __init__(self, name, specialty):
        self.name = name
        self.specialty = specialty

    def __repr__(self):
        return f"DataScientist(name='{self.name}', specialty='{self.specialty}')"

In [11]:
ds = DataScientist("Grace", "Data Engineering")
print(ds)

DataScientist(name='Grace', specialty='Data Engineering')


> Much better for understanding object content during debugging/logging.

---

## Intermediate Example: Dataset

### Without `__repr__`:

In [12]:
class Dataset:
    def __init__(self, name, rows, columns):
        self.name = name
        self.rows = rows
        self.columns = columns

In [13]:
iris = Dataset("Iris", 150, 4)
print(iris)

<__main__.Dataset object at 0x7e75fc011070>


### With `__repr__`:


In [14]:
class Dataset:
    def __init__(self, name, rows, columns):
        self.name = name
        self.rows = rows
        self.columns = columns

    def __repr__(self):
        return f"Dataset(name='{self.name}', rows={self.rows}, columns={self.columns})"

In [15]:
iris = Dataset("Iris", 150, 4)

In [16]:
print(iris)

Dataset(name='Iris', rows=150, columns=4)


### Expert Tip:

If you define **only `__repr__`**, Python will use it for both `print(obj)` and `repr(obj)`:

In [17]:
class ModelResult:
    def __init__(self, model_name, accuracy):
        self.model_name = model_name
        self.accuracy = accuracy

    def __repr__(self):
        return f"ModelResult(model='{self.model_name}', accuracy={self.accuracy:.2%})"

In [18]:
res = ModelResult("Random Forest", 0.937)
print(res)          # Uses __repr__ since __str__ is not defined
print(repr(res))    # Uses __repr__

ModelResult(model='Random Forest', accuracy=93.70%)
ModelResult(model='Random Forest', accuracy=93.70%)


> **This is useful when you want **one unified representation** that’s still developer-friendly.**

---

## 3. `__str__` – Human-Readable String Representation

Defines the **string version of the object** when printed. Used with `print()` and `str()`.


### Beginner Example `without` __str__:

In [19]:
class DataScientist:
    def __init__(self, name, specialty):
        self.name = name
        self.specialty = specialty

In [20]:
# Print output

ds = DataScientist("Bob", "Computer Vision")
print(ds)


<__main__.DataScientist object at 0x7e75fabcc0e0>


**This is the default object representation, not very informative for humans.**

---

### Beginner Example with __str__:

In [21]:
class DataScientist:
    def __init__(self, name, specialty):
        self.name = name
        self.specialty = specialty

    def __str__(self):
        return f"{self.name}, specializes in {self.specialty}"

In [22]:
ds = DataScientist("Bob", "Computer Vision")
print(ds)

Bob, specializes in Computer Vision


**Much more readable and user-friendly!**

---

### Intermediate Example `without` __str__:


In [23]:
class Dataset:
    def __init__(self, name, rows, columns):
        self.name = name
        self.rows = rows
        self.columns = columns


In [24]:
titanic = Dataset("Titanic", 891, 12)
print(titanic)

<__main__.Dataset object at 0x7e75fabcc8f0>


**This is the default object representation, not very informative for humans.**

---

### Intermediate Example `with` __str__:

In [25]:
class Dataset:
    def __init__(self, name, rows, columns):
        self.name = name
        self.rows = rows
        self.columns = columns

    def __str__(self):
        return f"Dataset: {self.name} | Shape: ({self.rows}, {self.columns})"

In [26]:
titanic = Dataset("Titanic", 891, 12)
print(titanic)

Dataset: Titanic | Shape: (891, 12)


**Much more readable and user-friendly!**

---

### Advanced Example `with` __str__:


In [27]:
class Experiment:
    def __init__(self, title, model, score):
        self.title = title
        self.model = model
        self.score = score

    def __str__(self):
        return f"[{self.title}] - Model: {self.model}, Accuracy: {self.score * 100:.2f}%"

In [28]:
exp = Experiment("Churn Prediction", "XGBoost", 0.943)
print(exp)

[Churn Prediction] - Model: XGBoost, Accuracy: 94.30%


## 4. `__len__` – Custom Length Function

### Purpose:
- Makes your object compatible with the built-in `len()` function.
- Commonly used when your class **holds a collection** (e.g. list of data points, models, experiments).
- Returns an integer indicating the size or count of what's inside.

## Beginner Level: With vs Without `__len__`

### Without `__len__`:

In [29]:
class Team:
    def __init__(self, members):
        self.members = members

In [30]:
data_team = Team(["Alice", "Bob", "Charlie"])
print(len(data_team)) 

TypeError: object of type 'Team' has no len()

### With `__len__`:


In [31]:
class Team:
    def __init__(self, members):
        self.members = members

    def __len__(self):
        return len(self.members)

In [32]:
data_team = Team(["Alice", "Bob", "Charlie"])
print(len(data_team))

3


> **Now the object behaves like a list and can be measured using `len()`.**

---

##  Intermediate Example: Dataset

In [33]:
class Dataset:
    def __init__(self, name, records):
        self.name = name
        self.records = records

    def __len__(self):
        return len(self.records)

In [34]:
sensor_data = Dataset("Sensor Logs", records=[{"temp": 22}, {"temp": 23}, {"temp": 21}])
print(len(sensor_data))

3


**Why it's useful**:
- In real-world datasets (e.g. from pandas or a database), being able to call `len()` helps you quickly understand size and iterate accordingly.

---

## Advanced Example: Experiment Tracker

In [35]:
class ExperimentTracker:
    def __init__(self):
        self.experiments = []

    def add_experiment(self, exp):
        self.experiments.append(exp)

    def __len__(self):
        return len(self.experiments)

In [36]:
tracker = ExperimentTracker()
tracker.add_experiment("Logistic Regression")
tracker.add_experiment("Random Forest")
print(len(tracker))

2


> **Imagine you're running A/B tests or ML experiments—`len(tracker)` can help you dynamically monitor how many trials have been logged.**

---

## **5 and 6 `__getitem__` and `__iter__`**

### **Objectives**
By the end of this lecture, students will be able to:

- Understand what `__getitem__` and `__iter__` do in Python.
- Explain how these magic methods enable indexing and iteration in custom classes.
- Implement custom iterable datasets using these methods.
- Apply these concepts in data science workflows like dataset iteration and model training loops.

### `__getitem__(self, index)`

Allows objects of a class to support **indexing** and **slicing** using square brackets — like a list or DataFrame.

###  Why It Matters in Data Science:
Custom datasets used in training models (like in PyTorch or TensorFlow) rely on `__getitem__` to return feature-label pairs.

### Beginner Example: Index Access in a Dataset

In [37]:
class Team:
    def __init__(self, members):
        self.members = members

    def __getitem__(self, index):
        return self.members[index]

In [38]:
data_team = Team(["Alice", "Bob", "Charlie"])
print(data_team[0])
print(data_team[2])

Alice
Charlie


In [39]:
class MyDataset:
    def __init__(self, data):
        self.data = data

    def __getitem__(self, index):
        return self.data[index]

In [40]:
dataset = MyDataset(["Row 1", "Row 2", "Row 3"])
print(dataset[1])

Row 2


### Intermediate Example: Return Dictionary of Features & Label


In [41]:
class MLData:
    def __init__(self, data):
        self.data = data

    def __getitem__(self, index):
        features, label = self.data[index]
        return {"features": features, "label": label}

In [42]:
data = [([1.2, 0.7], 1), 
        ([0.3, 0.4], 0)]

- The whole `data` is a **list** of examples.
- Each example is a **tuple**:  
  → `([features], label)`
- Features are input values (like `[1.2, 0.7]`)  
- Label is the expected output (like `1` or `0`)

In [43]:
dataset = MLData(data)
print(dataset[0])

{'features': [1.2, 0.7], 'label': 1}


## `__iter__(self)`

### Purpose:
Allows objects of a class to be **iterated over** using `for ... in` loops. This makes your object behave like a list or other iterable.

###  Why It Matters in Data Science:
Makes it easy to **loop over batches**, rows in a dataset, or a series of results in ML pipelines.

### Beginner Example: Iterate Through Team Members

In [44]:
class Team:
    def __init__(self, members):
        self.members = members

    def __iter__(self):
        return iter(self.members)

In [45]:
team = Team(["Alice", "Bob", "Charlie"])
for member in team:
    print(member)

Alice
Bob
Charlie


### Intermediate Example: Custom Dataset for Training Loop


In [46]:
class Dataset:
    def __init__(self, name, rows):
        self.name = name
        self.rows = rows  # list of dicts

    def __iter__(self):
        return iter(self.rows)

sensor_data = Dataset("Sensor Logs", [
    {"temp": 22}, {"temp": 23}, {"temp": 21}
])


In [47]:
for row in sensor_data:
    print(row)

{'temp': 22}
{'temp': 23}
{'temp': 21}


In [48]:
class MLDataset:
    def __init__(self, data):
        self.data = data

    def __iter__(self):
        for item in self.data:
            yield item

In [49]:
dataset = MLDataset([("X1", "y1"), ("X2", "y2")])
for features, label in dataset:
    print(f"Input: {features}, Label: {label}")

Input: X1, Label: y1
Input: X2, Label: y2


### Advanced Combining `__len__`, `__getitem__`, and `__iter__`


In [50]:
class ExperimentTracker:
    def __init__(self):
        self.experiments = []

    def add_experiment(self, model, accuracy):
        self.experiments.append({"model": model, "accuracy": accuracy})

    def __len__(self):
        return len(self.experiments)

    def __getitem__(self, index):
        return self.experiments[index]

    def __iter__(self):
        return iter(self.experiments)



In [51]:
tracker = ExperimentTracker()
tracker.add_experiment("XGBoost", 0.91)
tracker.add_experiment("Random Forest", 0.88)


In [52]:
print(len(tracker))
print(tracker[0])

2
{'model': 'XGBoost', 'accuracy': 0.91}


In [53]:
for exp in tracker:
    print(exp["model"], "→", exp["accuracy"])

XGBoost → 0.91
Random Forest → 0.88


### Advanced Example Combining `__len__`, `__getitem__`, and `__iter__`: Simulating Model Training Loop

In [54]:
class TrainingDataset:
    def __init__(self, data):
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        x, y = self.data[idx]
        return {"X": x, "y": y}

    def __iter__(self):
        return iter(self.data)

In [55]:
# Data
samples = [
    ([0.1, 0.2], 0),
    ([0.5, 0.6], 1),
    ([0.9, 1.0], 1),
]

train_data = TrainingDataset(samples)

In [56]:
# Testing __iter__ via unpacking in a for‑loop  
for X, y in train_data:
    print(f"Training on input: {X}, label: {y}")

Training on input: [0.1, 0.2], label: 0
Training on input: [0.5, 0.6], label: 1
Training on input: [0.9, 1.0], label: 1


In [57]:
#  Testing __len__  
print("Length of dataset:", len(train_data))

Length of dataset: 3


In [58]:
# Testing __getitem__  
print("Item at index 0:", train_data[0])
print("Item at index 2:", train_data[2])

Item at index 0: {'X': [0.1, 0.2], 'y': 0}
Item at index 2: {'X': [0.9, 1.0], 'y': 1}


## Python Magic (Dunder) Methods – Summary for Data Scientists

Magic methods (also called **special** or **dunder** methods) are built-in hooks that let your custom classes **behave like Python's built-in types**. This makes your code more intuitive, readable, and scalable—especially in Data Science pipelines.

### Core Magic Methods for Data Science

| Method         | What It Does                          | Data Science Example                           |
|----------------|----------------------------------------|------------------------------------------------|
| `__init__`     | Initializes object attributes          | Set up a dataset or model config               |
| `__str__`      | Human-readable string (`print(obj)`)   | Print info like model name or dataset shape    |
| `__repr__`     | Debugging string (`repr(obj)`)         | Console logging for debugging objects          |
| `__len__`      | Returns object size (`len(obj)`)       | Number of rows/samples in dataset              |
| `__getitem__`  | Index access (`obj[i]`)                | Retrieve a specific sample by index            |
| `__iter__`     | Looping support (`for item in obj`)    | Iterate over all data samples in training loop |


### Real-World Analogy (Datasets as Excel Sheets)

| You Want To...                | Magic Method   | Analogy                          |
|------------------------------|----------------|----------------------------------|
| Access a row by index        | `__getitem__`  | Like Excel’s row number          |
| Loop through rows            | `__iter__`     | Like scrolling through rows      |
| Count total rows             | `__len__`      | Like knowing total rows in sheet |
| Print readable summary       | `__str__`      | Like printing sheet description  |
| Debug internal structure     | `__repr__`     | Like looking at raw file data    |