# Day 1 - Advanced Python: Regex, Decorators, Mixins, Iterators and Generators

Welcome to the first day of the **Advanced Python** course. Today we will:

- Refresh working with **JupyterLab notebooks** in an “advanced user” way
- Dive deep into **regular expressions** and the `re` module  
  - Common regex functions  
  - Raw strings (`r"..."`)  
  - Complex substitutions with `re.sub` and lambda / helper functions
- Use **function, method, and class decorators** in real-world patterns  
  - `@staticmethod`, `@classmethod`  
  - Custom decorators and decorator factories (e.g. `@retry(n)`)  
  - Decorator chaining  
  - Class decorators for registration and attribute injection
- Understand **multiple inheritance**, **MRO**, the **diamond problem**, and **cooperative `super()`**
- Design and use **mixins** and **mixin factories** for reusable behavior
- Build custom **iterators** and **generators**, and compare them to lists
- Use **functional-style tools** like `map`, `filter`, `reduce` and `itertools` (`chain`, `count`, `cycle`, `repeat`, `groupby`)
- Combine all of the above into a **log processing pipeline** using regex, decorators, mixins, iterators/generators and `itertools`

## Daily agenda and course flow

**09:00 - 10:30 (1h 30m)**  
- JupyterLab refresher (fast)  
- Regular expressions (`re` module)  
  - Common regex functions  
  - Raw strings (`r"..."`)  
  - Substitution with lambda functions  

**10:30 - 10:45 (15m)**  
- Short break (stretch, water, quick walk)

**10:45 - 12:00 (1h 15m)**  
- Decorators  
  - `@staticmethod`, `@classmethod`  
  - Custom decorators and decorator chaining  
  - Decorator factories (e.g. `@retry(n)`)  
  - Class decorators  
  - Combining decorators and multiple inheritance (overview)

**12:00 - 13:00 (1h)**  
- Lunch break

**13:00 - 14:45 (1h 45m)**  
- Multiple inheritance and mixins  
  - Method Resolution Order (MRO)  
  - Diamond problem and solutions using `super()`  
  - Mixins and mixin factories  
  - Real-world mixin usage patterns

**14:45 - 15:00 (15m)**  
- Short break

**15:00 - 16:30 (1h 30m)**  
- Iterators and generators  
  - Iterator protocol, built-in iterators  
  - Custom iterators (e.g. `FractionalRange`, `Fibonacci`)  
  - Generator functions and expressions  
  - Performance characteristics (lists vs generators, `range`, membership tests)  
  - `map`, `filter`, `reduce`  
  - `itertools` (`chain`, `count`, `cycle`, `repeat`, `groupby`)  
- Complex end-to-day example: log processing pipeline combining regex, decorators, mixins, generators and `itertools`


## Getting started with JupyterLab and notebooks (advanced refresher)

We will keep using **JupyterLab** just like in the beginner course.

A notebook is still a sequence of **cells**:

- **Code cells**: contain Python code that you can run.
- **Markdown cells**: contain formatted text, titles, bullet points, links, etc.

### How to run a code cell

1. Click in the code cell.
2. Press `Shift + Enter` to run the cell and move to the next one.  
   Or `Ctrl + Enter` to run and stay in the same cell.
3. The execution counter on the left (`In [1]:`) shows the order in which cells were run.

Variables stay in memory until you **Restart kernel** from the Jupyter menu.

### How to add cells and switch types

- Use the `+` button in the toolbar to add a new cell.
- Use the dropdown (Code / Markdown) to change the type.
- In command mode (blue border):
  - `A` adds a cell **above**
  - `B` adds a cell **below**
  - `M` changes to Markdown
  - `Y` changes to Code

### Why notebooks are useful even for advanced work

- Exploratory data analysis (pandas, matplotlib, PyTorch).
- Interactive experimentation with APIs and libraries.
- Teaching / documentation mixing runnable examples with text.
- Prototyping before moving code into a package or script.

In real projects, it is very common to:
1. Solve a problem in a notebook.
2. Extract the stable parts into a `.py` module.
3. Keep the notebook as a "living documentation" and playground.

If you want to review the original Jupyter introduction, the general docs are here:  
- JupyterLab: https://jupyterlab.readthedocs.io/  
- Notebooks: https://docs.jupyter.org/en/stable/


In [None]:
# Quick sanity check: run this cell
import sys

print("Running Python version:")
print(sys.version)

### ✏ Exercise (easy): Describe your own notebook workflow

Write a few comment lines describing how **you** usually use notebooks in real life, for example:

- For trying out data transformations
- For quick API experiments
- For plotting metrics in a project

There is no enforced "correct" answer here - this is about being explicit about your own workflow.

In [None]:
# TODO: Replace the example comments with your own usage patterns.

# Example:
# - I use notebooks to prototype data cleaning steps.
# - When a cell becomes stable, I move the code into a .py module and import it back.
# - I keep a separate notebook for plots that I regularly regenerate.

In [None]:
# There is no single "correct" solution to this - your real workflow is the best solution.
# When reviewing during the course, we will discuss different patterns and best practices.

## 1. Regular expressions with the re module

Regular expressions (regex) are mini-languages for matching and manipulating text patterns.  
Python's built-in module for regex is `re`.

Useful links:

- Official docs: https://docs.python.org/3/library/re.html  
- Interactive playground: https://regexr.com/ (very useful for experimenting with patterns)

### Common regex functions in the re module

The most commonly used functions are:

- `re.search(pattern, text)` - find the **first** match anywhere in the string.
- `re.match(pattern, text)` - match only at the **beginning** of the string.
- `re.fullmatch(pattern, text)` - match the entire string from start to end.
- `re.findall(pattern, text)` - return a **list** of all non-overlapping matches.
- `re.finditer(pattern, text)` - return an **iterator** of match objects.
- `re.sub(pattern, repl, text)` - replace all matches with `repl`.
- `re.split(pattern, text)` - split a string using a regex delimiter.
- `re.compile(pattern)` - compile a pattern for repeated use (slightly faster, cleaner).

Basic syntax building blocks:

- Character classes:  
  - `\d` digit, `\w` word character, `\s` whitespace  
  - `.` any character except newline
- Quantifiers:  
  - `*` 0 or more, `+` 1 or more, `?` optional  
  - `{n}` exactly n, `{n,}` at least n, `{n,m}` between n and m
- Anchors:  
  - `^` start of string, `$` end of string
- Groups:  
  - `( ... )` capturing group, `(?: ... )` non-capturing group
- Alternation:  
  - `a|b` means "a or b"

When to think "regex":

- When you need to **validate** text (emails, phone numbers, IDs).
- When you need to **extract** structured data from logs or messy text.
- When simple `str.split` / `in` checks are not expressive enough.

Trivia: regex engines are tiny interpreters embedded in your program. They have their own instruction set and execution model. Complex or badly written patterns can be surprisingly expensive, so readability and simplicity matter.

In [1]:
import re

text = """Contact us:
- Email: alice@example.com
- Backup: support+eu@my-company.co.uk
- Phone: +36-30-123-4567
"""

# Simple email regex (still not perfect, but ok for demo)
email_pattern = re.compile(r"[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}")

emails = email_pattern.findall(text)
print("Found emails:", emails)

# Extract all phone-like groups: "+digits-digits-digits-digits"
phone_pattern = re.compile(r"\+\d{2}-\d{2}-\d{3}-\d{4}")
phones = phone_pattern.findall(text)
print("Found phones:", phones)

Found emails: ['alice@example.com', 'support+eu@my-company.co.uk']
Found phones: ['+36-30-123-4567']


### ✏ Exercise (easy): Extract all 3-digit numbers

You get a string that contains product codes, prices and random text.  
Your task:

1. Use `re.findall` and a regex to extract **all 3-digit numbers** (exactly 3 digits).
2. Store them in a list called `codes`.
3. Print the resulting list.

Hints:

- You can use `\d` for "digit".
- You can use `{3}` to mean "exactly 3 repetitions".
- Use the `re` module as shown in the example above.

In [2]:
import re

text = "Order 123: qty 5, code 999; ref 42, extension 007, backup 3210"

# TODO:
# - Write a pattern that matches exactly 3 digits.
# - Use re.findall to extract all occurrences into a list called codes.
# - Print the list.

# pattern = ...
# codes = ...
# print(codes)

In [4]:
# Solution

import re

text = "Order 123: qty 5, code 999; ref 42, extension 007, backup 3210"

pattern = r"\b\d{3}\b"
codes = re.findall(pattern, text)
print(codes)

['123', '999', '007']


### ⚡ Exercise (advanced): Validate a simple date format

You want to validate dates in the format `YYYY-MM-DD` where:

- Year: 4 digits.
- Month: 01-12 (you can accept 00-19 for this exercise, do not overcomplicate).
- Day: 01-31 (again, approximate validation is fine).

Tasks:

1. Define a regex pattern that matches strings like `2025-11-18`.
2. Use `re.fullmatch` to check if a given string is a valid date in this simple sense.
3. Test your function on a list of example strings and print which ones match.

Note: we are not doing full calendar validation here, just structural checks.

In [5]:
import re

def is_simple_date(s: str) -> bool:
    # TODO:
    # - Create a pattern for YYYY-MM-DD (4 digits, dash, 2 digits, dash, 2 digits).
    # - Use re.fullmatch to test s.
    # - Return True / False accordingly.
    #
    # Hint: you can start with something like r"\d{4}-\d{2}-\d{2}".
    pass

tests = ["2025-11-18", "1999-01-01", "20-01-01", "2025-1-01", "2025-13-01", "abcd-ef-gh"]

for t in tests:
    print(t, "->", is_simple_date(t))

2025-11-18 -> None
1999-01-01 -> None
20-01-01 -> None
2025-1-01 -> None
2025-13-01 -> None
abcd-ef-gh -> None


In [6]:
# Solution

import re

def is_simple_date(s: str) -> bool:
    pattern = r"\d{4}-\d{2}-\d{2}"
    return re.fullmatch(pattern, s) is not None

tests = ["2025-11-18", "1999-01-01", "20-01-01", "2025-1-01", "2025-13-01", "abcd-ef-gh"]

for t in tests:
    print(t, "->", is_simple_date(t))

2025-11-18 -> True
1999-01-01 -> True
20-01-01 -> False
2025-1-01 -> False
2025-13-01 -> True
abcd-ef-gh -> False


## 2. Raw strings and complex regex substitution

### What is an r-string (raw string)?

In Python, backslashes in normal strings are used for **escape sequences**:

- `"\n"` is a newline.
- `"\t"` is a tab.

This is annoying for regex, because regex patterns use a lot of backslashes too. For example:

- `\d` means "digit" in regex.
- In a normal string you would need `"\\d"` so that Python turns it into a single `\` before the regex engine sees it.

**Raw strings** solve this:

- `r"\d"` is interpreted by Python as **backslash + d**, no extra escaping.
- You should almost always write regex patterns as raw strings.

Examples:

- Normal: `"\\d{3}"`  
- Raw: `r"\d{3}"`  (this is what we want)

This does not affect what regex does. It only changes how Python parses the string literal.

Official docs: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

### Complex substitution with lambda in re.sub

`re.sub(pattern, repl, text)` can take a **function** instead of a simple replacement string:

- The function receives a `match` object.
- It returns the replacement string for that particular match.

This is very powerful for:

- Reformatting dates and IDs.
- Normalizing phone numbers.
- Anonymizing sensitive parts of logs.

Example: convert `YYYY-MM-DD` dates to `DD/MM/YYYY`.

In [7]:
import re

text = "Deadlines: 2025-11-18, 2024-01-05 and 2023-12-31."

date_pattern = re.compile(r"(\d{4})-(\d{2})-(\d{2})")

def reverse_date(match: re.Match) -> str:
    year, month, day = match.groups()
    return f"{day}/{month}/{year}"

converted = date_pattern.sub(reverse_date, text)
print("Original:", text)
print("Converted:", converted)

Original: Deadlines: 2025-11-18, 2024-01-05 and 2023-12-31.
Converted: Deadlines: 18/11/2025, 05/01/2024 and 31/12/2023.


### ✏ Exercise (easy): Use raw strings in patterns

You have a pattern that is currently written as a normal string with a lot of backslashes.  
Your tasks:

1. Rewrite the pattern using a **raw string**.
2. Confirm that both patterns produce the same result when used with `re.findall`.

We want to match Windows-style file paths like `C:\temp\data.txt` in a piece of text.

In [8]:
import re

text = "Paths: C:\\temp\\data.txt and D:\\logs\\2025\\errors.log"

# Old style: pattern with double escaping
old_pattern = "([A-Z]:\\\\[\\w\\.\\\\]+)"

# TODO:
# - Rewrite the pattern as a raw string in new_pattern (with fewer backslashes).
# - Use re.findall with both patterns.
# - Print the results and compare.

# new_pattern = r"..."

# old_matches = ...
# new_matches = ...

# print("Old:", old_matches)
# print("New:", new_matches)

In [9]:
# Solution

import re

text = "Paths: C:\\temp\\data.txt and D:\\logs\\2025\\errors.log"

old_pattern = "([A-Z]:\\\\[\\w\\.\\\\]+)"
new_pattern = r"([A-Z]:\\[\w.\\]+)"

old_matches = re.findall(old_pattern, text)
new_matches = re.findall(new_pattern, text)

print("Old:", old_matches)
print("New:", new_matches)

Old: ['C:\\temp\\data.txt', 'D:\\logs\\2025\\errors.log']
New: ['C:\\temp\\data.txt', 'D:\\logs\\2025\\errors.log']


### ⚡ Exercise (advanced): Mask email user names with lambda

You have texts that contain email addresses, and you want to **anonymize** them:

- `"alice@example.com"` becomes `"***@example.com"`.
- `"someone.sensitive@company.hu"` becomes `"***@company.hu"`.

Tasks:

1. Write a regex that finds email addresses (similar complexity as before).
2. Use `re.sub` with a lambda or named function that:
   - Keeps only the domain part.
   - Replaces the user part before `@` with `"***"`.
3. Print the anonymized text.

The exercise is about using `re.sub` with a processing function, not about perfect email regex.

In [10]:
import re

text = "Users: alice@example.com, bob.smith@company.hu and test+debug@my-app.org"

# TODO:
# - Build an email pattern with one group for user and one for domain.
# - Use re.sub with a function (or lambda) to transform "user@domain" into "***@domain".

# pattern = re.compile(...)
# def mask_email(match: re.Match) -> str:
#     user, domain = ...
#     return ...

# anonymized = ...
# print(anonymized)

In [11]:
# Solution

import re

text = "Users: alice@example.com, bob.smith@company.hu and test+debug@my-app.org"

pattern = re.compile(r"([\w.+-]+)@([\w.-]+)")
def mask_email(match: re.Match) -> str:
    user, domain = match.groups()
    return f"***@{domain}"

anonymized = pattern.sub(mask_email, text)
print(anonymized)

Users: ***@example.com, ***@company.hu and ***@my-app.org


---
# Short break (10:30 - 10:45)

Stand up, walk a bit, drink water. When you come back, we will move into decorators.
---

## 3. Decorators: functions, methods, and classes

Decorators are a powerful way to **wrap behavior** around functions and classes.

- A decorator is just a callable that takes a function (or class) and returns a new function (or class).
- Syntactic sugar: `@decorator` above a function is equivalent to `func = decorator(func)`.

#### Practical Examples and Use Cases

- **Logging**: Automatically log function calls and return values.
- **Access Control**: Restrict access to certain methods or functions.
- **Memoization**: Cache the results of expensive function calls.

Another common example is the `functools` module's `lru_cache` [decorator](https://docs.python.org/3/library/functools.html#functools.lru_cache):

In [16]:
import functools

# Define a function to compute a value (e.g., Fibonacci numbers)
@functools.lru_cache(maxsize=10)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

# Test the LRU cache
print(fib(35)) # Should print 9227465

# The LRU cache should have stored the most recent calls up to the max size specified.
print(fib.cache_info())  # Prints cache info such as hits, misses, maxsize, and current size


9227465
CacheInfo(hits=33, misses=36, maxsize=10, currsize=10)


### Built-in method decorators: @staticmethod and @classmethod

Inside classes, we commonly use:

- `@staticmethod`  
  - No `self` or `cls` parameter.  
  - Logically belongs to the class, but does not depend on instance or class state.
- `@classmethod`  
  - First argument is `cls` instead of `self`.  
  - Used to create **alternative constructors** or methods that work at the class level.

Docs:  
- https://docs.python.org/3/library/functions.html#staticmethod  
- https://docs.python.org/3/library/functions.html#classmethod

When to think of these:

- `@staticmethod`: pure helper function, but you want it grouped within a class for structure.
- `@classmethod`: when you return instances in different ways (e.g. from config, from file, from regex match).

Trivia: under the hood, `@staticmethod` and `@classmethod` are descriptor objects that modify how attributes are looked up and bound.

In [16]:
import datetime

class Invoice:
    def __init__(self, customer: str, amount: float, created_at: datetime.datetime | None = None):
        self.customer = customer
        self.amount = amount
        self.created_at = created_at or datetime.datetime.now(datetime.UTC)

    @staticmethod
    def format_currency(amount: float) -> str:
        return f"{amount:,.2f} HUF"

    @classmethod
    def from_text(cls, line: str) -> "Invoice":
        # Very simplified: "customer,amount"
        customer, raw_amount = line.split(",")
        amount = float(raw_amount)
        return cls(customer.strip(), amount)

line = "ACME Kft, 123456.78"
inv = Invoice.from_text(line)
print(inv.customer, Invoice.format_currency(inv.amount))

ACME Kft 123,456.78 HUF


### ✏ Exercise (easy): Alternate constructor with @classmethod

Imagine a `User` class where users are stored in `"name:age"` lines.

Tasks:

1. Define a `User` class with attributes `name` and `age`.
2. Implement a `@classmethod from_line(cls, line: str)` that:
   - Takes a line like `"Alice:30"`.
   - Creates and returns a `User` instance.
3. Create a few `User` objects from different lines and print their data.

You can follow the `Invoice.from_text` idea from the example.

In [18]:
class User:
    # TODO:
    # - Add __init__(self, name, age).
    # - Add @classmethod from_line that parses "name:age".
    # - Create a few users from lines and print them.
    pass

# lines = ["Alice:30", "Bob:25", "Charlie:40"]
# users = [User.from_line(line) for line in lines]
# for u in users:
#     print(u.name, u.age)

In [19]:
# Solution

class User:
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age

    @classmethod
    def from_line(cls, line: str) -> "User":
        name, raw_age = line.split(":")
        return cls(name.strip(), int(raw_age))

lines = ["Alice:30", "Bob:25", "Charlie:40"]
users = [User.from_line(line) for line in lines]
for u in users:
    print(u.name, u.age)

Alice 30
Bob 25
Charlie 40


### 3.1 Custom decorators and chaining

Custom function decorators are often used for:

- Logging and debugging.
- Access control and authorization.
- Caching (e.g. `functools.lru_cache`).
- Retrying operations, timing, input validation.

Basic pattern:

```python
import functools

def my_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # Do something before
        result = func(*args, **kwargs)
        # Do something after
        return result
    return wrapper
````

Decorator chaining:

```python
@decorator_a
@decorator_b
def my_func(...):
    ...
```

This is equivalent to `my_func = decorator_a(decorator_b(my_func))`.

In [23]:
import time
import functools

def timing(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        duration = (time.perf_counter() - start) * 1000
        print(f"{func.__name__} took {duration:.2f} ms")
        return result
    return wrapper

def debug_args(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print(f"Calling {func.__name__} with args={args}, kwargs={kwargs}")
        return func(*args, **kwargs)
    return wrapper

@timing
@debug_args
def slow_add(a, b):
    time.sleep(0.05)
    return a + b

print("Result:", slow_add(2, 3))

Calling slow_add with args=(2, 3), kwargs={}
slow_add took 50.58 ms
Result: 5


### ⚡ Exercise (advanced): Simple retry decorator

You want to call functions that might fail occasionally (for example, network calls).
Create a decorator `@retry(n)` that:

1. Takes a parameter `n` (number of attempts).
2. Retries the decorated function up to `n` times if it raises an exception.
3. If all attempts fail, re-raises the last exception.

Tasks:

* Implement `retry` as a decorator **factory**: `def retry(n): ...`.
* Use it on a function that randomly fails (for example using `random.random()`).
* Print how many attempts were needed when it succeeds.

Hint: the decorator factory returns the actual decorator.

In [None]:
import functools
import random

def retry(n: int):
    # TODO:
    # - Implement a decorator factory that retries n times on exception.
    # - Use functools.wraps on the inner wrapper.
    pass

@retry(3)
def sometimes_fails():
    if random.random() < 0.7:
        raise RuntimeError("Random failure")
    print("Success!")

for _ in range(5):
    try:
        sometimes_fails()
    except RuntimeError as e:
        print("Final failure:", e)
    print("-" * 20)

In [26]:
# Solution

import functools
import random

def retry(n: int):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            last_exc = None
            for attempt in range(1, n + 1):
                try:
                    return func(*args, **kwargs)
                except Exception as exc:
                    last_exc = exc
                    print(f"{func.__name__} failed on attempt {attempt}: {exc}")
            raise last_exc
        return wrapper
    return decorator

@retry(3)
def sometimes_fails():
    if random.random() < 0.7:
        raise RuntimeError("Random failure")
    print("Success!")

for _ in range(5):
    try:
        sometimes_fails()
    except RuntimeError as e:
        print("Final failure:", e)
    print("-" * 20)


sometimes_fails failed on attempt 1: Random failure
Success!
--------------------
Success!
--------------------
Success!
--------------------
Success!
--------------------
sometimes_fails failed on attempt 1: Random failure
sometimes_fails failed on attempt 2: Random failure
sometimes_fails failed on attempt 3: Random failure
Final failure: Random failure
--------------------


### 3.2 Class decorators

Decorators can also work on **classes**:

* A class decorator is a function that takes a class and returns a class.
* Typical uses:

  * Automatically registering classes in a central registry.
  * Modifying or injecting attributes or methods.
  * Validation of class definitions.

Example: automatic plugin registration based on a `@register` decorator.

Trivia: class decorators are often used as lightweight alternatives to metaclasses when you just want to tweak or wrap class behavior.

In [27]:
REGISTRY: dict[str, type] = {}

def register(name: str):
    def decorator(cls):
        REGISTRY[name] = cls
        return cls
    return decorator


@register("memory")
class InMemoryStorage:
    def save(self, key: str, value: str) -> None:
        print(f"[memory] saving {key}={value}")


@register("file")
class FileStorage:
    def save(self, key: str, value: str) -> None:
        print(f"[file] saving {key}={value}")


print("Registry:", REGISTRY)

storage_type = "memory"
StorageCls = REGISTRY[storage_type]
storage = StorageCls()
storage.save("user:1", "Alice")

Registry: {'memory': <class '__main__.InMemoryStorage'>, 'file': <class '__main__.FileStorage'>}
[memory] saving user:1=Alice


### ✏ Exercise (easy): Add a debug attribute via class decorator

Create a class decorator `@mark_debug` that:

* Adds an attribute `debug = True` to the decorated class.

Tasks:

1. Implement `mark_debug`.
2. Decorate a simple class `Service`.
3. Instantiate `Service` and print the value of `service.debug`.

This is a small example of how class decorators can inject behavior.

In [None]:
def mark_debug(cls):
    # TODO:
    # - Add attribute debug = True to cls.
    # - Return cls.
    pass

@mark_debug
class Service:
    def run(self):
        print("Running service")


s = Service()
print("debug attribute:", s.debug)

In [28]:
# Solution

def mark_debug(cls):
    cls.debug = True
    return cls


@mark_debug
class Service:
    def run(self):
        print("Running service")


s = Service()
print("debug attribute:", s.debug)


debug attribute: True


## 4. Multiple inheritance, MRO, diamond problem, and mixins

Python supports **multiple inheritance**:

```python
class C(A, B):
    ...
```

When you access a method, Python follows the **Method Resolution Order (MRO)**.
You can inspect it via:

* `C.mro()` or
* `C.__mro__`



### Diamond problem and cooperative `super()`

The classic "diamond" layout:

```python
class A: ...
class B(A): ...
class C(A): ...
class D(B, C): ...
```

Which `A` method does `D` get?
Python uses the **C3 linearization** algorithm to compute a consistent MRO.

**C3 idea:**
C3 builds the MRO by *merging*:

* the class itself
* the MROs of its base classes
* the list of base classes

...while:

* preserving each class's local parent order (left-to-right),
* making sure a class appears **after** all its parents,
* and keeping the order **monotonic** (subclasses don't reorder their parents' MROs).


```mermaid
graph TB
    A["A"] --> B["B"]
    A["A"] --> C["C"]
    B["B"] --> D["D"]
    C["C"] --> D["D"]
```

#### What actually happens in Python (step by step)

For:

```python
class A:
    def who(self):
        print("A")

class B(A):
    def who(self):
        print("B")
        super().who()

class C(A):
    def who(self):
        print("C")
        super().who()

class D(B, C):
    pass

print(D.mro())
d = D()
d.who()
```

1. **At class creation**, Python computes `D`'s MRO using C3, resulting in:

   ```python
   [D, B, C, A, object]
   ```
2. When you call `d.who()`, Python looks up `who` following `D`’s MRO:

   * finds it first in `B`, so it runs `B.who`.
3. Inside `B.who`, `super().who()` means: "call the next `who` in `D`'s MRO after `B`" -> that is `C.who`.
4. `C.who` runs, prints `"C"`, then its `super().who()` calls the next class after `C` in the MRO -> `A.who`.
5. `A.who` prints `"A"`, and the chain is done.

So the call chain is:

```text
D.who -> B.who -> C.who -> A.who
```

...and no class is visited twice, which is exactly what C3 + cooperative `super()` are designed to guarantee.


Good practice: in multiple inheritance hierarchies, use **cooperative `super()`**:

* Every class in the hierarchy calls `super().method(...)` instead of hardcoding parent names.
* This allows Python's MRO to "thread" the calls through the whole diamond.

Docs: [https://docs.python.org/3/tutorial/classes.html#multiple-inheritance](https://docs.python.org/3/tutorial/classes.html#multiple-inheritance)

Trivia: many languages either forbid or strongly discourage multiple implementation inheritance. Python allows it, but idiomatic usage is typically via **mixins**.

In [29]:
class A:
    def who(self):
        print("A")


class B(A):
    def who(self):
        print("B")
        super().who()


class C(A):
    def who(self):
        print("C")
        super().who()


class D(B, C):
    pass


print("MRO for D:", D.mro())

d = D()
d.who()


MRO for D: [<class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <class 'object'>]
B
C
A


### ✏ Exercise (easy): Inspect MRO in your own diamond

Create four classes in a diamond shape:

* `Base`
* `Left(Base)`
* `Right(Base)`
* `Child(Left, Right)`

Tasks:

1. Give each class a method `identify` that prints its own name and then calls `super().identify()` (except `Base`, which just prints).
2. Create an instance of `Child` and call `identify`.
3. Print `Child.mro()` and observe the order.

The goal is to see how cooperative `super()` follows the MRO.

In [None]:
# TODO:
# - Implement Base, Left, Right, Child with an identify method.
# - Use super() consistently.
# - Print Child.mro() and call identify on a Child instance.

class Base:
    # TODO: ...
    pass

class Left(Base):
    # TODO: ...
    pass


class Right(Base):
    # TODO: ...
    pass

class Child(Left, Right):
    pass

print(Child.mro())

c = Child()
c.identify()


In [30]:
# Solution

class Base:
    def identify(self):
        print("Base")


class Left(Base):
    def identify(self):
        print("Left")
        super().identify()


class Right(Base):
    def identify(self):
        print("Right")
        super().identify()


class Child(Left, Right):
    pass


print(Child.mro())

c = Child()
c.identify()


[<class '__main__.Child'>, <class '__main__.Left'>, <class '__main__.Right'>, <class '__main__.Base'>, <class 'object'>]
Left
Right
Base


---

# Lunch break (12:00 - 13:00)

Time to rest your brain and eat something. After lunch we will dive into multiple inheritance and mixins.
---

## 4.1 Mixins and mixin factories

A **mixin** is a class that provides a reusable **piece of behavior**, not a complete type on its own.

Characteristics:

* Usually does not define its own meaningful `__init__`.
* Often has names like `LoggingMixin`, `TimestampMixin`, `ReprMixin`.
* Intended to be combined with a "real" base class via multiple inheritance.

Example real-world uses:

* `JsonSerializableMixin` that adds `to_json()` / `from_json()`.
* `CreatedUpdatedMixin` that stores timestamps on save.
* `PermissionsMixin` in Django's auth system.

Mixins vs normal parents:

* Normal parent: defines "what the object is" (e.g. `User`, `Invoice`, `Repository`).
* Mixin: defines "what the object can do additionally" (e.g. "log actions", "serialize to JSON").

### Mixin factories

Sometimes you want to **parameterize** a mixin, for example by:

* Logger name
* Date format
* Target output

A mixin factory is just a function that returns a mixin class with some configuration baked in.

In [32]:
import json
import datetime

class JsonSerializableMixin:
    def to_json(self) -> str:
        return json.dumps(self.__dict__, default=str)


class TimestampMixin:
    def touch(self):
        self.updated_at = datetime.datetime.now(datetime.UTC)


def make_tagged_mixin(tag: str):
    class TaggedMixin:
        def get_tag(self) -> str:
            return tag
    return TaggedMixin


TaggedMixin = make_tagged_mixin("important")


class Task(JsonSerializableMixin, TimestampMixin, TaggedMixin):
    def __init__(self, title: str):
        self.title = title
        self.created_at = datetime.datetime.now(datetime.UTC)
        self.updated_at = self.created_at


task = Task("Write advanced Python notebook")
task.touch()

print(task.to_json())
print("Tag:", task.get_tag())


{"title": "Write advanced Python notebook", "created_at": "2025-11-18 20:43:25.803207+00:00", "updated_at": "2025-11-18 20:43:25.803952+00:00"}
Tag: important


### ⚡ Exercise (advanced): Logging mixin with mixin factory

You want to add simple logging behavior to different classes, with a configurable prefix.

Tasks:

1. Write a mixin factory `make_logging_mixin(prefix: str)` that returns a `LoggingMixin` class:

   * The mixin should have a `log(self, message: str)` method.
   * It should print something like `"[PREFIX] ClassName: message"`.
2. Use the factory to create two mixins with different prefixes.
3. Combine these mixins with simple classes and test `log`.

This exercise shows how mixins can be configured without metaclasses.

In [None]:
# TODO:
# - Implement make_logging_mixin(prefix).
# - Create two mixins, e.g. InfoLoggingMixin and ErrorLoggingMixin.
# - Combine them with "real" classes and call log on instances.

def make_logging_mixin(prefix: str):
    # ...


# ...

class InfoService(InfoLoggingMixin):
    pass


class ErrorService(ErrorLoggingMixin):
    pass


s1 = InfoService()
s1.log("starting up")

s2 = ErrorService()
s2.log("something went wrong")


In [33]:
# Solution

def make_logging_mixin(prefix: str):
    class LoggingMixin:
        def log(self, message: str) -> None:
            cls_name = self.__class__.__name__
            print(f"[{prefix}] {cls_name}: {message}")
    return LoggingMixin


InfoLoggingMixin = make_logging_mixin("INFO")
ErrorLoggingMixin = make_logging_mixin("ERROR")


class InfoService(InfoLoggingMixin):
    pass


class ErrorService(ErrorLoggingMixin):
    pass


s1 = InfoService()
s1.log("starting up")

s2 = ErrorService()
s2.log("something went wrong")


[INFO] InfoService: starting up
[ERROR] ErrorService: something went wrong


### 4.2 Combining decorators and multiple inheritance

In real projects, you often **combine**:

* Mixins that provide reusable behavior.
* Decorators that wrap specific methods or classes.

For example:

* A `LoggingMixin` that centralizes logging behavior.
* A `@timed` decorator that measures runtime of selected methods.
* A class decorator that automatically registers mixin-based services.

The key is to keep each piece **orthogonal**:

* Mixins: reusable behavior at class level.
* Decorators: reusable behavior at call level.

In later parts of the course (and in the final example today) we will use both together.

### Tutorial on Python Iterators

Python iterators are a fundamental concept for handling sequences of data efficiently. They are widely used due to their memory efficiency and clear, readable syntax. This tutorial will cover:

1. **What are Iterators?**
2. **The Iterator Protocol**
3. **Built-in Iterators and Iterable Objects**
4. **Custom Iterators**
5. **Generator Functions and Expressions**
6. **Performance Characteristics**
7. **Comparisons with Other Constructs**

#### 1. What are Iterators?

An iterator is an object that contains a countable number of values and can be iterated upon, meaning you can traverse through all the values. In Python, an iterator implements two essential methods: `__iter__()` and `__next__()`.

**Example:**

In [38]:
my_list = [1, 2, 3]
iterator = iter(my_list)  # Creates an iterator from the list

print(next(iterator))  # Output: 1
print(next(iterator))  # Output: 2
print(next(iterator))  # Output: 3
print(next(iterator))  # Output: StopIteration exception: the iterator has been exhausted

1
2
3


StopIteration: 

#### 2. The Iterator Protocol

The Iterator Protocol is a fancy-sounding term that means "the way Python's for loops work". The iterator protocol consists of two methods:

- `__iter__()`: This method returns the iterator object itself and is called once.
- `__next__()`: This method returns the next value and raises a `StopIteration` exception when no more values are available.

**Example of a simple iterator class:**

In [39]:
class MyIterator:
    def __init__(self, data):
        self.data = data
        self.index = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.index < len(self.data):
            result = self.data[self.index]
            self.index += 1
            return result
        else:
            raise StopIteration

my_iter = MyIterator([1, 2, 3])
for item in my_iter:
    print(item)

1
2
3


### ⚡ Exercise (Advanced): Fractional Range Iterator

#### Objective:
Build an advanced iterator class that generates fractional numbers within a given range. This exercise will help you practice creating custom iterators.

#### Exercise Description:
You are to create a custom iterator class `FractionalRange` that behaves similarly to Python's built-in `range()` [function](https://docs.python.org/3/library/functions.html#func-range) but supports fractional values. This class should accept three parameters: `start`, `stop`, and `step`. It should generate numbers starting from `start` up to but not including `stop`, with increments of `step`.

1. **Initialization (`__init__`)**:
   - `start`: The starting value of the sequence.
   - `stop`: The end value of the sequence (not inclusive).
   - `step`: The increment value for each step.

2. **Iteration Methods**:
   - `__iter__`: Should return the iterator object itself.
   - `__next__`: Should return the next value in the sequence and raise `StopIteration` when the end of the range is reached.

3. **Edge Cases**:
   - Handle cases where `step` is zero (which should raise a `ValueError`).
   - Ensure that if `start` is equal to `stop`, the iterator doesn't generate any values.

#### Requirements:
- Implement the `FractionalRange` class based on the above specifications.
- Test the code using the presented test-cases to verify that the iterator works correctly with integer and fractional steps.
- Make sure to handle potential edge cases and errors.

In [None]:

# Step 1: implement the iterator class
class FractionalRange:
    # ...

# Test code:
print("Fractional range with integer step:")
for num in FractionalRange(0, 5, 1):
    print(num)

# Expected output:
# Fractional range with integer step:
# 0
# 1
# 2
# 3
# 4

print("\nFractional range with fractional step:")
for num in FractionalRange(0.5, 2.0, 0.5):
    print(num)

# Expected output:
# Fractional range with fractional step:
# 0.5
# 1.0
# 1.5

print("\nFractional range with negative step:")
for num in FractionalRange(5, 0, -1):
    print(num)

# Expected output:
# Fractional range with negative step:
# 5
# 4
# 3
# 2
# 1

print("\nEdge case with zero step:")
try:
    for num in FractionalRange(0, 5, 0):
        print(num)
except ValueError as e:
    print(e)

# Expected output:
# Edge case with zero step:
# Step cannot be zero.

In [6]:
# Solution

# Step 1: implement the iterator class
class FractionalRange:
    def __init__(self, start, stop, step):
        if step == 0:
            raise ValueError("Step cannot be zero.")
        self.start = start
        self.stop = stop
        self.step = step
        self.current = start

    def __iter__(self):
        return self

    def __next__(self):
        if (self.step > 0 and self.current >= self.stop) or (self.step < 0 and self.current <= self.stop):
            raise StopIteration
        result = self.current
        self.current += self.step
        return result

# Test code:
print("Fractional range with integer step:")
for num in FractionalRange(0, 5, 1):
    print(num)

print("\nFractional range with fractional step:")
for num in FractionalRange(0.5, 2.0, 0.5):
    print(num)

print("\nFractional range with negative step:")
for num in FractionalRange(5, 0, -1):
    print(num)

print("\nEdge case with zero step:")
try:
    for num in FractionalRange(0, 5, 0):
        print(num)
except ValueError as e:
    print(e)

Fractional range with integer step:
0
1
2
3
4

Fractional range with fractional step:
0.5
1.0
1.5

Fractional range with negative step:
5
4
3
2
1

Edge case with zero step:
Step cannot be zero.


#### 3. Built-in Iterators and Iterable Objects

Python provides several built-in iterable objects such as lists, tuples, and dictionaries. The `iter()` function can be used to obtain an iterator from these objects.

**Example:**

In [7]:
my_list = [1, 2, 3]
my_iter = iter(my_list)

while True:
    try:
        print(next(my_iter))
    except StopIteration:
        break

1
2
3


Iterating over dictionaries can be done through keys, values, or key-value pairs:

In [9]:
my_dict = {'a': 1, 'b': 2}

# Iterating over keys
for key in my_dict:
    print(key)

# Iterating over values
for value in my_dict.values():
    print(value)

# Iterating over key-value pairs
for item in my_dict.items():
    print(item)

a
b
1
2
('a', 1)
('b', 2)


Files in Python are also iterable. You can loop through lines in a file:

In [11]:
# Step 1: Create and write to 'myfile.txt'
with open('myfile.txt', 'w') as f:
    f.write("Hello, World!\n")
    f.write("This is a test file.\n")
    f.write("Each line will be printed separately.\n")

# Step 2: Read from 'myfile.txt' and print each line
with open('myfile.txt', 'r') as f:
    for idx, line in enumerate(f):
        print(idx, "\t", line.strip())

0 	 Hello, World!
1 	 This is a test file.
2 	 Each line will be printed separately.


#### 4. Custom Iterators

You can create your own iterators by implementing the iterator protocol.

**Example of a Fibonacci sequence iterator:**

In [44]:
class Fibonacci:
    def __init__(self, max_value):
        self.max_value = max_value
        self.a, self.b = 0, 1

    def __iter__(self):
        return self

    def __next__(self):
        if self.a > self.max_value:
            raise StopIteration
        else:
            self.a, self.b = self.b, self.a + self.b
            return self.a

# Using the custom iterable
for num in Fibonacci(100):
    print(num)

1
1
2
3
5
8
13
21
34
55
89
144


#### 5. Generator Functions and Expressions

Generators are a simple way to create iterators using functions and the `yield` statement.

**Example of a generator function:**

In [48]:
def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1

for idx, number in enumerate(infinite_sequence()):
    print(number)
    if idx == 10:
        break

0
1
2
3
4
5
6
7
8
9
10


### ⚡ Exercise: Create a Generator Function

#### Objective:

Define a generator function named `fibonacci_sequence` that yields an infinite sequence of Fibonacci numbers. The Fibonacci sequence starts with 0 and 1, and each subsequent number is the sum of the previous two numbers. For example: 0, 1, 1, 2, 3, 5, 8, 13, ...

In [14]:
# Implement the `fibonacci_sequence` generator function:
def fibonacci_sequence():
    # ...

for idx, number in enumerate(fibonacci_sequence()):
    print(number)
    if idx == 9:
        break

# Expected output:
# 0
# 1
# 1
# 2
# 3
# 5
# 8
# 13
# 21
# 34

0
1
1
2
3
5
8
13
21
34


In [15]:
# Solution

def fibonacci_sequence():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

for idx, number in enumerate(fibonacci_sequence()):
    print(number)
    if idx == 9:
        break

0
1
1
2
3
5
8
13
21
34


##### Generator expressions provide a concise way to create generators.

**Example of a generator expression:**

In [18]:
gen_exp = (x * x for x in range(5))
for num in gen_exp:
    print(num)

0
1
4
9
16


### ⚡ Exercise:

#### Objective:
Write a generator expression that yields tuples containing each number from 1 to 10 and its square. Use this generator expression to print the tuples.

In [None]:
# Complete the following code:
number_and_squares = # ...

for number, square in number_and_squares:
    print(f'Number: {number}, Square: {square}')

# Expected output:
# Number: 1, Square: 1
# Number: 2, Square: 4
# Number: 3, Square: 9
# Number: 4, Square: 16
# Number: 5, Square: 25
# Number: 6, Square: 36
# Number: 7, Square: 49
# Number: 8, Square: 64
# Number: 9, Square: 81
# Number: 10, Square: 100

In [13]:
# Solution

number_and_squares = ((x, x**2) for x in range(1, 11))

for number, square in number_and_squares:
    print(f'Number: {number}, Square: {square}')


Number: 1, Square: 1
Number: 2, Square: 4
Number: 3, Square: 9
Number: 4, Square: 16
Number: 5, Square: 25
Number: 6, Square: 36
Number: 7, Square: 49
Number: 8, Square: 64
Number: 9, Square: 81
Number: 10, Square: 100


#### 6. Performance Characteristics

Iterators, especially generators, are memory efficient because they yield items one at a time rather than generating the entire list at once. This can be contrasted with list comprehensions, which generate the entire list in memory.

**Memory usage comparison:**

In [19]:
import sys

# List comprehension
list_comp = [x * x for x in range(1000000)]
print("Size of list comprehension", sys.getsizeof(list_comp))

# Generator expression
gen_exp = (x * x for x in range(1000000))
print("Size of generator expression", sys.getsizeof(gen_exp))

Size of list comprehension 8448728
Size of generator expression 200


In the above example, the list comprehension creates a list of a million items in memory, whereas the generator expression yields one item at a time.

In [30]:
import time

def measure_performance(func):
    def wrapper(*args, **kwargs):
        t1 = time.perf_counter(), time.process_time()
        result = func(*args, **kwargs)
        t2 = time.perf_counter(), time.process_time()
        print(f"{func.__name__}()")
        print(f" Real time: {t2[0] - t1[0]:.2f} seconds")
        print(f" CPU time: {t2[1] - t1[1]:.2f} seconds")
        print()
        return result
    return wrapper

@measure_performance
def sum_list_comp(n=10000000):
    list_comp = [x * x for x in range(n)]
    return sum(list_comp)

@measure_performance
def sum_gen_exp(n=10000000):
    gen_exp = (x * x for x in range(n))
    return sum(gen_exp)

# Test the functions
n = 10000000
_ = sum_list_comp(n)
_ = sum_gen_exp(n)

sum_list_comp()
 Real time: 2.33 seconds
 CPU time: 2.30 seconds

sum_gen_exp()
 Real time: 2.13 seconds
 CPU time: 2.05 seconds



In this example:

 - `sum_list_comp` takes more real time because of the overhead of generating and storing the entire list in memory.
 - `sum_gen_exp` is faster in terms of real time because it processes items one at a time, thus requiring less memory and potentially less time overall.

### Side Note: The Efficiency of `range`

In Python, the `range` object is a highly efficient iterable that generates numbers on-the-fly without storing them in memory. This efficiency extends to operations like membership testing with the `in` keyword.

Consider the expression:

In [25]:
1_000_000_000_000_000_000_000 in range(0, 1_000_000_000_000_000_000_001, 10)

True

Here, `1_000_000_000_000_000_000_000` is being checked for membership within a `range` that starts at 0 and ends at `1_000_000_000_000_000_000_001` with a step of 10. Despite the enormous size of the `range`, the membership test is instantaneous.

This is because `range` objects are not actually storing all possible values; instead, they compute the membership check using a formula. The `__contains__` method can efficiently determine whether a value is part of the sequence by checking if it fits the arithmetic progression defined by the `range` parameters. This method is implemented to provide constant time complexity for membership tests (`O(1)`), thanks to the way `range` calculates whether a number falls within its bounds.


#### 7. Comparisons with Other Constructs

**Iterators vs. Lists:**
- **Memory Usage:** Iterators are more memory efficient.
- **Performance:** Iterators can be slower for small data sizes due to the overhead of the `__next__` method calls, but for large datasets, they are more efficient.

**Iterators vs. List Comprehensions:**
- **Syntax:** List comprehensions are often more readable for simple use cases.
- **Use Cases:** Use iterators for large data processing where memory efficiency is crucial.

---

# Short break (14:45 - 15:00)

Last stretch coming up: iterators, generators, and itertools.
---

### 1. The `map` Function
The `map` function applies a given function to all items in an input list (or any iterable) and returns an iterator.

**Syntax**:
```python
map(function, iterable, ...)
```

**Example**:

In [31]:
def square(x):
    return x * x

numbers = [1, 2, 3, 4, 5]
squared_numbers = map(square, numbers)
print(list(squared_numbers))  # Output: [1, 4, 9, 16, 25]

[1, 4, 9, 16, 25]


**Using lambda**:

In [32]:
numbers = [1, 2, 3, 4, 5]
squared_numbers = map(lambda x: x * x, numbers)
print(list(squared_numbers))  # Output: [1, 4, 9, 16, 25]

[1, 4, 9, 16, 25]


**Practical examples**:

In [33]:
celsius = [0, 20, 30, 40]
fahrenheit = list(map(lambda x: (x * 9/5) + 32, celsius))
print("Converting Temperatures from Celsius to Fahrenheit", fahrenheit)

strings = ['HELLO', 'World', 'Python']
lowercase_strings = list(map(str.lower, strings))
print("Converting Strings to Lowercase", lowercase_strings)

list1 = [1, 2, 3]
list2 = [4, 5, 6]
summed_lists = list(map(lambda x, y: x + y, list1, list2))
print("Adding Two Lists Element-wise", summed_lists)

tuples = [(1, 'a'), (2, 'b'), (3, 'c')]
first_elements = list(map(lambda x: x[0], tuples))
print("Extracting First Elements from List of Tuples", first_elements)

import datetime
dates = ['2021-01-01', '2022-02-02', '2023-03-03']
formatted_dates = list(map(lambda date: datetime.datetime.strptime(date, '%Y-%m-%d').strftime('%d/%m/%Y'), dates))
print("Formatting Dates", formatted_dates)

first_names = ['John', 'Jane', 'Doe']
last_names = ['Smith', 'Doe', 'Ray']
full_names = list(map(lambda fn, ln: f"{fn} {ln}", first_names, last_names))
print("Combining First Names and Last Names", full_names)

Converting Temperatures from Celsius to Fahrenheit [32.0, 68.0, 86.0, 104.0]
Converting Strings to Lowercase ['hello', 'world', 'python']
Adding Two Lists Element-wise [5, 7, 9]
Extracting First Elements from List of Tuples [1, 2, 3]
Formatting Dates ['01/01/2021', '02/02/2022', '03/03/2023']
Combining First Names and Last Names ['John Smith', 'Jane Doe', 'Doe Ray']


### 2. The `reduce` Function
The `reduce` function from the `functools` module applies a rolling computation to sequential pairs of values in a list.

**Syntax**:
```python
from functools import reduce
reduce(function, iterable[, initializer])
```

**Example**:

In [34]:
from functools import reduce

def add(x, y):
    return x + y

numbers = [1, 2, 3, 4, 5]
total = reduce(add, numbers)
print(total)  # Output: 15

15


**Using lambda**:

In [35]:
numbers = [1, 2, 3, 4, 5]
total = reduce(lambda x, y: x + y, numbers)
print(total)  # Output: 15

15


### 3. The `filter` Function
The `filter` function constructs an iterator from elements of an iterable for which a function returns true.

**Syntax**:
```python
filter(function, iterable)
```

**Example**:

In [36]:
def is_even(x):
    return x % 2 == 0

numbers = [1, 2, 3, 4, 5, 6]
even_numbers = filter(is_even, numbers)
print(list(even_numbers))  # Output: [2, 4, 6]

[2, 4, 6]


**Using lambda**:

In [37]:
numbers = [1, 2, 3, 4, 5, 6]
even_numbers = filter(lambda x: x % 2 == 0, numbers)
print(list(even_numbers))  # Output: [2, 4, 6]

[2, 4, 6]


### 4. The `itertools` Module
The `itertools` module implements a number of iterator building blocks inspired by constructs from APL, Haskell, and SML. This module standardizes a core set of fast, memory-efficient tools that are useful by themselves or in combination.

The itertools module provides various functions that return iterators. Examples include:

 - `count(start=0, step=1)`: Make an iterator that returns evenly spaced values beginning with start.
 - `cycle(iterable)`: Cycles through an iterable indefinitely.
 - `repeat(object, times=None)`: Repeats a value indefinitely.

**Example Using itertools**

In [38]:
import itertools

for num in itertools.count(start=5, step=2):
    if num > 15:
        break
    print(num)

5
7
9
11
13
15



#### 4.1. `itertools.chain`
Combines several iterables into one long iterable.

**Example**:

In [39]:
import itertools

list1 = [1, 2, 3]
list2 = [4, 5, 6]
combined = itertools.chain(list1, list2)
print(list(combined))  # Output: [1, 2, 3, 4, 5, 6]

[1, 2, 3, 4, 5, 6]


#### 4.2. `itertools.cycle`
Repeats an iterable indefinitely.

**Example**:

In [40]:
import itertools

cycle_iter = itertools.cycle([1, 2, 3])
for i in range(10):
    print(next(cycle_iter), end=" ")  # Output: 1 2 3 1 2 3 1 2 3 1

1 2 3 1 2 3 1 2 3 1 

#### 4.3. `itertools.count`
Returns evenly spaced values starting with a specified number.

**Example**:

In [41]:
import itertools

counter = itertools.count(start=10, step=2)
for i in range(5):
    print(next(counter), end=" ")  # Output: 10 12 14 16 18

10 12 14 16 18 

#### 4.4. `itertools.groupby`
Groups consecutive elements in an iterable that have the same key.

**Example**:

In [42]:
import itertools

data = [('a', 1), ('a', 2), ('b', 3), ('b', 4), ('a', 5)]
grouped_data = itertools.groupby(data, key=lambda x: x[0])

for key, group in grouped_data:
    print(key, list(group))
# Output:
# a [('a', 1), ('a', 2)]
# b [('b', 3), ('b', 4)]
# a [('a', 5)]

a [('a', 1), ('a', 2)]
b [('b', 3), ('b', 4)]
a [('a', 5)]


To group all occurrences of an item, sort the iterable first.

In [43]:
import itertools

data = [('a', 1), ('a', 2), ('b', 3), ('b', 4), ('a', 5)]
data.sort(key=lambda x: x[0])
grouped_data = itertools.groupby(data, key=lambda x: x[0])

for key, group in grouped_data:
    print(key, list(group))
# Output:
# a [('a', 1), ('a', 2)]
# b [('b', 3), ('b', 4)]
# a [('a', 5)]

a [('a', 1), ('a', 2), ('a', 5)]
b [('b', 3), ('b', 4)]


### ⚡ Exercises
1. **Exercise 1**: Use `map` to convert a list of strings to uppercase.
2. **Exercise 2**: Use `reduce` to find the product of a list of numbers.
3. **Exercise 3**: Use `filter` to find all numbers greater than 10 in a list.
4. **Exercise 4**: Use `itertools.chain` to merge three different lists.
5. **Exercise 5**: Use `itertools.groupby` to group a list of tuples by the first element.

### ⚡ Exercise 1: Use `map` to Convert a List of Strings to Uppercase
**Objective:** Transform all the strings in a list to uppercase using the `map` function.

In [44]:
strings = ["hello", "world", "python", "is", "awesome"]

# Expected output: ["HELLO", "WORLD", "PYTHON", "IS", "AWESOME"]

In [45]:
# Solution

strings = ["hello", "world", "python", "is", "awesome"]

uppercase_strings = list(map(str.upper, strings))

print(uppercase_strings)

['HELLO', 'WORLD', 'PYTHON', 'IS', 'AWESOME']


### ⚡ Exercise 2: Use `reduce` to Find the Product of a List of Numbers
**Objective:** Calculate the product of all numbers in a list using the `reduce` function from the `functools` module.

In [46]:
numbers = [1, 2, 3, 4, 5]

# Expected output: 120

In [47]:
# Solution

from functools import reduce

numbers = [1, 2, 3, 4, 5]
product = reduce(lambda x, y: x * y, numbers)
print(product)

120


### ⚡ Exercise 3: Use `filter` to Find All Numbers Greater Than 10 in a List
**Objective:** Filter out all numbers greater than 10 from a list using the `filter` function.

In [48]:
numbers = [4, 11, 8, 15, 3, 22, 7]

# Expected output: [11, 15, 22]

In [49]:
# Solution

numbers = [4, 11, 8, 15, 3, 22, 7]
filtered_numbers = list(filter(lambda x: x > 10, numbers))
print(filtered_numbers)

[11, 15, 22]


### ⚡ Exercise 4: Use `itertools.chain` to Merge Three Different Lists
**Objective:** Combine three lists into one using the `chain` function from the `itertools` module.

In [50]:
list1 = [1, 2, 3]
list2 = ["a", "b", "c"]
list3 = [True, False, True]

# Expected output: [1, 2, 3, 'a', 'b', 'c', True, False, True]

In [51]:
# Solution

from itertools import chain

list1 = [1, 2, 3]
list2 = ["a", "b", "c"]
list3 = [True, False, True]

merged_list = list(chain(list1, list2, list3))
print(merged_list)

[1, 2, 3, 'a', 'b', 'c', True, False, True]


### ⚡ Exercise 5: Use `itertools.groupby` to Group a List of Tuples by the First Element
**Objective:** Group a list of tuples by their first element using the `groupby` function from the `itertools` module.

In [52]:
tuples = [(1, 'a'), (2, 'b'), (1, 'c'), (2, 'd'), (1, 'e')]

# Expected output:
# 1: [(1, 'a'), (1, 'c'), (1, 'e')]
# 2: [(2, 'b'), (2, 'd')]

In [53]:
# Solution

from itertools import groupby

tuples = [(1, 'a'), (2, 'b'), (1, 'c'), (2, 'd'), (1, 'e')]
sorted_tuples = sorted(tuples, key=lambda x: x[0])
grouped_tuples = {key: list(group) for key, group in groupby(sorted_tuples, key=lambda x: x[0])}

for key, group in grouped_tuples.items():
    print(f"{key}: {group}")

1: [(1, 'a'), (1, 'c'), (1, 'e')]
2: [(2, 'b'), (2, 'd')]


### Summary
Understanding and utilizing these advanced techniques—`map`, `reduce`, `filter`, and `itertools`—allows for more expressive and efficient Python code, particularly in data processing and functional programming scenarios. Experiment with these tools to become proficient and leverage their full potential in your projects.

### Sources

 - [itertools — Functions creating iterators for efficient looping](https://docs.python.org/3/library/itertools.html)
 - [Python built-in range function documentation](https://docs.python.org/3/library/functions.html#func-range)

### ⚡ Advanced Exercise: Process a List of Dictionaries to Extract and Transform Data

**Objective:** Combine the use of `map`, `filter`, `reduce`, and `itertools` to process a list of dictionaries. The task is to extract specific data, transform it, filter the results, and finally compute a summary statistic.

**Instructions:**
1. You have a list of dictionaries representing sales data. Each dictionary contains the following keys: `'item'`, `'price'`, and `'quantity'`.
2. Extract the total revenue (price * quantity) for each sale.
3. Filter out sales where the total revenue is less than a given threshold.
4. Calculate the total revenue of the filtered sales.
5. Calculate the total quantity sold for each filtered item.

In [54]:
sales_data = [
    {"item": "apple", "price": 1.0, "quantity": 10},
    {"item": "banana", "price": 0.5, "quantity": 20},
    {"item": "orange", "price": 1.2, "quantity": 5},
    {"item": "apple", "price": 1.0, "quantity": 15},
    {"item": "banana", "price": 0.5, "quantity": 5},
    {"item": "orange", "price": 1.2, "quantity": 10}
]

threshold = 10  # Minimum revenue threshold


# Expected output:
# Total revenue of filtered sales: 47.0
# Quantity sold per item:
# apple: 25
# banana: 20
# orange: 10

In [55]:
# Solution

from functools import reduce
from itertools import groupby

# Step 1: Define the sales data and threshold
sales_data = [
    {"item": "apple", "price": 1.0, "quantity": 10},
    {"item": "banana", "price": 0.5, "quantity": 20},
    {"item": "orange", "price": 1.2, "quantity": 5},
    {"item": "apple", "price": 1.0, "quantity": 15},
    {"item": "banana", "price": 0.5, "quantity": 5},
    {"item": "orange", "price": 1.2, "quantity": 10}
]

threshold = 10  # Minimum revenue threshold

# Step 2: Extract total revenue for each sale
total_revenues = map(lambda sale: {**sale, "total_revenue": sale["price"] * sale["quantity"]}, sales_data)

# Step 3: Filter sales by total revenue threshold
filtered_sales = filter(lambda sale: sale["total_revenue"] >= threshold, total_revenues)

# Convert the iterator to an iterable so that we can iterate over it multiple times without exhausting it
filtered_sales = list(filtered_sales)

# Step 4: Calculate the total revenue of the filtered sales
total_revenue = reduce(lambda acc, sale: acc + sale["total_revenue"], filtered_sales, 0)

# Step 5: Group the sales by item and calculate the total quantity sold for each item
# First, sort the filtered sales by item
filtered_sales.sort(key=lambda sale: sale["item"])
# Then, group by item and calculate total quantities
grouped_by = groupby(filtered_sales, key=lambda sale: sale["item"])
grouped_sales = {key: sum(sale["quantity"] for sale in group) for key, group in grouped_by}

# Output the results
print(f"Total revenue of filtered sales: {total_revenue}")
print("Quantity sold per item:")
for item, total_quantity in grouped_sales.items():
    print(f"{item}: {total_quantity}")


Total revenue of filtered sales: 47.0
Quantity sold per item:
apple: 25
banana: 20
orange: 10


## 6. Complex example: Log processing pipeline with regex, decorators, mixins, and generators

In this final example we combine many topics from today:

* **Regex** with raw strings and groups.
* **Decorators** for timing.
* **Mixins** to add JSON serialization.
* **Iterators / generators** to build a streaming pipeline.
* **itertools** to group and summarize.

### Scenario

We have a list of simplified log lines:

* Format: `"LEVEL YYYY-MM-DD message..."`, for example:
  `"INFO 2025-11-18 User logged in"`

We want to:

1. Use a compiled regex to parse each line into `(level, date, message)`.
2. Represent parsed lines as objects of a class `LogRecord` that:

   * Uses a `JsonSerializableMixin` for `to_json`.
3. Use a generator that:

   * Takes raw lines.
   * Yields `LogRecord` objects for lines that match.
4. Use a timing decorator on the pipeline function.
5. Use `itertools.groupby` to summarize how many records there are per level (`INFO`, `WARNING`, `ERROR`).

This is a toy example, but it reflects real patterns in log processing, ETL, and monitoring systems.

### 🧪 Complex Exercise: Implement the log processing pipeline

Tasks:

1. Implement a `JsonSerializableMixin` with a `to_json(self)` method.
2. Define a `LogRecord` class that:

   * Inherits from `JsonSerializableMixin`.
   * Has attributes: `level`, `date`, `message`.
3. Compile a regex that parses `"LEVEL YYYY-MM-DD message"`.
4. Implement a generator function `iter_log_records(lines)` that:

   * Takes an iterable of raw lines.
   * Uses the regex to parse matching lines.
   * Yields `LogRecord` instances.
5. Implement a `@timed` decorator (similar to the earlier `timing` decorator) that measures how long summarizing takes.
6. Implement a function `summarize_by_level(lines)` that:

   * Uses `iter_log_records` to get records.
   * Sorts them by `level`.
   * Uses `itertools.groupby` to count how many records are in each level.
   * Returns a dict like `{"INFO": 10, "ERROR": 3}`.

The starter code below gives you a scaffold. Fill in the `TODO` parts using concepts from earlier sections.

In [None]:
import re
import json
import time
import functools
from itertools import groupby

# TODO 1: JsonSerializableMixin

class JsonSerializableMixin:
    # Implement to_json using json.dumps on self.__dict__.
    pass


# TODO 2: LogRecord class

class LogRecord(JsonSerializableMixin):
    # __init__ should take level, date, message and store them.
    pass


# TODO 3: compiled regex for "LEVEL YYYY-MM-DD message"

# Hint: one word for level, date as 4-2-2 digits, and the rest as message.

log_pattern = re.compile(r"...")

# TODO 4: generator that yields LogRecord objects

def iter_log_records(lines):
    # For each line, try to match.
    # If it matches, create LogRecord and yield it.
    pass


# TODO 5: timed decorator

def timed(func):
    # Wrap func, measure execution time, and print it.
    pass


# TODO 6: summarize_by_level

@timed
def summarize_by_level(lines):
    # Use iter_log_records(lines) to get records.
    # Sort them by level.
    # Use groupby to count per level.
    # Return a dict level -> count.
    pass


# Example raw lines to test with

raw_lines = [
    "INFO 2025-11-18 User logged in",
    "ERROR 2025-11-18 Database connection failed",
    "WARNING 2025-11-18 Low disk space",
    "INFO 2025-11-18 User requested report",
    "DEBUG this line does not match and should be ignored",
    "ERROR 2025-11-19 Timeout while calling external API",
]

# When you are done, run:

# summary = summarize_by_level(raw_lines)
# print(summary)


In [41]:
# Solution

import re
import json
import time
import functools
from itertools import groupby


class JsonSerializableMixin:
    def to_json(self) -> str:
        return json.dumps(self.__dict__)


class LogRecord(JsonSerializableMixin):
    def __init__(self, level: str, date: str, message: str):
        self.level = level
        self.date = date
        self.message = message


log_pattern = re.compile(r"(\w+) (\d{4}-\d{2}-\d{2}) (.+)")


def iter_log_records(lines):
    for line in lines:
        m = log_pattern.match(line)
        if m:
            level, date, message = m.groups()
            yield LogRecord(level, date, message)


def timed(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        duration = (time.perf_counter() - start) * 1000
        print(f"{func.__name__} took {duration:.2f} ms")
        return result
    return wrapper


@timed
def summarize_by_level(lines):
    records = list(iter_log_records(lines))
    records.sort(key=lambda r: r.level)

    summary: dict[str, int] = {}
    for level, group in groupby(records, key=lambda r: r.level):
        count = sum(1 for _ in group)
        summary[level] = count
    return summary


raw_lines = [
    "INFO 2025-11-18 User logged in",
    "ERROR 2025-11-18 Database connection failed",
    "WARNING 2025-11-18 Low disk space",
    "INFO 2025-11-18 User requested report",
    "DEBUG this line does not match and should be ignored",
    "ERROR 2025-11-19 Timeout while calling external API",
]

summary = summarize_by_level(raw_lines)
print(summary)

print("First record as JSON:", next(iter_log_records(raw_lines)).to_json())


summarize_by_level took 0.03 ms
First record as JSON: {"level": "INFO", "date": "2025-11-18", "message": "User logged in"}


## Day 1 summary

Today you used JupyterLab for advanced topics and learned:

* How to work efficiently with **Jupyter notebooks** as an advanced user:
  * Refresh of cell types, execution, and when to move code from notebooks into modules.

* How to use **regular expressions** with Python's `re` module:
  * Common functions: `search`, `match`, `fullmatch`, `findall`, `finditer`, `sub`, `split`, `compile`.
  * Why **raw strings** (`r"..."`) are essential for clean regex patterns.
  * How to use **lambda/function replacements** in `re.sub` for complex transformations (e.g. date reformatting, email anonymization).

* How to write and use **decorators**:
  * `@staticmethod` and `@classmethod` for method organization and alternative constructors.
  * Custom decorators and decorator factories (like a `retry(n)` decorator).
  * Decorator chaining (stacking multiple decorators).
  * Class decorators for registration and attribute injection (e.g. plugin registries, `@mark_debug`).

* How **multiple inheritance** works in Python:
  * Method Resolution Order (MRO) and how to inspect it with `C.mro()` / `__mro__`.
  * The **diamond problem** and how Python’s C3 MRO resolves it.
  * Why **cooperative `super()`** calls are essential in multiple inheritance hierarchies.

* How to design and use **mixins**:
  * Separation between core types (“what the object is”) and mixin behaviors (“what it can also do”).
  * Practical patterns like JSON serialization, timestamps, logging, tagging.
  * **Mixin factories** for parameterized behavior (e.g. logging/tag prefixes).

* How Python's **iterator protocol** works:
  * `__iter__`, `__next__`, and `StopIteration`.
  * Implementing custom iterators like `FractionalRange` and a `Fibonacci` iterator.
  * Using built-in iterables (lists, dicts, files) through the iterator protocol.

* How to build **generator functions** and **generator expressions**:
  * Infinite and finite generators (e.g. `infinite_sequence`, `fibonacci_sequence`).
  * Generator expressions for concise, lazy pipelines.
  * Performance benefits of lazy evaluation vs. materializing large lists.

* How to leverage **functional tools**:
  * `map`, `filter`, `reduce` for transformations and aggregations.
  * `itertools` building blocks: `chain`, `count`, `cycle`, `repeat`, `groupby`.
  * Applying them to realistic data, e.g. processing sales data with map/filter/reduce/groupby.

* How to reason about **performance characteristics**:
  * Memory usage comparison of list comprehensions vs generator expressions.
  * Timing list-based vs generator-based summation with a measurement decorator.
  * Why `range` and its membership tests are efficient (`in` on huge ranges).

* How to combine these ideas into a **log processing pipeline** that:
  * Uses regex to parse log lines into structured data.
  * Wraps logic with a timing decorator.
  * Models records with a mixin-based `LogRecord` (`JsonSerializableMixin`).
  * Streams data with a generator over log lines.
  * Summarizes log levels with `itertools.groupby`.

In the next days of the advanced course, we will build on these tools for data handling, concurrency, and deep learning–related workflows.
