# Type Hints & Annotations 

<p>Python is dynamically typed, but with type hints (introduced in Python 3.5+), you can suggest types without enforcing them at runtime. Type hints are a feature in Python that allow developers to annotate their code with expected types for variables and function arguments. This helps to improve code readability and provides an opportunity to catch errors before runtime using type checkers like mypy.</p>




### Why Use It?
-> Cleaner Code:
Improves readability and maintainability by making the code more explicit and structured.

-> Easier Debugging:
Type-related issues can be caught early, reducing runtime errors and speeding up troubleshooting.

-> Better Tooling Support:
Enables powerful features like auto-completion, intelligent refactoring, and inline documentation in editors like VSCode and PyCharm.

-> Scalability:
Strong typing becomes essential as codebases grow — it makes collaboration easier, onboarding faster, and bugs less frequent.

In [18]:
def factorial(i: int) -> int:
    if not isinstance(i, int):  
        return None
    if i < 0:
        return None
    if i == 0:
        return 1
    return i * factorial(i - 1)

print(factorial(5.01))

None


In [25]:
# Static Type Checking with mypy
!pip install mypy





[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Why Use the typing Module?
-> Static Type Checking:
Enables tools like mypy to catch type-related bugs before runtime, making development safer.

-> Improved Readability & Maintainability:
Type hints make it clear what each function expects and returns — no more guessing what a variable holds.

-> Early Bug Detection:
While not enforced at runtime, type annotations help catch issues early during development, reducing debugging time.

In [7]:
#List
from typing import List

numbers: List[int] = [1, 2, 3]
names: List[str] = ["A", "B"]

In [14]:
# Dict
from typing import Dict

person: Dict[str, int] = {"age": 20, "height": 155}

In [16]:
# Union
from typing import Union

value: Union[int, str] = 40
value = "forty-one"  # Also valid

In [12]:
# Optional
from typing import Optional

nickname: Optional[str] = None
nickname = "Rose"  # Also valid

In [13]:
# Tuple
from typing import Tuple

user_info: Tuple[str, int] = ("Rose", 40)


In [17]:
from typing import List, Optional

# Function to greet a person
def greet(name: str) -> str:
    return "Hello, " + name

# Function to calculate the average of a list of numbers
def average(numbers: List[int]) -> float:
    return sum(numbers) / len(numbers)

# Function that returns an email if provided, else a default one
def get_email(username: Optional[str]) -> str:
    if username is None or username.strip() == "":
        return "guest@exa"

## Dataclass


<p>A dataclass  is a class designed primarily to hold data, minimizing the boilerplate code typically required for such classes.  A dataclass is a decorator (@dataclass) that automatically adds special methods to your class like: </p>

What @dataclass Automatically Generates 
-> __init__
Automatically creates the constructor so you don’t have to write it manually.

-> __repr__
Provides a human-readable string representation of the object — helpful for debugging and logging.

-> __eq__
Enables comparison between instances using == based on field values.

-> __hash__ (optional)
Generated only if the dataclass is immutable (frozen=True). Allows use in sets and as dictionary keys.

-> __lt__, __gt__, etc. (optional)
Comparison methods like less-than or greater-than are included if order=True is set.

## When to Use It? (referring to dataclasses)
-> Modeling Data Containers:
Ideal for representing structured data like User, Product, or Book — without boilerplate code.

-> Need for Immutability, Equality, or Auto-Generated __init__:
Use features like frozen=True for immutability, and get __init__, __repr__, and __eq__ auto-generated for free.

-> Building POPOs (Plain Old Python Objects):
@dataclass offers a clean, lightweight, and Pythonic way to build simple classes quickly.

In [5]:
# Example of dataclass
from dataclasses import dataclass
from typing import Optional

# Define the dataclass
@dataclass
class Student:
    name: str
    age: int
    grade: float
    email: Optional[str] = None  # Optional field with default = None

# Create student objects
student1 = Student("Roseeyy", 20, 75.5)
student2 = Student("Mina", 16, 99.0, "rose09@example.com")

# Accessing fields
print("Student 1 Name:", student1.name)
print("Student 2 Email:", student2.email)

# Auto-generated __repr__
print(student1)

# Auto-generated __eq__
print("Are they same?", student1 == student2)

Student 1 Name: Roseeyy
Student 2 Email: rose09@example.com
Student(name='Roseeyy', age=20, grade=75.5, email=None)
Are they same? False


#  Testing and Debugging
# Unittest


<p>Unit Testing is the first level of software testing where the smallest testable parts of software are tested.  This is used to validate that each software unit performs as designed. The unittest module is a built-in Python library for unit testing, inspired by JUnit. It provides a framework for writing and running tests to ensure the correctness and reliability of Python code.</p>

## Key Features:
#### -> Test Discovery: It can find and run all test methods starting with test_.
#### -> Test Fixtures:
#### -> setUp() – runs before each test.
#### -> tearDown() – runs after each test.
#### -> Command-line execution: python -m unittest test_file.py

In [32]:
import unittest

def add(a, b):
    return a + b

class TestMath(unittest.TestCase):

    def test_add(self):
        self.assertEqual(add(2, 3), 5)
        self.assertNotEqual(add(2, 2), 5)

if __name__ == "__main__":
    unittest.TextTestRunner().run(unittest.TestLoader().loadTestsFromTestCase(TestMath))

.
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


## Pytest

### What it is:

<p>Pytest is a framework based on Python. It is mainly used to write API test cases. It helps you write better programs.</p>


- A powerful, **modern testing tool**.
- Way more flexible and concise than unittest.
- No need to wrap tests in classes.
- Requires installation: `pip install pytest`.
- You write **plain functions**, no need for classes unless you want them.

### Key Features:

- **Fixtures** using `@pytest.fixture` for setup/teardown logic.
- **Parameterization** with `@pytest.mark.parametrize` to test multiple inputs.
- **Detailed failure introspection** – Shows what failed and why, without needing custom messages.
- **Plugins** galore (e.g. `pytest-django`, `pytest-cov`, `pytest-mock`).

Way more flexible and concise than unittest.
No need to wrap tests in classes.

In [3]:
pip install pytest


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [1]:
# Write to a .py file
with open("test.py", "w") as f:
    f.write('''
def add(a, b):
    return a + b

def test_add():
    assert add(1, 2) == 3
    assert add(-1, 1) == 0
''')
!pytest test.py

platform win32 -- Python 3.12.2, pytest-8.4.1, pluggy-1.6.0
rootdir: C:\Users\bhagi\Documents\Scrapping
plugins: anyio-4.8.0
collected 1 item

test.py [32m.[0m[32m                                                                [100%][0m



## Mocking
Mocking is a technique used in unit testing to isolate the code being tested from its external dependencies. Mocking is **faking** real objects or functions in your tests so you don’t rely on external systems (like APIs, databases, files, etc.) and **only test your logic**.

### Why Use Mocking?

→ Isolate the unit under test

→ Avoid side effects (network calls, DB writes)

→ Control return values and behavior

→ Speed up tests

→ Simulate edge cases easily

In [4]:
from unittest.mock import Mock

# Fake API function
def get_weather():
    return "sunny"

# Test function using mock
weather = Mock(return_value="rainy")
print(weather())  # Output: rainy


rainy


## Assesrt statement
The `assert` statement in Python is a debugging aid used to check if a given condition is true. It is a built-in construct that helps in verifying assumptions about the state of the program during development.

**How it works:**

If condition is:

- `True` → nothing happens, code continues
- `False` → throws `AssertionError` (with optional message)

**Purpose and Use Cases:**

- **Debugging:**
    
    Assertions are primarily used for debugging. They help identify logical errors or incorrect assumptions in the code early in the development cycle.
    
- **Sanity Checks:**
    
    They act as sanity checks, ensuring that certain conditions that are expected to always be true (unless there's a bug) are indeed met.
    
- **Pre-conditions and Post-conditions:**
    
    Assertions can be used to check pre-conditions (inputs to a function) and post-conditions (outputs of a function) to ensure correct behavior.

## **Logging (Built-in `logging` module)**

**Purpose:**

Instead of printing stuff with `print()`, use `logging` to track events, errors, and statuses in a clean and professional way.

### **Why Logging > Print**

- `print()` is for devs hacking things together.
- `logging` is for real devs writing maintainable code.

### Common Use Cases:

- Track app flow
- Debug unexpected behavior in production
- Record errors without crashing

In [6]:
import logging

logging.basicConfig(level=logging.INFO)

x = 42
logging.info(f"The value of x is {x}")


INFO:root:The value of x is 42


## **Debugger (`pdb` — Python Debugger)**

**Purpose:**

Step through your code *line by line* while it runs. Catch bugs like a surgeon — not a caveman with `print()`.

## Logging vs Debugging

| Feature | Logging | Debugging (`pdb`) |
| --- | --- | --- |
| When to use | During normal execution / prod | During development / bug fixing |
| Output | Terminal/file logs | Interactive terminal session |
| Performance | Minimal impact if managed well | Stops execution temporarily |
| Use case | Monitoring | Inspecting live code |

In [None]:
import pdb

a = 10
b = 5

pdb.set_trace()  # ← This line pauses execution. You can inspect vars.

c = a + b
print(c)


--Return--
None
> [32mc:\users\bhagi\appdata\local\temp\ipykernel_5336\1108430576.py[39m([92m6[39m)[36m<module>[39m[34m()[39m



ipdb>  3


3


ipdb>  3


3


ipdb>  4


4


ipdb>  2


2


ipdb>  2


2


ipdb>  4


4


ipdb>  6


6


ipdb>  7


7


ipdb>  10


10


ipdb>  5


5


ipdb>  10


10


ipdb>  5


5


ipdb>  v


*** NameError: name 'v' is not defined


In [5]:
x = 5
y = 2 + 3

assert x == y, "x and y are not equal"  # No error, both are 5

# assert x == 10, "x is not 10"  ← this would crash with AssertionError


# Regular Expressions


<p>Regular expressions (regex) are powerful tools for pattern matching and text manipulation. They provide a concise way to search, match, and replace text based on patterns.</p>

### Key Concepts

- **Pattern**: A sequence of characters that defines a search pattern
- **Match**: When text conforms to a pattern
- **Metacharacters**: Special characters with special meanings (`.`, , `+`, `?`, etc.)
- **Literal characters**: Characters that match themselves

In [None]:
Example :  Regular expression for an email address :
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$ 

## Pattern Matching, search, replace

## **Pattern Matching:**

<p>Pattern matching involves identifying specific sequences of characters or data structures that conform to a defined pattern. This pattern can be a simple literal string or a complex regular expression that describes a set of possible matches. Many programming languages and tools offer built-in functionalities for pattern matching, allowing for efficient identification of desired elements within larger bodies of text or data.</p>

### Use Cases:

- **Input validation**
    - Validate email addresses, phone numbers, ZIP codes, etc.
    - `r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"`
- **Data extraction**
    - Extract dates, hashtags, mentions, or URLs from a text.
    - Get all `@usernames` from a tweet.
- **Log parsing**
    - Match error messages, timestamps, or IP addresses in server logs.
    - Extract `[ERROR]` entries from thousands of log lines.
- **Tokenization in NLP**
    - Split sentences into tokens (words/punctuation) using regex patterns.
- **Syntax checking**
    - Quickly detect if strings follow a defined syntax (e.g., file paths or identifiers)

In [11]:
#Example
import re
phone = "9841-234-567"
pattern = r"^\d{4}-\d{3}-\d{3}$"

if re.match(pattern, phone):
    print("Valid")
else:
    print("Invalid")

Valid


## **Search:**

Searching, in this context, refers to the process of locating occurrences of a specified pattern within a larger text or data set. This can involve finding the first instance, all instances, or instances that meet certain criteria (e.g., case-sensitive or case-insensitive). The result of a search operation typically indicates whether a match was found and, if so, its location(s) within the searched content.

### Use Cases:

- **Keyword search**
    - Find if a word like "urgent" or "error" exists in an email or file.
- **Log analysis**
    - Search logs for specific codes, like `404` or `503`.
- **File scanning**
    - Search for occurrences of specific API keys or secrets in files.
- **Web scraping**
    - Search for specific tags or class names in HTML content.
- **Case-insensitive matching**
    - Detect terms like "admin" regardless of uppercase/lowercase: `re.search("admin", text, re.IGNORECASE)`
- **Search with conditions**
    - E.g., Find all words that start with a capital letter followed by digits.

In [12]:
#Example
text = "Your OTP is 456789. Do not share it."
pattern = r"\d{6}"

match = re.search(pattern, text)
if match:
    print("OTP found:", match.group())   #456789
else:
    print("No OTP found.")


OTP found: 456789


## re.findall(pattern, string):
<p>This function finds all non-overlapping occurrences of the pattern in the string and returns them as a list of strings. If the pattern contains capturing groups, it returns a list of tuples, where each tuple contains the strings matched by the groups.
</p>

In [15]:
text = "apple banana apple orange"
matches = re.findall(r"apple", text)
print(matches) # Output: ['apple', 'apple']

text_with_numbers = "Price: $30, Quantity: 5, Discount: $5"
numbers = re.findall(r"\d+", text_with_numbers)
print(numbers) # Output: ['30', '5', '5']

['apple', 'apple']
['30', '5', '5']


## Use Case:
<p>Use re.search() when you need to find the first occurrence of a pattern and potentially extract specific parts of that match using groups.</p><p> Use re.findall() when you need to retrieve all instances of a pattern within a string.</p>

## **Replace:**

Replacement involves substituting all or specific occurrences of a matched pattern with a new string or value. This operation is often performed in conjunction with searching, where identified patterns are then modified. Regular expressions enhance replacement capabilities by allowing for dynamic replacements using captured groups, where parts of the original matched pattern can be reused or reordered in the replacement string.

### Use Cases:

- **Censoring sensitive content**
    - Replace curse words or personal data with placeholders.
    - Replace `\d{16}` (credit card numbers) with `XXXX-XXXX-XXXX-XXXX`
- **Format conversion**
    - Convert date formats: from `DD-MM-YYYY` → `YYYY/MM/DD`
- **HTML/Markdown cleanup**
    - Replace `<br>` tags with `\n`, or `*bold**` with `<strong>bold</strong>`
- **Refactoring code/text**
    - Replace deprecated function names or variables in multiple files.
- **Dynamic template generation**
    - Use regex to swap out `{{placeholders}}` with actual values.
- **Reordering strings using groups**
    - Swap "Last, First" to "First Last":
        
        `r"(\w+), (\w+)" → r"\2 \1"`

In [17]:
#re.sub(pattern, placement, string)
text = "Today is 07-08-2025"
new_text = re.sub(r"(\d{2})-(\d{2})-(\d{4})", r"\3/\2/\1", text)
print(new_text)  # Today is 2025/08/07

Today is 2025/08/07


## Groups and capturing

## Groups let you:

- Extract specific parts of a match.
- Apply quantifiers to a **portion** of a pattern.
- Capture sub-patterns for later use.

Capturing groups are parts of a regular expression enclosed in parentheses `()`. They allow you to extract specific **substrings** from the matched text.

### **Purpose of Groups and Capturing:**

- To extract portions of matched patterns.
- To reuse matched content in substitutions.
- To apply quantifiers to part of a pattern.
- To make code readable and structured

In [16]:
pattern = r"(ab)+"
text = "ababab"
match = re.match(pattern, text)
print(match.group())  # Output: ababab


ababab


In [18]:
pattern = r"(\d{3})-(\d{3})-(\d{4})( ext(\d+))?"
text1 = "123-456-7890"
text2 = "123-456-7890 ext1234"

match1 = re.match(pattern, text1)
match2 = re.match(pattern, text2)

print(match1.groups())  # ('123', '456', '7890', None, None)
print(match2.groups())  # ('123', '456', '7890', ' ext1234', '1234')


('123', '456', '7890', None, None)
('123', '456', '7890', ' ext1234', '1234')



### Capturing Groups

These are **numbered automatically** from **left to right**, starting from 1.

Group 0 is always the **entire match**.

In [19]:
#Example for capturing group
text = "Name: Rose, Age: 20"
pattern = r"Name: (\w+), Age: (\d+)"

match = re.search(pattern, text)

if match:
    print(match.group(1))  # Rose
    print(match.group(2))  # 20


Rose
20


In [20]:
#Example for oprional capturing group
text1 = "Call me at 9841-123-456"
text2 = "Call me at 9841-123-456 ext 789"

pattern = r"(\d{4}-\d{3}-\d{3})(?: ext (\d+))?"

for text in [text1, text2]:
    match = re.search(pattern, text)
    print(match.groups())


('9841-123-456', None)
('9841-123-456', '789')


In [23]:
text = "key:value:Rose"
pattern = r"(\w+):(\w+):(\w+)"  # Three capturing groups
match = re.search(pattern, text)

if match:
    print(match.group(1))  # key
    print(match.group(2))  # value
    print(match.group(3)) # Rose

key
value
Rose


### You have the string "first-last".

### The pattern (\w+)-(\w+) captures:

Group 1: "first"

Group 2: "last"

### The replacement \2 \1 means:

Put group 2 ("last") first,

Add a space,

Then group 1 ("first").

re.sub() swaps the words and gives you "last first".

In [24]:
text = "first-last"
pattern = r"(\w+)-(\w+)"
replacement = r"\2 \1"  # Swap first and last

new = re.sub(pattern, replacement, text)
print(new)  # Output: last first

last first


## Non capturing group
A non-capturing group allows for grouping parts of a regular expression pattern without storing the matched text in a separate group within the match object. This is useful when you need to apply quantifiers or alternation to a group of characters or expressions but do not need to extract that specific part of the match.

In [27]:
# non capturing group
pattern = r"(?:Mr|Ms)\. (\w+)"
text = "Ms. Rose"
match = re.search(pattern, text)

print(match.group(1))  # Rose

Rose


## Named Group
A named group is a subset of dimension values. They are created by limiting a dimension to specific data values and can be used in calculations. The group's matching result can later be identified by this name instead of by its index in the pattern.


In [29]:
# named groups
pattern = r"(?P<first>\w+)-(?P<last>\w+)"
text = "Rose-Khatiwada"
match = re.search(pattern, text)

print(match.group("first"))  # Rose
print(match.group("last"))   # Khatiwada


Rose
Khatiwada
