# TP02: Clean Coding & Modular

## Exercise 1: Apply PEP8 and Docstrings
**Objective**: Practice clean code formatting and documentation using PEP8 and docstrings.

**Instruction**: Refactor the code to follow PEP8 (indentation, spacing, variable naming) and add docstrings to explain each function.

In [40]:
import pandas as pd
import logging

class CSVReader:
    """
    A class to handle reading and previewing CSV files.
    """

    def __init__(self, file_path: str, age: int):
        """
        Initialize the CSVReader with a file path.

        Args:
            file_path (str): The path to the CSV file.
        """
        self.file_path = file_path
        self.age = age
        self.df = None

    def read(self) -> pd.DataFrame:
        """
        Load the CSV file into a pandas DataFrame.

        Returns:
            pd.DataFrame: The loaded DataFrame.
        """
        self.df = pd.read_csv(self.file_path)
        return self.df
    def get_age(self) -> int:
        """
        Return the age attribute.

        Returns:
            int: The age value.
        """
        return self.age

    def preview(self, n: int = 5) -> pd.DataFrame:
        """
        Return the top n rows of the DataFrame.

        Args:
            n (int): Number of rows to preview. Default is 5.

        Returns:
            pd.DataFrame: The first n rows of the DataFrame.
        """
        if self.df is None:
            self.read()
        return self.df.head(n)

## Exercise 2: Add Configuration and Constants
**Objective**: Use a configuration module to separate constant values (like file paths or thresholds) from main code.

**Instruction**:
1. Create a new file named `config.py`.
2. Move constant values (like CSV path or threshold age) to that file.
3. Import and use them in your functions.

In [88]:
%%writefile config.py
# config.py

# Configuration constants
CSV_FILE_PATH = "/Users/macbookair/Documents/Data Science 5th Year/Advanced Programing for DS/Pratice/sample_data.csv"
AGE_THRESHOLD = 30

Writing config.py


In [89]:
from preprocessing_package.config import CSV_FILE_PATH, AGE_THRESHOLD

In [90]:
print (f"Reading from: {CSV_FILE_PATH}")
reader = CSVReader(CSV_FILE_PATH, AGE_THRESHOLD)
df = reader.read()
print(reader.preview())

Reading from: /Users/macbookair/Documents/Data Science 5th Year/Advanced Programing for DS/Pratice/sample_data.csv
   id     name   age  height_cm  weight_kg         city  score
0   1    Alice  29.0      165.0       68.0     New York   85.0
1   2      Bob   NaN      172.0        NaN  Los Angeles   90.0
2   3  Charlie  35.0      168.0       72.0      Chicago    NaN
3   4    David   NaN        NaN       80.0      Houston   75.0
4   5      Eva  27.0      160.0       55.0     New York   88.0


In [92]:
filter_age = df[df['age'] < AGE_THRESHOLD]
print(filter_age)

   id   name   age  height_cm  weight_kg      city  score
0   1  Alice  29.0      165.0       68.0  New York   85.0
4   5    Eva  27.0      160.0       55.0  New York   88.0


## Exercise 3: Add Logging and Exception Handling
**Objective**: Implement logging and try/except blocks instead of print statements.

**Instruction**:
1. Import and configure the logging module.
2. Replace `print()` with appropriate logging levels.
3. Handle `FileNotFoundError` using a try/except block.

In [None]:
import logging

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

class SafeCSVReader(CSVReader):
    """
    A version of CSVReader with logging and error handling.
    """
    def read(self) -> pd.DataFrame:
        try:
            logging.info(f"Attempting to read file: {self.file_path}")
            self.df = pd.read_csv(self.file_path)
            logging.info("File read successfully.")
            return self.df
        except FileNotFoundError:
            logging.error(f"Error: The file {self.file_path} was not found.")
            raise
        except Exception as e:
            logging.error(f"An unexpected error occurred: {e}")
            raise

# Test
try:
    safe_reader = SafeCSVReader(preprocessing_package.config.CSV_FILE_PATH)
    safe_reader.read()
except Exception:
    pass

2025-11-20 07:51:26,463 - INFO - Attempting to read file: /Users/macbookair/Documents/Data Science 5th Year/Advanced Programing for DS/Pratice/sample_data.csv
2025-11-20 07:51:26,466 - INFO - File read successfully.
2025-11-20 07:51:26,466 - INFO - File read successfully.


## Exercise 4: Modularize the Project
**Objective**: Split code into multiple modules inside a package for clean structure.

**Instruction**:
1. Create the following structure:
   ```
   preprocessing_package/
   ├── __init__.py
   ├── data_loader.py
   ├── data_cleaner.py
   └── config.py
   ```
2. Move each function into its proper file.
3. Import and use them in main.py (or this notebook).

In [10]:
import os
os.makedirs("preprocessing_package", exist_ok=True)

In [11]:
%%writefile preprocessing_package/__init__.py
# Init file for preprocessing_package

Overwriting preprocessing_package/__init__.py


In [12]:
%%writefile preprocessing_package/config.py
# preprocessing_package/config.py

CSV_FILE_PATH = "/Users/macbookair/Documents/Data Science 5th Year/Advanced Programing for DS/Pratice/sample_data.csv"
AGE_THRESHOLD = 20

Overwriting preprocessing_package/config.py


In [13]:
%%writefile preprocessing_package/data_loader.py
import pandas as pd
import logging

class CSVReader:
    """
    A class to handle reading and previewing CSV files.
    """
    def __init__(self, file_path: str):
        self.file_path = file_path
        self.df = None

    def read(self) -> pd.DataFrame:
        try:
            logging.info(f"Loading data from {self.file_path}")
            self.df = pd.read_csv(self.file_path)
            return self.df
        except FileNotFoundError:
            logging.error(f"File not found: {self.file_path}")
            raise

Overwriting preprocessing_package/data_loader.py


In [14]:
%%writefile preprocessing_package/data_cleaner.py
import pandas as pd
from abc import ABC, abstractmethod

class MissingValueStrategy(ABC):
    @abstractmethod
    def handle(self, df: pd.DataFrame) -> pd.DataFrame:
        pass

class DropMissing(MissingValueStrategy):
    def handle(self, df: pd.DataFrame) -> pd.DataFrame:
        return df.dropna()

class DataCleaner:
    def __init__(self, strategy: MissingValueStrategy):
        self.strategy = strategy

    def clean(self, df: pd.DataFrame) -> pd.DataFrame:
        return self.strategy.handle(df)

Overwriting preprocessing_package/data_cleaner.py


## Exercise 5: Package Setup & Code Quality Check
**Objective**: Convert module into a Python package and check code style with flake8 or black.

**Instruction**:
1. Create a `setup.py` file.
2. Install locally using `pip install -e .`.
3. Run `black .` and `flake8 .`.

In [15]:
%%writefile setup.py
from setuptools import setup, find_packages

setup(
    name="preprocessing_package",
    version="0.1",
    packages=find_packages(),
    install_requires=["pandas"],
    description="Simple preprocessing package for data science",
    author="Your Name",
)

Overwriting setup.py


In [16]:
# Install the package in editable mode
!pip install -e .

# Run code quality checks (if tools are installed)
# !black .
# !flake8 .

Obtaining file:///Users/macbookair/Documents/Data%20Science%205th%20Year/Advanced%20Programing%20for%20DS
  Installing build dependencies ... [?25l-done
[?25h  Checking if build backend supports build_editable ... [?25done
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25hdone
[?25h  Getting requirements to build editable ... [?25l  Getting requirements to build editable ... [?25l-done
[?25h  Preparing editable metadata (pyproject.toml) ... [?25done
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
[?25hdone
Building wheels for collected packages: preprocessing_package
Building wheels for collected packages: preprocessing_package
  Building editable for preprocessing_package (pyproject.toml) ... [?25l-done
[?25h  Created wheel for preprocessing_package: filename=preprocessing_package-0.1-0.editable-py3-none-any.whl size=2930 sha256=35f474bba715b90b51e0b8b7e04d5843bc1b70ddc2d39fe4656129356719a8d6
  Stored in directory: /private/v