# PyDataUtils Notebook

This notebook demonstrates simple, beginner-friendly utilities for:

- Cleaning product names (strings)
- Working with Python lists
- Computing basic statistics using both **pure Python** and **NumPy**
- Using a small **text-based menu** to interact with the utilities

The underlying reusable code lives in the `src/` package so that it can be imported from other notebooks or scripts.

## 1. Setup

In this section, we:

- Adjust `sys.path` so we can import from `src/` when running the notebook directly.
- Import NumPy and pandas for basic data science work.
- Import our utility functions and menu system.

In [None]:
import os
import sys

# Ensure the project root (one level up from this notebook) is on sys.path
NOTEBOOK_DIR = os.path.dirname(os.path.abspath("PyDataUtils.ipynb"))
PROJECT_ROOT = os.path.abspath(os.path.join(NOTEBOOK_DIR, os.pardir))
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)

import numpy as np
import pandas as pd

from src.utilities import (
    clean_product_name,
    clean_product_names,
    flatten_list,
    unique_preserve_order,
    python_stats,
    numpy_stats,
    compare_python_numpy_stats,
    format_stats,
    format_comparison,
)

from src.menu import run_menu

## 2. Loading Sample Data

We will use the small CSV file under `data/sample_products.csv` as an example dataset. It contains:

- A `product_id` column
- A `product_name` column with messy names (extra spaces, mixed case, special characters)
- A `price` column with simple numeric values


In [None]:
data_path = os.path.join(PROJECT_ROOT, 'data', 'sample_products.csv')
df = pd.read_csv(data_path)
df.head()

## 3. String Utilities: Cleaning Product Names

First, let's look at the `clean_product_name` function. It:

1. Strips leading/trailing whitespace.
2. Converts to lowercase.
3. Replaces any non-alphanumeric characters with a single space.
4. Collapses multiple spaces into a single space.

This is a very common kind of text normalization step when working with product data or user input.

In [None]:
example_names = [
    '  Super-Deluxe Toaster!!! ',
    'BASIC kettle (white)',
    '   Mega_MIXER 3000   ',
    'coffee-maker#1',
]

for name in example_names:
    print(f'Original: {name!r}')
    print(f'Cleaned : {clean_product_name(name)!r}')
    print('-' * 40)

### Cleaning a Column of Product Names

Now let's apply the cleaning function to the entire `product_name` column in our DataFrame. We will create a new column called `clean_name`.


In [None]:
df['clean_name'] = clean_product_names(df['product_name'])
df[['product_name', 'clean_name']].head(10)

## 4. List Utilities

Next, we briefly demonstrate two simple list utilities:

- `flatten_list`: flattens a list of lists into a single list.
- `unique_preserve_order`: returns unique elements while keeping the original order.


In [None]:
nested = [[1, 2], [3, 4, 4], [5]]
flat = flatten_list(nested)
print('Nested:', nested)
print('Flat  :', flat)

items = ['toaster', 'kettle', 'toaster', 'mixer', 'kettle']
unique_items = unique_preserve_order(items)
print('Original:', items)
print('Unique  :', unique_items)

## 5. Statistics: Pure Python vs NumPy

The `src/utilities.py` module contains two implementations for basic statistics:

- `python_stats`: uses only core Python (loops and built-ins).
- `numpy_stats`: uses NumPy arrays and vectorized methods.

Both functions compute:

- Count
- Sum (total)
- Mean
- Minimum
- Maximum

They both return a `StatsResult` dataclass instance.

In [None]:
values = df['price'].tolist()
len(values), values[:5]

In [None]:
py_stats = python_stats(values)
np_stats = numpy_stats(values)

print('Pure Python stats:')
print(format_stats(py_stats))
print()
print('NumPy stats:')
print(format_stats(np_stats))

### Comparing Implementations

The helper function `compare_python_numpy_stats` computes both sets of statistics, checks if they match (within a tiny numerical tolerance), and measures execution time for each implementation.

In [None]:
comparison = compare_python_numpy_stats(values)
print(format_comparison(comparison))

## 6. Interactive Menu

Finally, we can use a simple text-based menu from `src/menu.py` to interact with the utilities.

This is optional, but it shows:

- How to organize program flow using a `while` loop and `if`/`elif`/`else`.
- How to call functions from another module based on the user's choice.

**Note:** The menu uses `input()`, which works in a notebook but may feel a bit different than a standard console. You can always stop it by choosing the 'Quit' option or interrupting the kernel.

In [None]:
# Uncomment the next line to run the interactive menu inside the notebook.
# Be prepared to provide text input in the notebook output area.

# run_menu()

## 7. Next Steps

Some ideas to practice and extend this project:

- Add more cleaning rules (e.g., remove stop words like 'basic', 'economy').
- Implement additional statistics (median, variance, standard deviation).
- Add small plots of the `price` distribution using Matplotlib or pandas.
- Improve the menu system with more options and error handling.

This project is intentionally simple, but it shows how to combine Python, NumPy, and basic project structure in a way that is clear and shareable.