# Orders Cleaning

This notebook will perform basic cleaning of the `Orders` dataset in order to prepare for dashboarding and storytelling.

Steps include:
* Removing columns with no or single values.
* Removing columns with no valuable information.
* Standardizing item names.

In [None]:
import string
import pandas as pd

from belly_rubb.config import RAW_DATA_DIR, INTERIM_DATA_DIR

In [None]:
# Load orders

orders_df = pd.read_csv(INTERIM_DATA_DIR / 'orders.csv')
orders_df.head()

In [None]:
orders_df.shape

# Standardizing Item Names

In [None]:
# Load menu data

catalog_df = pd.read_csv(RAW_DATA_DIR / 'MLW4W4RYAASNM_catalog-2025-08-26-2046.csv')
catalog_df.head()

## Normalize `Item Name` in orders

- Lowercase
- Trim
- No punctuation marks
- Remove double spaces
- Remove parentheses
- Normalize unicode

In [None]:
def normalize(item: str) -> str:
    lowercase = item.lower()
    trimmed = lowercase.strip()
    no_punctuation = trimmed.translate(str.maketrans('', '', string.punctuation))
    no_double_space = no_punctuation.replace('  ', ' ')

    return no_double_space

In [None]:
normalized_df = orders_df.dropna(subset='Item Name')
normalized_df['Item Name'] = normalized_df['Item Name'].apply(lambda x: normalize(x))

In [None]:
normalized_df['Item Name'].unique()