# AM_Filler: Comprehensive Demo (Numeric, Categorical, Text)

This notebook demonstrates how `AM_Filler` handles all three major data types automatically.

### 1. Numeric Strategies:
- **Mean**: Used for *Normal Distributions* (e.g., Age, Test Scores).
- **Median**: Used for *Skewed Distributions* or data with *Outliers* (e.g., Salary, House Prices).

### 2. Categorical Strategy:
- **Mode**: Fills missing values with the most frequent category (e.g., City, Department).

### 3. Text Strategy:
- **Context-Aware Filling**: Detects if a column is text/sentences and fills it with meaningful placeholders relevant to the context (e.g., "Feedback" -> "No review provided").

---

In [None]:
import pandas as pd
import numpy as np
from am_filler import AMFiller

# Set seed for reproducibility
np.random.seed(42)

# ------------------------------------------------------------------
# 1. Create a Complex Dataset
# ------------------------------------------------------------------

# A. Numeric Data (Normal vs Skewed)
age = np.random.normal(30, 5, 20)           # Normal (Use Mean)
salary = np.random.exponential(50000, 20)   # Skewed (Use Median)
salary[0] = 1000000                         # Add a massive outlier

# B. Categorical Data
# "Paris" will be the Mode (most frequent)
cities = ["New York", "London", "Paris", "Paris", "Paris", "Tokyo"] * 3 + ["Paris", "Paris"]

# C. Text Data
# Context: Product Reviews
reviews = [
    "Great product!", "Loved it.", "Fast delivery.", 
    "Not consistent with description.", "Will buy again."
] * 4

df = pd.DataFrame({
    "Age_Normal": age,
    "Salary_Skewed": salary,
    "City_Cat": cities[:20],
    "Review_Text": reviews[:20]
})

# ------------------------------------------------------------------
# 2. Introduce Missing Values (The "Problem")
# ------------------------------------------------------------------
print("Injecting missing values...\n")

# Add NaNs to Age (Normal)
df.loc[0:2, "Age_Normal"] = np.nan

# Add NaNs to Salary (Skewed)
df.loc[0:2, "Salary_Skewed"] = np.nan

# Add NaNs to City (Categorical)
df.loc[0:3, "City_Cat"] = np.nan

# Add NaNs to Text (Text)
df.loc[0:4, "Review_Text"] = np.nan

print("Original Data with Missing Values:")
display(df.head(10))

In [None]:
# ------------------------------------------------------------------
# 3. Run AM_Filler (The "Solution")
# ------------------------------------------------------------------
print("Applying Automatic Imputation...\n")

filler = AMFiller(verbose=True)
df_clean = filler.fit_transform(df)

# ------------------------------------------------------------------
# 4. Analyze the Results
# ------------------------------------------------------------------
print("\n--- RESULTS ANALYSIS ---")

# 1. Age (Normal) -> Should be Mean
filled_age = df_clean.loc[0, "Age_Normal"]
print(f"[Age_Normal] Filled with: {filled_age:.2f}")
print("  -> Comment: Data was normal, so MEAN was used.")

# 2. Salary (Skewed) -> Should be Median
filled_salary = df_clean.loc[0, "Salary_Skewed"]
print(f"\n[Salary_Skewed] Filled with: {filled_salary:.2f}")
print("  -> Comment: Data had outliers, so MEDIAN was used (More robust).")

# 3. City (Categorical) -> Should be Mode ('Paris')
filled_city = df_clean.loc[0, "City_Cat"]
print(f"\n[City_Cat] Filled with: '{filled_city}'")
print("  -> Comment: 'Paris' was the most frequent city (Mode).")

# 4. Review (Text) -> Should be meaningful sentence
filled_review = df_clean.loc[0, "Review_Text"]
print(f"\n[Review_Text] Filled with: '{filled_review}'")
print("  -> Comment: Recognized as text and filled with context-aware placeholder.")

print("\nFinal Cleaned Data Preview:")
display(df_clean.head(10))