# üåØ Machine Learning Lab: Challenger Edition

### Core Goal: Evolve from "Basic Guessing" to "Memory-Based Professional Prediction"

In the basic lab, we taught the computer to look at "Store Location" to guess prices. But in the real world, prices fluctuate with time, seasons, and trends. Today, we are performing a **"Brain Upgrade"** to give your AI model the ability to observe historical patterns.

-----

## üßê 1. Why Upgrade? (Analogy Time)

Imagine two assistants trying to guess the price of a Kebab:

  * **Basic Assistant**: He only looks at "Which supermarket was this bought from?" He assumes prices at that store never change.
  * **Challenger Assistant**: He doesn't just look at the store; he flips through his **notebook**. He thinks: "What was the average price last week?" and "Has the price been jumpy lately?"

The **Challenger Assistant** is much more accurate because he has "memory" and "observation skills."

-----

## üõ†Ô∏è 2. Advanced Implementation: Step-by-Step Improvements

### Step A: Automated Labeling and "Wearing Uniforms" (Prefixing)

In the advanced version, we let the computer automatically detect all locations and brands, giving them a **Prefix**.

> **Why add a Prefix?**
> To avoid "Identity Confusion." If a Brand is named "Central" and a Location is also named "Central," the computer might crash without a prefix. By adding prefixes like `brand_Central` and `location_Central`, it's like putting **uniforms** on the data. The computer won't get confused, and we can easily tell if a "1" represents a brand or a location.

In [None]:
# Advanced Demo: Automatically tag categories with 0s and 1s and put on "uniforms" (prefixes)
categorical_cols = ['brand_name', 'supermarket', 'location']
df_dummies = pd.get_dummies(df, columns=categorical_cols, prefix=['brand', 'supermarket', 'location'])

-----

### Step B: Building "Memory" Features (Feature Engineering)

This is where we create "New Features" the computer couldn't see before. This forms the ingredients for **`X_advanced`**.

In [None]:
# 0. CRITICAL: Sort by date! Otherwise 'rolling' memory is scrambled.
df = df.sort_values(by='date')

# 1. Time Features: Help the computer understand the day of the year and month (capturing seasonality)
df_dummies['day_of_year'] = df['date'].dt.dayofyear
df_dummies['month'] = df['date'].dt.month

# 2. Historical Memory (Rolling Stats): Observe the average price and stability over the last 7 days
# rolling_avg: Weekly trend / price_volatility: Is the price stable or jumpy?
df_dummies['rolling_avg'] = df['price'].rolling(window=7, min_periods=1).mean()
df_dummies['price_volatility'] = df['price'].rolling(window=7, min_periods=1).std().fillna(0)

# 3. Physical Attributes: Consider the weight of the Kebab
df_dummies['weight_grams'] = df_dummies['weight_grams'].fillna(0)

-----

### Step C: The Magic Filter (Automated Feature Selection)

This is the smartest line in your code. We use the "uniforms" (prefixes) we created earlier as a "magnet" to pull out all the tags we need at once.

In [None]:
# Define our other numerical features
features = ['day_of_year', 'day_of_week', 'month', 'rolling_avg', 'price_volatility', 'weight_grams']

# Use the "uniform" prefixes to automatically grab all category tags
categorical_features = [col for col in df_dummies.columns if col.startswith(('brand_', 'supermarket_', 'location_'))]

# Combine! This creates the final X_advanced
X_advanced = df_dummies[features + categorical_features]

#### üîç How does the computer run this filter? (Step-by-Step)

Imagine the computer sees this list of columns: `['price', 'brand_REWE', 'month', 'location_Berlin', 'weight_grams']`

| Round | Column Checked (`col`) | Does it start with `brand_/supermarket_/location_`? | Result |
| :--- | :--- | :--- | :--- |
| 1 | `price` | ‚ùå No | Discard |
| 2 | `brand_REWE` | ‚úÖ Yes (starts with `brand_`) | **Add to List\!** |
| 3 | `month` | ‚ùå No | Discard |
| 4 | `location_Berlin` | ‚úÖ Yes (starts with `location_`) | **Add to List\!** |
| 5 | `weight_grams` | ‚ùå No | Discard |

-----

### Step D: The "Fair Scale" (Scaling to `X_scaled`)

Why is `X_advanced` not enough? Why must we convert it to `X_scaled`?

  * **`X_advanced` (Raw Ingredients)**: Contains huge numbers (Weight 500g) and tiny numbers (Month 1).
  * **Scaling (Standardization)**: If we feed this directly to the computer, it will think "bigger numbers are more important." We shrink/stretch all numbers to a similar size (usually between -3 and 3) so the computer can judge them fairly.

<!-- end list -->

In [None]:
# Pass through the "Fair Scale" (The Bridge)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_advanced) # X_advanced becomes X_scaled here

# Feed to the brain for training
# Note: In a real app, if you have < 5 rows of data, 'cv=5' will error.
# You would adjust this number dynamically (e.g. cv=min(len(X_scaled), 5), or use a try-except block).
model = RidgeCV(alphas=[0.1, 1.0, 10.0], cv=5)
model.fit(X_scaled, y)

-----

### Step E: The Crystal Ball (Predicting the Future)

A trained model is useless if it only remembers the past. We need to create a **"hypothetical future"** dataframe to ask: "What if I buy a Kebab next Tuesday?"

1.  **Generate Dates**: Create rows for the next 7 days.
2.  **Carry Forward Memory**: Assume the `rolling_avg` and `price_volatility` stay the same as the last known day (for simplicity).
3.  **Scale & Predict**: Run these future rows through the same `scaler` and `model`.

In [None]:
from datetime import datetime, timedelta
import numpy as np

# 1. Create Future Dates
today = datetime.now()
future_dates = [today + timedelta(days=i) for i in range(7)]

# 2. Build Future Data (Simplification: Copy last known stats)
last_row = df_dummies.iloc[-1]
future_data = {
    'day_of_year': [d.timetuple().tm_yday for d in future_dates],
    'day_of_week': [d.weekday() for d in future_dates],
    'month': [d.month for d in future_dates],
    'rolling_avg': [last_row['rolling_avg']] * 7,
    'price_volatility': [last_row['price_volatility']] * 7,
    'weight_grams': [last_row['weight_grams']] * 7
}
# Add dummy brand/location cols (all 0s or carry over)
for col in categorical_features:
    future_data[col] = [last_row[col]] * 7

future_df = pd.DataFrame(future_data)
# Ensure columns match training data exactly
future_df = future_df[X_advanced.columns]

# 3. Predict
future_scaled = scaler.transform(future_df)
predictions = model.predict(future_scaled)

# Find the best day
best_idx = np.argmin(predictions)
best_day = future_dates[best_idx].strftime("%A")
predicted_price = predictions[best_idx]

-----

## üèÅ 3. Final Results & Confidence Score

The advanced model doesn't just give you a price; it tells you how much **"Confidence"** it has based on recent **Price Volatility**.

In [None]:
# Confidence Formula: If prices have been jumping (high volatility), confidence goes down
avg_volatility = df_dummies['price_volatility'].mean()

# Note: '50' is a sensitivity setting. Change to '20' for a calmer AI (case-by-case), or '80' for a nervous one.
confidence = max(0, min(100, int(100 - (avg_volatility * 50))))

print(f"--- Challenger Report ---")
print(f"Predicted Best Price: {predicted_price:.2f} ‚Ç¨")
print(f"Machine Confidence: {confidence}%")

-----

## üïµÔ∏è‚Äç‚ôÇÔ∏è Critical Thinking (Discussion)

1.  **Why Prefix?** If we have two labels both named "Central"‚Äîone is a brand and one is a location‚Äîwhat happens if we don't use prefixes?
2.  **Filter Logic**: In the "Step-by-Step" table, why do we discard `price`? (Hint: Can a student take an exam while looking at the answer key?)
3.  **Fairness**: If we skip Scaling, do you think the computer will listen more to `weight_grams` (500) or `month` (1)?
4.  **Memory**: How does `rolling_avg` help the computer realize a shop is "quietly raising its prices"?

-----

**Summary**:
The essence of Machine Learning is **"Feature Engineering."** Through **`X_advanced`**, we gave the computer a broader vision and memory. Through **`X_scaled`**, we ensured the learning process was fair. This is the key process of evolving a simple "calculator" into a "Professional AI\!"