-----

# üåØ Machine Learning Lab: Doner Kebab Price Predictor

Welcome to the Doner Detective Lab! In this session, you will transform into a Data Scientist to solve a tasty mystery: What actually drives the price of a Doner Kebab?

Using Ridge Regression, you will teach a computer to find the mathematical "sweet spot" between historical patterns and real-world price spikes. You will learn to move from raw data to a stable predictive formula, discovering exactly how much "brand," "location," and "timing" weigh in on your final bill.

-----

## üèóÔ∏è Phase 1: The Prediction Equation (The Brain's Structure)

Machines don't guess based on feelings; they use **Linear Models** as their logical foundation.

### 1\. The Mathematical Expression

$$\hat{y} = w_1x_1 + w_2x_2 + b$$

### 2\. Component Breakdown & Weight Selection ($w$)

  * **$\hat{y}$ (Y-hat)**: The predicted price (in Euro).
  * **$x_1, x_2$**: Input **Features** (e.g., $x_1$ = is it Weekend?, $x_2$ = is it REWE?).
  * **$w_1, w_2$ (Weights)**:
      * **Definition**: The "Influence" of a feature. If $w_1$ is high, it means the "Weekend" is a major factor in price increases.
      * **How are they chosen?**: At the very start, the machine chooses **random numbers** (e.g., 0.1). It starts "clueless" and improves through the "Learning" phase.
  * **$b$ (Bias)**: The "Base Price." Even without any extra factors, a Kebab has a fundamental cost.

### 3\. Real-world Example

Assume initial weights: $w_1 = 0.5$, $w_2 = 0.5$, $b = 4.0$.
If you buy on a **Weekend ($x_1=1$)** at **REWE ($x_2=1$)**:
$$\hat{y} = 0.5(1) + 0.5(1) + 4.0 = 5.5 \text{ EUR}$$

-----

## üìâ Phase 2: The Loss Function (Measuring "Pain")

After making a guess, we must tell the machine how "bad" it was.

### 1\. The Mathematical Expression (Mean Squared Error)

$$Loss = (y - \hat{y})^2$$

### 2\. Machine Preference: Lower Loss

  * **Why?**: $Loss$ represents "Error." The goal of Machine Learning is **Loss Minimization**.
  * **$Loss = 0$** means the prediction matches reality perfectly. The machine will work tirelessly to reach the lowest possible Loss.

### 3\. Real-world Example

True Price $y = 7.0$. Current Prediction $\hat{y} = 5.0$.
$Loss = (7.0 - 5.0)^2 = 4.0$. The machine sees **4.0** and feels "pain," prompting it to adjust its weights.

-----

## üõ°Ô∏è Phase 3: Regularization (Preventing "Rote Memorization")

If there is an outlier (e.g., a typo saying a Kebab costs ‚Ç¨100), a machine might over-adjust its weights to fit that error. This is called **Overfitting**.

### 1\. The Mathematical Expression (Ridge)

$$Loss_{Total} = Loss_{MSE} + \lambda \sum w^2$$

### 2\. Machine Preference: Lower $Loss_{Total}$

  * This is a **Balance Game**. $\lambda$ (Lambda) is the "Restraint" we put on the machine.
  * If the machine tries to make $w$ huge to fit an outlier, $\lambda \sum w^2$ will skyrocket. To keep the **Total Loss** low, the machine chooses to stay **rational** and ignore the outlier.

-----

## üîÑ Phase 4: Parameter Updates & Iteration (Getting Smarter)

### 1\. The Mathematical Expression

$$w_{new} = w_{old} - \alpha \cdot \text{Gradient}$$

### 2\. The Logic of Iteration

The machine repeats a loop: **"See Data $\to$ Calculate Error $\to$ Update Weights."**

  * **One Epoch**: The machine has looked at all 80 rows of data once.
  * **Convergence**: After hundreds of repetitions, the weights stop changing. The machine has found its "Best Guess."
  
-----

## üíª Practical Implementation (Python Code Flow)

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import RidgeCV
import sys
import os
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)
import database

# 1. Prepare Ingredients (Feature Engineering)
# Convert text to 0s and 1s (One-Hot Encoding)
data = database.get_prices_by_item_and_brand("Doner Kebab")
data = data.fillna(0)
#display(data)
X = pd.get_dummies(data.drop(columns=['price', 'date']))
y = data['price']

# 2. Fairness Training (Feature Scaling)
# Math: z = (x - Œº) / œÉ
# We shrink all numbers (e.g., 2000m vs 7 days) to a similar scale so Lambda is fair
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Automatic Lambda Search (Finding the Sweet Spot)
# CV=5 means 5-fold cross-validation; the machine takes 5 exams to find the best Œª
model = RidgeCV(alphas=[0.1, 1.0, 10.0], cv=5)

# 4. Train Model (Iteration Begins)
# This line triggers the gradient descent and weight updates!
model.fit(X_scaled, y)

# 5. Output Results (Reading the Machine's Brain)
print(f"Base Price (Bias): {model.intercept_}")
print(f"Learned Influence (Weights): {model.coef_}")

# 6. The prediction result
final_prediction = model.predict(X_scaled) 
print(f"The machine's best guess for the first Kebab is: {final_prediction[0]:.2f} EUR")

Base Price (Bias): 5.405172413793103
Learned Influence (Weights): [-0.10540443  0.27393774  0.         -0.02822659 -0.02851044  0.
  0.06136901  0.08463575 -0.12544891 -0.09974844  0.00247581 -0.07522629
  0.04160614  0.13111615  0.10973936 -0.02833555  0.02833555]
The machine's best guess for the first Kebab is: 4.81 EUR


  data = data.fillna(0)


-----

# üïµÔ∏è‚Äç‚ôÇÔ∏è Detective Challenge: The Doner Mystery

**Goal**: Use your new skills to find the "Secret Price Rules"

### Step 1: Translation (Engineering)

**Task**: Convert dates and text to numbers.

  * **Hint**: Use `pd.to_datetime()` for the date and `pd.get_dummies()` for the supermarkets.

<!-- end list -->

In [2]:
# Your Code Here:

### Step 2: The Level Playing Field (Scaling)

**Task**: Scale your features so the machine doesn't get biased.

  * **Hint**: Separate your target `y` (Price) first, then use `StandardScaler()`.
  * **Math Connection**: Why do we do this? To make sure $\lambda$ treats every $w$ equally\!

<!-- end list -->

In [3]:
# Your Code Here:

### Step 3: Training the Detective (Modeling)

**Task**: Train a `RidgeCV` model.

  * **Hint**: Provide a list of `alphas` (Lambdas) for the machine to try.

<!-- end list -->

In [4]:
# Your Code Here:

### Step 4: Cracking the Case (Interpretation)

**Task**: Look at the Weights.

  * **Almost there!**: Find the top 3 features with the highest weights.
  * **The Condition?**: Write one sentence: "I discovered that if **[Condition]** happens, the Kebab price goes up significantly\!"

<!-- end list -->

In [5]:
# Your Code Here:

### Step 5: The Moment of Truth (Final Prediction)

**Task**: Ask your trained "Detective" (the model) to guess the prices for the items in your list.

  * **Hint**:Use model.predict(X_scaled) to generate the list of guesses.This step is where the machine finally uses the formula $\hat{y} = wX + b$ that it spent all that time learning in Step 3!

<!-- end list -->

In [6]:
# Your Code Here:

-----

### üåü Summary for Students

1.  **Weights**: Start random, improve via **Gradient Descent**.
2.  **Loss**: The lower the better\!
3.  **Iteration**: A marathon of "Guessing and Correcting."
4.  **Regularization ($\lambda$)**: The "Brake" that prevents the machine from going crazy over outliers.

-----




# üéì Further Study: Exploring the AI Universe

Now that you've mastered **Ridge Regression**, it's time to see how other "AI Personalities" think. In Machine Learning, there is no "perfect" model‚Äîonly the model that fits your data best\!

### 1\. Summary of Alternative ML Methods

| Method | Personality | Best Case |
| :--- | :--- | :--- |
| **KNN** | "Birds of a feather flock together" | When similar items (neighbors) have similar prices. |
| **SVM** | "Drawing a clear line in the sand" | When you need to find complex boundaries between price groups. |
| **XGBoost** | "The Super-Student" | When you want the highest accuracy by learning from past mistakes. |
| **Random Forest** | "The Expert Panel" | When your data is messy and has many categories (like names/locations). |

-----



### 2\. Implementation: Switching the "Brain"

The amazing thing about Python's `scikit-learn` is that switching models is like changing a lightbulb. The steps remain the same: **Define $\to$ Fit $\to$ Predict.**

In [7]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor

# --- KNN (K-Nearest Neighbors) ---
# It looks at the 5 closest "neighbor" kebabs to guess the price.
knn_model = KNeighborsRegressor(n_neighbors=5)
knn_model.fit(X_scaled, y)

# --- SVM (Support Vector Machine) ---
# It tries to find a "buffer zone" to separate price points.
svm_model = SVR(kernel='rbf')
svm_model.fit(X_scaled, y)

# --- XGBoost (Gradient Boosting) ---
# It builds one weak tree, then another to fix the first one's errors.
xgb_model = GradientBoostingRegressor(n_estimators=100)
xgb_model.fit(X_scaled, y) # Can also work without scaling!

# --- Random Forest ---
# 100 decision trees voting on the final price.
rf_model = RandomForestRegressor(n_estimators=100)
rf_model.fit(X_scaled, y)

0,1,2
,"n_estimators  n_estimators: int, default=100 The number of trees in the forest. .. versionchanged:: 0.22  The default value of ``n_estimators`` changed from 10 to 100  in 0.22.",100
,"criterion  criterion: {""squared_error"", ""absolute_error"", ""friedman_mse"", ""poisson""}, default=""squared_error"" The function to measure the quality of a split. Supported criteria are ""squared_error"" for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node, ""friedman_mse"", which uses mean squared error with Friedman's improvement score for potential splits, ""absolute_error"" for the mean absolute error, which minimizes the L1 loss using the median of each terminal node, and ""poisson"" which uses reduction in Poisson deviance to find splits. Training using ""absolute_error"" is significantly slower than when using ""squared_error"". .. versionadded:: 0.18  Mean Absolute Error (MAE) criterion. .. versionadded:: 1.0  Poisson criterion.",'squared_error'
,"max_depth  max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.",
,"min_samples_split  min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node: - If int, then consider `min_samples_split` as the minimum number. - If float, then `min_samples_split` is a fraction and  `ceil(min_samples_split * n_samples)` are the minimum  number of samples for each split. .. versionchanged:: 0.18  Added float values for fractions.",2
,"min_samples_leaf  min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. - If int, then consider `min_samples_leaf` as the minimum number. - If float, then `min_samples_leaf` is a fraction and  `ceil(min_samples_leaf * n_samples)` are the minimum  number of samples for each node. .. versionchanged:: 0.18  Added float values for fractions.",1
,"min_weight_fraction_leaf  min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.",0.0
,"max_features  max_features: {""sqrt"", ""log2"", None}, int or float, default=1.0 The number of features to consider when looking for the best split: - If int, then consider `max_features` features at each split. - If float, then `max_features` is a fraction and  `max(1, int(max_features * n_features_in_))` features are considered at each  split. - If ""sqrt"", then `max_features=sqrt(n_features)`. - If ""log2"", then `max_features=log2(n_features)`. - If None or 1.0, then `max_features=n_features`. .. note::  The default of 1.0 is equivalent to bagged trees and more  randomness can be achieved by setting smaller values, e.g. 0.3. .. versionchanged:: 1.1  The default of `max_features` changed from `""auto""` to 1.0. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features.",1.0
,"max_leaf_nodes  max_leaf_nodes: int, default=None Grow trees with ``max_leaf_nodes`` in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.",
,"min_impurity_decrease  min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following::  N_t / N * (impurity - N_t_R / N_t * right_impurity  - N_t_L / N_t * left_impurity) where ``N`` is the total number of samples, ``N_t`` is the number of samples at the current node, ``N_t_L`` is the number of samples in the left child, and ``N_t_R`` is the number of samples in the right child. ``N``, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum, if ``sample_weight`` is passed. .. versionadded:: 0.19",0.0
,"bootstrap  bootstrap: bool, default=True Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.",True


-----

### 3\. Comparison Challenge: Who is the Best Detective?

**Task**: Let's compare how each "AI Detective" guessed the price for the **very first Kebab** in our list.

In [8]:
# Create a list of all our trained detectives
detectives = {
    "Ridge (Linear)": model,
    "KNN (Neighbor)": knn_model,
    "SVM (Boundary)": svm_model,
    "XGBoost (Expert)": xgb_model,
    "Random Forest (Panel)": rf_model
}

print(f"--- Comparison for Kebab #1 ---")
print(f"Actual Price: {y.iloc[0]:.2f} EUR\n")

for name, detective in detectives.items():
    guess = detective.predict(X_scaled[0:1])[0]
    error = abs(guess - y.iloc[0])
    print(f"{name:25} | Guess: {guess:.2f} EUR | Error: {error:.2f} EUR")

--- Comparison for Kebab #1 ---
Actual Price: 4.70 EUR

Ridge (Linear)            | Guess: 4.81 EUR | Error: 0.11 EUR
KNN (Neighbor)            | Guess: 4.85 EUR | Error: 0.15 EUR
SVM (Boundary)            | Guess: 4.78 EUR | Error: 0.08 EUR
XGBoost (Expert)          | Guess: 4.69 EUR | Error: 0.01 EUR
Random Forest (Panel)     | Guess: 4.68 EUR | Error: 0.02 EUR


-----

### üë®‚Äçüè´ Critical Thinking Questions for the Lab

1.  **Which model was the most accurate?** (The one with the smallest Error).

2.  **Why do they differ?** 
      * **Ridge** looked for a straight-line formula.
      * **KNN** just looked for similar kebabs.
      * **XGBoost** tried to memorize the patterns perfectly.

3.  **The "Best" Choice**: If you had a million kebab records, which one would you trust to run your business? (Hint: usually XGBoost or Random Forest\!And you can see they have less errors in the above result:))

4.  **BUT**, in cases where you have very little data, a simple linear rule **(Ridge and SVM)** is actually more stable than a complex forest

-----
