<a href="https://colab.research.google.com/github/GerardoMunoz/ML_2025/blob/main/Perceptron_3_inputs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Analyzing Order Completion Time in a Fast-Food Restaurant**

## **Introduction**
In a fast-food restaurant, a customer records the time it takes to receive their order each time they visit. After visiting **10 times**, they plot the number of  hamburgers, fries and drinks ordered against the wait time. The goal is to understand the relationship between the ordered and the time required for order completion.

## **Collected Data**
Below is the recorded data from the customer's visits:

| **Number of Hamburgers Ordered (\( x \))** | **Time to Get Called (\( y \))** (in minutes) |
|------------------|------------------|
| $x_1$=1  | $y_{r1}$=4.8  |
| $x_2$=2  | $y_{r2}$=7.2  |
| $x_3$=3  | $y_{r3}$=9.0  |
| $x_4$=2  | $y_{r4}$=6.7  |
| $x_5$=4  | $y_{r5}$=11.2 |
| $x_6$=5  | $y_{r6}$=13.3 |
| $x_7$=3  | $y_{r7}$=9.5  |
| $x_8$=6  | $y_{r8}$=15.1 |
| $x_9$=5  | $y_{r9}$=12.6 |
| $x_{10}$=7  | $y_{r10}$=17.5 |


# **Analyzing Order Completion Time in a Fast-Food Restaurant**

## **Introduction**
In a fast-food restaurant, a customer records the time it takes to receive their order each time they visit. Over **10 visits**, they recorded the number of hamburgers, fries, and drinks ordered, along with the corresponding wait time. The goal is to understand the relationship between the number of items ordered and the time required for order completion.

## **Collected Data**
The following table shows the recorded data from the customer's visits.

| **Hamburgers ($ x_h $)** | **Fries ($ x_f $)** | **Drinks ($ x_d $)** | **Time ($ y_r $)**  |
|------------------|------------------|------------------|------------------|
| 1  | 1  | 1  | 9.3  |
| 2  | 0  | 2  | 9.7  |
| 3  | 2  | 1  | 13.5  |
| 2  | 1  | 1  | 10.8  |
| 4  | 3  | 2  | 19.2  |
| 5  | 2  | 1  | 18.6  |
| 3  | 2  | 3  | 15.9  |
| 6  | 4  | 2  | 25.5  |
| 5  | 3  | 1  | 22.8  |
| 7  | 5  | 3  | 32.4  |

The r in $y_r$ refers to the real data, instead of $y_p$ for the predicted data.




We will approximate to the function

 $y_p= m_h x_h + m_f x_f + m_d x_d + b$

 so we need to update the vector notion.

##  Expressing in Vector Form**  
Now, we can write our equations using vector notation for better efficiency in computations. We are going to use affine vectors and matrices.

$\mathbf{x} = \begin{bmatrix} x_h & x_f & x_d & 1 \end{bmatrix}$

$\mathbf{y_r} = \begin{bmatrix} y_r & 1 \end{bmatrix}$

$\mathbf{y_p} = \begin{bmatrix} y_p & 1 \end{bmatrix}$

$\mathbf{W} = \begin{bmatrix} m_h & 0 \\ m_f & 0 \\ m_d & 0 \\ b & 1 \end{bmatrix}$

So, the line equation is:

$\mathbf{y_p} = \mathbf{x} \mathbf{W}$

$\begin{bmatrix} y_p & 1 \end{bmatrix} =
\begin{bmatrix} x_h & x_f & x_d & 1 \end{bmatrix}
\begin{bmatrix} m_h & 0 \\ m_f & 0 \\ m_d & 0 \\ b & 1 \end{bmatrix}$

$\begin{bmatrix} y_p & 1 \end{bmatrix} =
\begin{bmatrix}m_h x_h + m_f x_f + m_d x_d + b  & 1 \end{bmatrix} $

$y_p= m_h x_h + m_f x_f + m_d x_d + b$,  $\ \ \ \ 1=1$



The points an the error matrices are:

$\mathbf{X}= \begin{bmatrix} x_{h1} & x_{f1} & x_{d1} & 1 \\  x_{h2} & x_{f2} & x_{d2}  & 1 \\ \vdots & \vdots \\  x_{hn} & x_{fn} & x_{dn}  & 1 \end{bmatrix}$

$\mathbf{Y_r}= \begin{bmatrix} y_{r1} & 1 \\ y_{r2} & 1 \\ \vdots & \vdots \\ y_{rn} & 1 \end{bmatrix}$


$\mathbf{Y_p} = \mathbf{X} \mathbf{W}$


$\mathbf{E}= \mathbf{Y_r}-\mathbf{Y_p}=
\begin{bmatrix} y_{r1} - y_{p1} & 1-1 \\ y_{r2}  - y_{p1} & 1-1  \\ \vdots & \vdots \\ y_{rn}  - y_{p1} & 1-1  \end{bmatrix}=\begin{bmatrix} e_1 & 0 \\ e_2 & 0 \\ \vdots & \vdots \\ e_n & 0 \end{bmatrix}$

Now we are goint to see how to update $\mathbf{W}$

$\mathbf{W} += \eta \mathbf{X}^T \mathbf{E}$



$ \begin{bmatrix} m_h & 0 \\ m_f & 0 \\ m_d & 0 \\ b & 1 \end{bmatrix}+= \eta
\begin{bmatrix}
x_{h1} &  x_{h2}  & \cdots & x_{hn} \\
x_{f1} &  x_{f2}  & \cdots & x_{fn} \\
x_{d1} &  x_{d2}  & \cdots & x_{dn} \\  
1 & 1 &\cdots & 1 \end{bmatrix}
\begin{bmatrix} e_1 & 0 \\ e_2 & 0 \\
\vdots & \vdots \\ e_n & 0 \end{bmatrix}$

$\begin{bmatrix} m_h & 0 \\ m_f & 0 \\ m_d & 0 \\ b & 1 \end{bmatrix} +=\eta\begin{bmatrix} e_1 x_{h1} + e_2 x_{h2} + \cdots + e_n x_{hn} & 0 \\ e_1 x_{f1} + e_2 x_{f2} + \cdots + e_n x_{fn} & 0 \\ e_1 x_{d1} + e_2 x_{d2} + \cdots + e_n x_{dn} & 0 \\ e_1 + e_2 + \cdots
 + e_n & 0 \end{bmatrix} $


In [20]:
import numpy as np

# Original inputs with an added 1 at the tail
X = np.array([
    [1, 1, 1, 1],
    [2, 0, 2, 1],
    [3, 2, 1, 1],
    [2, 1, 1, 1],
    [4, 3, 2, 1],
    [5, 2, 1, 1],
    [3, 2, 3, 1],
    [6, 4, 2, 1],
    [5, 3, 1, 1],
    [7, 5, 3, 1]
])

# Output converted to a column vector with 1 at the end of each row
Y_r = np.array([
    [9.3, 1],
    [8.7, 1],
    [15.5, 1],
    [10.8, 1],
    [22.2, 1],
    [19.6, 1],
    [18.9, 1],
    [29.5, 1],
    [22.8, 1],
    [35.4, 1]
])



In [21]:
Y_= 2*X[:,0]+ 3*X[:,1]+ 1*X[:,2]+3
Y_

array([ 9,  9, 16, 11, 22, 20, 18, 29, 23, 35])

In [22]:
Y_r[:,0]-Y_

array([ 0.3, -0.3, -0.5, -0.2,  0.2, -0.4,  0.9,  0.5, -0.2,  0.4])

In [23]:
# Initialize parameters
m_h = 1.5  # Initial slope
m_f = 1.8  # Initial slope
m_d = 1.2  # Initial slope
b = -1  # Initial intercept
learning_rate = 0.01
iterations = 3000
W = np.array([
    [m_h, 0],
    [m_f, 0],
    [m_d, 0],
    [b, 1]
])
Y_p =X @ W
E = Y_r - Y_p
print(f'Error: {max(abs(E[:,0]))}')

Error: 13.299999999999997


In [24]:
for i in range(iterations):
    # Update values
    E = Y_r - Y_p
    W = W + learning_rate * X.T @ E
    if i%100 == 0:
        print(f'Error: {max(abs(E[:,0]))}, m: {W[0,0]}, b: {W[1,0]}')
    Y_p = X @ W

Error: 13.299999999999997, m: 5.143000000000001, b: 4.104
Error: 6.990905051874888e+27, m: 1.1142443604312148e+27, b: 7.041497612300763e+26
Error: 3.2782461417015055e+54, m: 5.225027730160142e+53, b: 3.3019705185574664e+53
Error: 1.5372684489111486e+81, m: 2.45017302760922e+80, b: 1.5483935244650595e+80
Error: 7.208715214993348e+107, m: 1.1489600008380687e+107, b: 7.260884048270472e+106
Error: 3.3803838937619447e+134, m: 5.387819834152377e+133, b: 3.4048474324795727e+133
Error: 1.5851639201168978e+161, m: 2.52651115305249e+160, b: 1.5966356109521098e+160
Error: 7.433311519077197e+187, m: 1.1847572493119277e+187, b: 7.487105735906447e+186
Error: 3.485703871910674e+214, m: 5.555683924455658e+213, b: 3.5109296019782064e+213
Error: 1.634551633073689e+241, m: 2.6052276857880357e+240, b: 1.6463807384115326e+240
Error: 7.664905394615005e+267, m: 1.2216698046697213e+267, b: 7.720375635800969e+266
Error: 3.594305222278022e+294, m: 5.728778024982143e+293, b: 3.620316890695457e+293
Error: nan, m:

  Y_p = X @ W
  Y_p = X @ W


We need to normalize the data, each column is typically normalized individually to ensure all features contribute equally to the model.



## **Common Normalization Techniques**
There are two widely used methods:

### **1. Min-Max Scaling (Normalization)**
Scales values between **0 and 1** using the formula:

$x' = \frac{x - \min(x)}{\max(x) - \min(x)}$



### **2. Z-Score Standardization (Standardization)**
Centers data around **0** with a standard deviation of **1** using:

$x' = \frac{x - \mu}{\sigma}$

Where:
- $ \mu $ is the **mean** of the column.
- $ \sigma $ is the **standard deviation** of the column.







In [25]:
import numpy as np

import numpy as np

def normalize_columns(data):
    """
    Normalizes all columns in a NumPy array except the last column using Min-Max Scaling.
    Stores min and max values for later restoration.

    Parameters:
        data (np.array): The input NumPy array.

    Returns:
        normalized_data (np.array): The normalized array.
        min_max_values (list): List of (min, max) values for each column.
    """
    normalized_data = data.astype(float).copy()
    min_max_values = []  # Store min and max values for restoration

    for col in range(data.shape[1] - 1):  # Ignore last column
        min_val, max_val = data[:, col].min(), data[:, col].max()
        normalized_data[:, col] = (data[:, col] - min_val) / (max_val - min_val)
        min_max_values.append((min_val, max_val))

    #min_max_values.append((None, None))  # Last column remains unchanged
    return normalized_data, min_max_values


def restore_columns(normalized_data, min_max_values):
    """
    Restores the original values from the normalized data using stored min and max values.

    Parameters:
        normalized_data (np.array): The normalized array.
        min_max_values (list): List of (min, max) values for each column.

    Returns:
        np.array: Restored original data.
    """
    restored_data = normalized_data.astype(float).copy()

    for col in range(normalized_data.shape[1] - 1):  # Ignore last column
        min_val, max_val = min_max_values[col]
        restored_data[:, col] = normalized_data[:, col] * (max_val - min_val) + min_val

    return restored_data

import numpy as np

def denormalize_W(W_norm, X_limits, Y_limits):
    """
    Denormalizes the weight matrix W using Min-Max Scaling limits.

    Parameters:
        W_norm (np.array): (n+1, 1) normalized weight matrix (including bias).
        X_limits (list): List of (min, max) values for each input feature.
        Y_limits (tuple): (min, max) values for the output variable.

    Returns:
        W_denorm (np.array): (n+1, 1) denormalized weight matrix.
    """
    W_denorm = W_norm.copy()
    num_features = len(X_limits)  # Number of input variables

    # Extract min and max for output variable (Y)
    y_min, y_max = Y_limits[0]

    # Denormalize each coefficient
    for i in range(num_features):
        x_min, x_max = X_limits[i]
        W_denorm[i, 0] = W_norm[i, 0] * (y_max - y_min) / (x_max - x_min)

    # Denormalize bias term
    b_denorm = W_norm[-1, 0] * (y_max - y_min) + y_min
    for i in range(num_features):
        x_min, _ = X_limits[i]
        b_denorm -= W_denorm[i, 0] * x_min

    W_denorm[-1, 0] = b_denorm  # Update bias in the matrix

    return W_denorm


# Example usage


# Normalize the data
X_n, X_limits = normalize_columns(X)
Y_rn, Y_limits = normalize_columns(Y_r)

# Restore the original data
X_ = restore_columns(X_n, X_limits)

print("Original Data:\n", X)
print("\nNormalized Data:\n", X_n)
print("\nRestored Data:\n", X_)
print("\nNormalization limits:\n", X_limits)


Original Data:
 [[1 1 1 1]
 [2 0 2 1]
 [3 2 1 1]
 [2 1 1 1]
 [4 3 2 1]
 [5 2 1 1]
 [3 2 3 1]
 [6 4 2 1]
 [5 3 1 1]
 [7 5 3 1]]

Normalized Data:
 [[0.         0.2        0.         1.        ]
 [0.16666667 0.         0.5        1.        ]
 [0.33333333 0.4        0.         1.        ]
 [0.16666667 0.2        0.         1.        ]
 [0.5        0.6        0.5        1.        ]
 [0.66666667 0.4        0.         1.        ]
 [0.33333333 0.4        1.         1.        ]
 [0.83333333 0.8        0.5        1.        ]
 [0.66666667 0.6        0.         1.        ]
 [1.         1.         1.         1.        ]]

Restored Data:
 [[1. 1. 1. 1.]
 [2. 0. 2. 1.]
 [3. 2. 1. 1.]
 [2. 1. 1. 1.]
 [4. 3. 2. 1.]
 [5. 2. 1. 1.]
 [3. 2. 3. 1.]
 [6. 4. 2. 1.]
 [5. 3. 1. 1.]
 [7. 5. 3. 1.]]

Normalization limits:
 [(1, 7), (0, 5), (1, 3)]


In [26]:
W_n = np.array([
    [m_h, 0],
    [m_f, 0],
    [m_d, 0],
    [b, 1]
])
Y_pn =X_n @ W_n
E_n = Y_rn - Y_pn
print(f'Error: {max(abs(E_n[:,0]))}')

Error: 2.5


In [27]:

for i in range(iterations):
    # Update values
    E_n = Y_rn - Y_pn
    W_n = W_n + learning_rate * X_n.T @ E_n
    if i%100 == 0:
        print(f'Error: {max(abs(E_n[:,0]))}, m: {W_n[0,0]}, b: {W_n[1,0]}')
    Y_pn = X_n @ W_n

Error: 2.5, m: 1.4497156054931335, b: 1.751056479400749
Error: 0.34757996175044914, m: 0.6190463352859336, b: 0.9611501238513626
Error: 0.09624859719406897, m: 0.4308247524410763, b: 0.7819854323815847
Error: 0.038585397834888915, m: 0.3717909530398695, b: 0.7207990478530942
Error: 0.020147077880495967, m: 0.3536444232273951, b: 0.6962778576904617
Error: 0.020268155992763326, m: 0.3493757904249272, b: 0.6842880324807201
Error: 0.019822658661875714, m: 0.35001624021905164, b: 0.6770244877457782
Error: 0.019385224748425467, m: 0.35237328087325137, b: 0.6717494108526556
Error: 0.019017287411618566, m: 0.35525454437102766, b: 0.6674330909442346
Error: 0.018702854848725747, m: 0.3582046899190812, b: 0.6636637520325858
Error: 0.01842506154865503, m: 0.3610518402349865, b: 0.660265524295772
Error: 0.018173916758695563, m: 0.3637358783357618, b: 0.6571562385417213
Error: 0.017943990598018877, m: 0.3662411250845541, b: 0.6542921955007551
Error: 0.01773218046361974, m: 0.3685693805596489, b: 0.6

In [28]:
W_n

array([[ 0.38989282,  0.        ],
       [ 0.62751739,  0.        ],
       [ 0.10215826,  0.        ],
       [-0.11048498,  1.        ]])

In [29]:
W_=denormalize_W(W_n, X_limits, Y_limits)
W_

array([[1.73502303, 0.        ],
       [3.35094286, 0.        ],
       [1.36381273, 0.        ],
       [2.65121523, 1.        ]])

In [30]:
Y_ =X @ W_
E_ = Y_r - Y_
print(f'Error: {max(abs(E_[:,0]))}')

Error: 0.4219827849420277


In [31]:
W__ = np.array([
    [2, 0],
    [3, 0],
    [1, 0],
    [3, 1]
])
Y__ =X @ W__
E__ = Y_r - Y__
print(f'Error: {max(abs(E__[:,0]))}')

Error: 0.8999999999999986


The error of the approximation (trained model) is smaller than the error of the function that create the data (without noise)

In [36]:
# Aproximation
Y_

array([[ 9.10099386,  1.        ],
       [ 8.84888676,  1.        ],
       [15.92198278,  1.        ],
       [10.83601689,  1.        ],
       [22.37176141,  1.        ],
       [19.39202885,  1.        ],
       [18.64960825,  1.        ],
       [29.19275034,  1.        ],
       [22.74297171,  1.        ],
       [35.64252897,  1.        ]])

In [32]:
# Data created with noise
Y_r

array([[ 9.3,  1. ],
       [ 8.7,  1. ],
       [15.5,  1. ],
       [10.8,  1. ],
       [22.2,  1. ],
       [19.6,  1. ],
       [18.9,  1. ],
       [29.5,  1. ],
       [22.8,  1. ],
       [35.4,  1. ]])

In [35]:
# Data with out noise
Y__

array([[ 9,  1],
       [ 9,  1],
       [16,  1],
       [11,  1],
       [22,  1],
       [20,  1],
       [18,  1],
       [29,  1],
       [23,  1],
       [35,  1]])