<a href="https://colab.research.google.com/github/GerardoMunoz/ML_2025/blob/main/Perceptron_2_Two_outputs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://github.com/GerardoMunoz/ML_2025/blob/main/Perceptron_2_Two_outputs.ipynb

# **Analyzing Order Completion Time in a Fast-Food Restaurant for two outputs**

## **Introduction**
In a fast-food restaurant, the administrator records the time it takes to receive their order. Data was collected from 10 customers, recording the number of hamburgers, fries, and drinks ordered, along with the corresponding wait time and cost. The goal is to understand the relationship between the number of items ordered and the waiting time and preparation cost required for order completion.

## **Collected Data**
The following table shows the recorded data from the administrator.

| **Hamburgers ($ x_h $)** | **Fries ($ x_f $)** | **Drinks ($ x_d $)** | **Time ($ y_rt $)**  | **Cost ($ y_rc $)**  |
|------------------|------------------|------------------|------------------|------------------|
| 1  | 1  | 1  | 9.3  | 9.2  |
| 2  | 0  | 2  | 9.7  | 11.8  |
| 3  | 2  | 1  | 13.5  | 19.4  |
| 2  | 1  | 1  | 10.8  | 12.4  |
| 4  | 3  | 2  | 19.2  | 25.7 |
| 5  | 2  | 1  | 18.6  | 26.9  |
| 3  | 2  | 3  | 15.9  | 20.1  |
| 6  | 4  | 2  | 25.5  | 34.1  |
| 5  | 3  | 1  | 22.8  | 31.3  |
| 7  | 5  | 3  | 32.4  | 43.5  |


The subscript $r$ in
$y_r$  denotes real data, distinguishing it from the predicted values,
$y_p$.




We will approximate each output to the functions

$y_pt= m_h x_h + m_f x_f + m_d x_d + b_t$

$y_pc= m_h x_h + m_f x_f + m_d x_d + b_c$

 so we need to update the vector notation.

##  Expressing in Vector Form**  
Now, we can write our equations using vector notation for better efficiency in computations. We are going to use affine vectors and matrices.

$\mathbf{x} = \begin{bmatrix} x_h & x_f & x_d & 1 \end{bmatrix}$

$\mathbf{y_r} = \begin{bmatrix} y_{rt} & y_{rc} & 1 \end{bmatrix}$

$\mathbf{y_p} = \begin{bmatrix} y_{pt} & y_{pc} & 1 \end{bmatrix}$

$\mathbf{W} = \begin{bmatrix} m_{ht} & m_{hc} & 0 \\ m_{ft} & m_{fc} & 0 \\ m_{dt} & m_{dc} & 0 \\ b_t & b_c & 1 \end{bmatrix}$

So, the line equation is:

$\mathbf{y_p} = \mathbf{x} \mathbf{W}$

$\begin{bmatrix} y_{pt} & y_{pc} & 1 \end{bmatrix} =
\begin{bmatrix} x_h & x_f & x_d & 1 \end{bmatrix}
 \begin{bmatrix} m_{ht} & m_{hc} & 0 \\ m_{ft} & m_{fc} & 0 \\ m_{dt} & m_{dc} & 0 \\ b_t & b_c & 1 \end{bmatrix}$

$\begin{bmatrix} y_{pt} & y_{pc} & 1 \end{bmatrix} =
\begin{bmatrix}m_{ht} x_h + m_{ft} x_f + m_{dt} x_d + b_t  & m_{hc} x_h + m_{fc} x_f + m_{dc} x_d + b_c  & 1 \end{bmatrix} $

$y_{pt}= m_{ht} x_h + m_{ft} x_f + m_{dt} x_d + b_t$,  
$y_{pc}= m_{hc} x_h + m_{fc} x_f + m_{dc} x_d + b_c$,  

$1=1$



The points an the error matrices are:

$\mathbf{X}= \begin{bmatrix}
x_{h1} & x_{f1} & x_{d1} & 1 \\  
x_{h2} & x_{f2} & x_{d2}  & 1 \\
\vdots &\vdots &\vdots & \vdots \\  
x_{hn} & x_{fn} & x_{dn}  & 1 \end{bmatrix}$

$\mathbf{Y_r}= \begin{bmatrix} y_{rt1} & y_{rc1} & 1 \\ y_{rt2} & y_{rc2} & 1 \\ \vdots & \vdots \\ y_{rtn} & y_{rcn} & 1 \end{bmatrix}$


$\mathbf{Y_p} = \mathbf{X} \mathbf{W}$


$\mathbf{E}= \mathbf{Y_r}-\mathbf{Y_p}=
\begin{bmatrix} y_{rt1} - y_{pt1} & y_{rc1} - y_{pc1} & 1-1 \\ y_{rt2}  - y_{pt2} & y_{rc2}  - y_{pc2} & 1-1  \\ \vdots & \vdots \\ y_{rtn}  - y_{ptn} & y_{rcn}  - y_{pcn} & 1-1  \end{bmatrix}=\begin{bmatrix} e_{t1} & e_{c1} & 0 \\ e_{t2} & e_{c2} & 0 \\ \vdots & \vdots \\ e_{tn} & e_{cn} & 0 \end{bmatrix}$

Now we are goint to see how to update $\mathbf{W}$

$\mathbf{W} += \eta \mathbf{X}^T \mathbf{E}$



$ \begin{bmatrix} m_{ht} & m_{hc} & 0 \\ m_{ft} & m_{fc} & 0 \\ m_{dt} & m_{dc} & 0 \\ b_t & b_c & 1 \end{bmatrix}+= \eta
\begin{bmatrix}
x_{h1} &  x_{h2}  & \cdots & x_{hn} \\
x_{f1} &  x_{f2}  & \cdots & x_{fn} \\
x_{d1} &  x_{d2}  & \cdots & x_{dn} \\  
1 & 1 &\cdots & 1 \end{bmatrix}
\begin{bmatrix} e_{t1} & e_{c1} & 0 \\ e_{t2} & e_{c2} & 0 \\ \vdots & \vdots \\ e_{tn} & e_{cn} & 0 \end{bmatrix}$

$\begin{bmatrix} m_{ht} & m_{hc} & 0 \\ m_{ft} & m_{fc} & 0 \\ m_{dt} & m_{dc} & 0 \\ b_t & b_c & 1 \end{bmatrix} +=\eta\begin{bmatrix} e_{t1} x_{h1} + e_{t2} x_{h2} + \cdots + e_{tn} x_{hn} & e_{c1} x_{h1} + e_{c2} x_{h2} + \cdots + e_{cn} x_{hn} & 0 \\ e_{t1} x_{f1} + e_{t2} x_{f2} + \cdots + e_{tn} x_{fn} & e_{c1} x_{f1} + e_{c2} x_{f2} + \cdots + e_{cn} x_{fn} & 0 \\ e_{t1} x_{d1} + e_{t2} x_{d2} + \cdots + e_{tn} x_{dn} & e_{c1} x_{d1} + e_{c2} x_{d2} + \cdots + e_{cn} x_{dn} & 0 \\ e_{t1} + e_{t2} + \cdots
 + e_{tn} & e_{c1} + e_{c2} + \cdots
 + e_{cn} & 0 \end{bmatrix} $


In [3]:
import numpy as np

# Original inputs with an added 1 at the tail
X = np.array([
    [1, 1, 1, 1],
    [2, 0, 2, 1],
    [3, 2, 1, 1],
    [2, 1, 1, 1],
    [4, 3, 2, 1],
    [5, 2, 1, 1],
    [3, 2, 3, 1],
    [6, 4, 2, 1],
    [5, 3, 1, 1],
    [7, 5, 3, 1]
])

# Output converted to a column vector with 1 at the end of each row
Y_r = np.array([
    [9.3, 9.2, 1],
    [8.7, 11.8, 1],
    [15.5, 19.4, 1],
    [10.8, 12.4, 1],
    [22.2, 25.7, 1],
    [19.6, 26.9, 1],
    [18.9, 20.1, 1],
    [29.5, 34.1, 1],
    [22.8, 31.3, 1],
    [35.4, 43.5, 1]
])




In [4]:
# Initialize parameters randomly
m_ht, m_hc = 1.5, 2.1
m_ft, m_fc = 1.8, 1.1
m_dt, m_dc = 1.2, 1.7
b_t, b_c   = -1, 0
learning_rate = 0.01
iterations = 3000
W = np.array([
    [m_ht, m_hc, 0],
    [m_ft, m_fc, 0],
    [m_dt, m_dc, 0],
    [b_t, b_c, 1]
])
Y_p =X @ W
E = Y_r - Y_p
print(f'Error: {np.max(np.abs(E[:,:-1]))}')

Error: 18.199999999999996


In [5]:
for i in range(iterations):
    # Update values
    E = Y_r - Y_p
    W = W + learning_rate * X.T @ E
    if i%100 == 0:
        print(f'Error: {np.max(np.abs(E[:,:-1]))}, m: {W[0,0]}, b: {W[1,0]}')
    Y_p = X @ W

Error: 18.199999999999996, m: 5.143000000000001, b: 4.104
Error: 8.794428501580745e+27, m: 1.1142443604312148e+27, b: 7.041497612300763e+26
Error: 4.123972660170071e+54, m: 5.225027730160142e+53, b: 3.3019705185574664e+53
Error: 1.933855110513808e+81, m: 2.45017302760922e+80, b: 1.5483935244650595e+80
Error: 9.068429634802997e+107, m: 1.1489600008380687e+107, b: 7.260884048270472e+106
Error: 4.2524600521661575e+134, m: 5.387819834152377e+133, b: 3.4048474324795727e+133
Error: 1.9941067222783636e+161, m: 2.52651115305249e+160, b: 1.5966356109521098e+160
Error: 9.350967607115308e+187, m: 1.1847572493119277e+187, b: 7.487105735906447e+186
Error: 4.384950625381507e+214, m: 5.555683924455658e+213, b: 3.5109296019782064e+213
Error: 2.056235546405158e+241, m: 2.6052276857880357e+240, b: 1.6463807384115326e+240
Error: 9.642308394139024e+267, m: 1.2216698046697213e+267, b: 7.720375635800969e+266
Error: 4.5215691038036527e+294, m: 5.728778024982143e+293, b: 3.620316890695457e+293
Error: nan, m: 

  Y_p = X @ W
  Y_p = X @ W


We need to normalize the data, each column is typically normalized individually to ensure all features contribute equally to the model.



## **Common Normalization Techniques**
There are two widely used methods:

### **1. Min-Max Scaling (Normalization)**
Scales values between **0 and 1** using the formula:

$x' = \frac{x - \min(x)}{\max(x) - \min(x)}$



### **2. Z-Score Standardization (Standardization)**
Centers data around **0** with a standard deviation of **1** using:

$x' = \frac{x - \mu}{\sigma}$

Where:
- $ \mu $ is the **mean** of the column.
- $ \sigma $ is the **standard deviation** of the column.







In [6]:
import numpy as np

import numpy as np

def normalize_columns(data):
    """
    Normalizes all columns in a NumPy array except the last column using Min-Max Scaling.
    Stores min and max values for later restoration.

    Parameters:
        data (np.array): The input NumPy array.

    Returns:
        normalized_data (np.array): The normalized array.
        min_max_values (list): List of (min, max) values for each column.
    """
    normalized_data = data.astype(float).copy()
    min_max_values = []  # Store min and max values for restoration

    for col in range(data.shape[1] - 1):  # Ignore last column
        min_val, max_val = data[:, col].min(), data[:, col].max()
        normalized_data[:, col] = (data[:, col] - min_val) / (max_val - min_val)
        min_max_values.append((min_val, max_val))

    #min_max_values.append((None, None))  # Last column remains unchanged
    return normalized_data, min_max_values


def restore_columns(normalized_data, min_max_values):
    """
    Restores the original values from the normalized data using stored min and max values.

    Parameters:
        normalized_data (np.array): The normalized array.
        min_max_values (list): List of (min, max) values for each column.

    Returns:
        np.array: Restored original data.
    """
    restored_data = normalized_data.astype(float).copy()

    for col in range(normalized_data.shape[1] - 1):  # Ignore last column
        min_val, max_val = min_max_values[col]
        restored_data[:, col] = normalized_data[:, col] * (max_val - min_val) + min_val

    return restored_data

import numpy as np

def denormalize_W(W_norm, X_limits, Y_limits):
    """
    Denormalizes the weight matrix W using Min-Max Scaling limits.

    Parameters:
        W_norm (np.array): (n+1, 1) normalized weight matrix (including bias).
        X_limits (list): List of (min, max) values for each input feature.
        Y_limits (tuple): (min, max) values for the output variable.

    Returns:
        W_denorm (np.array): (n+1, 1) denormalized weight matrix.
    """
    W_denorm = W_norm.copy()
    num_features = len(X_limits)
    num_outputs = len(Y_limits)

    # Denormalize each coefficient
    for j in range(num_outputs):
        y_min, y_max = Y_limits[j]
        for i in range(num_features):
            x_min, x_max = X_limits[i]
            W_denorm[i, j] = W_norm[i, j] * (y_max - y_min) / (x_max - x_min)

    # Denormalize bias term
    for j in range(num_outputs):
        y_min, y_max = Y_limits[j]
        b_denorm = W_norm[-1, j] * (y_max - y_min) + y_min
        for i in range(num_features):
            x_min, _ = X_limits[i]
            b_denorm -= W_denorm[i, j] * x_min

        W_denorm[-1, j] = b_denorm  # Update bias in the matrix

    return W_denorm


# Example usage


# Normalize the data
X_n, X_limits = normalize_columns(X)
Y_rn, Y_limits = normalize_columns(Y_r)

# Restore the original data
X_ = restore_columns(X_n, X_limits)

print("Original Data:\n", X)
print("\nNormalized Data:\n", X_n)
print("\nRestored Data:\n", X_)
print("\nNormalization limits:\n", X_limits)


Original Data:
 [[1 1 1 1]
 [2 0 2 1]
 [3 2 1 1]
 [2 1 1 1]
 [4 3 2 1]
 [5 2 1 1]
 [3 2 3 1]
 [6 4 2 1]
 [5 3 1 1]
 [7 5 3 1]]

Normalized Data:
 [[0.         0.2        0.         1.        ]
 [0.16666667 0.         0.5        1.        ]
 [0.33333333 0.4        0.         1.        ]
 [0.16666667 0.2        0.         1.        ]
 [0.5        0.6        0.5        1.        ]
 [0.66666667 0.4        0.         1.        ]
 [0.33333333 0.4        1.         1.        ]
 [0.83333333 0.8        0.5        1.        ]
 [0.66666667 0.6        0.         1.        ]
 [1.         1.         1.         1.        ]]

Restored Data:
 [[1. 1. 1. 1.]
 [2. 0. 2. 1.]
 [3. 2. 1. 1.]
 [2. 1. 1. 1.]
 [4. 3. 2. 1.]
 [5. 2. 1. 1.]
 [3. 2. 3. 1.]
 [6. 4. 2. 1.]
 [5. 3. 1. 1.]
 [7. 5. 3. 1.]]

Normalization limits:
 [(1, 7), (0, 5), (1, 3)]


In [7]:
W_n = np.array([
    [m_ht, m_hc, 0],
    [m_ft, m_fc, 0],
    [m_dt, m_dc, 0],
    [b_t, b_c, 1]
])
Y_pn =X_n @ W_n
E_n = Y_rn - Y_pn
print(f'Error: {max(abs(E_n[:,0]))}')

Error: 2.5


In [8]:

for i in range(iterations):
    # Update values
    E_n = Y_rn - Y_pn
    W_n = W_n + learning_rate * X_n.T @ E_n
    if i%100 == 0:
        print(f'Error: {max(abs(E_n[:,0]))}, m: {W_n[0,0]}, b: {W_n[1,0]}')
    Y_pn = X_n @ W_n

Error: 2.5, m: 1.4497156054931335, b: 1.751056479400749
Error: 0.34757996175044914, m: 0.6190463352859336, b: 0.9611501238513626
Error: 0.09624859719406897, m: 0.4308247524410763, b: 0.7819854323815847
Error: 0.038585397834888915, m: 0.3717909530398695, b: 0.7207990478530942
Error: 0.020147077880495967, m: 0.3536444232273951, b: 0.6962778576904617
Error: 0.020268155992763326, m: 0.3493757904249272, b: 0.6842880324807201
Error: 0.019822658661875714, m: 0.35001624021905164, b: 0.6770244877457782
Error: 0.019385224748425467, m: 0.35237328087325137, b: 0.6717494108526556
Error: 0.019017287411618566, m: 0.35525454437102766, b: 0.6674330909442346
Error: 0.018702854848725747, m: 0.3582046899190812, b: 0.6636637520325858
Error: 0.01842506154865503, m: 0.3610518402349865, b: 0.660265524295772
Error: 0.018173916758695563, m: 0.3637358783357618, b: 0.6571562385417213
Error: 0.017943990598018877, m: 0.3662411250845541, b: 0.6542921955007551
Error: 0.01773218046361974, m: 0.3685693805596489, b: 0.6

In [9]:
W_n

array([[ 0.38989282,  0.73491091,  0.        ],
       [ 0.62751739,  0.27910742,  0.        ],
       [ 0.10215826,  0.025428  ,  0.        ],
       [-0.11048498, -0.06508127,  1.        ]])

In [10]:
W_=denormalize_W(W_n, X_limits, Y_limits)
W_

array([[1.73502303, 4.20124071, 0.        ],
       [3.35094286, 1.91467689, 0.        ],
       [1.36381273, 0.43609024, 0.        ],
       [2.65121523, 2.33038135, 1.        ]])

In [11]:
Y_ =X @ W_
E_ = Y_r - Y_
print(f'Error: {max(abs(E_[:,0]))}')

Error: 0.4219827849420277


In [12]:
W__ = np.array([
    [2, 4, 0],
    [3, 2, 0],
    [1, 1, 0],
    [3, 2, 1]
])
Y__ =X @ W__
E__ = Y_r - Y__
print(f'Error: {max(abs(E__[:,0]))}')

Error: 0.8999999999999986


The error of the approximation (trained model) is smaller than the error of the function that create the data (without noise)

In [13]:
# Aproximation
Y_

array([[ 9.10099386,  8.88238919,  1.        ],
       [ 8.84888676, 11.60504326,  1.        ],
       [15.92198278, 19.19954751,  1.        ],
       [10.83601689, 13.08362991,  1.        ],
       [22.37176141, 25.75155535,  1.        ],
       [19.39202885, 27.60202893,  1.        ],
       [18.64960825, 20.07172799,  1.        ],
       [29.19275034, 36.06871366,  1.        ],
       [22.74297171, 29.51670582,  1.        ],
       [35.64252897, 42.6207215 ,  1.        ]])

In [14]:
# Data created with noise
Y_r

array([[ 9.3,  9.2,  1. ],
       [ 8.7, 11.8,  1. ],
       [15.5, 19.4,  1. ],
       [10.8, 12.4,  1. ],
       [22.2, 25.7,  1. ],
       [19.6, 26.9,  1. ],
       [18.9, 20.1,  1. ],
       [29.5, 34.1,  1. ],
       [22.8, 31.3,  1. ],
       [35.4, 43.5,  1. ]])

In [15]:
# Data with out noise
Y__

array([[ 9,  9,  1],
       [ 9, 12,  1],
       [16, 19,  1],
       [11, 13,  1],
       [22, 26,  1],
       [20, 27,  1],
       [18, 21,  1],
       [29, 36,  1],
       [23, 29,  1],
       [35, 43,  1]])