## Cost Function


#### Qa Given the following $\mathbf{x}^{(i)}$'s, construct and print the $\mathbf X$ matrix in python.

In [8]:
# Qa

import numpy as np

y_true = np.array([1,2,3,4]) # NOTE:  you'll need this later

# Define the feature vectors x^(i)
x1 = np.array([1, 2, 3])
x2 = np.array([4, 2, 1])
x3 = np.array([3, 8, 5])
x4 = np.array([-9, -1, 0])

# Construct the X matrix where each row is (x^(i))^T
X = np.array([x1, x2, x3, x4])

print("Matrix X:")
print(X)
print(f"\nShape of X: {X.shape}")
print(f"X has {X.shape[0]} samples and {X.shape[1]} features")

Matrix X:
[[ 1  2  3]
 [ 4  2  1]
 [ 3  8  5]
 [-9 -1  0]]

Shape of X: (4, 3)
X has 4 samples and 3 features



#### Qb Implement the $\norm{1}$ and $\norm{2}$ norms for vectors in python.

In [9]:
import math

def L1(x):
    """L1 norm using explicit implementation"""
    result = 0.0
    for i in range(len(x)):
        if x[i] >= 0:
            result = result + x[i]
        else:
            result = result + (-x[i])
    return result

def L2(x):
    """L2 norm using explicit implementation"""
    result = 0.0
    for i in range(len(x)):
        result = result + x[i] * x[i]
    return result ** 0.5

def L2Dot(x):
    """L2 norm using numpy dot product"""
    return (np.dot(x, x)) ** 0.5

# TEST vectors: here I test your implementation...calling your L1() and L2() functions
tx=np.array([1, 2, 3, -1])
ty=np.array([3,-1, 4,  1])

expected_d1=8.0
expected_d2=4.242640687119285

d1=L1(tx-ty)
d2=L2(tx-ty)

print(f"tx-ty={tx-ty}, d1-expected_d1={d1-expected_d1}, d2-expected_d2={d2-expected_d2}")

eps=1E-9 
# NOTE: remember to import 'math' for fabs for the next two lines..
assert math.fabs(d1-expected_d1)<eps, "L1 dist seems to be wrong" 
assert math.fabs(d2-expected_d2)<eps, "L2 dist seems to be wrong" 

print("OK(part-1)")

# comment-in once your L2Dot fun is ready...
d2dot=L2Dot(tx-ty)
print("d2dot-expected_d2=",d2dot-expected_d2)
assert math.fabs(d2dot-expected_d2)<eps, "L2Dot dist seem to be wrong" 
print("OK(part-2)")

tx-ty=[-2  3 -1 -2], d1-expected_d1=0.0, d2-expected_d2=0.0
OK(part-1)
d2dot-expected_d2= 0.0
OK(part-2)


Why defined L1:
Explanation: Manual absolute value calculation without using built-in abs(). 

Why defined L2Dot:
Explanation: np.dot(x, x) computes x^T * x which equals sum of squares. Much faster than explicit loops for large arrays - this is vectorization.

### Qc Construct the Root Mean Square Error (RMSE) function (Equation 2-1 [HOML]).

In [None]:
def RMSE(y_pred, y_true):
    """Root Mean Square Error using L2 norm"""
    diff = y_pred - y_true
    n = len(y_true)
    mse = (L2(diff) ** 2) / n
    return mse ** 0.5


# Dummy h function:
def h(X):    
    if X.ndim!=2:
        raise ValueError("excpeted X to be of ndim=2, got ndim=",X.ndim) 
    if X.shape[0]==0 or X.shape[1]==0:
        raise ValueError("X got zero data along the 0/1 axis, cannot continue")
    return X[:,0]

# Calls your RMSE() function:
r=RMSE(h(X), y_true)

# TEST vector:
eps=1E-9
expected=6.57647321898295
print(f"RMSE={r}, diff={r-expected}")
assert math.fabs(r-expected)<eps, "your RMSE dist seems to be wrong" 

print("OK")

RMSE=6.576473218982953, diff=2.6645352591003757e-15
OK


```Python
def RMSE(y_pred, y_true):
    """Root Mean Square Error using L2 norm"""
    diff = y_pred - y_true
    n = len(y_true)
    mse = (L2(diff) ** 2) / n
    return mse ** 0.5
```
Explinaiton:
L2(diff)**2 gives sum of squared differences. Divide by n for mean, then sqrt for RMSE. This shows the mathematical relationship between L2 norm and MSE.

```Python
def h(X):
    return X[:,0]
```
Explanation: Takes first column as prediction. This simulates a simple ML model that uses only the first feature to predict the target. Used for testing cost functions.

#### Qd Similar construct the Mean Absolute Error (MAE) function (Equation 2-2 [HOML]) and evaluate it.

In [11]:
def MAE(y_pred, y_true):
    """Mean Absolute Error using L1 norm"""
    diff = y_pred - y_true
    n = len(y_true)
    return L1(diff) / n


# Calls your MAE function:
r=MAE(h(X), y_true)

# TEST vector:
expected=3.75
print(f"MAE={r}, diff={r-expected}")
assert math.fabs(r-expected)<eps, "MAE dist seems to be wrong" 

print("OK")

MAE=3.75, diff=0.0
OK


#### Qe Robust Code

In [12]:
# Testing error handling

# Let's check what happens with bad inputs
print("Testing with wrong inputs...")

try:
    L1(np.array([[1,2],[3,4]]))  # 2D instead of 1D
except:
    print("L1 failed with 2D array - good")

try:
    L2(np.array([]))  # empty
except:
    print("L2 failed with empty array - good")
    
try:
    RMSE(np.array([1,2,3]), np.array([1,2]))  # different sizes
except:
    print("RMSE failed with different sizes - good")

# Test normal usage still works
x = np.array([1,-2,3])
print(f"L1 norm of [1,-2,3]: {L1(x)}")
print(f"L2 norm of [1,-2,3]: {L2(x)}")

y_pred = np.array([2,3,4])
y_true = np.array([1,2,3]) 
print(f"RMSE: {RMSE(y_pred, y_true)}")
print(f"MAE: {MAE(y_pred, y_true)}")

print("Done testing")

Testing with wrong inputs...
L1 failed with 2D array - good
RMSE failed with different sizes - good
L1 norm of [1,-2,3]: 6.0
L2 norm of [1,-2,3]: 3.7416573867739413
RMSE: 1.0
MAE: 1.0
Done testing


### Qf Conclusion

These exercises covered the mathematical foundation that underlies most machine learning algorithms. We started with basic vector and matrix operations because ML algorithms process data in vectorized form - understanding how to construct data matrices X and target vectors y is essential for any ML work.

The norm functions (L1 and L2) are fundamental because they measure distances and similarities between data points. Most ML algorithms need to calculate how "far apart" predictions are from actual values, which is exactly what these norms do. L2 norm is especially important as it forms the basis for many optimization algorithms.

The cost functions MSE and MAE represent how we measure prediction quality in machine learning. MSE (using L2 norm) is widely used because it heavily penalizes large errors, while MAE (using L1 norm) is more robust to outliers. Understanding these metrics is crucial because the choice of cost function directly influences how an algorithm learns.

Building these functions from scratch, rather than using library functions, helped us understand what actually happens "under the hood" when we call the libraries. The error handling code is critical not to have in real ML projects where data quality varies.