## Multiple Variable Linear Regression

### 1.1 Goals
- Extend our regression model  routines to support multiple features
    - Extend data structures to support multiple features
    - Rewrite prediction, cost and gradient routines to support multiple features

### 1.2 Tools

In [59]:
import copy, math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 2. Problem Statement
- admission prediction
- X variables: GRE Score, TOEFL Score, University Rating, SOP, LOR, CGPA, Research
- Y variable: Chance of Admit

In [6]:
admission_df = pd.read_csv("~/Downloads/prediction of Graduate Admissions/Admission_Predict_Ver1.1.csv")
admission_df.head()

Unnamed: 0,Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,1,337,118,4,4.5,4.5,9.65,1,0.92
1,2,324,107,4,4.0,4.5,8.87,1,0.76
2,3,316,104,3,3.0,3.5,8.0,1,0.72
3,4,322,110,3,3.5,2.5,8.67,1,0.8
4,5,314,103,2,2.0,3.0,8.21,0,0.65


In [20]:
#x_train is the input variable
#y_train is the target
admission_df.columns
x_train = admission_df.drop(columns=['Chance of Admit '])
y_train = admission_df['Chance of Admit ']
print(f"x_train = {x_train}")
print(f"y_train = {y_train}")

x_train =      Serial No.  GRE Score  TOEFL Score  University Rating  SOP  LOR   CGPA  \
0             1        337          118                  4  4.5   4.5  9.65   
1             2        324          107                  4  4.0   4.5  8.87   
2             3        316          104                  3  3.0   3.5  8.00   
3             4        322          110                  3  3.5   2.5  8.67   
4             5        314          103                  2  2.0   3.0  8.21   
..          ...        ...          ...                ...  ...   ...   ...   
495         496        332          108                  5  4.5   4.0  9.02   
496         497        337          117                  5  5.0   5.0  9.87   
497         498        330          120                  5  4.5   5.0  9.56   
498         499        312          103                  4  4.0   5.0  8.43   
499         500        327          113                  4  4.5   4.5  9.04   

     Research  
0           1  
1        

### Matrix X 

In [21]:
print(f"X shape: {x_train.shape}, X Type:{type(x_train)}")
print(x_train)
print(f"y Shape: {y_train.shape}, y Type:{type(y_train)})")
print(y_train)

X shape: (500, 8), X Type:<class 'pandas.core.frame.DataFrame'>
     Serial No.  GRE Score  TOEFL Score  University Rating  SOP  LOR   CGPA  \
0             1        337          118                  4  4.5   4.5  9.65   
1             2        324          107                  4  4.0   4.5  8.87   
2             3        316          104                  3  3.0   3.5  8.00   
3             4        322          110                  3  3.5   2.5  8.67   
4             5        314          103                  2  2.0   3.0  8.21   
..          ...        ...          ...                ...  ...   ...   ...   
495         496        332          108                  5  4.5   4.0  9.02   
496         497        337          117                  5  5.0   5.0  9.87   
497         498        330          120                  5  4.5   5.0  9.56   
498         499        312          103                  4  4.0   5.0  8.43   
499         500        327          113                  4  4.5   4

### 2.2 Parameter vector w,b

- $w$ is a vector with $n$ elements
  - Each element contains the parameter associated with one feature.

$$\mathbf{w} = \begin{pmatrix}
w_0 \\ 
w_1 \\
\cdots\\
w_{n-1}
\end{pmatrix}
$$

* $b$ is a scalar parameter.  

In [68]:
b_init = 0
w_init = np.random.rand(x_train.shape[1]) * 0.01

### 3.Model Prediction With Multiple Variables

The model's prediction with multiple variables is given by the linear model:

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 
where $\cdot$ is a vector `dot product`

### 3.1 single Prediction, vector

In [46]:
def predict(x, w, b):
    """
    Single prediction using linear regression

    Args:
        x (pd.Series): Shape (n,) example with multiple features
        w (pd.Series or np.ndarray): Shape (n,) model parameters (weights)
        b (scalar): model parameter (bias)

    Returns:
        p (scalar): prediction
    """
    
    # Dot product of x and w (element-wise multiplication and summing)
    p = np.dot(x, w) + b

    return p

In [51]:
# Make a prediction
f_wb = predict(x_train, w_init, b_init)

print("Predictions for all samples:", f_wb)

Predictions for all samples: [331.85906231 318.54890823 310.44287473 317.74486993 308.54573589
 328.70070102 318.97736847 306.24410526 300.54654499 322.09731638
 324.50488938 329.27953371 331.10567009 312.9879796  314.80487352
 318.08352232 322.36167133 324.53586714 325.48090892 311.56914805
 320.38524091 333.63930332 340.68227733 346.89597171 348.24243158
 353.06946705 335.34100507 310.47535568 306.67012378 321.16366348
 315.16671134 340.52311999 354.59017726 356.56520093 349.98257152
 341.80171797 321.99059371 320.11641808 324.9295619  330.70587169
 333.0896711  337.61643615 336.40208447 357.67336453 353.24449975
 350.12810901 357.4228436  367.21437663 350.16742024 355.51634645
 340.83016945 339.86270484 364.77678405 356.20468335 353.81305504
 350.63986534 347.21067986 333.20099023 334.67025142 345.41878668
 344.3215719  344.72739146 343.57823553 354.29970354 363.82417383
 365.55633307 367.59973558 357.89295219 361.49889614 372.86116357
 378.81584291 381.01322177 369.41798448 362.523

### 4.Compute Cost With Multiple Variables

In [54]:
def compute_cost(X, y, w, b): 
    """
    Compute cost using pandas for linear regression.
    
    Args:
      X (pd.DataFrame): Data, m examples with n features
      y (pd.Series)   : Target values
      w (pd.Series)   : Model parameters (weights)
      b (scalar)      : Model parameter (bias)
      
    Returns:
      cost (scalar): The computed cost
    """
    m = X.shape[0]
    
    # Compute predictions: f_wb = X.dot(w) + b
    predictions = X.dot(w) + b  # Shape: (m,)
    
    # Compute the cost: J(w, b) = (1/2m) * sum((f_wb_i - y_i)^2)
    cost = ((predictions - y) ** 2).sum() / (2 * m)
    
    return cost

In [56]:
cost = compute_cost(x_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')

Cost at optimal w : 124488.76611348269


### 5. Gradient Descent With Multiple Variables

In [71]:
def compute_gradient(X, y, w, b):
    """
    Computes the gradient for linear regression using pandas.

    Args:
      X (pd.DataFrame): Data, m examples with n features
      y (pd.Series): target values
      w (np.ndarray): model parameters (weights)
      b (scalar): model parameter (bias)

    Returns:
      dj_dw (np.ndarray): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar): The gradient of the cost w.r.t. the parameter b. 
    """
    m, n = X.shape  # number of examples (m) and number of features (n)
    dj_dw = np.zeros(n)  # Initialize the gradient for weights
    dj_db = 0  # Initialize the gradient for bias

    # Loop through all examples
    for i in range(m):
        f_wb = np.dot(X.iloc[i], w) + b  # Model prediction
        error = f_wb - y.iloc[i]         # Error: prediction - actual value
        
        # Update gradients
        dj_db += error
        dj_dw += error * X.iloc[i].values  # element-wise product of error and feature
        
    dj_dw /= m  # Average gradient for weights
    dj_db /= m  # Average gradient for bias

    return dj_db, dj_dw

In [72]:
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs gradient descent to learn w and b.

    Args:
      X (pd.DataFrame or np.ndarray): Data, m examples with n features
      y (pd.Series or np.ndarray): target values
      w_in (np.ndarray): initial model parameters  
      b_in (scalar): initial model parameter
      cost_function: function to compute cost
      gradient_function: function to compute the gradient
      alpha (float): Learning rate
      num_iters (int): number of iterations to run gradient descent

    Returns:
      w (np.ndarray): Updated weights 
      b (scalar): Updated bias
      J_history (list): Cost history over iterations
    """
    
    J_history = []  # List to store cost at each iteration
    w = np.copy(w_in)  # Avoid modifying original w
    b = b_in
    
    # Gradient descent loop
    for i in range(num_iters):

        # Compute gradients
        dj_db, dj_dw = gradient_function(X, y, w, b)

        # Gradient clipping (optional)
        max_gradient = 1e4  # Threshold to clip the gradients (aggressive clipping)
        dj_dw = np.clip(dj_dw, -max_gradient, max_gradient)
        dj_db = np.clip(dj_db, -max_gradient, max_gradient)

        # Update parameters (weights and bias)
        w -= alpha * dj_dw
        b -= alpha * dj_db

        # Compute and store the cost at each iteration
        J_history.append(cost_function(X, y, w, b))

        # Print the cost every 10 iterations
        if i % max(1, num_iters // 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}")

    return w, b, J_history

In [73]:
#learning rate, iteration
alpha =  1e-5
num_iters = 1000

w_final, b_final, cost_history = gradient_descent(x_train, y_train, w_init, b_init, compute_cost, compute_gradient, alpha, num_iters)

print("Final weights:", w_final)
print("Final bias:", b_final)

Iteration    0: Cost     2.38
Iteration  100: Cost     0.01
Iteration  200: Cost     0.01
Iteration  300: Cost     0.01
Iteration  400: Cost     0.01
Iteration  500: Cost     0.01
Iteration  600: Cost     0.01
Iteration  700: Cost     0.01
Iteration  800: Cost     0.01
Iteration  900: Cost     0.01
Final weights: [ 2.65240622e-05 -3.98296060e-04  7.01771478e-03  1.24036394e-03
  9.75474123e-03  5.36392819e-03  3.87731375e-03  5.09973415e-03]
Final bias: -3.615700555180322e-05
