## Multiple Variable Linear Regression

### 1.1 Goals
- Extend our regression model  routines to support multiple features
    - Extend data structures to support multiple features
    - Rewrite prediction, cost and gradient routines to support multiple features

### 1.2 Tools

In [1]:
import copy, math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 2. Problem Statement
- admission prediction
- X variables: GRE Score, TOEFL Score, University Rating, SOP, LOR, CGPA, Research
- Y variable: Chance of Admit

In [2]:
admission_df = pd.read_csv("~/Downloads/prediction of Graduate Admissions/Admission_Predict_Ver1.1.csv")
admission_df.head()

Unnamed: 0,Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,1,337,118,4,4.5,4.5,9.65,1,0.92
1,2,324,107,4,4.0,4.5,8.87,1,0.76
2,3,316,104,3,3.0,3.5,8.0,1,0.72
3,4,322,110,3,3.5,2.5,8.67,1,0.8
4,5,314,103,2,2.0,3.0,8.21,0,0.65


In [3]:
#x_train is the input variable
#y_train is the target
admission_df.columns
x_train = admission_df.drop(columns=['Chance of Admit ','Serial No.'])
y_train = admission_df['Chance of Admit ']
print(f"x_train = {x_train}")
print(f"y_train = {y_train}")

x_train =      GRE Score  TOEFL Score  University Rating  SOP  LOR   CGPA  Research
0          337          118                  4  4.5   4.5  9.65         1
1          324          107                  4  4.0   4.5  8.87         1
2          316          104                  3  3.0   3.5  8.00         1
3          322          110                  3  3.5   2.5  8.67         1
4          314          103                  2  2.0   3.0  8.21         0
..         ...          ...                ...  ...   ...   ...       ...
495        332          108                  5  4.5   4.0  9.02         1
496        337          117                  5  5.0   5.0  9.87         1
497        330          120                  5  4.5   5.0  9.56         1
498        312          103                  4  4.0   5.0  8.43         0
499        327          113                  4  4.5   4.5  9.04         0

[500 rows x 7 columns]
y_train = 0      0.92
1      0.76
2      0.72
3      0.80
4      0.65
       .

### Matrix X 

In [4]:
print(f"X shape: {x_train.shape}, X Type:{type(x_train)}")
print(x_train)
print(f"y Shape: {y_train.shape}, y Type:{type(y_train)})")
print(y_train)

X shape: (500, 7), X Type:<class 'pandas.core.frame.DataFrame'>
     GRE Score  TOEFL Score  University Rating  SOP  LOR   CGPA  Research
0          337          118                  4  4.5   4.5  9.65         1
1          324          107                  4  4.0   4.5  8.87         1
2          316          104                  3  3.0   3.5  8.00         1
3          322          110                  3  3.5   2.5  8.67         1
4          314          103                  2  2.0   3.0  8.21         0
..         ...          ...                ...  ...   ...   ...       ...
495        332          108                  5  4.5   4.0  9.02         1
496        337          117                  5  5.0   5.0  9.87         1
497        330          120                  5  4.5   5.0  9.56         1
498        312          103                  4  4.0   5.0  8.43         0
499        327          113                  4  4.5   4.5  9.04         0

[500 rows x 7 columns]
y Shape: (500,), y Type:

### 2.2 Parameter vector w,b

- $w$ is a vector with $n$ elements
  - Each element contains the parameter associated with one feature.

$$\mathbf{w} = \begin{pmatrix}
w_0 \\ 
w_1 \\
\cdots\\
w_{n-1}
\end{pmatrix}
$$

* $b$ is a scalar parameter.  

In [5]:
b_init = 0
w_init = np.random.rand(x_train.shape[1]) * 0.01

### 3.Model Prediction With Multiple Variables

The model's prediction with multiple variables is given by the linear model:

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 
where $\cdot$ is a vector `dot product`

### 3.1 single Prediction, vector

In [6]:
def predict(x, w, b):
    """
    Single prediction using linear regression

    Args:
        x (pd.Series): Shape (n,) example with multiple features
        w (pd.Series or np.ndarray): Shape (n,) model parameters (weights)
        b (scalar): model parameter (bias)

    Returns:
        p (scalar): prediction
    """
    
    # Dot product of x and w (element-wise multiplication and summing)
    p = np.dot(x, w) + b

    return p

In [7]:
# Make a prediction
f_wb = predict(x_train, w_init, b_init)

print("Predictions for all samples:", f_wb)

Predictions for all samples: [4.23175909 4.00517072 3.88374543 3.99471127 3.83603948 4.13736431
 3.97893689 3.77841933 3.70599779 3.98428251 3.99114853 4.07019341
 4.08876281 3.85833682 3.83618909 3.87239316 3.9263443  3.93664325
 3.9639787  3.74878606 3.86839251 4.0538104  4.14511939 4.22469616
 4.22734736 4.28263438 4.00802487 3.64019254 3.55954791 3.75073147
 3.66673001 3.98515812 4.22559313 4.2177922  4.12426409 4.01627039
 3.75638111 3.70941465 3.76160741 3.84308852 3.87185997 3.87800963
 3.8680042  4.17133695 4.08819997 4.02850111 4.12487056 4.25868045
 3.99967933 4.05685876 3.80074953 3.78995011 4.17074657 4.04092903
 3.99148056 3.90382184 3.84843396 3.6688023  3.67352108 3.81350306
 3.77259362 3.77942148 3.77875289 3.9060339  4.02863674 4.0485178
 4.07273276 3.91293327 3.958437   4.11962609 4.19999857 4.17859211
 4.03503215 3.92802345 3.891423   4.08334591 4.05619486 3.68648738
 3.60585386 3.54592643 3.84510375 4.28500489 4.01356686 4.0693902
 4.23477952 3.91877593 3.90628258 3

### 4.Compute Cost With Multiple Variables

In [8]:
def compute_cost(X, y, w, b): 
    """
    Compute cost using pandas for linear regression.
    
    Args:
      X (pd.DataFrame): Data, m examples with n features
      y (pd.Series)   : Target values
      w (pd.Series)   : Model parameters (weights)
      b (scalar)      : Model parameter (bias)
      
    Returns:
      cost (scalar): The computed cost
    """
    m = X.shape[0]
    
    # Compute predictions: f_wb = X.dot(w) + b
    predictions = X.dot(w) + b  # Shape: (m,)
    
    # Compute the cost: J(w, b) = (1/2m) * sum((f_wb_i - y_i)^2)
    cost = ((predictions - y) ** 2).sum() / (2 * m)
    
    return cost

In [9]:
cost = compute_cost(x_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')

Cost at optimal w : 5.124632281692698


### 5. Gradient Descent With Multiple Variables

In [10]:
def compute_gradient(X, y, w, b):
    """
    Computes the gradient for linear regression using pandas.

    Args:
      X (pd.DataFrame): Data, m examples with n features
      y (pd.Series): target values
      w (np.ndarray): model parameters (weights)
      b (scalar): model parameter (bias)

    Returns:
      dj_dw (np.ndarray): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar): The gradient of the cost w.r.t. the parameter b. 
    """
    m, n = X.shape  # number of examples (m) and number of features (n)
    dj_dw = np.zeros(n)  # Initialize the gradient for weights
    dj_db = 0  # Initialize the gradient for bias

    # Loop through all examples
    for i in range(m):
        f_wb = np.dot(X.iloc[i], w) + b  # Model prediction
        error = f_wb - y.iloc[i]         # Error: prediction - actual value
        
        # Update gradients
        dj_db += error
        dj_dw += error * X.iloc[i].values  # element-wise product of error and feature
        
    dj_dw /= m  # Average gradient for weights
    dj_db /= m  # Average gradient for bias

    return dj_db, dj_dw

In [11]:
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs gradient descent to learn w and b.

    Args:
      X (pd.DataFrame or np.ndarray): Data, m examples with n features
      y (pd.Series or np.ndarray): target values
      w_in (np.ndarray): initial model parameters  
      b_in (scalar): initial model parameter
      cost_function: function to compute cost
      gradient_function: function to compute the gradient
      alpha (float): Learning rate
      num_iters (int): number of iterations to run gradient descent

    Returns:
      w (np.ndarray): Updated weights 
      b (scalar): Updated bias
      J_history (list): Cost history over iterations
    """
    
    J_history = []  # List to store cost at each iteration
    w = np.copy(w_in)  # Avoid modifying original w
    b = b_in
    
    # Gradient descent loop
    for i in range(num_iters):

        # Compute gradients
        dj_db, dj_dw = gradient_function(X, y, w, b)

        # Gradient clipping (optional)
        max_gradient = 1e4  # Threshold to clip the gradients (aggressive clipping)
        dj_dw = np.clip(dj_dw, -max_gradient, max_gradient)
        dj_db = np.clip(dj_db, -max_gradient, max_gradient)

        # Update parameters (weights and bias)
        w -= alpha * dj_dw
        b -= alpha * dj_db

        # Compute and store the cost at each iteration
        J_history.append(cost_function(X, y, w, b))

        # Print the cost every 10 iterations
        if i % max(1, num_iters // 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}")

    return w, b, J_history

In [12]:
#learning rate, iteration
alpha =  1e-5
num_iters = 1000

w_final, b_final, cost_history = gradient_descent(x_train, y_train, w_init, b_init, compute_cost, compute_gradient, alpha, num_iters)

print("Final weights:", w_final)
print("Final bias:", b_final)

Iteration    0: Cost     0.08
Iteration  100: Cost     0.01
Iteration  200: Cost     0.01
Iteration  300: Cost     0.00
Iteration  400: Cost     0.00
Iteration  500: Cost     0.00
Iteration  600: Cost     0.00
Iteration  700: Cost     0.00
Iteration  800: Cost     0.00
Iteration  900: Cost     0.00
Final weights: [-0.00044135  0.00727056  0.00560463  0.01017528  0.00576603  0.00128469
  0.00312425]
Final bias: -5.411274550921894e-05
