In [16]:
"""
# Applying Linear Algebra, Calculus, and Statistics in Machine Learning

## Linear Algebra
- **Mathematical Model for Linear Regression**:
  \[ \hat{y} = X \cdot \theta \]
  
  Where:
  - X: Feature matrix (with added column of 1s for the bias term)
  - θ: Parameter vector (weights and bias)
  - ŷ: Predicted values

  The feature matrix X has dimensions m×(n+1) where:
  - m: Number of training examples
  - n: Number of features
  - +1: Accounts for the bias term (intercept)

## Calculus
- **Optimization Objective**:
  Minimize the cost function (Mean Squared Error):
  
  \[ J(\theta) = \frac{1}{2m} \sum_{i=1}^m (\hat{y}_i - y_i)^2 \]
  
  Where:
  - m: Number of training examples
  - ŷ_i: Predicted value for the i-th example
  - y_i: Actual value for the i-th example

- **Gradient Calculation**:
  The gradient of the cost function with respect to θ is:
  
  \[ \nabla J(\theta) = \frac{1}{m} X^T (X \cdot \theta - y) \]
  
  This gradient is used in optimization algorithms like gradient descent to update the parameters:
  
  \[ \theta := \theta - \alpha \nabla J(\theta) \]
  
  Where α is the learning rate that controls the step size of each iteration.

## Statistics
- **Model Evaluation Metrics**:
  - Mean Squared Error (MSE): 
    \[ \frac{1}{m} \sum_{i=1}^m (\hat{y}_i - y_i)^2 \]
    Measures the average squared difference between predicted and actual values
    
  - R-squared (R²):
    \[ 1 - \frac{\sum_{i=1}^m (y_i - \hat{y}_i)^2}{\sum_{i=1}^m (y_i - \bar{y})^2} \]
    Represents the proportion of variance in the dependent variable that is predictable from the independent variables
    
  - Additional metrics often used:
    * Mean Absolute Error (MAE)
    * Root Mean Squared Error (RMSE)
    * Adjusted R-squared

## Implementation Connection
These mathematical concepts form the foundation for implementing linear regression and many other machine learning algorithms in practice.
"""

  """


'\n# Applying Linear Algebra, Calculus, and Statistics in Machine Learning\n\n## Linear Algebra\n- **Mathematical Model for Linear Regression**:\n  \\[ \\hat{y} = X \\cdot \theta \\]\n  \n  Where:\n  - X: Feature matrix (with added column of 1s for the bias term)\n  - θ: Parameter vector (weights and bias)\n  - ŷ: Predicted values\n\n  The feature matrix X has dimensions m×(n+1) where:\n  - m: Number of training examples\n  - n: Number of features\n  - +1: Accounts for the bias term (intercept)\n\n## Calculus\n- **Optimization Objective**:\n  Minimize the cost function (Mean Squared Error):\n  \n  \\[ J(\theta) = \x0crac{1}{2m} \\sum_{i=1}^m (\\hat{y}_i - y_i)^2 \\]\n  \n  Where:\n  - m: Number of training examples\n  - ŷ_i: Predicted value for the i-th example\n  - y_i: Actual value for the i-th example\n\n- **Gradient Calculation**:\n  The gradient of the cost function with respect to θ is:\n  \n  \\[ \nabla J(\theta) = \x0crac{1}{m} X^T (X \\cdot \theta - y) \\]\n  \n  This gradient

In [11]:
import numpy as np

# Generate Synthetic data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Add bias term to feature matrix
X_b = np.c_[np.ones((100, 1)), X]


# Initialze parameters

theta = np.random.randn(2 ,1)
learning_rate = 0.1
iterations = 1000

In [12]:
# Task 1: Implement the Mathematical Formula for Linear Regression

def predict(X, theta):
    return np.dot(X, theta)

In [13]:
#Task 2: Use Gradient Descent to Optimize the Model Parameters
def gradient_decent(X, y, theta,learning_rate, iterations):
    m = len(y)
    for _ in range(iterations):
        gradients = (1/m) * np.dot(X.T, (np.dot(X,theta) - y))
        theta -= learning_rate * gradients
    return theta


In [20]:
# Task 3; Calculation Evaluation Metrix

def mean_squared_error(y_true, y_pred):
    return np.mean((y_pred - y_true) ** 2)

def r_squared(y_true, y_pred):
    ss_res = np.sum((y_true - y_pred) ** 2)
    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
    return 1 - (ss_res/ss_tot)

In [21]:
theta_optimized = gradient_decent(X_b, y, theta, learning_rate, iterations)
y_pred = predict(X_b, theta_optimized)
mse = mean_squared_error(y, y_pred)
r2 = r_squared(y, y_pred)

print("Optimized Parameters (Theta): ", theta_optimized)
print("MSE: ", mse)
print("R2: ",r2)

Optimized Parameters (Theta):  [[4.21509616]
 [2.77011339]]
MSE:  0.8065845639670531
R2:  0.7692735413614223


In [19]:
X_b.T

array([[1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.  