Multiple LINEAR REGRESSION: y = intercept + coef1(x1) + coef2(x2) + coef3(x3) + errorterm
- Linear Regression function parameter:
    1. X:np.array -> [[1,2,3],[4,5,6]] "two-Dimension Array"
    2. y:np.array  -> [4,5,6] 'One-Dimension Array"
    * returns -> [coef1, coef2,coef3] and intercept
- Predictions function parameter:
    1. coefficients: np.array "One-Dimension"
    2. intercept: float
    3. X: np.array -> [[1,2,3],[4,5,6]] "Two-Dimension Array"
    * returns -> predictions:np.array "one-Dimension array"
- R2 function:
    1. y_true: np.array -> [1,2,3]
    2. predictions: np.array -> [1,2,3]
    * returns -> R2:float

In [2]:
import numpy as np

def multiple_linear_regression(X:np.array, y:np.array)-> np.array:
   
    # Add a column of ones to X for the intercept term
    X_extended = np.c_[np.ones(X.shape[0]), X]
    #product of transpose matrix with feature extended
    prod_transpose = np.dot(X_extended.T, X_extended)
    #calc the inverse of product transpose matrix
    prod_transpose_inverse = np.linalg.inv(prod_transpose)
    #product of feature extended transpose with predictor
    prod_transpose_y = np.dot(X_extended.T,y)
    #calc coefficients using normal equations 
    #coefficients = (X^T * X)^(-1) * X^T * y
    coefficients = np.dot(prod_transpose_inverse,prod_transpose_y)

    # Extract the intercept and coefficients
    intercept = coefficients[0]
    coefficients = coefficients[1:]

    return coefficients, intercept

# Example usage:
# Replace X_data and y_data with your own data
#X_data = np.array([[1, 2], [2, 3], [3, 4]])
#y_data = np.array([5, 8, 10])

#coefficients, intercept = multiple_linear_regression(X_data, y_data)

#print("Coefficients:", coefficients)
#print("Intercept:", intercept)


Formula explain 

In summary, the code is solving the normal equation to find the coefficients for a multiple linear regression model. The normal equation provides a closed-form solution to the linear regression problem by directly calculating the coefficients that minimize the sum of squared differences between the predicted and actual values.

In [3]:
#Prediction function

def predict_multiple_linear_regression(X:np.array, coefficients:np.array, intercept:float)-> np.array:
    # Add a column of ones to X for the intercept term
    X_extended = np.c_[np.ones(X.shape[0]), X]

    # Calculate predictions
    y_pred = X_extended @ np.hstack((intercept, coefficients))

    return y_pred

# Example usage:
# Replace X_data, coefficients, and intercept with your own data
#X_data = np.array([[1, 2], [2, 3], [3, 4]])
#coefficients = np.array([2, 3])
#intercept = 1

#predictions = predict_multiple_linear_regression(X_data, coefficients, intercept)

#print("Input features (X):")
#print(X_data)
#print("\nPredicted values:")
#print(predictions)


In [4]:
#Compuet r2 function
#compute R2 - coef of determination = 1- (SSR/SST)
def compute_r2(y_true, y_pred):
    residual = y_true - y_pred
    mean_y_true = np.mean(y_true)
    #** is power of
    #total sum of square
    total_variance = np.sum((y_true - mean_y_true) ** 2)
    #sum of squared errors = SSR
    explained_variance = np.sum((residual) ** 2)

    r2 = 1- (explained_variance / total_variance)

    return r2

REAL DATA APPLICATIONS

In [5]:

# using loadtxt()
Data = np.loadtxt("Startups.csv",
                 delimiter=",", dtype=str)

#display(Data)
#print(Data.ndim)
#print(Data.shape)
#print(Data.size)

In [6]:
#Process Data
Data_without_first_row = np.delete(Data, 0, axis=0)
shuffled_data = np.random.permutation(Data_without_first_row)
#print("Original 2D array:")
#print(Data)
#print("Shuffled copy:")
#print(shuffled_data)

In [7]:
#training and testing set size
train_size=int(0.75*np.size(shuffled_data,0))
test_size=int(0.25*np.size(shuffled_data,0))
print("Training set size : "+ str(train_size))
print("Testing set size : "+str(test_size))

Training set size : 37
Testing set size : 12


In [8]:
#SPLIT FEATURES AND PREDICTOR
#Getting features from dataset select column number
X=shuffled_data[:,[0,1,2]]
y=shuffled_data[:,4]

In [9]:
#SPLIT TRAIN/TEST (make sure X is 2d array and y is 1d array)
#training set split
X_train=X[0:train_size,:]
X_train = X_train.astype(np.float_)
y_train=y[0:train_size]
y_train = y_train.flatten()
y_train = y_train.astype(np.float_)

#testing set split
X_test=X[train_size:,:]
X_test = X_test.astype(np.float_)
Y_test=y[train_size:]
Y_test = Y_test.flatten()
Y_test = Y_test.astype(np.float_)

In [10]:
#Find beta coef and Intercept
coefficients, intercept = multiple_linear_regression(X_train, y_train)
#predictions function
predictions = predict_multiple_linear_regression(np.array(X_train), np.array(coefficients), intercept)
#Compute R2
r2_train = compute_r2(np.array(y_train),predictions)

print("Coefficients:", coefficients)
print("Intercept:", intercept)
print("R-squared :", r2_train)


Coefficients: [ 0.85699668 -0.07421305  0.01715109]
Intercept: 54020.57032281859
R-squared : 0.9482678300082175


Linear Regression and Simple Linear Regression are both statistical methods used for modeling the relationship between a dependent variable and one or more independent variables. They are commonly employed for predicting the value of the dependent variable based on the values of the independent variables.

### Simple Linear Regression:

1. **Definition:**
   - Simple Linear Regression involves modeling the relationship between a dependent variable (\(y\)) and a single independent variable (\(x\)).

2. **Model:**
   - The model equation is \(y = b_0 + b_1x + \varepsilon\), where:
     - \(y\) is the dependent variable,
     - \(x\) is the independent variable,
     - \(b_0\) is the y-intercept,
     - \(b_1\) is the slope of the line, and
     - \(\varepsilon\) is the error term representing unobserved factors affecting \(y\).

3. **Objective:**
   - The objective is to find the values of \(b_0\) and \(b_1\) that minimize the sum of squared differences between the observed (\(y\)) and predicted (\(b_0 + b_1x\)) values.

4. **Assumptions:**
   - Assumes a linear relationship between \(x\) and \(y\).
   - Assumes that the errors (\(\varepsilon\)) are normally distributed and have constant variance.

5. **Use Case:**
   - Predicting the price of a house (\(y\)) based on its size (\(x\)).

### Linear Regression:

1. **Definition:**
   - Linear Regression is a more general form that can involve modeling the relationship between a dependent variable (\(y\)) and multiple independent variables (\(x_1, x_2, \ldots, x_n\)).

2. **Model:**
   - The model equation is \(y = b_0 + b_1x_1 + b_2x_2 + \ldots + b_nx_n + \varepsilon\), where:
     - \(y\) is the dependent variable,
     - \(x_1, x_2, \ldots, x_n\) are the independent variables,
     - \(b_0\) is the y-intercept,
     - \(b_1, b_2, \ldots, b_n\) are the slopes of the respective variables, and
     - \(\varepsilon\) is the error term.

3. **Objective:**
   - Similar to Simple Linear Regression, the objective is to find the values of \(b_0, b_1, \ldots, b_n\) that minimize the sum of squared differences between the observed (\(y\)) and predicted values.

4. **Assumptions:**
   - Assumes a linear relationship between \(y\) and the independent variables.
   - Assumes that the errors (\(\varepsilon\)) are normally distributed and have constant variance.
   - Assumes independence of errors.

5. **Use Case:**
   - Predicting the sales (\(y\)) of a product based on advertising spending (\(x_1\)), product price (\(x_2\)), and competitor prices (\(x_3\)).

### Key Differences:

1. **Number of Variables:**
   - Simple Linear Regression involves one independent variable.
   - Linear Regression involves multiple independent variables.

2. **Model Complexity:**
   - Simple Linear Regression is a simpler model with fewer parameters.
   - Linear Regression is more complex and can capture relationships involving multiple variables.

3. **Equation:**
   - The equation of Simple Linear Regression has one independent variable (\(x\)).
   - The equation of Linear Regression has multiple independent variables (\(x_1, x_2, \ldots, x_n\)).

4. **Use Cases:**
   - Simple Linear Regression is suitable when there is a clear linear relationship between one independent variable and the dependent variable.
   - Linear Regression is used when there are multiple factors influencing the dependent variable.

Both Simple Linear Regression and Linear Regression are fundamental techniques in statistical modeling, and their applications extend across various fields, including economics, finance, biology, and social sciences. The choice between the two depends on the nature of the data and the relationships being modeled.

ADD LASSO AND RIDGE REGRESSIOn

In [11]:
def ridge_regression(X, y, alpha):
    """
    Perform ridge regression.

    Parameters:
    - X: Input features matrix
    - y: Target variable
    - alpha: Regularization parameter

    Returns:
    - coefficients: Ridge regression coefficients
    """
    n_samples, n_features = X.shape

    # Add a column of ones for the intercept term
    X_extended = np.c_[np.ones(n_samples), X]

    # Ridge regression closed-form solution using np.dot()
    A = np.dot(X_extended.T, X_extended) + alpha * np.identity(n_features + 1)
    b = np.dot(X_extended.T, y)
    coefficients = np.linalg.inv(A).dot(b)

    return coefficients

# Example usage:
# Replace X_data and y_data with your own data
X_data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y_data = np.array([3, 7, 8, 12])

alpha_value = 1.0

# Perform ridge regression
coefficients = ridge_regression(X_data, y_data, alpha_value)

# Display the results
print("Ridge Regression Coefficients:", coefficients)

Ridge Regression Coefficients: [-0.04026846  1.28187919  1.24161074]


In [12]:
def lasso_regression(X, y, alpha, max_iters=100, tol=1e-4):
    """
    Perform lasso regression using coordinate descent.

    Parameters:
    - X: Input features matrix
    - y: Target variable
    - alpha: Regularization parameter
    - max_iters: Maximum number of iterations
    - tol: Tolerance for convergence

    Returns:
    - coefficients: Lasso regression coefficients
    """
    n_samples, n_features = X.shape

    # Initialize coefficients
    coefficients = np.zeros(n_features + 1)
    
    # Add a column of ones for the intercept term
    X_extended = np.c_[np.ones(n_samples), X]

    # Initialize previous coefficients
    prev_coefficients = coefficients.copy()

    for _ in range(max_iters):
        # Coordinate descent for lasso regression
        for j in range(n_features + 1):
            X_j = X_extended[:, j]
            rho = np.dot(X_j, (y - np.dot(X_extended, coefficients) + coefficients[j] * X_j))
            
            if j == 0:
                coefficients[j] = 1/n_samples * rho
            else:
                coefficients[j] = np.sign(rho) * max(0, abs(rho) - alpha)

        # Check for convergence
        if np.linalg.norm(coefficients - prev_coefficients) < tol:
            break

        prev_coefficients = coefficients.copy()

    return coefficients

# Example usage:
# Replace X_data and y_data with your own data
X_data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y_data = np.array([3, 7, 8, 12])

alpha_value = 1.0

# Perform lasso regression
lasso_coefficients = lasso_regression(X_data, y_data, alpha_value)

# Display the results
print("Lasso Regression Coefficients:", lasso_coefficients)


Lasso Regression Coefficients: [ 2.15542711e+241  3.44541499e+241 -1.67992579e+243]


Lasso Regression and Ridge Regression are both techniques used in linear regression to handle the problem of overfitting by introducing regularization. Regularization is a way to prevent the model from becoming too complex and, in turn, reduces the risk of overfitting to the training data. Both methods add a penalty term to the cost function, but they differ in the type of penalty and the impact on the coefficients.

### Ridge Regression:

1. **Objective Function:**
   - **Cost Function:** \( J(\theta) = \text{MSE}(\theta) + \alpha \sum_{i=1}^{n} \theta_i^2 \)
   - The cost function includes the Mean Squared Error (MSE) term and an additional term that penalizes the square of the magnitude of the coefficients.

2. **Penalty Term:**
   - The penalty term is the sum of squared coefficients multiplied by the regularization parameter (\(\alpha\)).
   - It tends to shrink the coefficients towards zero but does not make them exactly zero.

3. **Effect on Coefficients:**
   - Ridge regression tends to produce models with all features included, but with smaller coefficients.
   - It is suitable when there is multicollinearity in the data, i.e., when features are highly correlated.

### Lasso Regression:

1. **Objective Function:**
   - **Cost Function:** \( J(\theta) = \text{MSE}(\theta) + \alpha \sum_{i=1}^{n} |\theta_i| \)
   - The cost function includes the Mean Squared Error (MSE) term and an additional term that penalizes the absolute values of the coefficients.

2. **Penalty Term:**
   - The penalty term is the sum of the absolute values of the coefficients multiplied by the regularization parameter (\(\alpha\)).
   - It tends to shrink some coefficients exactly to zero, effectively performing feature selection.

3. **Effect on Coefficients:**
   - Lasso regression encourages sparsity in the model by driving some coefficients to exactly zero.
   - It is particularly useful when there are many irrelevant or redundant features, as it performs automatic feature selection.

### Common Characteristics:

- **Regularization Parameter (\(\alpha\)):**
  - Both Ridge and Lasso regression have a regularization parameter (\(\alpha\)) that controls the strength of regularization.
  - Larger values of \(\alpha\) result in stronger regularization.

- **Trade-off:**
  - There is a trade-off between fitting the model to the training data (minimizing the MSE term) and keeping the model simple (minimizing the regularization term).

- **Preventing Overfitting:**
  - Both methods help prevent overfitting by penalizing overly complex models.

- **Regularization Path:**
  - Both Ridge and Lasso regression provide a regularization path, showing how the coefficients change for different values of \(\alpha\).

In summary, while Ridge Regression tends to shrink coefficients towards zero, Lasso Regression goes a step further by encouraging sparsity and setting some coefficients exactly to zero. The choice between Ridge and Lasso depends on the specific characteristics of the data and the desired properties of the model. If feature selection is crucial, Lasso may be preferred; otherwise, Ridge might be more appropriate. In practice, a combination of both, known as Elastic Net, is also used to benefit from the advantages of both techniques.