<a href="https://www.kaggle.com/code/vizeno/linear-regression-from-scratch?scriptVersionId=208647518" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<div style="background-color: #ffcc99; color: #cc6600; padding: 10px; border-radius: 5px;">
    <h1>Simple Linear Regression in Python From Scratch 📈</h1>
</div>


**Linear Regression** is a statistical method used to model the relationship between a dependent variable 
Y and an independent variable 
X. 

> The objective is to fit a linear equation to the observed data in order to make predictions. The equation of a simple linear regression is represented as:

$$ Y = b_0 + b_1 X $$

**Step 1: Calculate the Mean of X and Y**
The first step in calculating the regression coefficients is to compute the means of both X and Y
$$ \overline{X}= {1 \over n}\sum\limits_{i=1}^{n} X_i$$



$$ \overline{Y}= {1 \over n}\sum\limits_{i=1}^{n} Y_i$$


**Step 2: Calculate the Slope b1**
The slope 
b1
​represents the rate of change of 
Y with respect to 
X. It is calculated using the formula

$$b_1 = \frac{\sum\limits_{i=1}^{n}(X_i - \overline{X})(Y_i - \overline{Y})}{\sum\limits_{i=1}^{n}(X_i - \overline{X})}$$

This computes the covariance between 
X and 
Y, divided by the variance of 
X. This provides the **best fit line's slope**.


**Step 3: Calculate the Intercept b0:**
Intercept b0 is the value of Y when X = 0

$$ b_0 = \overline{Y} - b_1 \overline{X} $$
This Ensures Regression line passes through mean Values of X and Y


**Step 4: Make Predictions**
Once the model has been trained, Coefficients b0 and b1 are learnt, the predictions for value X is done using:

$$ Y_{pred} = b_0 + b_1 X $$


<div style="background-color: #ffcc99; color: #cc6600; padding: 10px; border-radius: 5px;">
    <h1>Model Class</h1>
</div>


In [None]:
import numpy as np

class SimpleLinearRegression:
    def __init__(self):
        self.b0 = None
        self.b1 = None

    def fit(self, X, Y):
        # Compute means of X and Y 
        x_mean = np.mean(X)
        y_mean = np.mean(Y) # STEP 1
        
        # Number of data points
        n = len(X)
        
        # Calculate the slope (b1) and intercept (b0) using the formulas
        numerator = 0
        denominator = 0
        for i in range(n):
            numerator += (X[i] - x_mean) * (Y[i] - y_mean)
            denominator += (X[i] - x_mean) ** 2
        
        self.b1 = numerator / denominator  # slope STEP 2
        self.b0 = y_mean - (self.b1 * x_mean)  # intercept STEP 3
    
    def predict(self, X):
        # Make predictions using the learned coefficients
        return self.b0 + self.b1 * X # STEP 4





<div style="background-color: #ffcc99; color: #cc6600; padding: 10px; border-radius: 5px;">
    <h1>Model Usage</h1>
</div>


In [None]:
if __name__ == "__main__":
    # Example dataset (X: feature matrix, Y: target vector)
    X = np.array([1, 2, 3, 4, 5])  # Feature values
    Y = np.array([2.2, 2.8, 4.5, 3.7, 5.5])  # Target values

    model = SimpleLinearRegression() # Create an instance of SimpleLinearRegression
    
    
    model.fit(X, Y) # Fit the model to the data

    print(f"Learned coefficients: b0 = {model.b0}, b1 = {model.b1}")
    # Print the learned coefficients (b0, b1)
    
    # Make predictions using the trained model
    predictions = model.predict(X)
    print(f"Predictions: {predictions}")


<div style="background-color: #ffcc99; color: #cc6600; padding: 10px; border-radius: 5px;">
    <h1>Model Visualization</h1>
    
</div>




In [None]:
import matplotlib.pyplot as plt
# The Red line represents the Reggression line,
# the Blue dots represnt the Training Data
plt.scatter(X, Y, color='blue', label='Data points')  
plt.plot(X, predictions, color='red', label='Regression line') 
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Linear Regression Model')
plt.legend()
plt.show()