## Simple Linear Regression

### Definition

Simple linear regression aims to find a linear relationship to describe the correlation between an independent and possibly dependent variable. The regression line can be used to predict or estimate missing values, this is known as interpolation.

This Python code defines a simple linear regression class named `MeraLR`. Here's a breakdown of what each part of the code does:

1. **Class Definition (`MeraLR`):**
   - This line defines a class named `MeraLR`, which stands for "My Linear Regression".

2. **`__init__` method:**
   - This method serves as the constructor for the class. It initializes two instance variables `self.m` and `self.b` to `None`.

3. **`fit` method:**
   - This method is used to train the linear regression model. It takes two parameters `X_train` and `y_train`, which are the features and corresponding labels of the training data.
   - Inside the method, it calculates the slope `self.m` and intercept `self.b` of the regression line using the least squares method.
   - It prints the calculated slope and intercept.
   
4. **`predict` method:**
   - This method is used to make predictions using the trained linear regression model. It takes a parameter `X_test`, which represents the features of the test data.
   - Inside the method, it calculates the predicted values `y_pred` using the formula of a straight line (`y = mx + b`) where `m` is the slope and `b` is the intercept.
   - It then returns the predicted values.

5. **Main Functionality:**
   - The class is designed to fit a linear regression model (`fit` method) to a given dataset and then use this model to make predictions (`predict` method) on new data.

This implementation assumes that the input data (`X_train`, `y_train`, `X_test`) is in the form of one-dimensional arrays (or lists), where each element represents a single feature or label. Additionally, it does not include any error handling or validation checks, which might be necessary for a production-grade implementation.

**The formula for calculating the slope (`m`) in simple linear regression using the method of least squares is:**

```
m = (Σ(xi - x̄)(yi - ȳ)) / (Σ(xi - x̄)^2)
```

Where:
- m is the slope.
- Σ represents summation.
- xi and yi are individual data points.
- x̄ and ȳ are the means of the x and y values, respectively.


This formula calculates the slope by taking the sum of the products of the differences between each data point and the mean of the corresponding variable, divided by the sum of the squared differences between each data point and the mean of the \( x \) variable.

In the provided code, `num` represents the numerator of this formula, and `den` represents the denominator. The slope (`m`) is then calculated by dividing `num` by `den`.

**The intercept (
b) in simple linear regressio**n:



```
b = ȳ - (m * x̄)
```

Where:
- b is the intercept.
- ȳ is the mean of the \( y \) values.
- m is the slope.
- x̄ is the mean of the \( x \) values.

def fit(self,X_train,y_train):
        
    num = 0
    den = 0
        
    for i in range(X_train.shape[0]):
            
        num = num + ((X_train[i] - X_train.mean())*(y_train[i] - y_train.mean()))
        den = den + ((X_train[i] - X_train.mean())*(X_train[i] - X_train.mean()))
        
    self.m = num/den
    self.b = y_train.mean() - (self.m * X_train.mean())
    print(self.m)
    print(self.b)

This part of the code implements the `fit` method within the `MeraLR` class. Here's a detailed explanation of what it does:

1. **Initialization of Variables:**
   - `num` and `den` are initialized to 0. These variables will be used to compute the numerator and denominator of the slope (`m`) of the linear regression line.

2. **Loop through Training Data:**
   - It iterates through each data point in the training set (`X_train` and `y_train`). The loop runs `X_train.shape[0]` times, which is the number of samples in the training set.

3. **Numerator (`num`) and Denominator (`den`) Calculation:**
   - Within the loop, for each data point, it updates `num` and `den` by adding the product of the differences between the data points and their respective means.
   - `num` accumulates the sum of `(X_train[i] - X_train.mean()) * (y_train[i] - y_train.mean())`, while `den` accumulates the sum of `(X_train[i] - X_train.mean()) * (X_train[i] - X_train.mean())`.

4. **Calculation of Slope (`self.m`) and Intercept (`self.b`):**
   - After iterating through all the data points, it computes the slope (`self.m`) by dividing `num` by `den`.
   - It also calculates the intercept (`self.b`) using the formula for a straight line: `y = mx + b`, where `y_train.mean()` represents the mean of the labels and `X_train.mean()` represents the mean of the features.

5. **Printing Results:**
   - It prints out the calculated slope (`self.m`) and intercept (`self.b`) to the console.

6. **Side Notes:**
   - This implementation assumes that `X_train` and `y_train` are one-dimensional arrays or lists.
   - The method doesn't include any regularization or bias terms, which might be necessary depending on the application.
   - Printing the slope and intercept might be useful for debugging purposes, but in a production setting, it might be unnecessary and can be removed.

Overall, this `fit` method calculates the parameters (`self.m` and `self.b`) of a linear regression model using the least squares method.

In [1]:
class MeraLR:
    
    def __init__(self):
        self.m = None
        self.b = None
        
    def fit(self,X_train,y_train):
        
        num = 0
        den = 0
        
        for i in range(X_train.shape[0]):
            #m = (Σ(xi - x̄)(yi - ȳ)) / (Σ(xi - x̄)^2)
            num = num + ((X_train[i] - X_train.mean())*(y_train[i] - y_train.mean()))
            den = den + ((X_train[i] - X_train.mean())*(X_train[i] - X_train.mean()))
            
        #m = (Σ(xi - x̄)(yi - ȳ)) / (Σ(xi - x̄)^2)
        self.m = num/den
        
        #b = ȳ - (m * x̄)
        self.b = y_train.mean() - (self.m * X_train.mean())
        
        print('Slope(m) =',self.m)
        print('Intercept(b) =',self.b)       
    
    def predict(self,X_test):
        
        print('Test Value = ',X_test)
        #y = mx + b
        print ('↓ Predicted Value ↓')
        return self.m * X_test + self.b

In [2]:
import numpy as np
import pandas as pd

In [3]:
df = pd.read_csv('https://raw.githubusercontent.com/campusx-official/100-days-of-machine-learning/main/day48-simple-linear-regression/placement.csv')

In [4]:
df.head()

Unnamed: 0,cgpa,package
0,6.89,3.26
1,5.12,1.98
2,7.82,3.25
3,7.42,3.67
4,6.94,3.57


In [5]:
X = df.iloc[:,0].values
y = df.iloc[:,1].values

In [6]:
X

array([6.89, 5.12, 7.82, 7.42, 6.94, 7.89, 6.73, 6.75, 6.09, 8.31, 5.32,
       6.61, 8.94, 6.93, 7.73, 7.25, 6.84, 5.38, 6.94, 7.48, 7.28, 6.85,
       6.14, 6.19, 6.53, 7.28, 8.31, 5.42, 5.94, 7.15, 7.36, 8.1 , 6.96,
       6.35, 7.34, 6.87, 5.99, 5.9 , 8.62, 7.43, 9.38, 6.89, 5.95, 7.66,
       5.09, 7.87, 6.07, 5.84, 8.63, 8.87, 9.58, 9.26, 8.37, 6.47, 6.86,
       8.2 , 5.84, 6.6 , 6.92, 7.56, 5.61, 5.48, 6.34, 9.16, 7.36, 7.6 ,
       5.11, 6.51, 7.56, 7.3 , 5.79, 7.47, 7.78, 8.44, 6.85, 6.97, 6.94,
       8.99, 6.59, 7.18, 7.63, 6.1 , 5.58, 8.44, 4.26, 4.79, 7.61, 8.09,
       4.73, 6.42, 7.11, 6.22, 7.9 , 6.79, 5.83, 6.63, 7.11, 5.98, 7.69,
       6.61, 7.95, 6.71, 5.13, 7.05, 7.62, 6.66, 6.13, 6.33, 7.76, 7.77,
       8.18, 5.42, 8.58, 6.94, 5.84, 8.35, 9.04, 7.12, 7.4 , 7.39, 5.23,
       6.5 , 5.12, 5.1 , 6.06, 7.33, 5.91, 6.78, 7.93, 7.29, 6.68, 6.37,
       5.84, 6.05, 7.2 , 6.1 , 5.64, 7.14, 7.91, 7.19, 7.91, 6.76, 6.93,
       4.85, 6.17, 5.84, 6.07, 5.66, 7.57, 8.28, 6.

In [7]:
y

array([3.26, 1.98, 3.25, 3.67, 3.57, 2.99, 2.6 , 2.48, 2.31, 3.51, 1.86,
       2.6 , 3.65, 2.89, 3.42, 3.23, 2.35, 2.09, 2.98, 2.83, 3.16, 2.93,
       2.3 , 2.48, 2.71, 3.65, 3.42, 2.16, 2.24, 3.49, 3.26, 3.89, 3.08,
       2.73, 3.42, 2.87, 2.84, 2.43, 4.36, 3.33, 4.02, 2.7 , 2.54, 2.76,
       1.86, 3.58, 2.26, 3.26, 4.09, 4.62, 4.43, 3.79, 4.11, 2.61, 3.09,
       3.39, 2.74, 1.94, 3.09, 3.31, 2.19, 1.61, 2.09, 4.25, 2.92, 3.81,
       1.63, 2.89, 2.99, 2.94, 2.35, 3.34, 3.62, 4.03, 3.44, 3.28, 3.15,
       4.6 , 2.21, 3.  , 3.44, 2.2 , 2.17, 3.49, 1.53, 1.48, 2.77, 3.55,
       1.48, 2.72, 2.66, 2.14, 4.  , 3.08, 2.42, 2.79, 2.61, 2.84, 3.83,
       3.24, 4.14, 3.52, 1.37, 3.  , 3.74, 2.82, 2.19, 2.59, 3.54, 4.06,
       3.76, 2.25, 4.1 , 2.37, 1.87, 4.21, 3.33, 2.99, 2.88, 2.65, 1.73,
       3.02, 2.01, 2.3 , 2.31, 3.16, 2.6 , 3.11, 3.34, 3.12, 2.49, 2.01,
       2.48, 2.58, 2.83, 2.6 , 2.1 , 3.13, 3.89, 2.4 , 3.15, 3.18, 3.04,
       1.54, 2.42, 2.18, 2.46, 2.21, 3.4 , 3.67, 2.

In [8]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

In [9]:
X_train.shape

(160,)

In [10]:
lr = MeraLR()

In [11]:
lr.fit(X_train,y_train)

Slope(m) = 0.5579519734250721
Intercept(b) = -0.8961119222429152


In [12]:
X_train.shape[0]

160

In [13]:
X_train[0]

7.14

In [14]:
X_train.mean()

6.989937500000001

In [15]:
X_test[0]

8.58

In [16]:
print(lr.predict(X_test[0]))

Test Value =  8.58
↓ Predicted Value ↓
3.891116009744203


In [17]:
lr.predict(X_test[2])

Test Value =  5.88
↓ Predicted Value ↓


2.3846456814965085