# Simple Linear Regression

## Overview
Simple Linear Regression is a statistical method used to model the relationship between two variables. It predicts the value of a dependent variable based on the value of an independent variable. The relationship is represented by a straight line.

## Mathematical Model
The equation for a simple linear regression model is:

$$ y = \beta_0 + \beta_1 x + \epsilon $$

Where:
- $$ y $$ is the dependent variable (target).
- $$ x $$ is the independent variable (feature).
- $$ \beta_0 $$ is the intercept (value of $$ y $$ when $$ x = 0 $$
- $$ \beta_1 $$ is the slope of the line (change in $$ y $$ for a one-unit change in $$ x $$
- $$ \epsilon $$ is the error term (difference between observed and predicted values).

## Objective
The objective of simple linear regression is to find the best-fitting line through the data points. This is achieved by minimizing the sum of the squared differences between the observed values and the predicted values.

## Evaluation Metrics
- **Mean Absolute Error (MAE)**: The average of the absolute differences between the observed and predicted values.
  
  $$ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$

- **Mean Squared Error (MSE)**: The average of the squared differences between the observed and predicted values.
  
  $$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

- **R-squared (R2)**: The proportion of variance in the dependent variable that is predictable from the independent variable.
  
  $$ R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} $$



In [1]:
from sklearn.datasets import make_regression
import pandas as pd
import numpy as np

import plotly.express as px
import plotly.graph_objects as go

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [2]:
# Generate a dataset with one feature
X, y = make_regression(n_samples=100, n_features=1, n_informative=1, n_targets=1, noise=50)

In [3]:
# Create a DataFrame
df = pd.DataFrame({'feature': X.flatten(), 'target': y})

In [4]:
df.head()

Unnamed: 0,feature,target
0,-0.473603,-68.727816
1,1.03389,41.191329
2,-1.545053,-7.994413
3,-0.15448,-30.046787
4,2.589282,-81.083845


In [5]:
# Plot the data
fig = px.scatter(df, x='feature', y='target', title="Feature vs Target")
fig.show()

In [6]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=3)

In [7]:
# Create and train the linear regression model
lr = LinearRegression()
lr.fit(X_train, y_train)

In [8]:
# Make predictions
y_pred = lr.predict(X_test)

In [9]:
# Print evaluation metrics
print("MAE:", mean_absolute_error(y_test, y_pred))
print("MSE:", mean_squared_error(y_test, y_pred))
print("R2 score:", r2_score(y_test, y_pred))

MAE: 45.49941091140783
MSE: 2869.376573378397
R2 score: -0.02590442945029059


In [10]:
# Create a grid of values for plotting the regression line
x_range = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_range = lr.predict(x_range)

In [11]:
# Plot the data and the regression line
fig = px.scatter(df, x='feature', y='target', title="Feature vs Target with Regression Line")
fig.add_trace(go.Scatter(x=x_range.flatten(), y=y_range, mode='lines', name='Regression Line'))

fig.show()