CASE STUDY: LINEAR REGRESSION
============================

Problem Statement:
Many real-world problems require predicting a continuous numerical value
based on one independent variable. Linear Regression is used to model
the relationship between a single feature and a target variable.

Objective:
- To understand the relationship between independent and dependent variables
- To build and train a Linear Regression model
- To evaluate model performance using appropriate metrics

Dataset:
- Dataset containing one independent feature and one target variable

Tools Used:
- Python
- Pandas
- NumPy
- Matplotlib
- Scikit-learn

Approach:
1. Load and explore the dataset
2. Perform Exploratory Data Analysis (EDA)
3. Split the dataset into training and testing sets
4. Train the Linear Regression model
5. Make predictions on test data
6. Evaluate the model using metrics such as R² score and Mean Squared Error
7. Visualize the regression line



In [None]:
#Importing Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

###Simple linear regression example (manual)

In [None]:
#a → independent variable (X)
#b → dependent variable (Y)
a=np.array([1,2,3,4,5])
b=np.array([3,4,2,4,5])

In [None]:
a.mean()

np.float64(3.0)

In [None]:
b.mean()

np.float64(3.6)

In [None]:
#Subtract mean from each X value
# This is X − X̄ (x-bar)
x=a-3

In [None]:
#Subtract mean from Y values
#This is Y − Ȳ (y-bar)
y=b-3.6

In [None]:
#(X−Xˉ)(Y−Yˉ) Used to calculate slope.
x*y

array([ 1.2, -0.4, -0. ,  0.4,  2.8])

In [None]:
#Formula for intercept:- c=yˉ​−mxˉ
3.6-(0.4*3)

2.4

In [None]:
#Multiply slope with X values.
c=0.4*a

In [None]:
#Displays predicted Y values (without intercept).
c

array([0.4, 0.8, 1.2, 1.6, 2. ])

In [None]:
#Adds intercept.

Final predicted Y values.
c+2.4

array([2.8, 3.2, 3.6, 4. , 4.4])

###ERRORS

In [None]:
#Sample dataset.
X=np.array([1,2,3,4,5,6,7,8,9,10])
Y=np.array([4,3,2,4,5,4,6,4,3,4])

In [None]:
#Mean of X and Y.
xmean=(X.mean())
ymean=(Y.mean())

In [None]:
#Shows deviation from mean.
print(X-xmean)
print(Y-ymean)
#we subtarct x- avg of x to find x bar value
xbar=(X-xmean)
ybar=(Y-ymean)

[-4.5 -3.5 -2.5 -1.5 -0.5  0.5  1.5  2.5  3.5  4.5]
[ 0.1 -0.9 -1.9  0.1  1.1  0.1  2.1  0.1 -0.9  0.1]


In [None]:
xbar

array([-4.5, -3.5, -2.5, -1.5, -0.5,  0.5,  1.5,  2.5,  3.5,  4.5])

In [None]:
ybar

array([ 0.1, -0.9, -1.9,  0.1,  1.1,  0.1,  2.1,  0.1, -0.9,  0.1])

In [None]:
#here we are squaring xbar values and storing it in x2 variable
x2=xbar**2

In [None]:
#Squares Y deviations.
y2 =ybar**2

In [None]:
x2

array([20.25, 12.25,  6.25,  2.25,  0.25,  0.25,  2.25,  6.25, 12.25,
       20.25])

In [None]:
#Mean of squared deviations.
x2.mean()

np.float64(8.25)

In [None]:
#Product of deviations.
z=xbar*ybar

In [None]:
z

array([-0.45,  3.15,  4.75, -0.15, -0.55,  0.05,  3.15,  0.25, -3.15,
        0.45])

In [None]:
z.mean()

np.float64(0.75)

In [None]:
# Slope calculation: m=∑(x−xˉ)2∑(x−xˉ)(y−yˉ​)​
m=0.75/8.25


In [None]:
m

0.09090909090909091

In [None]:
#Intercept Formula: c=yˉ​−mxˉ
c=3.9-(0.09*5.5)
# c=3.9-(0.90*5.5)

In [None]:
c

3.405

In [None]:
0.09*2+3.405

3.585

In [None]:
#Prediction Equation: y=mx+c
yp=0.09*X+3.9

In [None]:
#Displays predicted Y values.
yp

NameError: name 'yp' is not defined

##new dataset


In [None]:
a=np.array([8.3,2.7,7.7,5.9,4.5,3.3,1.1,8.9])
b=np.array([81,25,85,62,41,42,17,95])

In [None]:
amean=a.mean()

In [None]:
amean

np.float64(5.300000000000001)

In [None]:
bmean=b.mean()

In [None]:
bmean

np.float64(56.0)

In [None]:
abar=a-amean

In [None]:
abar

array([ 3. , -2.6,  2.4,  0.6, -0.8, -2. , -4.2,  3.6])

In [None]:
bbar=b-bmean

In [None]:
bbar

array([ 25., -31.,  29.,   6., -15., -14., -39.,  39.])

In [None]:
x2=abar**2

In [None]:
x2

array([ 9.  ,  6.76,  5.76,  0.36,  0.64,  4.  , 17.64, 12.96])

In [None]:
p=x2.mean()

In [None]:
z=abar*bbar

In [None]:
q=z.mean()

In [None]:
m=q/p

In [None]:
m

np.float64(10.031512605042016)

In [None]:
c=bmean-(m*amean)

In [None]:
c

np.float64(2.8329831932773075)

In [None]:
yp=m*1 +c
print(yp)

12.864495798319323


In [None]:
yp

np.float64(12.864495798319323)

In [None]:
df = pd.read_table(,sep=",")
df

FileNotFoundError: [Errno 2] No such file or directory: '/content/cars.csv'

Conclusion:
* A Linear Regression model was successfully implemented to predict continuous
values.
* The model provided a baseline understanding of the relationship
between variables and can be extended to more complex regression techniques.
