Assignment Code: D-AG-008
Supervised Learning: Regression

Question 1: What is Simple Linear Regression (SLR)? Explain its purpose.
Answer:
Simple Linear Regression (SLR) is a statistical technique used to understand the relationship between two continuous variables — one independent variable (predictor) and one dependent variable (target).
It fits a straight line that best represents the relationship between them and helps in predicting the value of the dependent variable based on the independent variable.
Purpose:
•	To identify the strength and direction of the relationship between variables.
•	To make predictions using the linear equation.
•	To understand how one variable affects another.

Question 2: What are the key assumptions of Simple Linear Regression?
Answer:
The key assumptions of Simple Linear Regression are:
1.	Linearity: The relationship between the independent and dependent variable is linear.
2.	Independence: The residuals (errors) are independent of each other.
3.	Homoscedasticity: The variance of residuals is constant across all levels of the independent variable.
4.	Normality: The residuals are normally distributed.
5.	No multicollinearity: (Applies to multiple regression) Independent variables should not be highly correlated.

Question 3: Write the mathematical equation for a simple linear regression model and explain each term.
Answer:
Equation:
Y = β₀ + β₁X + ε
Where:
•	Y: Dependent variable (output or target)
•	X: Independent variable (input or predictor)
•	β₀: Intercept — value of Y when X = 0
•	β₁: Slope — change in Y for a one-unit increase in X
•	ε: Error term — represents the difference between actual and predicted values

Question 4: Provide a real-world example where simple linear regression can be applied.
Answer:
A real-world example is predicting house prices based on house size.
Here,
•	X (independent variable): Size of the house (in square feet)
•	Y (dependent variable): House price
The regression line helps estimate the price of a house based on its size.

Question 5: What is the method of least squares in linear regression?
Answer:
The method of least squares is used to find the best-fitting regression line by minimizing the sum of the squares of the residuals (differences between actual and predicted values).
Formula:
Minimize Σ (from i = 1 to n) [ Yi(actual) − Yi(predicted) ]²
This ensures that the fitted line passes as close as possible to all the data points, minimizing overall error.

Question 6: What is Logistic Regression? How does it differ from Linear Regression?
Answer:
Logistic Regression is a supervised learning algorithm used for classification problems where the output variable is categorical (e.g., Yes/No, 0/1). It predicts the probability that a sample belongs to a specific class using the logistic (sigmoid) function.
Differences:
Aspect	Linear Regression	Logistic Regression
Output	Continuous value	Probability (0–1)
Purpose	Predict quantities	Classify categories
Function	Linear function	Sigmoid/logistic function
Error Metric	Mean Squared Error	Log Loss / Cross-Entropy
Range of Output	(−∞, +∞)	0 to 1

Question 7: Name and briefly describe three common evaluation metrics for regression models.
Answer:
Three commonly used evaluation metrics are:
1.	 Mean Absolute Error (MAE)
It measures the average magnitude of errors between actual and predicted values, without considering their direction.
Formula:
MAE = (1 / n) * Σ (from i = 1 to n) | Yi(actual) − Yi(predicted) |

2.	Mean Squared Error (MSE)
It measures the average of squared differences between actual and predicted values. Squaring penalizes larger errors more.
Formula:Σ (from i = 1 to n) [ Yi(actual) − Yi(predicted) ]²

3.	 Root Mean Squared Error (RMSE)
It is the square root of MSE and expresses the error in the same units as the dependent variable.
Formula:
RMSE = √[ (1 / n) * Σ (from i = 1 to n) [ Yi(actual) − Yi(predicted) ]² ]

Lower MAE, MSE, and RMSE values indicate better model performance.

Question 8: What is the purpose of the R-squared metric in regression analysis?
Answer:
The R-squared (R²) metric measures how well the regression model explains the variance in the dependent variable. It shows the proportion of total variation in Y that is explained by X.
Formula:
R² = 1 − [ Σ (from i = 1 to n) ( Yi(actual) − Yi(predicted) )² / Σ (from i = 1 to n) ( Yi(actual) − Ȳ )² ]
Where:
•	Ȳ = Mean of actual values
•	Numerator: Unexplained variance (residual error)
•	Denominator: Total variance
An R² value closer to 1 indicates a better fit, meaning the model explains most of the data variance.

Question 10: How do you interpret the coefficients in a simple linear regression model?
Answer:
In the regression equation Y = β₀ + β₁X:
•	Intercept (β₀): The predicted value of Y when X = 0.
•	Slope (β₁): Represents how much Y changes for a one-unit increase in X.
o	If β₁ > 0, Y increases as X increases.
o	If β₁ < 0, Y decreases as X increases.
Example:
If β₀ = 2 and β₁ = 0.6, then for each additional unit increase in X, Y increases by 0.6 units.


In [2]:
# Question 9: Write Python code to fit a simple linear regression model using scikit-learn
# and print the slope and intercept.
# (Include your Python code and output in the code box below.)

import pandas as pd

import numpy as np

df = pd.read_csv('/content/Salary_dataset.csv')
df

Unnamed: 0.1,Unnamed: 0,YearsExperience,Salary
0,0,1.2,39344.0
1,1,1.4,46206.0
2,2,1.6,37732.0
3,3,2.1,43526.0
4,4,2.3,39892.0
5,5,3.0,56643.0
6,6,3.1,60151.0
7,7,3.3,54446.0
8,8,3.3,64446.0
9,9,3.8,57190.0


In [4]:
X = df.iloc[:, :-1]
y = df.iloc[:,-1]

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)



In [7]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

((24, 2), (24,), (6, 2), (6,))

In [8]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()


In [9]:
model.fit(X_train, y_train)

In [13]:
print(f' intercept is : {model.intercept_}')
print(f'slope is: {model.coef_}')

 intercept is : 23806.947115177056
slope is: [ -585.85283033 11291.35263817]
