
Question 1 : What is Simple Linear Regression (SLR)? Explain its purpose.

ans : Simple Linear Regression (SLR) is a statistical method used to model the relationship between two continuous variables: one independent (predictor) and one dependent (outcome). Its purpose is to quantify this relationship with a straight line, allowing us to both understand how the variables are related and make predictions about the dependent variable based on the independent variable.

Understanding Relationships
- SLR helps determine whether and how strongly two variables are related.
- Example: Studying how hours studied (X) affect exam scores (Y).

Prediction
- Once the line is fitted, you can predict the dependent variable for new values of the independent variable.
- Example: Predicting a student’s exam score if they study for 10 hours.

Quantifying Impact
- The slope tells us the magnitude of change in Y for each unit increase in X.
- Example: If slope = 5, each extra hour of study increases the score by 5 points.

Decision-Making
- Used in economics, business, engineering, and science to guide decisions based on data-driven predictions.


Question 2: What are the key assumptions of Simple Linear Regression?

ans:  
 Linearity
- The relationship between the independent variable X and dependent variable Y is linear.

Independence of Errors
- Residuals (errors) are independent; no autocorrelation exists.

Homoscedasticity
- The variance of residuals is constant across all values of X.

Normality of Errors
- Residuals should be approximately normally distributed.

No Perfect Multicollinearity
- In SLR, only one predictor is used, so this is trivially satisfied. (Relevant in multiple regression.)

Measurement Accuracy of Predictor
- The independent variable X is measured without significant error.

Fixed Independent Variable
- Values of X are considered fixed in repeated samples (not random).

Additivity of Effects
- The effect of X on Y is additive, captured by slope + intercept.

 Correct Model Specification
- The model includes the right variables and form; no important predictors are omitted.

 Error Term Expectation = 0
- The average of residuals is zero, meaning the regression line is unbiased.


Question 3: Write the mathematical equation for a simple linear regression model and
explain each term.

ans : 
Explanation of Each Term

Y (Dependent Variable / Response Variable)
- The outcome we are trying to predict or explain.
- Example: Exam score, house price, sales revenue.

 X (Independent Variable / Predictor Variable)
- The input variable used to predict Y.
- Example: Hours studied, square footage, advertising spend.

 (Intercept)
- The value of Y when X=0.
- It represents the baseline level of the dependent variable.
- Example: If no hours are studied, the expected exam score might still be 20 due to prior knowledge.

 (Slope Coefficient)
- The change in Y for a one-unit increase in X.
- It quantifies the strength and direction of the relationship.
- Example: If Slope Coefficient=5, each extra hour of study increases the exam score by 5 points.

(Error Term / Residual)
- Captures the variation in Y not explained by X.
- Represents randomness, measurement error, or other unobserved factors.
- Example: Two students studying the same hours may still score differently due to motivation, health, or luck.

Question 4: Provide a real-world example where simple linear regression can be
applied.

ans : 
 Real-World Example of Simple Linear Regression
Scenario: Predicting house prices based on square footage.

1. Dependent Variable (Y)
- House price (in ₹ or $).
- This is what we want to predict.
2. Independent Variable (X)
- Square footage (size of the house).
- This is the predictor.
3. Regression Equation
 $y = \beta_{0} + \beta_{1}x + \epsilon$
- $\beta_{0}$: Base price (intercept) when square footage is 0.
- $\beta_{1}$: Increase in price per additional square foot.
- $\epsilon$ : Error term (captures other factors like location, age, amenities).


Question 5: What is the method of least squares in linear regression?

ans : 

The method of least squares is the standard technique used to estimate the parameters 
(\(\beta_0, \beta_1\)) of a linear regression model.  
It works by finding the line that minimizes the sum of the squared differences between 
the observed values and the values predicted by the regression line.

For a regression model:

$y = \beta_0 + \beta_1 x + \epsilon$

We want to choose ($\beta_0$) and ($\beta_1$) such that the residuals (errors) are as small as possible.

- Residual for each data point:

$e_i = y_i - (\beta_0 + \beta_1 x_i)$


- Least squares minimizes:

$S = \sum_{i=1}^{n} \big(y_i - (\beta_0 + \beta_1 x_i)\big)^2$


This ensures the regression line is the best fit line through the data.

 Purpose
- Find the best-fit line that represents the relationship between (x) and (y).  
- Minimize error by reducing the squared differences between actual and predicted values.  
- Provide estimates of slope ($\beta_1$)  and intercept ($\beta_0$)  that can be used for prediction.  


Example
Suppose we want to predict exam scores (y) from hours studied (x).

- Actual data points:  
  - (2 hrs → 50 marks)  
  - (4 hrs → 70 marks)  
  - (6 hrs → 90 marks)  

Least squares finds the line that minimizes the squared differences between actual scores and predicted scores.


Question 6: What is Logistic Regression? How does it differ from Linear Regression?

ans: 

Nature of Output
- Linear Regression predicts continuous values.
- Logistic Regression predicts probabilities (between 0 and 1).

Dependent Variable Type
- Linear Regression → Continuous (e.g., salary, house price).
- Logistic Regression → Categorical (e.g., pass/fail, spam/not spam).

 Curve vs Line
- Linear Regression fits a straight line.
- Logistic Regression fits an S‑shaped curve (sigmoid).

Interpretation of Coefficients
- Linear Regression: $\beta _1$ = change in y per unit change in x.
- Logistic Regression: $\beta _1$ = change in log‑odds of the outcome per unit change in x.

Error Minimization
- Linear Regression minimizes sum of squared errors.
- Logistic Regression maximizes likelihood (log‑likelihood).

Range of Predictions
- Linear Regression predictions can be any real number.
- Logistic Regression predictions are constrained to [0,1].

Use Cases
- Linear Regression → predicting quantities (e.g., sales, temperature).
- Logistic Regression → classification (e.g., disease/no disease, churn/no churn).

Assumptions
- Linear Regression assumes linearity, normality of errors, homoscedasticity.
- Logistic Regression assumes linearity in the log‑odds, not in the raw outcome.

Decision Making
- Linear Regression → directly uses predicted values.
- Logistic Regression → applies a threshold (e.g., 0.5) to classify into categories.


Question 7: Name and briefly describe three common evaluation metrics for regression
models.

ans : 

Mean Absolute Error (MAE)
- Measures the average of the absolute differences between predicted and actual values.
- Easy to interpret: “On average, predictions are off by X units.”
- Treats all errors equally, without squaring.

 Mean Squared Error (MSE)
- Calculates the average of squared differences between predicted and actual values.
- Penalizes larger errors more heavily because of squaring.
- Useful when big mistakes are particularly undesirable.

 R-squared (R^2) — Coefficient of Determination
- Represents the proportion of variance in the dependent variable explained by the model.
- Ranges from 0 to 1, with values closer to 1 indicating a better fit.
- Shows how well the regression line captures the data’s variability.


Question 8: What is the purpose of the R-squared metric in regression analysis?

ans: The R-squared (R^2) metric, also known as the coefficient of determination, serves the purpose of measuring how well a regression model explains the variability of the dependent variable. It represents the proportion of variance in the outcome that is accounted for by the independent variables, essentially indicating the goodness of fit of the model. A higher R^2 value, closer to 1, means the model explains a larger share of the variation in the data, while a lower value suggests the model captures less of the variability. In regression analysis, R^2 helps assess the explanatory power of the model and provides insight into how effectively the predictors describe the relationship with the target variable.


Question 10: How do you interpret the coefficients in a simple linear regression model?

ans:
 A simple linear regression model is written as:

$y=\beta _0+\beta _1x+\epsilon$ 

Where:
- y = dependent variable (outcome)
- x = independent variable (predictor)
- $\beta _0$ = intercept
- $\beta _1$ = slope (coefficient of x)
- $\epsilon$  = error term

Interpretation
 Intercept ($\beta _0$)
- Represents the expected value of y when x=0.
- It’s the baseline or starting point of the regression line.
- Example: If $\beta _0$=50, then when study hours = 0, the predicted exam score is 50.

Slope ($\beta _1$)
- Represents the change in y for a one-unit increase in x.
- Shows the strength and direction of the relationship between x and y.
- Example: If $\beta _1$=10, then each additional study hour increases the predicted exam score by 10 points.









In [5]:
pip install scikit-learn


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [6]:
# Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.

# ans :

import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data (X = independent variable, y = dependent variable)
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)   # predictor values
y = np.array([2, 4, 5, 4, 5])                  # target values

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Print slope (coefficient) and intercept
print("Slope (β1):", model.coef_[0])
print("Intercept (β0):", model.intercept_)


Slope (β1): 0.6
Intercept (β0): 2.2
