1. What is Simple Linear Regression (SLR)? Explain its purpose.
- Simple Linear Regression is a statistical method that is used when we need to check realtionship between one Independent variable(X) and one dependent variable(y).
- SLR finds the best-fit line (called the regression line) that predicts Y from X using the equation:
  - y = mX+c
    - y = dependent variable (what you want to predict)
    - X = independent variable (input/predictor)
    - c = intercept (value of Y when X = 0)
    - m = slope (change in Y for each unit increase in X)

Purpose of Simple Linear Regression
- Prediction – Estimate the value of Y based on X.
Example: Predict house price from size.
- Understanding Relationships – Determine how strongly X affects Y.
Example: How study hours influence exam marks.
- Trend Analysis – Identify whether Y increases or decreases as X changes.
- Decision Making – Helps in business, economics, science, etc., to make data-driven decisions.

2. What are the key assumptions of Simple Linear Regression?
- The key assumptions of Simple Linear Regression (SLR) are conditions that must hold true for the regression results (predictions, coefficients, and statistical tests) to be reliable and valid.
- Linearity : The relationship between X and Y must be linear.
- Independence of Errors : The residuals (errors) should be independent of each other.
- Homoscedasticity (Constant Variance) : The residuals should have constant variance across all values of X.
- SLR works best when the relationship is linear, errors are independent, variance is constant, residuals are normal, and there are no influential outliers.

3. Write the mathematical equation for a simple linear regression model and explain each term.
- The standard equation of a Simple Linear Regression (SLR) model is:
  - Y = β0 ​+ β1​*X + e
| Symbol | Name                 | Meaning                                    |
| ------ | -------------------- | ------------------------------------------ |
| **Y**  | Dependent variable   | The variable we want to predict or explain |
| **X**  | Independent variable | The predictor/input variable               |
| **β₀** | Intercept            | Value of Y when X = 0                      |
| **β₁** | Slope coefficient    | Change in Y for a one-unit increase in X   |
| **ε**  | Error term           | Random variation not explained by X        |

- Observed value = Predicted value + Error
- Predicted value =>  
  - y = m*X+c
  - y =  β0 ​+ β1​*X



4.  Provide a real-world example where simple linear regression can be applied.
- A teacher wants to understand how the number of hours a student studies (X) affects their exam score (Y).
  - Exam Score = β0 ​+ β1 * ​(Study Hours) + e
  - β₀ (Intercept): Expected exam score if a student studies 0 hours.
  - β₁ (Slope): Increase in score for each additional hour of study.
  - e (Error): Other factors affecting marks (sleep, IQ, stress, preparation quality).
- If the slope is positive, it means more study time generally leads to higher marks.

5. What is the method of least squares in linear regression?
- The method of least squares is a mathematical technique used in linear regression to find the best-fitting line for a set of data points. It chooses the line that minimizes the total squared difference between the actual observed values and the predicted values from the line.
  - The error (residual) is : e ​= y ​− y^
  - Least squares minimizes:​ ∑(yi​−y^​i​)**2
- Squaring ensures : Negative and positive errors don’t cancel out

6. What is Logistic Regression? How does it differ from Linear Regression?
- Logistic Regression is a statistical and machine learning algorithm used for classification problems, where the output variable is categorical (usually binary: 0 or 1, Yes/No, True/False).
- | Feature       | Linear Regression        | Logistic Regression         |
| ------------- | ------------------------ | --------------------------- |
| Purpose       | Predict numerical values | Predict categories          |
| Output        | Continuous number        | Probability (0–1)           |
| Graph Shape   | Straight line            | S-shaped curve              |
| Equation Type | Linear equation          | Logistic (sigmoid) function |
| Example       | Predict salary           | Predict pass/fail           |



- Linear Regression: Predict a student’s marks from study hours.
- Logistic Regression: Predict whether a student will pass or fail.

- Linear regression answers → “How much?”
- Logistic regression answers → “Which category?”

7. Name and briefly describe three common evaluation metrics for regression models.

Mean Absolute Error (MAE)
- Measures the average absolute difference between actual and predicted values.
- MAE = n1∑∣yi​−y^​i​∣


Mean Squared Error (MSE)
- Calculates the average of squared errors.
- MSE = n1​∑(yi​−y^​i​)**2

R-squared (Coefficient of Determination)
- Shows how well the model explains variation in the data.
- R2=1−∑(yi​−yˉ​)2∑(yi​−y^​i​)2​

- MAE → average error size
- MSE → emphasizes large errors
- R² → goodness of fit

8. What is the purpose of the R-squared metric in regression analysis?
- R-squared (R²), also called the coefficient of determination, measures how well a regression model explains the variability of the dependent variable.
- R**2 = 1 − (Total Sum of Squares / Residual Sum of Squares​)
- The purpose of R² is to evaluate how well the regression line fits the data.
- R² = 0 → Model explains none of the variability
- R² = 1 → Model explains all variability
- Higher R² → Better explanatory power
- R² tells you how much of the change in Y is explained by X.
- If R² = 0.80, then 80% of the variation in Y is explained by the model, and 20% is unexplained (random factors or missing variables).

In [1]:
# 9. Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.



# Simple Linear Regression using scikit-learn

import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data (X = independent variable, y = dependent variable)
X = np.array([[1],[2],[3],[4],[5]])
y = np.array([2,4,5,4,5])

# Create and train model
model = LinearRegression()
model.fit(X, y)

# Print slope and intercept
print("Slope (coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)

Slope (coefficient): 0.6
Intercept: 2.2


10. How do you interpret the coefficients in a simple linear regression model?
- In a Simple Linear Regression model, the equation is:
  - Y = β0 ​+ β1*​X
  - The coefficients β0 and β1 describe how the independent variable affects the dependent variable
  - Intercept (β₀) :
    - Represents the predicted value of Y when X = 0
    - It is the point where the regression line crosses the Y-axis
    - Sometimes meaningful, sometimes not (depends on whether X = 0 makes sense in context)
    - Example: If β₀ = 5 → predicted Y is 5 when X = 0.
  - Slope (β₁) :  
    - Shows the change in Y for a one-unit increase in X
    - Indicates direction and strength of relationship:
      - Positive slope → Y increases as X increases
      - Negative slope → Y decreases as X increases
      - Zero slope → No relationship
    - Example: If β₁ = 2 → every 1 unit increase in X increases Y by 2 units.

- Simple Real-Life Example
  - Salary = 20000 + 3000×(Years of Experience)
  - Intercept (20000): Starting salary with 0 years experience
  - Slope (3000): Each extra year of experience increases salary by 3000