# Supervised Learning: Regression
Models and Performance Metrics


1. What is Simple Linear Regression (SLR)? Explain its purpose ?
 - Simple Linear Regression estimates how the dependent variable (Y) changes when the independent variable (X) changes.
The relationship is represented by the equation:
Y=a+bX+e

   Where:

    - Y = Dependent variable (the one we want to predict)

    - X = Independent variable (the predictor)

    - a = Intercept (value of Y when X = 0)

    - b = Slope (change in Y for a one-unit change in X)

    - e = Error term (difference between actual and predicted values)

  -Purpose of Simple Linear Regression

    - Prediction:
To predict the value of one variable (Y) based on the value of another (X).
Example: Predicting a student‚Äôs exam score (Y) based on hours studied (X).

    - Understanding Relationships:
To understand whether and how strongly two variables are related.
Example: Examining if advertising expenditure (X) affects sales (Y).

    - Trend Analysis:
To identify and describe trends or patterns in data over time.

    - Quantifying Effect:
To measure how much change in Y is caused by a unit change in X.



2. What are the key assumptions of Simple Linear Regression?
  - 1. Linearity

     There should be a linear relationship between the independent variable (X) and the dependent variable (Y).
     The change in Y is proportional to the change in X.
     Nonlinear relationships violate this assumption.

     Example: If studying more hours consistently increases marks, the relationship is linear.

  - 2. Independence of Errors

     The residuals (errors) ‚Äî the differences between actual and predicted values ‚Äî must be independent of each other.
     This means one observation‚Äôs error should not depend on another‚Äôs.

     Example: In time-series data, errors from one day should not affect the next day.

  - 3. Homoscedasticity (Constant Variance)
     The variance of the residuals should be constant across all values of X.
     Equal spread of errors throughout the data.
     If errors increase or decrease as X increases, it‚Äôs called heteroscedasticity.

     Example: The spread of exam score errors should be similar for all study hours.

  - 4. Normality of Errors
     The residuals should be normally distributed (bell-shaped curve).
     Important for hypothesis testing and confidence intervals.
     You can check this by plotting a histogram or Q-Q plot of residuals.

  - 5. No or Minimal Multicollinearity
     Since SLR has only one independent variable, this assumption mainly applies to multiple regression ‚Äî but in general, X should not be highly correlated with other predictors.

  - 6. No Measurement Error in X
     The independent variable (X) should be measured accurately ‚Äî errors in X can bias the results.

3. Write the mathematical equation for a simple linear regression model and
explain each term.
 - Mathematical Equation:
   Y=a+bX+e

 - Interpretation Example:
   Suppose the regression equation is:
   Y=40+6.25X

    - a = 40: When a student studies 0 hours, the predicted marks are 40.

    - b = 6.25: For every 1 extra hour of study, marks increase by 6.25 points.

    - e: Accounts for random factors (like exam difficulty or student‚Äôs health) that affect marks but are not related to hours studied.

 - Graphically
   
   It‚Äôs represented as a straight line on a scatter plot:
   Predicted¬†Y=a+bX

   - The slope (b) shows how steeply Y changes with X.

   - The intercept (a) shows where the line starts on the Y-axis.



4. Provide a real-world example where simple linear regression can be
applied.
 - Example: Predicting House Prices

   Simple Linear Regression can be used to predict the price of a house (Y) based on its size in square feet (X).

     Equation: House¬†Price=a+b(Size)

 - Example:
   
   If the model is: Price=50,000+2000√óSize

   Then for a house of 1,000 sq. ft,

   Predicted¬†Price=50,000+2000(1000)=2,050,000

 - Purpose: To estimate or predict house prices using their size as a single predictor.



5. What is the method of least squares in linear regression?
 - It chooses the line for which the total squared difference between the observed values and the values predicted by the line is the smallest.

   Minimize¬†‚àë(Yi‚Äã‚àíYi‚Äã^‚Äã)2

 - where
   
   - Yi = actual value

   - ùëåùëñ^ = a+bXi = predicted value

 - Example:
   
   Suppose you have data of hours studied (X) and marks (Y).
   The least squares method finds the line Y=a+bX that gives the minimum total squared error between the actual marks and the predicted marks.

6. What is Logistic Regression? How does it differ from Linear Regression?
 - Logistic Regression is a statistical method used for classification problems, where the dependent variable is categorical (e.g., Yes/No, 0/1, Pass/Fail).
It predicts the probability that an observation belongs to a particular class using the logistic (sigmoid) function.

 - Equation:
   
    P(Y=1)=1+e‚àí(a+bX)1‚Äã

    - This ensures predicted values are between 0 and 1 (probabilities).

 - Example:

    - Linear Regression: Predicts exact marks scored.

    - Logistic Regression: Predicts the probability of passing the exam (e.g., 0.85 ‚Üí likely to pass).

7. Name and briefly describe three common evaluation metrics for regression
models.
 - . Mean Absolute Error (MAE)
     
     MAE=n/1‚Äã‚àë‚à£Yi‚Äã‚àíYi‚Äã^‚Äã‚à£

      - Measures the average absolute difference between actual and predicted values.

      - Interpretation: Lower MAE ‚Üí better model accuracy.

      - Example: If MAE = 5, predictions are off by about 5 units on average.

 - 2. Mean Squared Error (MSE)
       
       MSE=n/1‚Äã‚àë(Yi‚Äã‚àíYi‚Äã^‚Äã)2

       - Measures the average squared difference between actual and predicted values.

       - Penalizes larger errors more heavily than MAE.

       - Lower MSE means better performance.

 - 3. R-squared (Coefficient of Determination)
       
      R2=1‚àíSST/SSR‚Äã

      - Shows how much of the variation in Y is explained by the model.

       - Values range from 0 to 1.

          1 ‚Üí perfect fit

           0 ‚Üí no explanatory power



8. What is the purpose of the R-squared metric in regression analysis?
  - Assess Model Fit:

     - Indicates how well the regression line represents the data.

     - Higher R¬≤ ‚Üí model explains more of the variation in Y.

  - Compare Models:

     - Helps compare different models predicting the same dependent variable.

     - The model with a higher R¬≤ generally fits the data better.

  - Interpret Explained Variability:

     - R¬≤ = 0.8 ‚Üí 80% of the variability in Y is explained by X, 20% is unexplained.

  - Example:
     If you predict house prices (Y) based on size (X) and get R¬≤ = 0.75, it means 75% of the variation in house prices is explained by house size.

9. Write Python code to fit a simple linear regression model using scikit-learn
and print the slope and intercept.
(Include your Python code and output in the code box below.)


In [None]:
# Import required libraries
from sklearn.linear_model import LinearRegression
import numpy as np

# Sample data
# X = independent variable (reshaped into 2D array for sklearn)
# y = dependent variable
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Print slope (coefficient) and intercept
print("Slope (Coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)

Sample Output:

Slope (Coefficient): 0.6
Intercept: 2.2

10. How do you interpret the coefficients in a simple linear regression model?
  - In a simple linear regression model, the equation is:
    
    y=b0‚Äã+b1‚Äãx

  - where:
     
     - y = predicted (dependent) variable

     - x = independent variable

     - b0 = intercept

     - b1 = slope (coefficient)

  - Interpretation:
     
     - Intercept (b0):

       The predicted value of y when x = 0

       It represents the point where the regression line crosses the y-axis

     - Slope (b1):

       The amount by which y changes when x increases by one unit.

       If b1 is positive, y increase with x;

       If negative, y decrease with x.
     