#**Supervised Learning: Regression Models and Performance Metrics**

#**Question 1: What is Simple Linear Regression (SLR)? Explain its purpose.**

#**Answer:**

Simple Linear Regression (SLR) is a supervised learning regression technique used to study the relationship between one independent variable (X) and one dependent variable (Y). It assumes that this relationship can be represented by a straight line.

**The main purpose of Simple Linear Regression is to:**

- Understand how the dependent variable changes with respect to the independent variable

- Predict the value of the dependent variable for a given input

- Identify the strength and direction of the relationship between two variables

Simple Linear Regression is easy to understand and interpret, which makes it widely used in data analysis, statistics, and machine learning applications such as predicting marks, sales, or prices.

#**Question 2: What are the key assumptions of Simple Linear Regression?**

#**Answer:**

Simple Linear Regression works correctly only when certain assumptions are satisfied:

1. Linearity

   There must be a linear relationship between the independent and dependent variables.

2. Independence of Errors

   The residuals (errors) should be independent of each other.

3. Homoscedasticity

   The variance of the errors should remain constant for all values of X.

4. Normality of Errors

   The error terms should be normally distributed.

5. No Outliers

   Extreme outliers should not strongly influence the regression line.

If these assumptions are violated, the predictions made by the model may not be reliable.

#**Question 3: Write the mathematical equation for a simple linear regression model and explain each term.**

#**Answer:**

The mathematical equation of Simple Linear Regression is:




      Y=β0​+β1​X+ε
Explanation of each term:

- Y → Dependent variable (output)

- X → Independent variable (input)

- β₀ (Intercept) → Value of Y when X is equal to 0

- β₁ (Slope) → Change in Y for a one-unit change in X

- ε (Error term) → Difference between actual and predicted values

This equation represents a straight-line relationship between X and Y.

#**Question 4: Provide a real-world example where simple linear regression can be applied.**

#**Answer:**
A common real-world example of Simple Linear Regression is predicting exam scores based on study hours.

- Independent variable (X): Number of hours studied

- Dependent variable (Y): Exam score

Using past data, a regression model can predict how many marks a student may score if they study for a certain number of hours. This helps in understanding the impact of study time on performance.

#**Question 5: What is the method of least squares in linear regression?**

#**Answer:**

The method of least squares is a mathematical approach used to find the best-fitting regression line. It works by minimizing the sum of the squares of the errors (residuals) between the actual values and the predicted values.

In simple words:

- It calculates the line where the difference between actual and predicted values is minimum.

- Squaring the errors ensures that positive and negative errors do not cancel each other.

This method helps in finding the optimal values of slope and intercept for the regression model.

#**Question 6: What is Logistic Regression? How does it differ from Linear Regression?**

#**Answer:**


Logistic Regression is a supervised learning algorithm that is mainly used for classification problems. The dependent variable in Logistic Regression is categorical, most commonly binary in nature such as Yes/No or 0/1. Instead of predicting exact numerical values, Logistic Regression predicts the probability that a given input belongs to a particular class. It uses a sigmoid (logistic) function to limit the output values between 0 and 1. Logistic Regression is widely applied in areas like medical diagnosis, spam detection, and fraud analysis.

**Difference between Logistic Regression and Linear Regression:**

- Linear Regression is used to predict continuous numerical values, whereas Logistic Regression is used to predict categorical outcomes.

- Linear Regression assumes a linear relationship between variables, while Logistic Regression uses a non-linear sigmoid function.

- The output of Linear Regression can take any real value, whereas the output of Logistic Regression is always between 0 and 1.

- Linear Regression is mainly used for regression problems, while Logistic Regression is designed for classification problems.

- Linear Regression uses the least squares method for training, whereas Logistic Regression uses maximum likelihood estimation.

#**Question 7: Name and briefly describe three common evaluation metrics for regression models.**

#**Answer:**

To evaluate the performance of regression models, several standard evaluation metrics are used. Three commonly used metrics are explained below:

- **Mean Absolute Error (MAE):**

   Mean Absolute Error calculates the average of the absolute differences between the actual values and the predicted values. It is easy to understand and expresses the error in the same unit as the dependent variable.

- **Mean Squared Error (MSE):**

   Mean Squared Error computes the average of the squared differences between actual and predicted values. It gives more weight to larger errors, making it useful when large errors need to be penalized.

- **Root Mean Squared Error (RMSE):**

   Root Mean Squared Error is the square root of MSE. It represents the prediction error in the same unit as the output variable, which makes interpretation easier.

These metrics help in comparing different regression models and assessing their accuracy.

#**Question 8: What is the purpose of the R-squared metric in regression analysis?**

Answer:

R-squared (R²) is a statistical metric used to measure how well a regression model explains the variation in the dependent variable.

- R-squared shows the proportion of variance in the dependent variable that is explained by the independent variable(s).

- Its value ranges from 0 to 1.

- A higher R-squared value indicates a better fit of the regression model.

- For example, an R² value of 0.80 means that 80% of the variation in the dependent variable is explained by the model.

- R-squared helps in evaluating and comparing regression models, but it does not indicate causation.

#**Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.**

#**Answer:**

In [1]:
# Import required libraries
import numpy as np
from sklearn.linear_model import LinearRegression

# Create sample input data (independent variable)
X = np.array([[1], [2], [3], [4], [5]])

# Create output data (dependent variable)
Y = np.array([2, 4, 6, 8, 10])

# Create a Linear Regression model
model = LinearRegression()

# Train the model using the data
model.fit(X, Y)

# Display the slope and intercept
print("Slope of the regression line:", model.coef_[0])
print("Intercept of the regression line:", model.intercept_)


Slope of the regression line: 2.0
Intercept of the regression line: 0.0


#**Question 10: How do you interpret the coefficients in a simple linear regression model?**

#**Answer:**

In a simple linear regression model, the coefficients explain the relationship between the independent and dependent variables.

- **Intercept (β₀):**

  It represents the expected value of the dependent variable when the independent variable is zero.

- **Slope (β₁):**

  It indicates how much the dependent variable changes for a one-unit increase in the independent variable.

- A positive slope shows a direct relationship, while a negative slope shows an inverse relationship.

- These coefficients help in understanding the direction and strength of the relationship between variables.