# Supervised Learning: Regression Models and Performance Metrics | SolutionAssignment

Q1.  What is Simple Linear Regression (SLR)? Explain its purpose.

- Simple Linear Regression is a statistical method that models the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. The model assumes a linear relationship between the two variables, represented by the equation:

`[ Y = mX + c ]`

where:
- \( Y \) is the dependent variable.
- \( X \) is the independent variable.
- \( m \) is the slope of the line (the change in \( Y \) for a one-unit change in \( X \)).
- \( c \) is the intercept (the value of \( Y \) when \( X \) is zero).

**Purpose of SLR**

- Understanding relationships: It helps determine if a relationship exists between two variables and the strength and direction of that relationship.

- Predicting outcomes: Once the line is established, it can be used to predict the value of the dependent variable for a new value of the independent variable.

- Foundation for advanced methods: Even though it's simple, it serves as a fundamental concept for more complex regression techniques.
-----------------------------------------------------------------

Q2. What are the key assumptions of Simple Linear Regression?  
- **Linearity:** The relationship between the independent and dependent variables should be linear.  
- **Independence:** The residuals (errors) should be independent.  
- **Homoscedasticity:** The residuals should have constant variance at every level of the independent variable.  
- **Normality:** The residuals should be normally distributed.  

**Example:** If the residuals show a pattern or funnel shape, it indicates a violation of homoscedasticity.  

-----------------------------------------------------------------

Q3. Write the mathematical equation for a simple linear regression model and explain each term.
- The coefficients The goal of linear regression is to find the optimal values for the coefficients that create the "line of best fit" for your data. \(\beta _{0}\) (or \(a\)): The intercept. This is the value of y when x is equal to 0. On a graph, it is the point where the regression line crosses the y-axis.\(\beta _{1}\) (or \(b\)): The slope or regression coefficient. This value represents the average change in y for every one-unit change in x. For example, if \(\beta _{1}\) is 5, it means that for every 1-unit increase in x, y is predicted to increase by 5 units.

-----------------------------------------------------------------

Q4. Provide a real-world example where simple linear regression can be applied.
- A real-world example of simple linear regression is predicting a student's exam score based on the number of hours they studied. In this case, "hours studied" is the independent variable and "exam score" is the dependent variable. The model would use data from previous students to find a linear relationship, allowing for the prediction of a future student's score if they study a certain number of hours.

-----------------------------------------------------------------

Q5. What is the method of least squares in linear regression?
- The method of least squares in linear regression is a statistical technique used to find the best-fit line through a set of data points. It works by minimizing the sum of the squared differences (residuals) between the observed data values and the values predicted by the line. The goal is to find the line that has the smallest possible sum of squared errors, making it the line that best represents the relationship between the variables.


-----------------------------------------------------------------

Q6. What is Logistic Regression? How does it differ from Linear Regression?
- Logistic regression predicts categorical outcomes (like yes/no) using a probability output between 0 and 1,
- linear regression predicts continuous outcomes (like price or temperature) with a value that can range from negative to positive infinity.
- The key difference lies in their goal: logistic regression is for classification, and linear regression is for regression problems.  


-----------------------------------------------------------------

Q7.  Name and briefly describe three common evaluation metrics for regression
models.
- Three common regression metrics are Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (\(R^{2}\)). MAE measures the average absolute difference between predicted and actual values, MSE calculates the average squared difference, and \(R^{2}\) represents the proportion of variance in the dependent variable that is explained by the model's independent variables.
   
   **1. Mean Absolute Error (MAE)**
       The average of the absolute differences between the predicted values and the actual values.  What it shows: It represents the average magnitude of errors in a set of predictions, without considering their direction.  Best for: Providing an easy-to-understand metric for the average error magnitude, as it is in the same units as the original data.            
       **2. Mean Squared Error (MSE)**

            The average of the squared differences between the predicted values and the actual values. What it shows: It penalizes larger errors more heavily than smaller ones because the errors are squared. Best for: Highlighting and penalizing large errors, making it useful when large mistakes are particularly undesirable.
  
-----------------------------------------------------------------

Q8.What is the purpose of the R-squared metric in regression analysis?
- R-squared, or the coefficient of determination, indicates the proportion of the variance in the dependent variable that is predictable from the independent variables in a regression model.

**Key purposes of R-squared**
- Measures goodness of fit:  It quantifies how closely the data points cluster around the fitted regression line or curve.
- Indicates explanatory power: It represents the percentage of variation in the dependent variable that is explained by the independent variables in the model.
- Provides a scale: It works on a convenient scale of \(0\) to \(1\) (or \(0\%\) to \(100\%\)), making it easy to interpret the model's predictive accuracy.
- Helps compare models: It can be used to compare different models, though this can be misleading if models have a different number of predictors (in which case, adjusted R-squared is a better alternative).
- Detects overfitting and underfitting: A large gap between the training and testing R-squared values can signal overfitting, while low values on both can indicate underfitting
-----------------------------------------------------------------





Q9. Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept. (Include your Python code and output in the code box below.)

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data
# X should be a 2D array for scikit-learn
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

# Create a Linear Regression model object
model = LinearRegression()

# Fit the model to the data
model.fit(X, y)

# Print the intercept and coefficient (slope)
print(f"Intercept: {model.intercept_}")
print(f"Slope (Coefficient): {model.coef_[0]}")

Intercept: 2.2
Slope (Coefficient): 0.6



Q10.  How do you interpret the coefficients in a simple linear regression model?
- Linear regression is a cornerstone technique in statistical modeling, used extensively to understand relationships between variables and to make predictions. At the heart of linear regression lies the interpretation of its coefficients. These coefficients provide valuable insights into the nature of the relationships between the dependent variable and the independent variables. This article will guide you through understanding and interpreting these coefficients effectively.

Linear Regression Equation

The basic form of a linear regression equation is:

Y=β_0+β_1 X_1+β_2 X_2+⋯+β_n X_n+ϵ

Where:
Y is the dependent variable.

β_0 is the intercept.

β_1,β_2,…,β_n are the coefficients of the independent variables X1,X2,…, XnX_1,
X_2, \ldots, X_nX1,X2,…,Xn.

ϵ is the error term.

Interpreting the Intercept (β_0) and Coefficients (β1,β2,…,βn\beta_1, \beta_2, \ldots, \beta_nβ1,β2,…,βn)

The intercept (β_0) represents the expected value of Y when all X variables are zero. It serves as the baseline level of the dependent variable. However, its practical interpretation can sometimes be limited, especially if zero values for all independent variables are unrealistic or outside the scope of the data.

