#Supervised Learning: Regression Models and Performance Metrics

#ASSIGNMENT

Q.1 What is Simple Linear Regression (SLR)? Explain its purpose.

     --> Simple Linear Regression (SLR) is a statistical method that models the relationship between a single independent (predictor) variable and a single dependent (outcome) variable by fitting a straight line to the data. Its purpose is to understand, predict, and quantify the linear relationship between the two variables, which can be used to make predictions and gain insights into how changes in the independent variable affect the dependent variable.

     Purpose of Simple Linear Regression (SLR):

     To establish a relationship between two variables: SLR helps determine if a linear relationship exists between an independent and a dependent variable and quantifies that relationship. For example, it can be used to see how monthly advertising cost (independent variable) relates to monthly sales (dependent variable).

     To make predictions: Once the linear relationship is established, you can use the model to predict the value of the dependent variable for a new value of the independent variable. For instance, you could use the model to predict a car's resale price based on its age.

     To gain insights and test hypotheses: The model provides a clear understanding of the relationship's direction and strength. The slope of the line (\(\beta _{1}\)) shows the average change in the dependent variable for a one-unit increase in the independent variable, allowing for the testing of hypotheses about the relationship's significance.

     To serve as a foundation for more complex models: Though simple, SLR is a fundamental technique in statistics and machine learning. Its principles form the basis for more advanced methods, such as multiple linear regression.



Q.2 What are the key assumptions of Simple Linear Regression?

     --> The key assumptions of simple linear regression are linearity, meaning the relationship between the variables is a straight line; independence, where errors are uncorrelated; homoscedasticity, or constant variance of errors; and normality, where the errors are normally distributed.


Q.3 Write the mathematical equation for a simple linear regression model and explain each term.

     --> The mathematical equation for a simple linear regression model is \(y=\beta _{0}+\beta _{1}x+\epsilon \), where \(y\) is the dependent variable, \(x\) is the independent variable, \(\beta _{0}\) is the y-intercept, \(\beta _{1}\) is the slope, and \(\epsilon \) represents the error term.

     Equation:

      \(y=\beta _{0}+\beta _{1}x+\epsilon \)

     Explanation of terms:

     \(y\): The dependent variable (or response variable) is the outcome you are trying to predict.

     \(\beta _{0}\): The intercept is the value of \(y\) when \(x\) is equal to zero. It is the point where the regression line crosses the vertical y-axis.

     \(\beta _{1}\): The slope (or regression coefficient) represents the change in the dependent variable (\(y\)) for a one-unit increase in the independent variable (\(x\)). It indicates the steepness of the line.

     \(\epsilon \): The error term (or residual) represents the difference between the observed value of \(y\) and the value predicted by the model. It accounts for the variability in \(y\) that is not explained by \(x\).    



Q.4  Provide a real-world example where simple linear regression can be applied.

     --> A real-world example of simple linear regression is predicting a house's price based on its size. By analyzing historical sales data, a company can create a model to see how much the price typically increases for each additional square foot, allowing them to predict the price of a new house based on its size.

     Example: Predicting house prices

     Scenario: A real estate company wants to estimate the likely selling price of a new house based on its square footage.

     Independent variable: The size of the house in square feet (e.g., \(x\) represents the square footage).
     
     Dependent variable: The price of the house (e.g., \(y\) represents the price).
     
     Data: The company collects data from past house sales in a specific neighborhood, noting both the size and the final sale price for each house.
     
     Analysis: They use simple linear regression to find a line of best fit that shows the relationship between square footage and price. This line represents the average price increase for every extra square foot.
     
     Application: With this line, they can predict the price of a house with a certain square footage. For example, if the model indicates that houses in the area cost approximately \(\$100\) per square foot, they can estimate that a 1,500-square-foot house would likely sell for around \(\$150,000\) (\(1500\times 100\)).

Q.5  What is the method of least squares in linear regression?

     --> The method of least squares in linear regression is a statistical technique used to find the best-fit line for a set of data points by minimizing the sum of the squared differences between the observed values and the values predicted by the line. This is achieved by finding the equation (\(y=mx+b\)) that produces the smallest possible sum of squared residuals (errors), which are the vertical distances from each data point to the line.

Q.6 What is Logistic Regression? How does it differ from Linear Regression?

     --> Logistic regression predicts the probability of a categorical outcome (like yes/no) by modeling a sigmoidal function, while linear regression predicts a continuous value (like price or temperature) using a linear equation. The primary difference is the type of problem they solve: logistic is for classification and linear is for regression.

Q.7 Name and briefly describe three common evaluation metrics for regression models.

     --> Three common regression metrics are Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (\(R^{2}\)). MAE measures the average absolute difference between predicted and actual values, while MSE calculates the average of the squared differences, penalizing larger errors more heavily. \(R^{2}\) indicates the proportion of the variance in the dependent variable that is predictable from the independent variables.

     1. Mean Absolute Error (MAE)

     Description: The average absolute difference between the predicted and actual values.
     
     Formula: \(\text{MAE}=\frac{1}{n}\sum _{i=1}^{n}|y_{i}-\^{y}_{i}|\)
     
     Usefulness: It is easy to interpret because it is in the same units as the target variable.
     
     2. Mean Squared Error (MSE)
     
      Description: The average of the squared differences between predicted and actual values.
     
     Formula: \(\text{MSE}=\frac{1}{n}\sum _{i=1}^{n}(y_{i}-\^{y}_{i})^{2}\)
     
     Usefulness: It penalizes larger errors more significantly than MAE due to the squaring of the error term.
     
     3. R-squared (\(R^{2}\))
     
      Description: The coefficient of determination, which represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.
     
     Range: It can range from \(0\) to \(1\), where a higher value indicates a better fit.
     
     Usefulness: It provides a relative measure of fit, showing how well the model's predictions are likely to be.


Q.8 What is the purpose of the R-squared metric in regression analysis?

     --> The purpose of the R-squared (R²) metric is to measure the goodness of fit for a regression model by indicating the proportion of variance in the dependent variable that is explained by the independent variables. Essentially, it tells you how well the model predicts the outcome, with a higher value (closer to 1 or 100%) indicating a better fit.

Q.9 Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.

In [1]:
# Import required libraries
import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Independent variable
Y = np.array([2.1, 4.3, 6.1, 7.9, 10.2])      # Dependent variable

# Create and fit the model
model = LinearRegression()
model.fit(X, Y)

# Print the slope and intercept
print("Slope (Coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)


Slope (Coefficient): 1.9800000000000004
Intercept: 0.17999999999999794


Q.10 How do you interpret the coefficients in a simple linear regression model?

     --> In a simple linear regression model (\(y=\beta _{0}+\beta _{1}x\)), the coefficient \(\beta _{1}\) represents the estimated change in the dependent variable (\(y\)) for a one-unit increase in the independent variable (\(x\)). The intercept, \(\beta _{0}\), is the predicted value of \(y\) when \(x\) equals 0.

     Interpreting the slope coefficient (\(\beta _{1}\))

     Direction: The sign of \(\beta _{1}\) indicates the direction of the relationship.
     
     A positive coefficient means that as \(x\) increases, \(y\) is predicted to increase.
     
     A negative coefficient means that as \(x\) increases, \(y\) is predicted to decrease.

     Magnitude: The value of \(\beta _{1}\) tells you the size of the predicted change.
     
     For every one-unit increase in the independent variable (\(x\)), the dependent variable (\(y\)) is expected to change by the amount of the coefficient (\(\beta _{1}\)).
     
     Example: If the equation is \(Y=10+2X\), a one-unit increase in \(X\) is associated with an estimated increase of \(2\) in \(Y\).

     Interpreting the intercept coefficient (\(\beta _{0}\))
     
      Meaning: The intercept is the predicted value of \(y\) when \(x\) is \(0\).
     
     Context is crucial: The intercept only has a meaningful interpretation if a value of \(x=0\) is realistic within the context of the data.
     
     Example: In a model predicting car stopping distance (\(y\)) based on speed (\(x\)), an intercept of \(-17\) feet is not physically possible. In such cases, the intercept is just a mathematical necessity to position the line and does not have a real-world meaning.

#END