- Question 1: What is Simple Linear Regression (SLR)? Explain its purpose.

  Simple Linear Regression (SLR) is a statistical method used to model the relationship between a dependent variable y and a single independent variable x. It assumes a linear relationship between the two variables. The purpose of SLR is to predict the value of y based on the value of x. It is a foundational technique in predictive analytics, helping to understand and quantify the relationship between variables.
- Question 2: What are the key assumptions of Simple Linear Regression?

  The key assumptions of Simple Linear Regression are:

  Linearity: There is a linear relationship between the independent variable x and the dependent variable y.

  Independence of errors: The residuals (errors) are independent of each other.

  Homoscedasticity: The variance of the errors is constant across all levels of the independent variable x.

  Normality of errors: The residuals are normally distributed for any fixed value of x.

  No multicollinearity: In the case of multiple predictors, there should be no high correlation between predictors (this does not apply to Simple Linear Regression as it has only one predictor).

- Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

    The mathematical equation for a simple linear regression model is:

    $y = \beta_0 + \beta_1x + \epsilon$

    Where:

  *   $y$ is the dependent variable (the variable you are trying to predict).
  *   $x$ is the independent variable (the variable used to predict y).
  *   $\beta_0$ is the intercept, representing the predicted value of y when  x is 0.
  *   $\beta_1$ is the slope, representing the change in y for a one-unit change in x.
  *   $\epsilon$ is the error term, representing the random variability or noise not explained by the linear relationship.

- Question 4: Provide a real-world example where simple linear regression can be applied.

  A real-world example of applying Simple Linear Regression is predicting house prices based on the size of the house. Here:

  x could be the size of the house (in square feet).

  y would be the price of the house.

  The relationship between size and price can be modeled using SLR to predict the price of a house based on its size.

- Question 5: What is the method of least squares in linear regression?

  The method of least squares is a technique used to estimate the parameters (slope and intercept) of the linear regression model. It minimizes the sum of the squared differences (errors) between the actual values y and the predicted values ^y . This method ensures that the line of best fit is as close as possible to the data points.

 - Question 6: What is Logistic Regression? How does it differ from Linear Regression?

   Logistic Regression is a type of regression used when the dependent variabley is categorical, typically binary (e.g., yes/no, 0/1). It predicts the probability of an event occurring and uses the logistic function (sigmoid) to map the linear combination of inputs to a value between 0 and 1.

   Differences from Linear Regression:

   Output: Linear regression predicts a continuous value, while logistic regression predicts probabilities (usually between 0 and 1).

   Function: Linear regression uses a linear equation, while logistic regression uses the logistic function (sigmoid) to convert the output into a probability.

   Use Case: Linear regression is used for regression tasks (continuous output), whereas logistic regression is used for classification tasks (categorical output).

- Question 7: Name and briefly describe three common evaluation metrics for regression models.

  Mean Absolute Error (MAE): Measures the average magnitude of the errors in a set of predictions, without considering their direction. It’s the average of the absolute differences between actual and predicted values.

  Mean Squared Error (MSE): Measures the average of the squared differences between actual and predicted values. It penalizes larger errors more than MAE.

  R-squared (R²): Represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It is between 0 and 1, where 1 means perfect prediction.

- Question 8: What is the purpose of the R-squared metric in regression analysis?

  The R-squared metric represents how well the independent variable(s) explain the variance in the dependent variable. It tells us the proportion of the total variance in the dependent variable that is explained by the regression model. Higher values indicate that the model explains a greater portion of the variance.

  R² = 1 means that the regression model perfectly explains the variance in the target variable.

  R² = 0 means that the model does not explain any of the variance.

In [3]:
#Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.
#Here is an example Python code:

import numpy as np
from sklearn.linear_model import LinearRegression

# Example data (x: feature, y: target)
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Independent variable (reshape for 2D)
y = np.array([1, 2, 3, 4, 5])  # Dependent variable

# Initialize the model
model = LinearRegression()

# Fit the model to the data
model.fit(x, y)

# Get the slope (coefficient) and intercept
slope = model.coef_[0]
intercept = model.intercept_

# Print the results
print("Slope:", slope)
print("Intercept:", intercept)

Slope: 1.0
Intercept: 0.0


- Question 10: How do you interpret the coefficients in a simple linear regression model?

In a simple linear regression model, the mathematical equation is typically written as:

$y = \beta_0 + \beta_1x + \epsilon$

Where:

*	$y$ is the dependent variable.
*	$x$ is the independent variable.
*	$\beta_0$ is the intercept.
*	$\beta_1$ is the slope (or coefficient) for the independent variable.
*	$\epsilon$ is the error term.

Here's how you interpret the coefficients:

*	**Intercept ($\beta_0$)**: This represents the predicted value of the dependent variable ($y$) when the independent variable ($x$) is equal to zero. In some cases, this interpretation might not be meaningful if $x=0$ is outside the range of your data or doesn't make practical sense.

*	**Slope ($\beta_1$)**: This represents the change in the predicted value of the dependent variable ($y$) for a one-unit increase in the independent variable ($x$). It indicates the direction and magnitude of the linear relationship between $x$ and $y$.

For example, in the code you provided, the slope is 1.0 and the intercept is 0.0. This means that for every one-unit increase in the independent variable `x`, the dependent variable `y` is predicted to increase by 1.0 unit. When `x` is 0, the predicted value of `y` is 0.0.