# 1.  What is Simple Linear Regression (SLR)? Explain its purpose.
   - Simple Linear Regression (SLR) is a statistical technique used to study the relationship between two variables — one independent (predictor) variable and one dependent (response) variable — by fitting a straight line that best describes how changes in the independent variable affect the dependent variable. Its main purpose is to predict the value of the dependent variable based on the known value of the independent variable and to understand the strength and direction of their relationship. The equation of the regression line is generally expressed as ( Y = a + bX + e ), where ( a ) is the intercept, ( b ) is the slope showing how much ( Y ) changes for each unit increase in ( X ), and ( e ) is the error term. SLR is commonly used in fields like business, economics, and science to make predictions, analyze trends, and quantify relationships between variables — for example, predicting sales from advertising expenditure or estimating crop yield from rainfall.


# 2. What are the key assumptions of Simple Linear Regression?
  - The key assumptions of Simple Linear Regression are that there is a **linear relationship** between the independent variable (X) and the dependent variable (Y), meaning the change in Y is proportional to the change in X; the **residuals (errors)** are **independent**, meaning the error terms are not correlated with each other; the **residuals have constant variance** (homoscedasticity), indicating that the spread of errors remains the same across all levels of X; the **residuals are normally distributed**, ensuring valid hypothesis tests and confidence intervals; and there is **no significant outlier or influential point** that unduly affects the regression line. These assumptions ensure that the regression model provides reliable, unbiased, and accurate predictions.


# 3. Write the mathematical equation for a simple linear regression model and explain each term.
   - The mathematical equation for a Simple Linear Regression model is ( Y = a + bX + e ), where **Y** represents the dependent variable or the outcome being predicted, **X** is the independent variable or predictor, **a** (also called the intercept) is the value of Y when X equals zero, **b** (the slope) shows the rate of change in Y for every one-unit increase in X, and **e** (the error term or residual) accounts for the difference between the actual and predicted values of Y, representing the influence of all other factors not included in the model. This equation defines a straight-line relationship between X and Y, where the coefficients a and b are estimated using the method of least squares to minimize the sum of squared errors.


# 4.  Provide a real-world example where simple linear regression can be applied.
  - A real-world example of applying Simple Linear Regression is in predicting **house prices** based on their **size** in square feet. In this case, the **dependent variable (Y)** is the house price, and the **independent variable (X)** is the size of the house. By collecting data on several houses, a regression line can be drawn to show how price changes with size. The model might look like ( \text{Price} = a + b \times \text{Size} ), where **a** represents the base price (when the size is zero), and **b** indicates how much the price increases for each additional square foot. This helps real estate agents, buyers, and sellers estimate house prices more accurately and understand how property size affects value.


# 5. What is the method of least squares in linear regression?
  - The **method of least squares** in linear regression is a mathematical technique used to find the best-fitting line through a set of data points by minimizing the sum of the squared differences between the observed values and the values predicted by the model. In other words, it determines the line ( Y = a + bX ) such that the total of all squared residuals (errors) — the vertical distances between the actual data points and the predicted line — is as small as possible. By minimizing these squared errors, the method ensures that the regression line represents the overall trend of the data accurately, providing the most reliable estimates for the intercept (**a**) and slope (**b**) of the model.


# 6. What is Logistic Regression? How does it differ from Linear Regression?
  - **Logistic Regression** is a statistical method used to model the relationship between one or more independent variables and a **categorical dependent variable**, usually representing two outcomes such as “yes/no” or “success/failure.” Unlike **Linear Regression**, which predicts a continuous numeric value, Logistic Regression predicts the **probability** of an event occurring, with the output ranging between 0 and 1. It uses the **logistic (sigmoid) function** to transform the linear combination of inputs into a probability value. In short, while Linear Regression fits a straight line to continuous data, Logistic Regression fits an S-shaped curve to classify outcomes and is mainly used for **classification problems** rather than prediction of continuous values.


# 7. Name and briefly describe three common evaluation metrics for regression models.
  - Three common evaluation metrics for regression models are **Mean Absolute Error (MAE)**, **Mean Squared Error (MSE)**, and **R-squared (R²)**. **MAE** measures the average of the absolute differences between the predicted and actual values, giving an easy-to-interpret measure of how far predictions are from real outcomes on average. **MSE** calculates the average of the squared differences between predicted and actual values, giving more weight to larger errors and thus being sensitive to outliers. **R-squared (R²)**, also known as the coefficient of determination, represents how well the regression model explains the variability of the dependent variable — with values closer to 1 indicating a better fit. Together, these metrics help evaluate how accurately and effectively a regression model predicts outcomes.


# 8. What is the purpose of the R-squared metric in regression analysis?
  - The **R-squared (R²)** metric in regression analysis measures how well the independent variable(s) explain the variation in the dependent variable. It represents the proportion of the total variation in the outcome that is accounted for by the model, with values ranging from 0 to 1. An R² value close to 1 indicates that the model explains most of the variability in the data, meaning it has a good fit, while a value near 0 suggests that the model fails to capture the underlying relationship. In simple terms, R² helps assess the **goodness of fit** of a regression model, showing how effectively the independent variable(s) predict the dependent variable.


# 9.  Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.
  - In theory, fitting a **Simple Linear Regression** model using the **scikit-learn** library in Python involves finding the best-fitting straight line that represents the relationship between an independent variable (X) and a dependent variable (Y). The goal of the model is to estimate the line of the form ( Y = a + bX ), where **a** is the intercept and **b** is the slope. The slope (**b**) shows how much the dependent variable changes for each unit increase in the independent variable, while the intercept (**a**) represents the predicted value of Y when X is zero. Scikit-learn’s `LinearRegression()` class uses the **method of least squares** to find these parameters by minimizing the sum of the squared differences between actual and predicted Y values. Once the model is trained using `fit(X, y)`, we can obtain the slope using `model.coef_` and the intercept using `model.intercept_`, which together define the regression line that best fits the data.


In [1]:
# Import necessary libraries
from sklearn.linear_model import LinearRegression
import numpy as np

# Example data
# X = independent variable (must be 2D for sklearn)
# y = dependent variable
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Print slope (coefficient) and intercept
print("Slope (b):", model.coef_[0])
print("Intercept (a):", model.intercept_)


Slope (b): 0.6
Intercept (a): 2.2


# 10. How do you interpret the coefficients in a simple linear regression model?
 - In a **Simple Linear Regression** model, the coefficients represent the relationship between the independent variable (X) and the dependent variable (Y). The **intercept (a)** indicates the expected value of Y when X is zero, essentially showing where the regression line crosses the Y-axis. The **slope (b)** shows the amount by which Y changes for every one-unit increase in X — if **b** is positive, it means Y increases as X increases; if **b** is negative, Y decreases as X increases. Therefore, the slope explains the **direction and strength** of the relationship between the two variables, while the intercept provides the **baseline value** of Y. Together, these coefficients help interpret how changes in the independent variable influence the dependent variable.
