            Assignment : Logistic Regression

#Question 1 : What is Simple Linear Regression (SLR)? Explain its purpose.
- Simple Linear Regression (SLR) is a statistical method used to examine the relationship between two variables:

One independent variable (X) - the predictor or input

One dependent variable (Y) - the outcome or response

SLR assumes that the relationship between X and Y can be represented by a straight line.

Mathematical Form

The simple linear regression model is written as:

Y=a+bX

Where:

Y = dependent variable

X = independent variable

a = intercept (value of Y when X = 0)

b = slope (rate of change in Y for a one-unit change in X)

Purpose of Simple Linear Regression

The main purposes of SLR are:

To understand the relationship between variables
It helps determine whether and how strongly one variable affects another.

To predict values
Once the relationship is known, SLR can be used to predict the value of the dependent variable for a given value of the independent variable.

To quantify the effect of the independent variable
The slope shows how much the dependent variable changes when the independent variable increases by one unit.

To analyze trends
SLR is useful for identifying and explaining trends in data, such as sales growth over time or the effect of study hours on exam scores

# Question 2: What are the key assumptions of Simple Linear Regression?
- Key Assumptions of Simple Linear Regression (SLR)

Simple Linear Regression is based on the following important assumptions:

> Linearity
There is a linear relationship between the independent variable (X) and the dependent variable (Y). This means changes in X result in proportional changes in Y.

> Independence of Errors
The residuals (errors) are independent of each other. The error for one observation does not affect the error for another.

> Homoscedasticity
The variance of the errors is constant for all values of the independent variable. In other words, the spread of residuals remains the same across all levels of X.

> Normality of Errors
The residuals are normally distributed, especially important for hypothesis testing and confidence intervals.

> No Significant Outliers
There should be no extreme values that unduly influence the regression results, as outliers can distort the regression line.

#Question 3: Write the mathematical equation for a simple linear regression model and explain each term.
- Mathematical Equation of Simple Linear Regression (SLR)

The mathematical equation of a simple linear regression model is:
Y=a+bX+ε
Explanation of Each Term

Y (Dependent Variable):
The variable we want to predict or explain.

X (Independent Variable):
The variable used to predict Y.

a (Intercept):
The value of Y when X = 0. It represents where the regression line crosses the Y-axis.

b (Slope or Regression Coefficient):
Shows the change in Y for a one-unit increase in X. It indicates the strength and direction of the relationship.

ε (Error Term):
Represents random variation or unexplained factors affecting Y that are not captured by X.

#Question 4: Provide a real-world example where simple linear regression can be applied.
- Real-World Example of Simple Linear Regression

A common real-world application of Simple Linear Regression is in education.

Example: Study Time and Exam Scores

Independent Variable (X): Number of hours a student studies

Dependent Variable (Y): Exam score

Simple linear regression can be used to analyze how the number of hours studied affects exam performance. By fitting a regression line, educators can:

Understand the relationship between study time and exam scores

Predict a student’s expected score based on hours studied

Identify whether increased study time leads to better academic performance

Other Real-World Examples

Predicting sales based on advertising expenditure

Estimating house prices based on house size

Forecasting electricity consumption based on temperature

In [None]:
#Question 5: What is the method of least squares in linear regression?
- Method of Least Squares in Linear Regression

The method of least squares is a statistical technique used in linear regression to find the best-fitting straight line through a set of data points.

Explanation

In simple linear regression, the best-fitting line is the one that minimizes the sum of the squared differences between the observed values and the predicted values.

These differences are called residuals.

Residual=Observed value - Predicted value


The method of least squares chooses the regression line such that:

∑(Residuals)2

is as small as possible.

Purpose of the Method of Least Squares

To find accurate estimates of the regression coefficients (intercept and slope).

To ensure the best fit between the regression line and the observed data.

To reduce prediction error by minimizing overall deviation from actual values.

Result

Using this method, we obtain:

a (Intercept): Best estimate of Y when X = 0

b (Slope): Best estimate of the change in Y for a one-unit change in X

#Question 6: What is Logistic Regression? How does it differ from Linear Regression?
- Logistic Regression

Logistic Regression is a statistical method used to model the relationship between one or more independent variables and a binary (categorical) dependent variable, such as Yes/No, Pass/Fail, or 0/1. Instead of predicting a continuous value, it predicts the probability that an outcome belongs to a particular class. This probability is calculated using a logistic (sigmoid) function, which ensures the output always lies between 0 and 1.

Difference Between Logistic Regression and Linear Regression

Linear Regression is used when the dependent variable is continuous (for example, marks, height, or price) and it predicts values using a straight-line equation. Logistic Regression is used when the dependent variable is categorical, and it predicts probabilities rather than exact values.

Another key difference is the nature of their outputs. Linear regression can produce any real-number value, while logistic regression outputs probabilities that are bounded between 0 and 1. Linear regression assumes a linear relationship and normally distributed errors, whereas logistic regression uses a non-linear S-shaped curve and assumes a binomial distribution.

#Question 7: Name and briefly describe three common evaluation metrics for regression models
- Common Evaluation Metrics for Regression Models

Mean Absolute Error (MAE)
Mean Absolute Error measures the average of the absolute differences between the actual values and the predicted values. It shows how far predictions are from the true values on average, without considering the direction of the error.

Mean Squared Error (MSE)
Mean Squared Error calculates the average of the squared differences between actual and predicted values. Squaring the errors gives more weight to larger errors, making it useful when large mistakes are particularly undesirable.

R-squared (Coefficient of Determination)
R-squared indicates how well the regression model explains the variation in the dependent variable. Its value ranges from 0 to 1, where a higher value means the model explains a greater proportion of the variability in the data.

#Question 8: What is the purpose of the R-squared metric in regression analysis?
- Purpose of the R-squared Metric in Regression Analysis

The R-squared (coefficient of determination) metric is used to measure how well a regression model explains the variation in the dependent variable.

Its main purpose is to show the proportion of the total variability in the dependent variable that is explained by the independent variable(s) in the model. An R-squared value closer to 1 indicates that the model fits the data well, while a value closer to 0 indicates a poor fit.

R-squared also helps in comparing different regression models for the same dataset. A model with a higher R-squared value generally explains the data better

#Question 9 Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.



In [1]:
import numpy as np
from sklearn.linear_model
import LinearRegression

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 6, 8, 10])

model = LinearRegression()
model.fit(X, y)

print("Slope:", model.coef_[0])
print("Intercept:", model.intercept_)


Slope: 2.0
Intercept: 0.0


#Question 10: How do you interpret the coefficients in a simple linear regression model?
- Interpretation of Coefficients in a Simple Linear Regression Model

In a simple linear regression model, the coefficients describe the relationship between the independent variable and the dependent variable.

The intercept represents the expected value of the dependent variable when the independent variable is equal to zero. It indicates where the regression line crosses the Y-axis. In some real-world situations, the intercept may not have a practical meaning, but it is still important for defining the regression line.

The slope (regression coefficient) represents the average change in the dependent variable for a one-unit increase in the independent variable. If the slope is positive, it means that as the independent variable increases, the dependent variable also increases. If the slope is negative, it indicates an inverse relationship, where the dependent variable decreases as the independent variable increases.