<a href="https://colab.research.google.com/github/AbbisreeSaadhvi/Python-Projects/blob/main/Linear_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Linear Regression

Linear Regression is a statistical method used to model the relationship between a dependent variable (also called the target or outcome variable) and one or more independent variables (also known as predictors or features). The goal of linear regression is to find the best-fitting linear equation that can predict the dependent variable based on the independent variables.

**Dependent Variable (Y):** The variable that we are trying to predict or explain.

**Independent Variable(s) (X):** The variable(s) that are used to predict the dependent variable.

**Linear Relationship:** The relationship between the dependent and independent variables is assumed to be linear, meaning it can be represented by a straight line.

**Types of Linear Regression**

Simple Linear Regression: Involves a single independent variable.

Multiple Linear Regression: Involves two or more independent variables.

**Dataset Used:**
 Boston Housing dataset

In [4]:
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset from GitHub
url = 'https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv'
data = pd.read_csv(url)

# Display the first few rows of the dataset
print(data.head())

# Define the predictors and the outcome variable
X = data.drop(columns=['medv'])
y = data['medv']

# Add a constant to the model (for the intercept)
X_const = sm.add_constant(X)

      crim    zn  indus  chas    nox     rm   age     dis  rad  tax  ptratio  \
0  0.00632  18.0   2.31     0  0.538  6.575  65.2  4.0900    1  296     15.3   
1  0.02731   0.0   7.07     0  0.469  6.421  78.9  4.9671    2  242     17.8   
2  0.02729   0.0   7.07     0  0.469  7.185  61.1  4.9671    2  242     17.8   
3  0.03237   0.0   2.18     0  0.458  6.998  45.8  6.0622    3  222     18.7   
4  0.06905   0.0   2.18     0  0.458  7.147  54.2  6.0622    3  222     18.7   

        b  lstat  medv  
0  396.90   4.98  24.0  
1  396.90   9.14  21.6  
2  392.83   4.03  34.7  
3  394.63   2.94  33.4  
4  396.90   5.33  36.2  


##Explanatory Linear Regression

An explanatory model is used to offer an explanation of a past event. Specifically, explanatory models are used when we want to:

Explain why certain units have high values for the outcome variable, while others have low value for the outcome variable; explain why the value of the outcome variable varied for different observations.

Assess the relationship between the outcome variable and a factor(indepedent variable). This is also known as Hypothesis Testing

Null hypothesis (H0): There is NO association between the independent variable and the outcome variable
Alternative hypothesis (H1): There is a significant association between the independent variable and the outcome variable

In [5]:
# Fit the explanatory linear regression model
model_explanatory = sm.OLS(y, X_const).fit()

# Summary of the explanatory model
print(model_explanatory.summary())

                            OLS Regression Results                            
Dep. Variable:                   medv   R-squared:                       0.741
Model:                            OLS   Adj. R-squared:                  0.734
Method:                 Least Squares   F-statistic:                     108.1
Date:                Wed, 29 May 2024   Prob (F-statistic):          6.72e-135
Time:                        01:44:33   Log-Likelihood:                -1498.8
No. Observations:                 506   AIC:                             3026.
Df Residuals:                     492   BIC:                             3085.
Df Model:                          13                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         36.4595      5.103      7.144      0.0

**Interpretation:**

_R-squared:_

0.741, indicating that 74.1% of the variance in the dependent variable (medv) is explained by the independent variables in the model.

*Significant Predictors:*

Variables like crim, zn, chas, nox, rm, dis, rad, tax, ptratio, black, and lstat have p-values less than 0.05, indicating they are statistically significant predictors of the median value of owner-occupied homes.

*Coefficients:*

Each coefficient represents the change in the dependent variable for a one-unit change in the predictor variable, holding all other variables constant. For example, the coefficient for rm is 3.8099, suggesting that an increase of one room leads to an average increase of $3.81 in the median value of homes.

##Predictive Linear Regression

In [3]:
# Split the data into training and testing sets for the predictive model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the predictive linear regression model
model_predictive = LinearRegression()
model_predictive.fit(X_train, y_train)

# Make predictions
y_pred = model_predictive.predict(X_test)

# Performance metrics for the predictive model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

Mean Squared Error: 24.291119474973478
R-squared: 0.6687594935356326


**Interpretation:**

*Mean Squared Error (MSE):*

24.291, indicating the average squared difference between the observed actual outcomes and the outcomes predicted by the model.

*R-squared:*

0.668, indicating that 66.8% of the variance in the dependent variable (medv) is explained by the model on the test data.

##Conclusion

Both the explanatory and predictive models provide insights into the relationship between the predictors and the target variable. The explanatory model helps identify significant predictors and their impacts, while the predictive model evaluates the performance of predictions on new data.