<a href="https://colab.research.google.com/github/ReyhaneTaj/ML_Algorithms/blob/main/LinearRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Regression: Concept and Implementation

## What is Linear Regression?
Linear Regression is a supervised machine learning model used for regression tasks. It predicts the value of a target variable based on the linear relationship between the target and one or more predictor variables.

- **Model Structure**: Linear regression fits a linear equation to the observed data.
- **Equation**: For a single predictor, the equation is \( y = \beta_0 + \beta_1 x + \epsilon \), where:
  - \( y \) is the target variable.
  - \( x \) is the predictor variable.
  - \( \beta_0 \) is the intercept.
  - \( \beta_1 \) is the slope (coefficient).
  - \( \epsilon \) is the error term.

## Types of Linear Regression
1. **Simple Linear Regression**: Involves a single predictor variable.
2. **Multiple Linear Regression**: Involves multiple predictor variables.

## Assumptions of Linear Regression
1. **Linearity**: The relationship between the predictors and the target is linear.
2. **Independence**: Observations are independent of each other.
3. **Homoscedasticity**: Constant variance of the errors.
4. **Normality**: The residuals (errors) of the model are normally distributed.
5. **No Multicollinearity**: Predictors are not highly correlated with each other.

## How Linear Regression Works
1. **Model Training**: The model learns the best-fitting line by minimizing the sum of squared residuals (errors) between the predicted and actual values.
2. **Prediction**: New data is passed through the learned linear equation to make predictions.

## Advantages
- **Simplicity**: Easy to understand and implement.
- **Interpretability**: Coefficients provide clear insights into the relationship between predictors and the target variable.
- **Computational Efficiency**: Requires minimal computational resources.

## Disadvantages
- **Assumptions**: Assumes a linear relationship between predictors and the target, which may not hold in all cases.
- **Outliers**: Sensitive to outliers, which can disproportionately influence the model.
- **Multicollinearity**: Can be problematic if predictors are highly correlated.
- **Extrapolation**: Poor performance on data outside the range of the training set.

## Implementation in Google Colab
Let's implement a Linear Regression model using the Boston housing dataset.

### Code Implementation



In [None]:
# Step 1: Import Libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns

# Step 2: Load Dataset
data = load_boston()
X = data.data
y = data.target

# Step 3: Split Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Train Linear Regression Model
model = LinearRegression()
model.fit(X_train, y_train)

# Step 5: Make Predictions
y_pred = model.predict(X_test)

# Step 6: Evaluate the Model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R^2 Score: {r2}")

# Visualize the Results
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', lw=2)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('Actual vs Predicted')
plt.show()

# Residual Plot
residuals = y_test - y_pred
plt.figure(figsize=(10, 6))
plt.scatter(y_pred, residuals, color='blue')
plt.axhline(y=0, color='red', linestyle='--')
plt.xlabel('Predicted')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.show()

# Distribution of Residuals
plt.figure(figsize=(10, 6))
sns.histplot(residuals, kde=True, color='blue')
plt.xlabel('Residuals')
plt.title('Distribution of Residuals')
plt.show()

**Comparison with Other Models**

**Accuracy:** Linear regression can be less accurate compared to more complex models, especially if the relationship is not linear.

**Complexity:** Linear regression is less complex compared to models like Random Forests or Neural Networks.

**Interpretability:** Linear regression is highly interpretable, with coefficients directly showing the impact of predictors.
**Conclusion**
Linear Regression is a fundamental machine learning algorithm used for predicting continuous values. Its simplicity and interpretability make it a good starting point for regression problems, although it may be limited by assumptions of linearity and sensitivity to outliers.