# Linear Regressing and Implementation

## Defination

Linear Regression is a statistical method that models the relationship between a dependent variable (often denoted as y) and one or more independent variables (denoted as 𝑋). If there is one independent variable, it’s called simple linear regression, and if there are multiple, it’s referred to as multiple linear regression.

Linear regression assumes that the relationship between the dependent and independent variables is linear, meaning it can be described using a straight line. The objective is to find the line (or hyperplane in higher dimensions) that best fits the data, which minimizes the difference between predicted and observed values.

## History

The origins of linear regression date back to the 1800s. Sir Francis Galton, a British statistician, introduced the term "regression" in his study of the relationship between parents’ and children’s heights. Later, Karl Pearson formalized the mathematics behind the technique, which was further developed by R.A. Fisher and others into a core component of modern statistics.

## Key Concepts in Linear Regression

1. **Simple Linear Regression:** In simple linear regression, there is only one independent variable(predictor), one dependent variable(response), and the relationship between the independent and dependent variable is linear.

![Simple Linear Regression](attachment:image.png)

2. **Multiple Linear Regression:** In multiple linear regression, there are multiple independent variables(predictors), one dependent variable(response), and the relationship between the independent and dependent variable is linear.

![Multiple Linear Regression](attachment:image.png)

3. **Ordinary Least Squares (OLS):** Linear regression usually estimates the parameters β0 and β1 by minimizing the sum of squared residuals, known as Ordinary Least Squares (OLS). The objective is to find the line that minimizes the squared difference between the predicted and actual values of the dependent variable.

![Ordinary Least Squares](attachment:image.png)

![Metrics for Evaluation](attachment:image.png)

![Advantages and Disadvantages](attachment:image.png)

### Implementation of Simple Linear Regression

In [None]:
# Import libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Example dataset
data = {
    'X': [1, 2, 3, 4, 5, 6],
    'Y': [2, 4, 5, 4, 5, 7]
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Split data into features (X) and target (Y)
X = df[['X']]  # Feature
y = df['Y']    # Target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a LinearRegression object
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Predict on the test data
y_pred = model.predict(X_test)

# Calculate the mean squared error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Get the model's slope and intercept
print(f"Slope (Coefficient): {model.coef_[0]}")
print(f"Intercept: {model.intercept_}")


### Impelmentation of Multiple Linear Regression

In [1]:
# Import libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Example dataset with multiple independent variables
data = {
    'X1': [1, 2, 3, 4, 5, 6],  # First feature
    'X2': [10, 20, 30, 40, 50, 60],  # Second feature
    'X3': [100, 200, 300, 400, 500, 600],  # Third feature
    'Y': [2, 4, 5, 4, 5, 7]  # Target variable
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Define features (X1, X2, X3) and target (Y)
X = df[['X1', 'X2', 'X3']]  # Multiple features
y = df['Y']  # Target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a LinearRegression object
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Predict on the test data
y_pred = model.predict(X_test)

# Calculate the Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Calculate the R-squared value
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2}")

# Get the coefficients and intercept of the model
coefficients = model.coef_
intercept = model.intercept_
print(f"Coefficients: {coefficients}")
print(f"Intercept: {intercept}")


Mean Squared Error: 0.44500000000000034
R-squared: 0.5549999999999997
Coefficients: [6.93000693e-05 6.93000693e-04 6.93000693e-03]
Intercept: 2.100000000000001
