# **Linear Regression**

Linear regression is a fundamental algorithm in machine learning and statistics used for modeling the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the variables.

---

## Key Concepts


### **Types of Linear Regression**

- **Simple Linear Regression**:
  - Models the relationship between one independent variable and a dependent variable.
  - Formula: \( y = mx + b \), where:
    - \( m \): Slope (coefficient)
    - \( b \): Intercept

- **Multiple Linear Regression**:
  - Models the relationship between multiple independent variables and a dependent variable.
  - Formula: \( y = b_0 + b_1x_1 + b_2x_2 + ... + b_nx_n \), where:
    - \( b_0 \): Intercept
    - \( b_1, b_2, ..., b_n \): Coefficients of independent variables.

---

## Assumptions of Linear Regression


1. **Linearity**: The relationship between the independent and dependent variable is linear.
2. **Independence**: Observations are independent of each other.
3. **Homoscedasticity**: Constant variance of errors.
4. **Normality**: Errors are normally distributed.
5. **No Multicollinearity**: Independent variables are not highly correlated.

---

## Mathematical Formulation


The goal of linear regression is to minimize the **sum of squared residuals (errors)** between the predicted values and the actual values. This is achieved by solving:

\[
$J(\mathbf{w}) = \frac{1}{2m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})^2$
\]

Where:
- \( $J(\mathbf{w})$ \): Cost function
- \( $\hat{y}^{(i)}$ \): Predicted value
- \( $y^{(i)}$ \): Actual value
- \( $m$\): Number of data points

**Derivation is used to find the minimum values.**

![costfunction.png](../images/lr_cost_function.png)

Here m and c values are found according the minimum error.

![cost_funtion_vs_m_and_c.png](../images//m,%20c%20vs%20cost.png)

- m and c are found by cosidering minimum cost function value.

![cost_function_graph2.png](../images/lr_cost_graph.png)

**Gradient Descent** algorithm is used to find m and c in machine learning libraries.


---

## Implementation Steps


1. **Collect Data**:
   - Gather data with independent and dependent variables.

2. **Preprocess Data**:
   - Handle missing values, scale features, and split the dataset into training and test sets.

3. **Fit Model**:
   - Train the linear regression model on the training data.

4. **Evaluate Model**:
   - Use metrics like Mean Squared Error (MSE), R-squared, and Mean Absolute Error (MAE) to evaluate performance.

5. **Make Predictions**:
   - Use the trained model to predict outcomes on new data.

---

## Python Implementation


### Simple Linear Regression


```python
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt

# Sample Data
X = np.array([[1], [2], [3], [4], [5]])  # Independent variable
y = np.array([2, 4, 6, 8, 10])          # Dependent variable

# Model Training
model = LinearRegression()
model.fit(X, y)

# Predictions
predictions = model.predict(X)
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

# Plot Results
plt.scatter(X, y, color='blue', label='Actual')
plt.plot(X, predictions, color='red', label='Predicted')
plt.legend()
plt.show()
```

### Multiple Linear Regression


```python
# Sample Data
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])  # Independent variables
y = np.array([5, 7, 9, 11])                     # Dependent variable

# Model Training
model = LinearRegression()
model.fit(X, y)

# Predictions
predictions = model.predict(X)
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
```


[Implement Linear Regression Model](02%20-%20Implement%20Linear%20Regression%20Model.ipynb)

## Applications


1. **Predicting Trends**:
   - Forecast sales, prices, or other measurable quantities.

2. **Risk Analysis**:
   - Assess risks in insurance or finance.

3. **Healthcare**:
   - Predict disease progression based on clinical features.

4. **Marketing**:
   - Understand the impact of factors like advertising spend on sales.

---

Linear regression is one of the simplest and most effective algorithms for predicting continuous values. It forms the foundation for many advanced machine learning techniques.