# Correlation, Covariance, and Simple Linear Regression

This session explores the statistical relationships between variables and introduces simple linear regression, a foundational concept in machine learning and data science.

---

## 📘 Theory Explanation

### 1. Correlation
- **Definition**: Correlation measures the strength and direction of the linear relationship between two variables.
- **Range**: 
  - \( 1 \): Perfect positive correlation (as one increases, the other increases proportionally).
  - \( -1 \): Perfect negative correlation (as one increases, the other decreases proportionally).
  - \( 0 \): No correlation.

#### Formula for Correlation Coefficient (Pearson's \( r \)):
$$
r = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y}
$$

Where:
- \( \text{Cov}(X, Y) \): Covariance between \( X \) and \( Y \).
- \( \sigma_X \) and \( \sigma_Y \): Standard deviations of \( X \) and \( Y \.



--

### 2. Covariance
- **Definition**: Measures how two variables change together.
- **Interpretation**:
  - **Positive**: Variables move in the same direction.
  - **Negative**: Variables move in opposite directions.
  - **Zero**: No relationship.

#### Formula for Covariance:
$$
\text{Cov}(X, Y) = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{n - 1}
$$

---

### 3. Simple Linear Regression
- **Definition**: Models the linear relationship between a dependent variable (\( y \)) and an independent variable (\( x \)).
- **Equation**:
$$
y = mx + b
$$
  - \( m \): Slope (rate of change of \( y \) with respect to \( x \)).
  - \( b \): Intercept (value of \( y \) when \( x = 0 \)).

- **Line of Best Fit**: Minimizes the sum of squared residuals (differences between actual and predicted \( y \)).

---

### 4. \( R^2 \) (R-squared)
- **Definition**: Measures how well the regression line fits the data.
- **Range**:
  - \( 1 \): Perfect fit.
  - \( 0 \): No explanatory power.

#### Formula for \( R^2 \):
$$
R^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}}
$$
Where:
- \( \text{SS}_{\text{res}} \): Sum of squared residuals.
- \( \text{SS}_{\text{tot}} \): Total sum of squares.


---

## 💻 Practical Implementation
### 1. Correlation and Covariance
Calculate correlation coefficients and covariance between two variables using Python.

### 2. Simple Linear Regression
Use Python libraries to create a regression model, find the line of best fit, and evaluate 
\( R^2 \) 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

data = {'X': [1, 2, 3, 4, 5], 'Y': [2, 4, 5, 4, 5]}
df = pd.DataFrame(data)

correlation = df['X'].corr(df['Y'])
covariance = df['X'].cov(df['Y'])
print(f"Correlation: {correlation}")
print(f"Covariance: {covariance}")

X = df['X'].values.reshape(-1, 1)
y = df['Y'].values

model = LinearRegression()
model.fit(X, y)

y_pred = model.predict(X)

r2 = r2_score(y, y_pred)
p
plt.scatter(df['X'], df['Y'], color='blue', label='Data Points')
plt.plot(df['X'], y_pred, color='red', label='Regression Line')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Simple Linear Regression')
plt.legend()
plt.show()

Correlation: 0.7745966692414834
Covariance: 1.5


NameError: name 'p' is not defined