# Day 3 – Linear Regression & Logistic Regression


## Part 1: Linear Regression

### Topics Covered
- Concept of Linear Regression  
- Implementation using NumPy & scikit-learn  
- Model Evaluation & Interpretation  



### What is Linear Regression?
Linear Regression models the relationship between an independent variable **X** and a dependent variable **y** by fitting a straight line:

**y = mX + b**

Where:
- m = slope (coefficient)
- b = intercept


In [1]:

# Simple Linear Regression using NumPy
import numpy as np

# Sample dataset
X = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 6])

# Mean values
x_mean = np.mean(X)
y_mean = np.mean(y)

# Calculate coefficients
m = np.sum((X - x_mean) * (y - y_mean)) / np.sum((X - x_mean) ** 2)
b = y_mean - m * x_mean

print("Slope (m):", m)
print("Intercept (b):", b)


Slope (m): 0.8
Intercept (b): 1.7999999999999998



### Interpretation of Coefficients
- **Slope (m)**: Change in y for a one-unit change in X  
- **Intercept (b)**: Value of y when X = 0  


In [2]:

# Evaluation Metrics
y_pred = m * X + b

mse = np.mean((y - y_pred) ** 2)
r2 = 1 - (np.sum((y - y_pred) ** 2) / np.sum((y - y_mean) ** 2))

print("Mean Squared Error:", mse)
print("R-squared:", r2)


Mean Squared Error: 0.47999999999999987
R-squared: 0.7272727272727273



### Model Evaluation
- **MSE** measures average squared prediction error  
- **R²** shows how much variance is explained by the model  


---


## Part 2: Logistic Regression

### Topics Covered
- Logistic Regression Concept  
- Binary Classification  
- Performance Metrics  



### What is Logistic Regression?
Logistic Regression is used for **classification problems**.  
It predicts probabilities using the **sigmoid function**:

**σ(z) = 1 / (1 + e⁻ᶻ)**

Output is between 0 and 1.



### Difference: Linear vs Logistic Regression
- Linear Regression → Continuous output  
- Logistic Regression → Probability / Class labels  
- Logistic uses **sigmoid**, Linear uses **identity function**


In [3]:

# Logistic Regression using scikit-learn
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score
import numpy as np

# Sample binary classification dataset
X = np.array([[1],[2],[3],[4],[5],[6]])
y = np.array([0, 0, 0, 1, 1, 1])

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)


In [4]:

# Evaluation Metrics
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))


Accuracy: 1.0
Precision: 0.0
Recall: 0.0


  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])



### Interpretation of Odds Ratio
- Coefficients represent **log-odds**
- **Odds Ratio = e^(coefficient)**
- Indicates how odds change with one-unit increase in feature


In [5]:

# Odds Ratio
odds_ratio = np.exp(model.coef_)
print("Odds Ratio:", odds_ratio)


Odds Ratio: [[2.26816789]]



### Final Takeaways
- Linear Regression → Prediction  
- Logistic Regression → Classification  
- Metrics matter: choose based on problem type  
