<a href="https://colab.research.google.com/github/Geetanshi-jain/DSAssignmentByGeetanshijain/blob/main/regression_modelling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



### 📊 #9: REGRESSION MODELING

---

### 🔍 **Introduction to Regression Modeling**

Regression modeling is a mathematical method that helps us understand relationships between data points and make predictions. 📈 For example, if we want to estimate something for the future—like how much a customer might spend per visit—we can use a regression model to find an approximate value.

---

### 🧮 **How Do We Estimate?**

Our goal is to predict a customer’s “sales per visit” based on:
- **Days between purchases**
- **Credit card use** (whether the customer has a credit card or not)

**Example**:  
A customer shops every 333 days and does not have a credit card. Plugging these details into the model, we estimate their sales per visit at **$128.13**. However, the actual amount they spent was **$184.23**.  

#### 📝 Prediction Error:
Actual sales - Predicted sales = $184.23 - $128.13 = $56.10.


This tells us our model predicted a lower amount than the actual value.

---

### 📐 **Model Evaluation Metrics**

Model evaluation helps us assess how accurate our model is. Two key metrics are:

1. **Standard Error of the Estimate (s)** - This predicts the typical error in our predictions. Based on our data, it’s **$87.54**. This is relatively large since our data lacks some details that could improve accuracy.

2. **Mean Absolute Error (MAE)** - This calculates the average error between predicted and actual values. Here, MAE is **$53.39**, indicating a $53.39 average difference between predictions and actual values.

---

### 📊 **What is the R-squared (R²) Value?**

The R-squared value tells us how much of the total sales variation is explained by the predictors ("Days since Purchase" and "Credit Card").  
In our model, the adjusted R² value is **0.064** (or 6.4%), meaning most factors affecting sales are not covered by this model.

---

## 📈 Estimation Modeling in Python

Let's set up and evaluate a regression model in Python:

---

### 🔹 **Step 1: Import Libraries and Set Up Data**

```python
# 📚 Import Libraries
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SequentialFeatureSelector

# 🚀 Sample Data for Demonstration
data = pd.DataFrame({
    'Sales per Visit': [184.23, 139.21, 170.18, 128.19, 163.21],
    'Days Since Purchase': [333, 150, 200, 120, 310],
    'Credit Card Use': [0, 1, 0, 1, 0]
})

# 🧹 Split data into training and test sets
X = data[['Days Since Purchase', 'Credit Card Use']]
y = data['Sales per Visit']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

---

### 🔹 **Step 2: Train the Regression Model**

```python
# 📈 Initialize and fit the model
model = LinearRegression()
model.fit(X_train, y_train)
```

---

### 🔹 **Step 3: Predict Sales for a New Customer**

```python
# 🛒 Predict Sales for a New Customer (Days Since Purchase = 333, Credit Card Use = 0)
cust01 = np.array([[333, 0]])
predicted_sales = model.predict(cust01)[0]
print(f"🎯 Predicted Sales for New Customer: ${predicted_sales:.2f}")
```

---

### 🔹 **Step 4: Calculate Prediction Error**

```python
# 📝 Prediction Error Calculation (Actual vs. Predicted)
actual_sales = 184.23
prediction_error = actual_sales - predicted_sales
print(f"📉 Prediction Error: ${prediction_error:.2f}")
```

---

### 🔹 **Step 5: Model Evaluation with MAE**

```python
# 📊 Evaluate Model with Mean Absolute Error (MAE) on Test Data
y_pred = model.predict(X_test)
test_mae = mean_absolute_error(y_test, y_pred)
print(f"🔍 Model MAE: ${test_mae:.2f}")

# ➡️ Calculate R-squared (R²)
r_squared = model.score(X_train, y_train)
print(f"📐 Model R-squared (R²): {r_squared:.4f}")
```

---

## 🔄 Stepwise Regression

Stepwise regression is a method to identify the best predictive model by adding one variable at a time and assessing its usefulness.

---

### 🔹 **Stepwise Regression in Python**

```python
# 🔄 Stepwise Regression with Sequential Feature Selector
sfs = SequentialFeatureSelector(model, direction='forward', scoring='neg_mean_absolute_error')
sfs.fit(X_train, y_train)

# 🎉 Selected Features
selected_features = X_train.columns[sfs.get_support()]
print("🌟 Selected Features:", selected_features.tolist())
```

---

### 🔹 **Step 6: Model Comparison**

```python
# 🔍 Model Comparison
baseline_mae = 55.53  # Hypothetical Baseline MAE
print(f"🔹 Baseline MAE: ${baseline_mae}")
print(f"🔹 Regression Model MAE: ${test_mae}")

if test_mae < baseline_mae:
    print("✅ The Regression Model performs better!")
else:
    print("⚠️ The Baseline Model performs better.")
```

---
Example:


*   Predicted Value: $112.57
*   Baseline MAE: $55.53
*   Estimation Model MAE: $53.39
*   Comparison:    53.39<55.53(Model is better)



## 🏆 Conclusion

If the MAE of the regression model is less than the baseline model's MAE, the regression model performs better. Both regression modeling and stepwise regression are valuable in data analysis and prediction.