# Session 69: Capstone Project Part 2 (Baseline Modeling)

**Unit 6: Data Ethics, Privacy, and Future Trends**
**Hour: 69**
**Mode: Practical Project**

---

### 1. Objective

In this final technical session, we will build a simple **baseline predictive model**. The goal is not to build a perfect, production-ready model, but to use machine learning as a final validation step for our EDA. We want to see if a Logistic Regression model agrees with our findings about which features are the most important for predicting campaign response.

### 2. Setup

We will need our full cleaned DataFrame and several components from Scikit-learn.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

# --- Start of Cleaning and Feature Engineering Code ---
url = 'https://raw.githubusercontent.com/LeoFernan/Marketing-Campaigns-Analysis/main/marketing_campaign.csv'
df = pd.read_csv(url, sep='\t')
df['Income'].fillna(df['Income'].median(), inplace=True)
df['Age'] = 2024 - df['Year_Birth']
df['Relationship'] = df['Marital_Status'].replace({'Married': 'In Relationship', 'Together': 'In Relationship', 'Single': 'Single', 'Divorced': 'Single', 'Widow': 'Single', 'Alone': 'Single', 'Absurd': 'Single', 'YOLO': 'Single'})
df['Children'] = df['Kidhome'] + df['Teenhome']
spend_cols = ['MntWines', 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts', 'MntGoldProds']
df['Total_Spend'] = df[spend_cols].sum(axis=1)
df_model = df[['Income', 'Recency', 'Children', 'Total_Spend', 'Relationship', 'Response']].copy()
# --- End of Cleaning and Feature Engineering Code ---

### 3. Data Preparation for Modeling

We need to follow the standard ML workflow: separate X/y, one-hot encode, and split.

In [None]:
# 1. Separate Features (X) and Target (y)
X = df_model.drop('Response', axis=1)
y = df_model['Response']

# 2. One-Hot Encode Categorical Features
X_encoded = pd.get_dummies(X, columns=['Relationship'], drop_first=True)

# 3. Split Data
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.3, random_state=42, stratify=y)
# We use stratify=y because the dataset is imbalanced. This ensures the train and test sets have the same proportion of responders.

### 4. Build and Evaluate the Model

In [None]:
# Initialize, Fit, Predict
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Baseline Model Accuracy: {accuracy*100:.2f}%")

An accuracy of ~86% is quite good for a simple baseline model.

### 5. Interpret the Model

Let's look at the coefficients to see if the model agrees with our EDA findings.

In [None]:
coefficients = model.coef_[0]
feature_names = X_train.columns
coef_df = pd.DataFrame({'Feature': feature_names, 'Coefficient': coefficients})

coef_df = coef_df.sort_values(by='Coefficient', ascending=False)

print(coef_df)

**Interpretation:**
*   **Positive Coefficients (Increase probability of responding):** `Total_Spend` and `Income` are the strongest positive predictors. This perfectly matches our EDA findings.
*   **Negative Coefficients (Decrease probability of responding):** `Recency`, `Children`, and `Relationship_Single` are the strongest negative predictors. This also matches our EDA. A higher recency (more days since last purchase) leads to a lower probability of response.

The machine learning model has automatically confirmed the insights we discovered through our manual exploration!

### 6. Conclusion

This session completed our analytical workflow. We have:
1.  Prepared our cleaned data for a machine learning task.
2.  Built a baseline Logistic Regression model and found it to be reasonably accurate.
3.  Interpreted the model's coefficients and confirmed that the features we identified during our EDA (`Total_Spend`, `Income`, `Recency`, `Children`) are indeed the most important drivers of campaign response.

This provides a powerful, final validation of our analysis.

**Next Session:** We will focus on structuring all of our findings into a final presentation for the business stakeholders.