# Module 8: Capstone Project

## Capstone Project: End-to-End Marketing Analytics
- Define a business problem
- Collect and clean data
- Perform EDA and segmentation
- Build predictive models
- Present actionable insights

## Practice Exercise
1. Complete an end-to-end marketing analytics project using a real or simulated dataset.
2. Present your findings and recommendations.

## Example Dataset

We'll use the sample CSV file (`marketing_sample.csv`) containing customer demographics, campaign responses, purchases, and review text. This file is available in your workspace.

In [None]:
# Step 1: Load and Explore Data
import pandas as pd

df = pd.read_csv('../marketing_sample.csv')
df.head()

## Step 2: Data Cleaning & EDA
- Check for missing values
- Summary statistics
- Visualize key variables

In [None]:
# Check for missing values
print(df.isnull().sum())

# Summary statistics
print(df.describe())

# Visualize distribution of Income
import matplotlib.pyplot as plt
plt.hist(df['Income'], bins=5, color='skyblue', edgecolor='black')
plt.title('Income Distribution')
plt.xlabel('Income')
plt.ylabel('Frequency')
plt.show()

## Step 3: Segmentation Example (RFM)
- Segment customers by Recency, Frequency, and Monetary value (simulate with Purchases and Income)

In [None]:
# Simple segmentation: High vs. Low Purchases
purchase_threshold = df['Purchases'].median()
df['Segment'] = ['High Value' if x > purchase_threshold else 'Low Value' for x in df['Purchases']]
print(df['Segment'].value_counts())

## Step 4: Predictive Modeling
- Build a logistic regression model to predict campaign response

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

# Prepare features and target
X = df[['Age', 'Income', 'Purchases', 'Campaign_Contacted']]
y = df['Response']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Confusion Matrix:\n', confusion_matrix(y_test, y_pred))

## Step 5: Present Insights
- Summarize key findings from your analysis and model
- Make recommendations for the marketing team

### Your Turn
- Try different segmentation strategies (e.g., by age or income)
- Tune the model or try other algorithms
- Create visualizations to support your recommendations