# Data Workflow â€“ Exploratory Data Analysis

**Name:** Roberto Torres  
**Dataset:** Marketing Campaign Performance Dataset (Kaggle)  

This project explores marketing campaign performance metrics to understand relationships between spend, engagement, and conversion outcomes using reproducible EDA practices.


In [None]:
# Optional: Kaggle download (requires kaggle.json configured)
# !pip install kaggle
# !kaggle datasets download -d manishabhatt22/marketing-campaign-performance-dataset
# !unzip marketing-campaign-performance-dataset.zip

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")


In [None]:
# Load dataset
df = pd.read_csv("marketing_campaign_dataset.csv")
df.head()


## Data Cleaning Functions

In [None]:

def standardize_column_names(df):
    """
    Standardize column names by making them lowercase and replacing spaces with underscores.
    """
    df.columns = df.columns.str.lower().str.replace(" ", "_")
    return df

def coerce_numeric_and_handle_missing(df):
    """
    Convert numeric columns and handle missing values by median imputation.
    """
    for col in df.select_dtypes(include=["object"]).columns:
        try:
            df[col] = pd.to_numeric(df[col])
        except ValueError:
            pass

    for col in df.select_dtypes(include=["int64", "float64"]).columns:
        df[col] = df[col].fillna(df[col].median())

    return df

df = standardize_column_names(df)
df = coerce_numeric_and_handle_missing(df)


## Exploratory Data Analysis

In [None]:

def campaign_eda(df):
    """
    Perform summary statistics, grouped analysis, and correlation checks.
    """
    summary = df.describe()
    grouped = df.groupby("campaign_type").mean(numeric_only=True)
    correlations = df.corr(numeric_only=True)
    return summary, grouped, correlations

summary_stats, grouped_stats, corr = campaign_eda(df)
summary_stats


In [None]:
grouped_stats

## Visualizations

In [None]:

plt.figure(figsize=(6,4))
sns.scatterplot(data=df, x="spend", y="conversions")
plt.title("Spend vs Conversions")
plt.xlabel("Spend")
plt.ylabel("Conversions")
plt.show()


In [None]:

plt.figure(figsize=(6,4))
sns.barplot(data=df, x="campaign_type", y="roi")
plt.title("ROI by Campaign Type")
plt.xlabel("Campaign Type")
plt.ylabel("ROI")
plt.show()


In [None]:

plt.figure(figsize=(8,6))
sns.heatmap(corr, cmap="coolwarm", annot=False)
plt.title("Correlation Heatmap of Campaign Metrics")
plt.show()


## Summary and Interpretation

The analysis shows clear relationships between marketing spend and conversion outcomes, with notable variation in ROI across campaign types. Some campaigns achieve higher efficiency despite lower spend, suggesting opportunities for budget optimization. Limitations include lack of temporal data and potential imbalance across campaign categories.
