# Consumer Behavior – Ads Engagement Analysis (Notebook)

**Goal:** Test whether **device** is associated with higher engagement with ads.

**Dataset:** `Ecommerce_Consumer_Behavior_Analysis_Data.csv`

> This notebook mirrors the Python script and adds quick EDA, a baseline vs model check, and a clean figure saved as `ads_by_device.png` for the README.

## 0. Setup
- Imports
- Display options
- Random seeds

In [13]:
import warnings

warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 120)
np.random.seed(42)

pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 120)
np.random.seed(42)

## 1. Load & Quick Inspect

In [14]:
CSV_PATH = "Ecommerce_Consumer_Behavior_Analysis_Data.csv"
df = pd.read_csv(CSV_PATH)

print("Rows, Cols:", df.shape)
display(df.head())
display(df.describe(include="all"))
print("\nMissing Engagement (raw):", df["Engagement_with_Ads"].isna().sum())

Rows, Cols: (1000, 28)


Unnamed: 0,Customer_ID,Age,Gender,Income_Level,Marital_Status,Education_Level,Occupation,Location,Purchase_Category,Purchase_Amount,Frequency_of_Purchase,Purchase_Channel,Brand_Loyalty,Product_Rating,Time_Spent_on_Product_Research(hours),Social_Media_Influence,Discount_Sensitivity,Return_Rate,Customer_Satisfaction,Engagement_with_Ads,Device_Used_for_Shopping,Payment_Method,Time_of_Purchase,Discount_Used,Customer_Loyalty_Program_Member,Purchase_Intent,Shipping_Preference,Time_to_Decision
0,37-611-6911,22,Female,Middle,Married,Bachelor's,Middle,Évry,Gardening & Outdoors,$333.80,4,Mixed,5,5,2.0,,Somewhat Sensitive,1,7,,Tablet,Credit Card,3/1/2024,True,False,Need-based,No Preference,2
1,29-392-9296,49,Male,High,Married,High School,High,Huocheng,Food & Beverages,$222.22,11,In-Store,3,1,2.0,Medium,Not Sensitive,1,5,High,Tablet,PayPal,4/16/2024,True,False,Wants-based,Standard,6
2,84-649-5117,24,Female,Middle,Single,Master's,High,Huzhen,Office Supplies,$426.22,2,Mixed,5,5,0.3,Low,Not Sensitive,1,7,Low,Smartphone,Debit Card,3/15/2024,True,True,Impulsive,No Preference,3
3,48-980-6078,29,Female,Middle,Single,Master's,Middle,Wiwilí,Home Appliances,$101.31,6,Mixed,3,1,1.0,High,Somewhat Sensitive,0,1,,Smartphone,Other,10/4/2024,True,True,Need-based,Express,10
4,91-170-9072,33,Female,Middle,Widowed,High School,Middle,Nara,Furniture,$211.70,6,Mixed,3,4,0.0,Medium,Not Sensitive,2,10,,Smartphone,Debit Card,1/30/2024,False,False,Wants-based,No Preference,4


Unnamed: 0,Customer_ID,Age,Gender,Income_Level,Marital_Status,Education_Level,Occupation,Location,Purchase_Category,Purchase_Amount,Frequency_of_Purchase,Purchase_Channel,Brand_Loyalty,Product_Rating,Time_Spent_on_Product_Research(hours),Social_Media_Influence,Discount_Sensitivity,Return_Rate,Customer_Satisfaction,Engagement_with_Ads,Device_Used_for_Shopping,Payment_Method,Time_of_Purchase,Discount_Used,Customer_Loyalty_Program_Member,Purchase_Intent,Shipping_Preference,Time_to_Decision
count,1000,1000.0,1000,1000,1000,1000,1000,1000,1000,1000,1000.0,1000,1000.0,1000.0,1000.0,753,1000,1000.0,1000.0,744,1000,1000,1000,1000,1000,1000,1000,1000.0
unique,1000,,8,2,4,3,2,969,24,989,,3,,,,3,3,,,3,3,5,344,2,2,4,3,
top,48-203-9118,,Female,High,Widowed,Bachelor's,High,Oslo,Electronics,$253.37,,Mixed,,,,High,Very Sensitive,,,High,Desktop,PayPal,3/3/2024,True,False,Need-based,No Preference,
freq,1,,452,515,260,341,517,4,54,2,,340,,,,268,350,,,270,350,219,8,521,509,256,372,
mean,,34.304,,,,,,,,,6.945,,3.026,3.033,1.01303,,,0.954,5.399,,,,,,,,,7.547
std,,9.353238,,,,,,,,,3.147361,,1.416803,1.436654,0.791802,,,0.810272,2.868454,,,,,,,,,4.035849
min,,18.0,,,,,,,,,2.0,,1.0,1.0,0.0,,,0.0,1.0,,,,,,,,,1.0
25%,,26.0,,,,,,,,,4.0,,2.0,2.0,0.0,,,0.0,3.0,,,,,,,,,4.0
50%,,34.5,,,,,,,,,7.0,,3.0,3.0,1.0,,,1.0,5.0,,,,,,,,,8.0
75%,,42.0,,,,,,,,,10.0,,4.0,4.0,2.0,,,2.0,8.0,,,,,,,,,11.0



Missing Engagement (raw): 256


## 2. Cleaning
- Normalize engagement labels (strip, title-case)
- Map to numeric score (0=None, 3=High)
- Keep true missing values as NA

In [15]:
ENGAGEMENT_MAP = {"None": 0, "Low": 1, "Medium": 2, "High": 3}
consumer_data = df.copy()
consumer_data["Engagement_with_Ads"] = (
    consumer_data["Engagement_with_Ads"].astype("string").str.strip().str.title()
)
consumer_data["Engagement_with_Ads_Score"] = consumer_data["Engagement_with_Ads"].map(
    ENGAGEMENT_MAP
)

print(
    "Unique engagement (cleaned):",
    consumer_data["Engagement_with_Ads"].dropna().unique(),
)
print(
    "Missing after mapping (true blanks only):",
    consumer_data["Engagement_with_Ads_Score"].isna().sum(),
)

Unique engagement (cleaned): <StringArray>
['High', 'Low', 'Medium']
Length: 3, dtype: string
Missing after mapping (true blanks only): 256


## 3. EDA (Device & Engagement)
- Device distribution
- Mean engagement by device

In [16]:
# 3) EDA — device counts and mean engagement by device
device_counts = consumer_data["Device_Used_for_Shopping"].value_counts(dropna=False)
display(device_counts.to_frame(name="count"))

device_ads = (
    consumer_data.groupby("Device_Used_for_Shopping")["Engagement_with_Ads_Score"]
    .mean()
    .sort_values(ascending=False)
)
display(device_ads.to_frame(name="avg_engagement"))

Unnamed: 0_level_0,count
Device_Used_for_Shopping,Unnamed: 1_level_1
Desktop,350
Tablet,339
Smartphone,311


Unnamed: 0_level_0,avg_engagement
Device_Used_for_Shopping,Unnamed: 1_level_1
Desktop,2.145038
Smartphone,2.012821
Tablet,1.995968


## 4. Visualization – Save `ads_by_device.png`
Horizontal bars, value labels, grid, fixed 0–3 score axis.

In [17]:
s = device_ads.dropna().sort_values(ascending=True)
fig, ax = plt.subplots(figsize=(9, 5), dpi=150)
bars = ax.barh(s.index.astype(str), s.values)
ax.bar_label(bars, labels=[f"{v:.2f}" for v in s.values], padding=4)
ax.set_title(
    "Average Engagement with Ads by Device\n(Score: 0=None, …, 3=High)", pad=10
)
ax.set_xlabel("Average Engagement Score")
ax.set_ylabel("Device")
ax.set_xlim(0, 3)
ax.grid(axis="x", linestyle="--", alpha=0.3)
plt.tight_layout()
plt.savefig("ads_by_device.png", bbox_inches="tight")
plt.close(fig)
print("Saved figure: ads_by_device.png")

Saved figure: ads_by_device.png


## 5. Model – Device-only Logistic Regression
Binary target: **High** vs not High. Compare to a majority-class **baseline**.

In [None]:
# Prepare ML frame
ml = consumer_data[consumer_data["Engagement_with_Ads"].notna()].copy()
ml["Engagement_High"] = (ml["Engagement_with_Ads"] == "High").astype(int)
X = pd.get_dummies(ml["Device_Used_for_Shopping"], drop_first=True)
y = ml["Engagement_High"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)

# Baseline: always predict the majority class in y_train
baseline = max(y_train.mean(), 1 - y_train.mean())

print(f"Baseline accuracy (majority class): {baseline:.3f}")
print(f"Model accuracy: {acc:.3f}")
print("\nCoefficients (aligned to columns):")
display(pd.Series(model.coef_[0], index=X.columns).sort_values(ascending=False))
print("\nClassification report:")
print(classification_report(y_test, y_pred, digits=3))

Baseline accuracy (majority class): 0.635
Model accuracy: 0.644

Coefficients (aligned to columns):


Tablet       -0.331423
Smartphone   -0.343972
dtype: float64


Classification report:
              precision    recall  f1-score   support

           0      0.644     1.000     0.784        96
           1      0.000     0.000     0.000        53

    accuracy                          0.644       149
   macro avg      0.322     0.500     0.392       149
weighted avg      0.415     0.644     0.505       149



## Takeaways

- **Baseline accuracy (majority class): 0.635**  
- **Model accuracy (device-only): 0.644**

**What it means.** Device alone provides **limited predictive signal** for “High” engagement—the lift over baseline is small.  
**Averages by device** do differ (see `ads_by_device.png`), but not enough to make device a strong standalone predictor.

**Practical note.** Treat device as a **supporting feature**. For meaningful lift, combine it with richer signals such as visit frequency, region, and campaign/copy.

**Next step ideas.** Add features and re-run: frequency of purchase, time-of-day, region, or campaign type; compare logistic regression to a tree-based model and report whether accuracy improves.

