### Methodology

We infer demographic attributes (age band, gender) from behavioral signals observed in transactions. In the absence of ground-truth demographics, we treat purchase behavior as a proxy and frame the problem as probabilistic evidence aggregation over user events.

- **Assumptions**
  - Behavioral patterns (what, how often, when, and how much) correlate with demographics at population level.
  - Category- and brand-level priors approximate demographic propensity and can be refined as labeled data becomes available.
  - Signals are weak individually but informative in aggregate across a user’s history.

- **Scope**
  - Outputs: age buckets (`<25`, `25–40`, `40+`) and gender (`male`, `female`) with associated probabilities and a confidence score.
  - Inputs: product category/brand, order value, purchase frequency/recency, time-of-day/day-of-week.

Step 1: Identify Predictive Signals

- **Category and Brand Affinity**
  - Kids/baby categories → higher likelihood of `25–40`.
  - Beauty/apparel → higher likelihood of female; peak in `25–40`.
  - Consumer electronics → skew male; `<40` overall.
  - Health/grocery → relatively stronger `40+` signal.

- **Spend and Frequency Patterns**
  - Higher AOV and lower frequency → older/wealthier skew.
  - Lower AOV and higher frequency → younger skew.

- **Temporal Behavior**
  - Late-night/weekend skew → `<25` uplift.
  - Daytime/weekday shopping → family/working-age skew.

These signals are treated as weak learners that contribute weighted evidence rather than hard rules.


Step 2: Define Prior Weights (Evidence Model)

For each category/brand, we maintain priors over demographics that reflect propensity at population level. Examples:

- Electronics → Age: {`<25`: 0.40, `25–40`: 0.45, `40+`: 0.15}; Gender: {`male`: 0.60, `female`: 0.40}
- Beauty → Age: {`<25`: 0.35, `25–40`: 0.45, `40+`: 0.20}; Gender: {`male`: 0.20, `female`: 0.80}

These priors are initialized using domain knowledge/market research and are updated as labeled data arrives (e.g., via Bayesian updating or calibration). Unmapped categories back off to neutral distributions with smoothing to avoid overconfidence.


Step 3: Probabilistic Scoring and Aggregation

For each user:
1) Collect all transactions and map each event to demographic priors (category/brand/time-of-day/spend band).
2) Convert priors to additive evidence (e.g., log-probabilities or weighted counts) and sum across events.
3) Normalize to obtain posterior probabilities per class. Apply smoothing to prevent dominance by sparse signals.
4) Output the argmax class and a confidence score (e.g., 1 − entropy or margin between top-1 and top-2).

Interpretation: a user with sustained beauty/apparel purchases and moderate AOV will have uplift for `female, 25–40`, while occasional grocery purchases add mild `40+` weight without overriding consistent signals.

Step 4: Validation and Calibration Plan

When ground truth becomes available:
- **Classification quality**: Precision/Recall/F1 (gender), per-class recall and macro metrics (age buckets), confusion matrix.
- **Calibration**: Reliability curves, Expected Calibration Error, Brier score; temperature/Platt or isotonic calibration if needed.
- **Temporal robustness**: Time-based cross-validation to detect drift.
- **Ablations**: Contribution of each signal family (category, spend, temporal).
- **Business lift**: Uplift in campaign metrics vs. random/heuristic baselines.
- **Fairness**: Bias/parity checks where labels permit, with documented mitigations.

Step 5: Business Impact and Responsible AI

- **Personalization**: Improve targeting (e.g., beauty → female `25–40`, electronics → male `<40`) while measuring incremental lift.
- **Segmentation and Growth**: Identify under-served cohorts for acquisition/retention strategies.
- **Operationalization**: Serve scores to downstream systems with versioned priors and monitoring for drift.
- **Governance**: Document assumptions, maintain auditable mappings, and implement change controls for priors.
- **Privacy & Ethics**: Use only consented data, avoid sensitive inferences where prohibited, and monitor for stereotype reinforcement.

In [None]:
import pandas as pd

# Load transactional dataset
df = pd.read_csv("transactions.csv") //Assuming the file is named transactions.csv
print("Total transactions:", len(df))
df.head()


In [None]:
# Age weights per category
category_age = {
    "kids": {"<25":0.1, "25-40":0.6, "40+":0.3},
    "electronics": {"<25":0.4, "25-40":0.45, "40+":0.15},
    "beauty": {"<25":0.35, "25-40":0.45, "40+":0.2},
    "apparel": {"<25":0.3, "25-40":0.45, "40+":0.25},
    "health": {"<25":0.2, "25-40":0.35, "40+":0.45}
}

# Gender weights per category
category_gender = {
    "electronics": {"male":0.6, "female":0.4},
    "beauty": {"male":0.2, "female":0.8},
    "apparel": {"male":0.3, "female":0.7},
    "kids": {"male":0.4, "female":0.6},
    "health": {"male":0.45, "female":0.55}
}


In [None]:
def infer_demo(category):
    c = str(category).lower()
    for key in category_age.keys():
        if key in c:
            return category_age[key], category_gender[key]
    # default neutral distribution if not mapped
    return {"<25":0.33,"25-40":0.34,"40+":0.33}, {"male":0.5,"female":0.5}

# Quick check
for cat in df['category'].head(5):
    a, g = infer_demo(cat)
    print(cat, "=>", a, g)


In [None]:
user_preds = {}

for uid, g in df.groupby("user_id"):
    age_score = {"<25":0,"25-40":0,"40+":0}
    gen_score = {"male":0,"female":0}

    for cat in g['category']:
        a, gen = infer_demo(cat)
        for k,v in a.items(): age_score[k]+=v
        for k,v in gen.items(): gen_score[k]+=v

    # Normalize
    total_age = sum(age_score.values())
    total_gen = sum(gen_score.values())
    age_score = {k: round(v/total_age,2) for k,v in age_score.items()}
    gen_score = {k: round(v/total_gen,2) for k,v in gen_score.items()}

    user_preds[uid] = {
        "age_pred": max(age_score,key=age_score.get),
        "gender_pred": max(gen_score,key=gen_score.get),
        "age_prob": age_score,
        "gender_prob": gen_score
    }

preds_df = pd.DataFrame(user_preds).T
preds_df.head()


## Validation Checklist
- Compare predictions with ground truth when available (holdout set).
- Report Precision/Recall/F1 (gender) and macro-averaged metrics (age).
- Plot reliability curves; compute ECE/Brier; calibrate if miscalibrated.
- Perform time-split validation to assess stability and drift.
- Quantify business lift vs. baseline targeting.


## Strategic Insights
- **Personalization**: Deploy audience cohorts with confidence thresholds to balance scale vs. precision.
- **Growth**: Surface white-space opportunities by comparing inferred mix vs. market benchmarks.
- **Experimentation**: A/B test targeting policies; feed outcomes back to update priors/calibration.
- **Risk Management**: Monitor drift, confidence degradation, and fairness indicators; implement alerts.
