### Load & Prepare Data

In [1]:
import json
import pandas as pd

with open("../data/raw/events.json", "r") as f:
    events = json.load(f)

df = pd.json_normalize(events)
df["timestamp"] = pd.to_datetime(df["timestamp"])
df = df.sort_values("timestamp")

df["resolved_user"] = df["userId"].fillna(df["anonymousId"])

### List All Event Types Again

In [2]:
df["event"].dropna().unique()

array(['Signup Completed', 'Login', 'Feature Used'], dtype=object)

## Event Taxonomy (Conceptual)
#### Signup Completed
-->Category: Acquisition
-->Meaning: User successfully created an account

#### Login
-->Category: Engagement
-->Meaning: User accessed the product

#### Feature Used
-->Category: Activation / Core Value
-->Meaning: User interacted with a core product feature

## Define Core Metrics

#### Activated User:
-->A user who has performed at least one “Feature Used” event.
#### Activation Time:
-->The time difference between a user’s first event and their first “Feature Used” event.
#### Active Session:
-->A session containing at least one meaningful event (Login or Feature Used).
#### Daily Active User (DAU):
-->The number of unique resolved users who performed at least one meaningful event in a given day.

### (Light) Metric Sanity Check

In [3]:
activated_users = df[df["event"] == "Feature Used"]["resolved_user"].unique()
activated_users

array(['user_001', 'anon_456'], dtype=object)

#### Observations

1️⃣ Why is an event taxonomy important?
An event taxonomy ensures consistent interpretation of user behavior and prevents metric ambiguity as data scales.

2️⃣ Why define metrics before calculating them?
Defining metrics first ensures calculations reflect business intent rather than arbitrary data patterns.

3️⃣ What risks arise from poorly named events?
Poorly named events lead to misaligned analysis, incorrect conclusions, and loss of trust in metrics.

4️⃣ How does this prepare us for real datasets?
Clear taxonomy and metric definitions allow scalable, repeatable analysis even with noisy and large datasets.