# 20 More Event‑Processing Interview Problems (Stdlib‑Only, **No Solutions**)

*Each problem gives you a full task statement, a hint listing the stdlib constructs you’ll likely need, and a starter code cell.*

All problems presume events shaped like:

```python
{
    "id": int,
    "ts": "ISO‑8601 string",
    "user": "user42",
    "type": "LOGIN" | "VIEW" | "PURCHASE" | ...,
    "payload": {"value": int, ...}
}
```

Use the generator below or plug in your own dataset.


## Optional Helper – Synthetic Event Generator

In [1]:

import random, string
from datetime import datetime, timedelta

EVENT_TYPES = ["CLICK", "VIEW", "LOGIN", "PURCHASE", "LOGOUT"]
random.seed(1337)

def _rand_ts(start, end):
    delta = end - start
    return start + timedelta(seconds=random.randint(0, int(delta.total_seconds())))

def generate_events(n=1000):
    start, end = datetime.now() - timedelta(days=1), datetime.now()
    for i in range(n):
        yield {
            "id": i,
            "ts": _rand_ts(start, end).isoformat(),
            "user": f"user{random.randint(1, 200)}",
            "type": random.choice(EVENT_TYPES),
            "payload": {"value": random.randint(1, 10_000)}
        }


---

### Problem 1 – Hourly Event Histogram

**Task**  
Count how many events fall into each **hour of the day** (0‑23) and return a dict like `{hour: count}`.

**Hint – stdlib tools to consider** → datetime, collections.Counter


In [None]:
# TODO – your code for Problem 1


---

### Problem 2 – Streaming 90th‑Percentile Payload Value

**Task**  
Keep a data structure that lets you query the 90th percentile of `payload['value']` at any moment during ingestion.

**Hint – stdlib tools to consider** → heapq, bisect, statistics.quantiles


In [None]:
# TODO – your code for Problem 2


---

### Problem 3 – Triple‑Login Without Logout Detector

**Task**  
For each user, emit the first timestamp at which they have **three consecutive `LOGIN` events** without an intervening `LOGOUT`.

**Hint – stdlib tools to consider** → collections.defaultdict, collections.deque


In [None]:
# TODO – your code for Problem 3


---

### Problem 4 – Re‑order Events with ≤ 5 s Skew

**Task**  
Events arrive up to five seconds out of order. Buffer just enough to output them globally sorted.

**Hint – stdlib tools to consider** → heapq, datetime


In [None]:
# TODO – your code for Problem 4


---

### Problem 5 – Decile Buckets of Payload Values

**Task**  
Produce counts for ten equal‑width buckets between the min and max `payload['value']`.

**Hint – stdlib tools to consider** → bisect, math


In [None]:
# TODO – your code for Problem 5


---

### Problem 6 – User‑to‑Distinct‑Types Map

**Task**  
Build a mapping `user → set(event types)` as events stream in (needed for downstream access‑control logic).

**Hint – stdlib tools to consider** → collections.defaultdict, set


In [None]:
# TODO – your code for Problem 6


---

### Problem 7 – Reservoir Sample of 100 Events

**Task**  
Implement uniform random sampling to keep 100 representative events from an unbounded stream.

**Hint – stdlib tools to consider** → random, enumerate


In [None]:
# TODO – your code for Problem 7


---

### Problem 8 – Write NDJSON to Gzip

**Task**  
Serialize all events in NDJSON format and write to a `.gz` file called `events.ndjson.gz`.

**Hint – stdlib tools to consider** → json, gzip, io


In [None]:
# TODO – your code for Problem 8


---

### Problem 9 – Day‑of‑Week Aggregation

**Task**  
Count events for each day‑of‑week (Mon–Sun) and return an ordered list.

**Hint – stdlib tools to consider** → itertools.groupby, operator.itemgetter, datetime


In [None]:
# TODO – your code for Problem 9


---

### Problem 10 – Longest Daily Activity Streak per User

**Task**  
Find each user's maximum streak of consecutive calendar days with at least one event.

**Hint – stdlib tools to consider** → collections.defaultdict, datetime


In [None]:
# TODO – your code for Problem 10


---

### Problem 11 – Top‑10 Users by Spend

**Task**  
Assuming each `PURCHASE` event's `payload['value']` is dollars spent, output the top‑10 users by total spend.

**Hint – stdlib tools to consider** → collections.Counter, heapq.nlargest


In [None]:
# TODO – your code for Problem 11


---

### Problem 12 – Minute‑Level Surge Detection

**Task**  
Identify any minute where the event count exceeds `mean + 2*stdev` of the previous 60 minutes.

**Hint – stdlib tools to consider** → collections.deque, statistics.mean, statistics.stdev


In [None]:
# TODO – your code for Problem 12


---

### Problem 13 – Simple Bloom‑like Duplicate Filter

**Task**  
Implement a fixed‑size bit‑array to flag when an `id` is *probably* a duplicate.

**Hint – stdlib tools to consider** → hashlib, bytearray


In [None]:
# TODO – your code for Problem 13


---

### Problem 14 – Sliding Window Maximum

**Task**  
For every 10‑event sliding window, output the maximum `payload['value']`.

**Hint – stdlib tools to consider** → collections.deque


In [None]:
# TODO – your code for Problem 14


---

### Problem 15 – Temporal Join of Two Streams

**Task**  
Given two *sorted* lists A and B, output `(a, b)` pairs where `a.id == b.id` and timestamps differ by ≤2 seconds.

**Hint – stdlib tools to consider** → dict, datetime


In [None]:
# TODO – your code for Problem 15


---

### Problem 16 – Session Duration Distribution

**Task**  
Break each user's activity into sessions separated by ≥30 min of inactivity, then compute session durations.

**Hint – stdlib tools to consider** → collections.defaultdict, datetime


In [None]:
# TODO – your code for Problem 16


---

### Problem 17 – One‑Line XML Event Writer

**Task**  
Convert every event dict to an XML element `<event …/>` and write one per line to `events.xmll` (yes, double‑l).

**Hint – stdlib tools to consider** → xml.etree.ElementTree


In [None]:
# TODO – your code for Problem 17


---

### Problem 18 – Adjacent Type Pair Counts

**Task**  
Count how often each *ordered* pair of consecutive event types appears in the stream.

**Hint – stdlib tools to consider** → collections.Counter


In [None]:
# TODO – your code for Problem 18


---

### Problem 19 – Fixed‑Size Batch Averager

**Task**  
Process the stream in chunks of 500 events and print the average `payload['value']` for each chunk.

**Hint – stdlib tools to consider** → itertools.islice


In [None]:
# TODO – your code for Problem 19


---

### Problem 20 – First Timestamp Ordering Violation

**Task**  
In a supposedly sorted list, find the index where `ts` becomes earlier than the previous element (return ‑1 if never).

**Hint – stdlib tools to consider** → enumerate, datetime


In [None]:
# TODO – your code for Problem 20
