# 30 Event‑Processing Interview Problems (Stdlib‑Only, **No Solutions**)

Each section poses a realistic *events‑in‑memory* challenge you might face in a 60‑minute coding interview.  
For every problem you get:

* **Problem statement** – what to build.  
* **Hint** – the *one or two* Python‑stdlib constructs you’ll probably need (from the cheat‑sheet).  
* **Starter cell** – write your own code below the `# TODO` line.

> ✨ **Pro tip:** run `help(<obj>)` in a separate cell if you forget an API.


## Helper – Synthetic Event Generator

You can reuse this if you like, or roll your own. Delete or modify as you wish.


In [1]:

import random, string
from datetime import datetime, timedelta

random.seed(42)
EVENT_TYPES = ["CLICK", "VIEW", "LOGIN", "PURCHASE", "LOGOUT"]

def random_timestamp(start, end):
    delta = end - start
    return start + timedelta(seconds=random.randint(0, int(delta.total_seconds())))

def generate_events(n=1000):
    start = datetime.now() - timedelta(days=100)
    end = datetime.now()
    for i in range(n):
        yield {
            "id": i,
            "ts": random_timestamp(start, end).isoformat(),
            "user": f"user{random.randint(1, 50)}",
            "type": random.choice(EVENT_TYPES),
            "payload": {"value": random.randint(1, 1000)}
        }


---

### Problem 1 – Top‑K Event Types

**Task**  
Top‑K Event Types.

**Hint (constructs to consider)** → collections.Counter, heapq.nlargest



In [2]:
from collections import Counter
# ⬇️ Write your solution for Problem 1 below

g = generate_events()
c = Counter()
for event in g:
    # iterable of one
    c.update([event["type"]])

c.most_common(2)


[('LOGOUT', 208), ('PURCHASE', 204)]

In [3]:
import heapq

h = []
g = generate_events(1000)
for d in g:
    heapq.heappush(h, ((d["ts"]), d))



In [4]:
# Ascending Time

tsl = [heapq.heappop(h) for _ in h]
tsl[:2]

[('2025-03-04T08:57:34.517168',
  {'id': 203,
   'ts': '2025-03-04T08:57:34.517168',
   'user': 'user6',
   'type': 'LOGOUT',
   'payload': {'value': 987}}),
 ('2025-03-04T13:45:43.517168',
  {'id': 363,
   'ts': '2025-03-04T13:45:43.517168',
   'user': 'user50',
   'type': 'PURCHASE',
   'payload': {'value': 535}})]

In [5]:
# Descending Time 

sorted(tsl, reverse=True)[:2]

[('2025-04-22T11:45:04.517168',
  {'id': 682,
   'ts': '2025-04-22T11:45:04.517168',
   'user': 'user1',
   'type': 'PURCHASE',
   'payload': {'value': 510}}),
 ('2025-04-22T04:55:43.517168',
  {'id': 466,
   'ts': '2025-04-22T04:55:43.517168',
   'user': 'user48',
   'type': 'CLICK',
   'payload': {'value': 216}})]

---

### Problem 2 – Per‑User Sliding‑Window Rate Limiter

**Task**  
Per‑User Sliding‑Window Rate Limiter.

**Hint (constructs to consider)** → collections.defaultdict, collections.deque



In [6]:
from collections import deque
# ⬇️ Write your solution for Problem 2 below
USER = "user5"
g = generate_events(1000)
d = deque(maxlen=100)
for elem in generate_events(1000):
    if d.count(USER) >= 2:
        print(f"{USER} reached his limit")
        continue
    d.appendleft(elem["user"])



user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached his limit
user5 reached hi

In [7]:
# Moving Average
g = generate_events(1000)
d = deque(maxlen=100)
s = 0
for elem in generate_events(1000):
    d.appendleft(elem["user"])
    

In [8]:
def moving_average():
    g = generate_events(100)
    d = deque()
    count = 0
    s = 0
    for elem in g:
        if count == 10:
            count -= 1
            s -= d.pop()     
        payload = elem["payload"]["value"]
        d.appendleft(elem["payload"]["value"])
        count += 1
        s += payload
        mv = s/count
        yield mv, count

In [9]:
list(moving_average())

[(599.0, 1),
 (634.5, 2),
 (739.0, 3),
 (593.5, 4),
 (517.2, 5),
 (435.5, 6),
 (452.42857142857144, 7),
 (507.375, 8),
 (474.0, 9),
 (454.4, 10),
 (397.3, 10),
 (415.8, 10),
 (409.9, 10),
 (440.0, 10),
 (439.9, 10),
 (446.8, 10),
 (465.6, 10),
 (388.8, 10),
 (404.8, 10),
 (398.6, 10),
 (444.8, 10),
 (456.1, 10),
 (406.2, 10),
 (396.8, 10),
 (420.2, 10),
 (477.3, 10),
 (500.6, 10),
 (492.9, 10),
 (514.1, 10),
 (508.5, 10),
 (537.6, 10),
 (523.0, 10),
 (522.1, 10),
 (503.4, 10),
 (534.9, 10),
 (548.8, 10),
 (472.2, 10),
 (560.6, 10),
 (505.7, 10),
 (580.8, 10),
 (596.3, 10),
 (552.8, 10),
 (516.3, 10),
 (541.6, 10),
 (513.2, 10),
 (446.4, 10),
 (473.6, 10),
 (447.8, 10),
 (493.8, 10),
 (422.8, 10),
 (342.5, 10),
 (380.0, 10),
 (407.4, 10),
 (390.1, 10),
 (376.2, 10),
 (415.7, 10),
 (401.2, 10),
 (393.3, 10),
 (409.1, 10),
 (390.5, 10),
 (462.8, 10),
 (419.7, 10),
 (440.3, 10),
 (456.2, 10),
 (486.2, 10),
 (445.0, 10),
 (506.1, 10),
 (472.9, 10),
 (454.9, 10),
 (531.6, 10),
 (489.7, 10),


---

### Problem 3 – LRU Cache for Expensive Payload Transforms

**Task**  
LRU Cache for Expensive Payload Transforms.

**Hint (constructs to consider)** → collections.OrderedDict



In [10]:

# ⬇️ Write your solution for Problem 3 below
# TODO


---

### Problem 4 – Merge K Pre‑sorted Event Streams by Timestamp

**Task**  
Merge K Pre‑sorted Event Streams by Timestamp.

**Hint (constructs to consider)** → heapq.merge



In [11]:

# ⬇️ Write your solution for Problem 4 below
# TODO


---

### Problem 5 – Count Events in Last W Seconds Using Binary Search

**Task**  
Count Events in Last W Seconds Using Binary Search.

**Hint (constructs to consider)** → bisect, bisect_left, bisect_right



In [12]:
# ⬇️ Write your solution for Problem 5 below

import bisect
from typing import Generator

WDAYS = 5
evl = []
g: Generator = generate_events(300)
for ev in g:
    bisect.insort_right(evl, ev, key=lambda x: x.get("ts"))


last_ev = evl[-1]
first_event = datetime.fromisoformat(last_ev.get("ts"))
oldest_event = first_event - timedelta(days=5)

idx = bisect.bisect_right(evl, first_event.isoformat(), key=lambda e: e.get("ts"))
idx

# TODO

300

In [13]:
len(evl) - 281

19

In [14]:
datetime.fromisoformat(evl[282]["ts"]) < first_event <  datetime.fromisoformat(evl[283]["ts"])

False

In [15]:
oldest_event

datetime.datetime(2025, 6, 6, 1, 22, 1, 606680)

In [16]:
first_event

datetime.datetime(2025, 6, 11, 1, 22, 1, 606680)

In [17]:
evl[280:282]

[{'id': 155,
  'ts': '2025-06-05T07:37:50.606680',
  'user': 'user7',
  'type': 'CLICK',
  'payload': {'value': 235}},
 {'id': 115,
  'ts': '2025-06-05T15:06:14.606680',
  'user': 'user43',
  'type': 'LOGOUT',
  'payload': {'value': 102}}]

---

### Problem 6 – Daily Cumulative Event Counts

**Task**  
Daily Cumulative Event Counts.

**Hint (constructs to consider)** → itertools.groupby, itertools.accumulate



In [None]:
from itertools import groupby
# ⬇️ Write your solution for Problem 6 below

from datetime import datetime

events = [(datetime.fromisoformat(ev['ts']).date(), ev) for ev in generate_events(3000)]

events.sort(key = lambda x: x[0])

for day, group in groupby(events, key= lambda x: x[0]):
    print(f"{day}: {(list(group))[:3]}")





2025-03-04: [(datetime.date(2025, 3, 4), {'id': 197, 'ts': '2025-03-04T11:45:14.472049', 'user': 'user45', 'type': 'CLICK', 'payload': {'value': 581}}), (datetime.date(2025, 3, 4), {'id': 703, 'ts': '2025-03-04T22:14:44.472049', 'user': 'user33', 'type': 'LOGIN', 'payload': {'value': 637}}), (datetime.date(2025, 3, 4), {'id': 950, 'ts': '2025-03-04T14:37:16.472049', 'user': 'user45', 'type': 'VIEW', 'payload': {'value': 683}})]
2025-03-05: [(datetime.date(2025, 3, 5), {'id': 96, 'ts': '2025-03-05T03:06:44.472049', 'user': 'user14', 'type': 'LOGOUT', 'payload': {'value': 442}}), (datetime.date(2025, 3, 5), {'id': 109, 'ts': '2025-03-05T01:11:32.472049', 'user': 'user23', 'type': 'PURCHASE', 'payload': {'value': 601}}), (datetime.date(2025, 3, 5), {'id': 133, 'ts': '2025-03-05T04:59:59.472049', 'user': 'user39', 'type': 'LOGOUT', 'payload': {'value': 527}})]
2025-03-06: [(datetime.date(2025, 3, 6), {'id': 4, 'ts': '2025-03-06T08:13:05.472049', 'user': 'user27', 'type': 'VIEW', 'payload':

---

### Problem 7 – Memoise Payload Normalisation

**Task**  
Memoise Payload Normalisation.

**Hint (constructs to consider)** → functools.lru_cache



In [None]:

# ⬇️ Write your solution for Problem 7 below
# TODO


---

### Problem 8 – Convert NDJSON to CSV

**Task**  
Convert NDJSON to CSV.

**Hint (constructs to consider)** → json, csv



In [None]:

# ⬇️ Write your solution for Problem 8 below
# TODO


---

### Problem 9 – Multi‑Key Stable Sort of Events

**Task**  
Multi‑Key Stable Sort of Events.
> **Task**   Return the first 10 events ordered by `(ts, type, user)` in one pass.


**Hint (constructs to consider)** → operator.itemgetter, sorted



In [35]:
from operator import itemgetter
# ⬇️ Write your solution for Problem 9 below

first_10 = []
evl = list(generate_events(100))
sorted(evl, key=itemgetter("ts", "type", "user"))[:10]




[{'id': 68,
  'ts': '2025-03-05T01:46:39.063079',
  'user': 'user26',
  'type': 'VIEW',
  'payload': {'value': 336}},
 {'id': 72,
  'ts': '2025-03-05T13:30:35.063079',
  'user': 'user21',
  'type': 'PURCHASE',
  'payload': {'value': 917}},
 {'id': 2,
  'ts': '2025-03-07T00:40:14.063079',
  'user': 'user26',
  'type': 'VIEW',
  'payload': {'value': 250}},
 {'id': 52,
  'ts': '2025-03-07T15:57:19.063079',
  'user': 'user32',
  'type': 'CLICK',
  'payload': {'value': 227}},
 {'id': 82,
  'ts': '2025-03-08T21:35:29.063079',
  'user': 'user22',
  'type': 'PURCHASE',
  'payload': {'value': 929}},
 {'id': 99,
  'ts': '2025-03-10T23:01:54.063079',
  'user': 'user46',
  'type': 'VIEW',
  'payload': {'value': 277}},
 {'id': 23,
  'ts': '2025-03-15T03:13:07.063079',
  'user': 'user41',
  'type': 'PURCHASE',
  'payload': {'value': 112}},
 {'id': 16,
  'ts': '2025-03-15T05:10:21.063079',
  'user': 'user8',
  'type': 'CLICK',
  'payload': {'value': 979}},
 {'id': 35,
  'ts': '2025-03-15T05:30:54.063

---
### Problem 10 – Finding the Median Event `value` on the Fly   (`heapq` again, but *dual* heaps)

> **Task**   Stream events and be able to query the median of the integer `payload['value']` at any time.

**Hint (constructs to consider)** → heapq, dual‑heap technique



In [None]:
import heapq
# ⬇️ Write your solution for Problem 10 below
# TODO


---

### Problem 11 – Deduplicate Events by `id`

**Task**  
Deduplicate Events by `id`.

**Hint (constructs to consider)** → set, dict



In [None]:

# ⬇️ Write your solution for Problem 11 below
# TODO


---

### Problem 12 – Approximate Deduplication with a Bloom‑like Filter

**Task**  
Approximate Deduplication with a Bloom‑like Filter.

**Hint (constructs to consider)** → bytearray, hashlib



In [None]:

# ⬇️ Write your solution for Problem 12 below
# TODO


---

### Problem 13 – Rolling Average of a Metric (Sliding Window)

**Task**  
Rolling Average of a Metric (Sliding Window).

**Hint (constructs to consider)** → collections.deque



In [None]:

# ⬇️ Write your solution for Problem 13 below
# TODO


---

### Problem 14 – TTL Cache with Expiration

**Task**  
TTL Cache with Expiration.

**Hint (constructs to consider)** → collections.OrderedDict, datetime



In [75]:
from operator import itemgetter
from datetime import timedelta
from collections import deque

cache = deque()

# ⬇️ Write your solution for Problem 14 below
TTL = timedelta(days=1)

entries = list(generate_events(500))
entries.sort(key=itemgetter("ts"))
for entry in entries:
    current_entry_time = datetime.fromisoformat(entry["ts"])
    while cache and cache[0][0] + TTL < current_entry_time:
        # evict
        removed = cache.popleft()
    cache.append((datetime.fromisoformat(entry["ts"]), entry))
        


---

### Problem 15 – Sessionise Events Separated by Idle Time

**Task**  
Sessionise Events Separated by Idle Time.

**Hint (constructs to consider)** → dict, datetime, timedelta



In [None]:

# ⬇️ Write your solution for Problem 15 below
# TODO


---

### Problem 16 – Peak Concurrent Sessions via Sweep Line

**Task**  
Peak Concurrent Sessions via Sweep Line.

**Hint (constructs to consider)** → sorted list of tuples, heapq or simple counters



In [55]:

# ⬇️ Write your solution for Problem 16 below
import random
from datetime import datetime, timedelta

def generate_paired_sessions(n_sessions=500):
    now = datetime.now()
    start_base = now - timedelta(days=30)
    for i in range(n_sessions):
        user = f"user{random.randint(1,50)}"
        # pick a random start
        start = start_base + timedelta(seconds=random.randint(0, 30*24*3600))
        # give it a random duration between 1 minute and 4 hours
        end = start + timedelta(seconds=random.randint(60, 4*3600))
        yield {
            "user": user,
            "start_ts": start.isoformat(),
            "end_ts":   end.isoformat()
        }


In [65]:
time_window = timedelta(minutes=180)
sweep_line = []
for ev in generate_paired_sessions(500):
    sweep_line.append((datetime.fromisoformat(ev["start_ts"]), +1))
    sweep_line.append((datetime.fromisoformat(ev["end_ts"]), -1))

sweep_line.sort(key = lambda x: (x[0], x[1]))
concurrent = 0
max_concur = 0
for ts, delta in sweep_line:
    concurrent += delta
    max_concur = max(max_concur, concurrent)


---

### Problem 17 – Hourly Histograms of Event Types

**Task**  
Hourly Histograms of Event Types.

**Hint (constructs to consider)** → collections.defaultdict, collections.Counter



In [53]:
from collections import defaultdict, Counter
from datetime import datetime

hist = defaultdict(Counter)

for ev in generate_events(200):
    h = datetime.fromisoformat(ev["ts"]).hour
    t = ev["type"]
    #hist[h][t] += 1
    hist[t].update(t)


In [54]:
evl[:10]

[(17, 'LOGIN'),
 (15, 'LOGIN'),
 (10, 'LOGIN'),
 (23, 'VIEW'),
 (19, 'LOGOUT'),
 (3, 'LOGIN'),
 (22, 'CLICK'),
 (11, 'CLICK'),
 (17, 'LOGOUT'),
 (3, 'CLICK')]

---

### Problem 18 – Unique User Count per Rolling Window

**Task**  
Unique User Count per Rolling Window.

**Hint (constructs to consider)** → set, collections.deque



In [52]:
from collections import deque
# ⬇️ Write your solution for Problem 18 below
d = deque()
c = Counter()
WINDOW = 20
for ev in generate_events(200):
    user = ev["user"]
    d.appendleft(user)
    c.update([user])
    if len(d) > WINDOW:
        old_user = d.pop()
        c[old_user] -= 1
        if c[old_user] == 0:
            del c[old_user]        




c

Counter({'user49': 2,
         'user21': 2,
         'user11': 2,
         'user33': 2,
         'user15': 1,
         'user38': 1,
         'user28': 1,
         'user36': 1,
         'user44': 1,
         'user5': 1,
         'user30': 1,
         'user19': 1,
         'user12': 1,
         'user39': 1,
         'user41': 1,
         'user23': 1})

---

### Problem 19 – Throttle Logins per IP Address

**Task**  
Throttle Logins per IP Address.

**Hint (constructs to consider)** → collections.defaultdict, collections.deque



In [None]:

# ⬇️ Write your solution for Problem 19 below
# TODO


---

### Problem 20 – Maintain Top‑1% Highest payload['value']

**Task**  
Maintain Top‑1% Highest payload['value'].

**Hint (constructs to consider)** → heapq, nlargest / nsmallest



In [None]:

# ⬇️ Write your solution for Problem 20 below
# TODO


---

### Problem 21 – Detect Sequence: LOGIN → PURCHASE within Δ sec

**Task**  
Detect Sequence: LOGIN → PURCHASE within Δ sec.

**Hint (constructs to consider)** → enum.Enum, dict, datetime



In [None]:

# ⬇️ Write your solution for Problem 21 below
# TODO


---

### Problem 22 – Parse Apache‑style Logs with Regex

**Task**  
Parse Apache‑style Logs with Regex.

**Hint (constructs to consider)** → re



In [None]:

# ⬇️ Write your solution for Problem 22 below
# TODO


---

### Problem 23 – Tokenise and Count Words in Message Field

**Task**  
Tokenise and Count Words in Message Field.

**Hint (constructs to consider)** → str.split, collections.Counter



In [None]:

# ⬇️ Write your solution for Problem 23 below
# TODO


---

### Problem 24 – Bucketise Timestamps into Fixed 5‑min Windows

**Task**  
Bucketise Timestamps into Fixed 5‑min Windows.

**Hint (constructs to consider)** → datetime, integer division



In [None]:

# ⬇️ Write your solution for Problem 24 below
# TODO


---

### Problem 25 – Rolling Min/Max Queue

**Task**  
Rolling Min/Max Queue.

**Hint (constructs to consider)** → collections.deque (monotonic queue)



In [None]:

# ⬇️ Write your solution for Problem 25 below
# TODO


---

### Problem 26 – Aggregate After Sorting by User Then Type

**Task**  
Aggregate After Sorting by User Then Type.

**Hint (constructs to consider)** → itertools.groupby



In [None]:

# ⬇️ Write your solution for Problem 26 below
# TODO


---

### Problem 27 – Detect Repeated Pattern Within User Stream

**Task**  
Detect Repeated Pattern Within User Stream.

**Hint (constructs to consider)** → collections.deque, set



In [None]:

# ⬇️ Write your solution for Problem 27 below
# TODO


---

### Problem 28 – Join Two Streams on Shared `id`

**Task**  
Join Two Streams on Shared `id`.

**Hint (constructs to consider)** → dict



In [None]:

# ⬇️ Write your solution for Problem 28 below
# TODO


---

### Problem 29 – Real‑time Leaderboard of Top Buyers

**Task**  
Real‑time Leaderboard of Top Buyers.

**Hint (constructs to consider)** → heapq, dict



In [None]:

# ⬇️ Write your solution for Problem 29 below
# TODO


---

### Problem 30 – Checkpoint & Replay Events to/from Disk

**Task**  
Checkpoint & Replay Events to/from Disk.

**Hint (constructs to consider)** → json, open/read/write



In [None]:

# ⬇️ Write your solution for Problem 30 below
# TODO
