## Project 3 (Advanced) — Realtime App (Notifications / Chat)  
### Workbook Guidelines (What to build, why, how to verify) — not spoonfeeding full code

This project is your “realtime systems” capstone. The point is not just “make a
WebSocket connect,” but to build a **production-shaped realtime feature**:

- authenticated WebSockets
- tenant/org scoping
- message persistence (don’t lose events)
- “online now” presence (optional)
- delivery semantics (best-effort push + reliable catch-up)
- abuse prevention (rate limits, payload limits)
- observability (request IDs won’t apply to WS the same way; you need connection IDs)
- tests for realtime behavior

---

# 1) Project Goal (What you’re building)

Build a realtime feature for your app (choose one primary track):

### Track A — Org Notifications Stream (recommended baseline)
When tasks/events happen (task created, status changed, export finished), connected
clients in the org see notifications instantly.

### Track B — Org Chat (classic realtime use-case)
Members in an org can join a room and send chat messages in realtime.

### Track C — Realtime Dashboard (events + counters)
A “live activity” page shows the last N events and updates counters live.

You can do A only, or A + B if you want deeper mastery.

---

# 2) Target Skills (What this project teaches)

- Channels architecture (Consumers, Routing, Channel Layer, Groups)
- WebSocket security:
  - origin validation
  - authentication via session/cookies or tokens
  - authorization before accept
- Tenant isolation (org boundaries) in realtime:
  - group naming must include org id
  - membership check before group_add
- Reliability patterns:
  - DB is source of truth
  - socket is best-effort delivery
  - catch-up fetch on reconnect
- Background jobs + realtime:
  - Celery job completion triggers WebSocket broadcast
- Tests:
  - WebsocketCommunicator
  - dedupe and idempotency for event delivery
- Operations:
  - Redis dependency
  - connection limits and backpressure
  - logging and metrics for realtime

---

# 3) Requirements (User Stories + Acceptance Criteria)

## 3.1 Authentication and authorization (non-negotiable)
1. As an anonymous user, I cannot connect to realtime endpoints.
2. As a logged-in user, I can connect only to orgs I belong to.
3. As a user in org A, I can never receive messages for org B.

Acceptance criteria:
- connect is rejected with a clear close code when unauthorized
- tenant leak regression tests exist (connect to org you don’t belong to fails)

---

## 3.2 Notifications stream (Track A baseline)
4. As an org member, I can open a page and see live notifications without refreshing.
5. When a task status changes, I receive a realtime event with:
   - task_id
   - action type (status_changed)
   - minimal details (from/to)
   - timestamp
6. When an export job finishes, I receive:
   - job_id
   - status (done/failed)
   - download link (if done and user authorized)

Acceptance criteria:
- events appear in UI live
- events are correctly scoped to org
- payloads contain only needed fields (no sensitive data)

---

## 3.3 Persistence + catch-up (reliability baseline)
7. If I disconnect and reconnect, I can fetch missed notifications from the DB.

Acceptance criteria:
- on connect, server sends “last N events” or client calls a “fetch missed” endpoint
- events are not lost permanently due to temporary disconnects

---

## 3.4 Chat (optional stretch)
8. As an org member, I can join a room and send messages.
9. Messages are saved to DB and appear in order.
10. A user cannot send more than X messages per minute (abuse protection).

Acceptance criteria:
- message appears for all connected users in room
- message is stored and visible after refresh
- rate limiting is enforced

---

# 4) Data Model Guidelines (What models you likely need)

You can reuse your existing `Organization` and `Membership`.

## Option 1 (Notifications-focused): `NotificationEvent`
Fields:
- organization FK (tenant boundary)
- type (task_event, export_done, webhook_failed, etc.)
- payload JSON (minimal structured data)
- created_at timestamp
- optional actor FK (who triggered it)
- optional “dedupe key” (for idempotency if same event can be emitted twice)

Indexes:
- (organization, created_at desc)

Purpose:
- DB becomes the source of truth for “recent notifications”
- WebSocket broadcasts are just a fast path

## Option 2 (Chat-focused): `Room` and `Message`
Room fields:
- organization FK
- name
- slug unique per org

Message fields:
- room FK
- organization FK (redundant but helpful for indexing/scoping)
- author FK
- body
- created_at
- optional client_message_id (dedupe on retries)

Indexes:
- (room, created_at)
- (organization, created_at)

Guideline:
- Always include org_id in queries even if room implies it (defense in depth).

---

# 5) WebSocket Endpoint Design (Protocol + Event Shapes)

A professional realtime feature is a **protocol**, not ad-hoc strings.

## 5.1 URL structure
Recommended:
- Notifications: `/ws/orgs/<org_slug>/notifications/`
- Chat rooms: `/ws/orgs/<org_slug>/rooms/<room_slug>/`

## 5.2 Event message shapes (JSON)
Use a consistent envelope:

### Server → Client examples
- Welcome:
  - `{ "type": "welcome", "org": {"id": 1, "slug": "acme"} }`
- Notification:
  - `{ "type": "task_event", "task_id": 123, "action": "status_changed", "details": {...}, "created_at": "..." }`
- Export done:
  - `{ "type": "export_done", "job_id": 9, "status": "done", "download_url": "..." }`
- Error:
  - `{ "type": "error", "error": {"code": "...", "message": "..."} }`

### Client → Server examples (optional)
- Ping:
  - `{ "type": "ping" }`
- Chat message:
  - `{ "type": "message", "client_message_id": "...", "body": "..." }`

Guidelines:
- Include `type` always.
- Keep payload minimal.
- Never send secrets or private fields unless you have explicit authorization.

---

# 6) Security Guidelines (Realtime-specific)

## 6.1 Origin validation (must-have)
- Use `AllowedHostsOriginValidator` or explicit `OriginValidator`.
- Do not accept sockets from arbitrary origins.

## 6.2 Authenticate before accept
- Use session auth (cookies) via `AuthMiddlewareStack` OR token-based auth in a
  custom middleware.
- Reject unauthenticated connections before `accept()`.

## 6.3 Authorize before joining groups
- Verify org membership **before** adding channel to `org-<id>` group.
- Close connection with meaningful close codes.

## 6.4 Rate limit and payload limits
- Limit message size (chat body length, JSON payload size).
- Limit client send rate (messages/min).
- If violated: close connection or ignore messages.

## 6.5 Multi-tenancy hard rule
- Group names must include org id, not org slug (slug can change).
- Never broadcast to a group without an org boundary.

---

# 7) Implementation Plan (Milestones)

## Milestone 1 — Channels setup and a secure notifications consumer
Deliverables:
- Channels installed, Redis channel layer configured
- ASGI routing supports websocket and http
- Consumer that:
  - requires login
  - requires org membership
  - joins org group and sends welcome message

Definition of Done:
- connection works under an ASGI server (Uvicorn)
- non-member cannot connect
- member connects and receives welcome event

---

## Milestone 2 — Broadcast task events into WebSockets
Deliverables:
- When a task status changes (service layer), broadcast an event to org group.

Definition of Done:
- open two browsers in same org
- change task in one, other receives event instantly

Guidelines:
- do not “broadcast directly from views”; broadcast from services where state changes
- persist event in DB first (recommended), then broadcast

---

## Milestone 3 — Add persistence + catch-up
Deliverables (choose one):
- On connect: send last N NotificationEvents from DB
OR
- Provide an HTTP endpoint `/orgs/<slug>/notifications/?since=<cursor>` to fetch missed events

Definition of Done:
- disconnect socket
- perform events
- reconnect and see missed events

---

## Milestone 4 (Optional) — Export job completion notifications
Deliverables:
- When background export job finishes, broadcast `export_done` to org group.

Definition of Done:
- start export job
- when done, UI receives event with download link

Guidelines:
- user authorization must still apply to download endpoint (don’t rely on socket)

---

## Milestone 5 (Optional) — Chat rooms
Deliverables:
- room list page (HTML or API)
- websocket room consumer that:
  - accepts messages
  - validates body length and rate limits
  - persists messages
  - broadcasts message events to room group
- chat history fetched on page load (HTTP) + new messages via WS

Definition of Done:
- messages persist across reload
- unauthorized user can’t join room
- rate limiting works

---

# 8) Testing Guidelines (High-value tests, minimum set)

## 8.1 WebSocket connection tests
- member can connect to org notifications socket
- non-member cannot connect (reject)
- anonymous cannot connect

Use `WebsocketCommunicator` in Channels tests.

## 8.2 Tenant leak regression tests (realtime version)
- user in org A attempts to connect to org B socket → rejected
- broadcast to org A group is not received by org B connections (optional but powerful)

## 8.3 Event persistence tests
- when task status changes:
  - NotificationEvent row is created (if you implement it)
  - broadcast payload matches expected shape

## 8.4 Abuse control tests (if chat)
- sending too many messages triggers close/deny
- message too large is rejected

## 8.5 Background job + realtime integration tests (optional)
- export job completion triggers broadcast (can be tested by calling broadcaster function and asserting communicator receives)

---

# 9) Observability Guidelines (Realtime isn’t like HTTP)

You should log:
- connection opened/closed
- org_slug/org_id
- user_id
- close code and reason
- message counts (optional)
- errors (DB access in consumer, redis layer errors)

Guidelines:
- introduce a connection_id for sockets (uuid hex)
- include connection_id in logs so you can debug “this socket is misbehaving”
- do not log message bodies in production for privacy unless required

---

# 10) Performance and Scaling Guidelines

## 10.1 Redis is a production dependency
- channel layer requires Redis for multi-process/multi-instance
- in-memory layer is dev-only

## 10.2 Group fan-out cost
Broadcasting to a large org group has a cost:
- payload size matters
- broadcast frequency matters

Guidelines:
- keep payload minimal
- batch noisy events if needed (advanced)
- consider sending “event id” then clients fetch details via HTTP for heavy payloads

## 10.3 Backpressure
If clients can’t keep up:
- queue grows
- memory usage rises
Mitigation:
- close slow clients after thresholds
- reduce event volume
- use smaller payloads

---

# 11) “Definition of Done” (Project 3 completion gate)

Project 3 is complete when:

- [ ] Secure websocket endpoint exists
- [ ] Membership enforced before accept and before group join
- [ ] Task events (or another domain event) broadcast successfully
- [ ] Persistence + catch-up exists (DB event store or fetch endpoint)
- [ ] Tests cover:
  - authorized connect
  - unauthorized connect
  - basic broadcast reception
- [ ] Redis channel layer used (not in-memory) for realistic environment
- [ ] Minimal observability logs exist for connect/disconnect/errors

---

# 12) Stretch Goals (Pick 3–6)

1. Presence (“online users in org”)
2. Per-user notifications group: `user-<id>`
3. Notification preferences (mute certain event types)
4. Delivery receipts (client sends ack; server marks delivered) — advanced
5. Offline queue: missed events fetched with cursor pagination
6. Rate limiting per socket
7. Admin ops tool: view active connections (approx) and recent events
8. E2E test: Playwright opens two tabs and verifies live update (advanced)

---

# 13) Common Failure Modes (and what to check)

- “WebSocket connects locally but not in production”
  - proxy not forwarding Upgrade headers
  - ASGI server not used (WSGI won’t work for websockets)
  - Origin validation blocking due to ALLOWED_HOSTS/Origin mismatch

- “User receives other org’s notifications”
  - group naming not tenant-scoped
  - broadcast function sending to wrong group
  - membership check missing before group_add

- “Events are lost”
  - relying only on WebSocket without DB persistence
  - fix: store NotificationEvent and fetch on reconnect

- “Redis down kills app”
  - treat realtime as optional degradation:
    - core app works, realtime fails gracefully
  - monitor Redis and have runbook

---

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='49. saas_crud_platform.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <span style='color:gray; font-size:1.05em;'>Next</span>
</div>
