Skip to content

CodeWithZezo/tracehub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TraceHub — Distributed Logging Platform (Monolithic MVP)

A production-grade learning system that simulates Kafka-like reliability, ACK/replay protocols, and real-time observability — all in one monolithic repo.

┌─────────────────────────────────────────────────────────┐
│                    TRACEHUB ARCHITECTURE                 │
│                                                         │
│  ┌──────────────┐   batch+seq   ┌──────────────────┐   │
│  │  auth-svc    │──────────────▶│                  │   │
│  │  payment-svc │  POST /ingest │   Backend API    │   │
│  │  notif-svc   │◀─────────────│   (Express.js)   │   │
│  └──────────────┘   ACK/missing └────────┬─────────┘   │
│   (Producer Sim)                         │             │
│                                    ┌─────▼──────┐      │
│                                    │   Redis    │      │
│                                    │  queue:logs│      │
│                                    │  queue:retry│     │
│                                    │  producer:*│      │
│                                    └─────┬──────┘      │
│                                          │             │
│                                    ┌─────▼──────┐      │
│                                    │  Log Worker │     │
│                                    │  (batch    │      │
│                                    │   insert)  │      │
│                                    └─────┬──────┘      │
│                                          │             │
│                                    ┌─────▼──────┐      │
│                                    │ PostgreSQL │      │
│                                    │  logs table│      │
│                                    │  ack_state │      │
│                                    │  replays   │      │
│                                    └────────────┘      │
│                                                        │
│  ┌──────────────┐    Socket.io    ┌──────────────────┐ │
│  │   Next.js    │◀───────────────│   Backend WS     │ │
│  │   Dashboard  │                │   (real-time)    │ │
│  └──────────────┘                └──────────────────┘ │
└─────────────────────────────────────────────────────────┘

🚀 Quick Start

# Clone and start everything
git clone <repo>
cd tracehub
docker-compose up --build

# Services:
# Dashboard:  http://localhost:3000
# Backend:    http://localhost:3001
# PostgreSQL: localhost:5432
# Redis:      localhost:6379

🏗️ Project Structure

tracehub/
├── apps/
│   ├── backend/              # Node.js + Express + Socket.io
│   │   └── src/
│   │       ├── index.ts          # App entry + Socket.io setup
│   │       ├── routes/
│   │       │   ├── ingest.ts     # POST /ingest — receive log batches
│   │       │   ├── logs.ts       # GET /logs — query + replay API
│   │       │   └── metrics.ts    # GET /metrics + POST /control
│   │       ├── services/
│   │       │   ├── database.ts   # PostgreSQL pool + helpers
│   │       │   ├── redis.ts      # Redis client + queue ops
│   │       │   ├── ackService.ts # ACK/replay protocol
│   │       │   └── metricsService.ts
│   │       └── workers/
│   │           └── logWorker.ts  # Queue consumer + batch insert
│   │
│   ├── dashboard/            # Next.js 14 + Tailwind + Recharts
│   │   └── src/
│   │       ├── app/
│   │       │   ├── page.tsx          # Overview
│   │       │   ├── live-logs/        # Real-time log stream
│   │       │   ├── replay/           # Replay center
│   │       │   ├── queue/            # Queue monitor
│   │       │   └── services/         # Per-service metrics
│   │       ├── components/
│   │       │   ├── SocketProvider.tsx
│   │       │   ├── dashboard/
│   │       │   └── ui/
│   │       └── hooks/
│   │           └── useMetrics.ts     # Socket.io hooks
│   │
│   └── producer-simulator/   # Fake EC2 services
│       └── src/index.ts      # auth + payment + notification producers
│
├── postgres/init.sql         # Schema: logs, ack_state, replay_requests
├── redis/redis.conf
├── shared/src/index.ts       # Shared TypeScript types
└── docker-compose.yml

🧠 Core Concepts Implemented

1. Sequence-Based Log Protocol

Every log carries a monotonically increasing seq number per service:

{
  "seq": 1001,
  "service": "payment-service",
  "level": "error",
  "message": "payment failed",
  "requestId": "req_abc123",
  "timestamp": "2024-01-15T10:30:00.000Z"
}

2. ACK/Replay Protocol

Producer → Backend flow:

  1. Producer generates logs, saves them in Redis sorted set (producer:{service}) keyed by seq
  2. Producer sends batch via POST /ingest
  3. Backend enqueues logs into Redis queue:logs
  4. Backend computes ACK response:
    • ackTill = highest contiguous seq received
    • missing = gaps in the sequence window
  5. Producer removes ACKed logs from its buffer
  6. Missing seqs are queued in replay_requests table

ACK Response Format:

{
  "batchId": "ack_1705312200000",
  "ackTill": 1099,
  "missing": [1088, 1092, 1095],
  "status": "partial"
}

3. Redis Queue Architecture

queue:logs        LIST  → main processing queue (RPUSH / LPOP)
queue:retry       LIST  → failed/retry queue
producer:{svc}    ZSET  → producer buffer sorted by seq (for replay lookup)
ack:{svc}         STRING → current ACK state per service

4. Worker Processing

The LogWorker runs continuously:

  1. Pops up to 50 logs from queue:logs
  2. Deduplicates by service:seq
  3. Batch inserts into PostgreSQL
  4. Checks replay_requests for pending replays
  5. Fetches buffered logs from Redis for replay
  6. Marks replays completed

5. Duplicate Protection

  • In-memory sliding window Set of service:seq pairs
  • PostgreSQL ON CONFLICT DO NOTHING on insert
  • Maximum 10,000 entries tracked before eviction

🎛️ Fault Injection Controls

Control Effect
Crash Worker Stops log processing for 5s, auto-recovers
Network Failure Drops all /ingest requests for 10s
Delay ACK Adds 3s latency to ACK responses
Flush Queue Drops all queued logs (demonstrates data loss)
Reset All Clears all failure states

📊 Dashboard Pages

Page Path Features
Overview / System metrics, EPS chart, service status, fault controls
Live Logs /live-logs Real-time log stream, filters, search
Replay Center /replay Replay requests, manual trigger, live events
Queue Monitor /queue Queue depth charts, worker status, sim events
Service Metrics /services Per-service EPS, errors, ACK state, charts

🔌 API Reference

POST /ingest              Receive log batch, return ACK
GET  /logs                Query logs (filters: service, level, search, from, to)
GET  /logs/replay         List replay requests
POST /logs/replay         Trigger manual replay
GET  /metrics             System metrics snapshot
GET  /metrics/queue       Queue depth + worker stats
GET  /metrics/replay      Replay stats by status
POST /metrics/control     Fault injection controls
GET  /health              Health check

📡 WebSocket Events

Event Direction Payload
metrics:snapshot Server → Client Full SystemMetrics every 2s
log:new Server → Client Individual LogEntry
ack:sent Server → Client ACK details
replay:completed Server → Client Replay result
worker:crashed Server → Client Crash notification
worker:recovered Server → Client Recovery notification
sim:control Server → Client Fault injection event

🔄 Future Microservices Migration Path

This monolith is structured to extract into:

tracehub-ingest-service    → /ingest route
tracehub-worker-service    → logWorker.ts
tracehub-query-service     → /logs route
tracehub-metrics-service   → metricsService.ts
tracehub-replay-service    → ackService.ts + replay worker

Each shares the same PostgreSQL schema and Redis keys, making migration additive rather than disruptive.

🛠️ Local Development (Without Docker)

# Start dependencies
docker-compose up postgres redis -d

# Backend
cd apps/backend
npm install
npm run dev

# Producer
cd apps/producer-simulator
npm install
npm run dev

# Dashboard
cd apps/dashboard
npm install
npm run dev

About

A production-grade learning system that simulates Kafka-like reliability, ACK/replay protocols, and real-time observability — all in one monolithic repo. next step is to made the production ready opem source project that will handle 100k event/sec.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors