Description
The API has zero observability infrastructure. There are no Prometheus metrics, no structured audit trails for financial operations, and no distributed tracing. This makes it impossible to monitor the system, detect anomalies, debug production issues, or meet compliance requirements for a financial platform.
Problem Analysis
What is missing
- No metrics: No request latency histograms, no error rate counters, no business metrics (quiz completions/hour, reward claims/day)
- No audit trail: Reward claims and credential mints modify both DB and blockchain, but there is no audit log recording who did what, when, and the full transaction lifecycle
- No distributed tracing: A single request spans JWT verify → DB query → Redis lookup → Stellar RPC → DB update, with no way to trace the full path
- No Prometheus endpoint: No way for Grafana/Datadog to scrape metrics
- No alerting hooks: No way to detect spikes in error rates or latency
Current logging
The codebase uses Pino (src/utils/logger.ts) for basic request logging, but:
- Logs are not structured for aggregation (no consistent field names)
- No business event logging (e.g., "reward claimed" with amount, user, tx hash)
- No security event logging (e.g., "failed auth attempt" with IP, address)
Required Implementation
A. Prometheus Metrics
Install: npm install prom-client
// New file: src/metrics/index.ts
import { Registry, Counter, Histogram, Gauge } from "prom-client";
export const register = new Registry();
// HTTP metrics
export const httpRequestDuration = new Histogram({
name: "http_request_duration_seconds",
help: "Duration of HTTP requests in seconds",
labelNames: ["method", "route", "status_code"],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10],
registers: [register],
});
export const httpRequestTotal = new Counter({
name: "http_requests_total",
help: "Total number of HTTP requests",
labelNames: ["method", "route", "status_code"],
registers: [register],
});
// Stellar metrics
export const stellarTxTotal = new Counter({
name: "stellar_transactions_total",
help: "Total Stellar transactions submitted",
labelNames: ["method", "status"],
registers: [register],
});
export const stellarTxDuration = new Histogram({
name: "stellar_transaction_duration_seconds",
help: "Duration of Stellar transaction submission",
labelNames: ["method"],
buckets: [0.5, 1, 2, 5, 10, 30],
registers: [register],
});
// Business metrics
export const rewardClaimsTotal = new Counter({
name: "reward_claims_total",
help: "Total reward claims",
labelNames: ["status", "amount_bucket"],
registers: [register],
});
export const credentialsMintedTotal = new Counter({
name: "credentials_minted_total",
help: "Total credentials minted",
labelNames: ["course_id"],
registers: [register],
});
export const quizzesCompletedTotal = new Counter({
name: "quizzes_completed_total",
help: "Total quizzes completed",
labelNames: ["passed"],
registers: [register],
});
// System metrics
export const activeConnections = new Gauge({
name: "db_active_connections",
help: "Number of active database connections",
registers: [register],
});
export const redisConnected = new Gauge({
name: "redis_connected",
help: "Redis connection status (1=connected, 0=disconnected)",
registers: [register],
});
B. Metrics Endpoint
// In server.ts
import { register } from "./metrics/index.js";
app.get("/metrics", async (request, reply) => {
reply.header("Content-Type", register.contentType);
return reply.send(await register.metrics());
});
C. Fastify Metrics Hook
// New file: src/metrics/fastify-hook.ts
import type { FastifyInstance } from "fastify";
import { httpRequestDuration, httpRequestTotal } from "./index.js";
export function registerMetricsHook(app: FastifyInstance) {
app.addHook("onResponse", async (request, reply) => {
const duration = (reply.elapsedTime || 0) / 1000;
const labels = {
method: request.method,
route: request.routeOptions?.url ?? request.url,
status_code: reply.statusCode,
};
httpRequestDuration.observe(labels, duration);
httpRequestTotal.inc(labels);
});
}
D. Structured Audit Logging
// New file: src/audit/index.ts
import { logger } from "../utils/logger.js";
export interface AuditEvent {
event: string;
userId?: string;
stellarAddress?: string;
resource: string;
resourceId?: string;
action: string;
result: "success" | "failure";
txHash?: string;
amount?: number;
metadata?: Record<string, unknown>;
ip?: string;
userAgent?: string;
}
export function auditLog(event: AuditEvent) {
logger.info({
audit: true,
...event,
timestamp: new Date().toISOString(),
}, `[AUDIT] ${event.event}`);
}
Usage in reward service:
auditLog({
event: "reward_claimed",
userId,
stellarAddress: user.stellarAddress,
resource: "quiz_submission",
resourceId: submissionId,
action: "claim_reward",
result: "success",
txHash,
amount: REWARD_AMOUNT,
ip: request.ip,
});
E. OpenTelemetry Distributed Tracing
Install: npm install @opentelemetry/sdk-node @opentelemetry/api @opentelemetry/instrumentation-fastify @opentelemetry/instrumentation-pg @opentelemetry/instrumentation-redis
// New file: src/tracing.ts (must be imported FIRST in server.ts)
import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
import { SimpleSpanProcessor, ConsoleSpanExporter } from "@opentelemetry/sdk-trace-base";
import { FastifyInstrumentation } from "@opentelemetry/instrumentation-fastify";
import { PgInstrumentation } from "@opentelemetry/instrumentation-pg";
import { RedisInstrumentation } from "@opentelemetry/instrumentation-redis";
const provider = new NodeTracerProvider({
instrumentations: [
new FastifyInstrumentation(),
new PgInstrumentation(),
new RedisInstrumentation(),
],
});
provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter()));
provider.register();
F. Business Event Dashboard Queries
-- Reward claims per hour (last 24h)
SELECT date_trunc('hour', created_at) AS hour, COUNT(*), SUM(amount)
FROM audit_events WHERE event = 'reward_claimed'
AND created_at > NOW() - INTERVAL '24 hours'
GROUP BY hour ORDER BY hour;
-- Error rate per endpoint (last 1h)
SELECT route, status_code, COUNT(*)
FROM metrics WHERE timestamp > NOW() - INTERVAL '1 hour'
AND status_code >= 500
GROUP BY route, status_code;
-- Active users per day
SELECT date_trunc('day', created_at) AS day, COUNT(DISTINCT user_id)
FROM audit_events WHERE event IN ('reward_claimed', 'quiz_submitted')
GROUP BY day ORDER BY day DESC;
Files to create
- New:
src/metrics/index.ts — Prometheus metrics definitions
- New:
src/metrics/fastify-hook.ts — Fastify metrics collection
- New:
src/audit/index.ts — Structured audit logging
- New:
src/tracing.ts — OpenTelemetry setup
- Modify:
src/server.ts — register metrics hook, add /metrics endpoint, import tracing
- Modify:
src/modules/rewards/reward.service.ts — add audit logging
- Modify:
src/modules/credentials/credential.service.ts — add audit logging
- Modify:
src/modules/quizzes/quiz.service.ts — add business metrics
Dependencies to Add
npm install prom-client @opentelemetry/sdk-node @opentelemetry/api \
@opentelemetry/instrumentation-fastify @opentelemetry/instrumentation-pg \
@opentelemetry/instrumentation-redis
Testing Requirements
- Verify /metrics endpoint returns valid Prometheus text format
- Verify http_request_duration histogram records correct buckets
- Verify audit logs contain all required fields
- Verify OpenTelemetry spans are created for Fastify requests
- Load test: verify metrics collection does not add significant latency (< 1ms per request)
References
Description
The API has zero observability infrastructure. There are no Prometheus metrics, no structured audit trails for financial operations, and no distributed tracing. This makes it impossible to monitor the system, detect anomalies, debug production issues, or meet compliance requirements for a financial platform.
Problem Analysis
What is missing
Current logging
The codebase uses Pino (
src/utils/logger.ts) for basic request logging, but:Required Implementation
A. Prometheus Metrics
Install:
npm install prom-clientB. Metrics Endpoint
C. Fastify Metrics Hook
D. Structured Audit Logging
Usage in reward service:
E. OpenTelemetry Distributed Tracing
Install:
npm install @opentelemetry/sdk-node @opentelemetry/api @opentelemetry/instrumentation-fastify @opentelemetry/instrumentation-pg @opentelemetry/instrumentation-redisF. Business Event Dashboard Queries
Files to create
src/metrics/index.ts— Prometheus metrics definitionssrc/metrics/fastify-hook.ts— Fastify metrics collectionsrc/audit/index.ts— Structured audit loggingsrc/tracing.ts— OpenTelemetry setupsrc/server.ts— register metrics hook, add /metrics endpoint, import tracingsrc/modules/rewards/reward.service.ts— add audit loggingsrc/modules/credentials/credential.service.ts— add audit loggingsrc/modules/quizzes/quiz.service.ts— add business metricsDependencies to Add
Testing Requirements
References