Skip to content

feat: integrate UptimeRobot and Sentry for production observability #127

@GitAddRemote

Description

@GitAddRemote

Tech Story

As a solo engineer running Station in production, I want external uptime monitoring, in-app error tracking, and a public status page so that I know about outages and errors before my users do and can share system status transparently without building anything custom.

ELI5 Context

What is the difference between UptimeRobot and Sentry?
UptimeRobot is external — it's a server somewhere else on the internet that sends an HTTP request to your API every 5 minutes. If your server is down, on fire, or unreachable, UptimeRobot still knows because it's checking from the outside. Sentry is internal — it runs inside your NestJS process and catches errors as they happen. UptimeRobot tells you "the server is down." Sentry tells you "this specific function threw this specific error with this stack trace at 2:14 AM."

What is @nestjs/terminus?
Terminus is NestJS's official health-check library. It provides a GET /health endpoint that checks whether the app's dependencies (database, Redis) are actually responding — not just that the NestJS process is alive. If the database is down, /health returns HTTP 503 (not 200), which tells UptimeRobot to fire an alert even though the NestJS process itself is running fine.

What is a Sentry DSN?
A DSN (Data Source Name) is a URL that tells the Sentry SDK where to send errors. It looks like https://abc123@o123456.ingest.sentry.io/789. It's not a password, but it should still be kept in environment secrets to prevent spam from third parties who might find it and flood your Sentry quota.

What is a public status page?
A hosted webpage (e.g. status.drdnt.org) showing real-time and historical uptime for your services. UptimeRobot generates one automatically from your monitors — no code needed. It's what you link to when users ask "is the site down?"

Technical Elaboration

Part 1: NestJS Health Endpoint

Install:

cd backend
pnpm add @nestjs/terminus

New file: backend/src/health/health.module.ts

import { Module } from '@nestjs/common';
import { TerminusModule } from '@nestjs/terminus';
import { HealthController } from './health.controller';

@Module({
  imports: [TerminusModule],
  controllers: [HealthController],
})
export class HealthModule {}

New file: backend/src/health/health.controller.ts

import { Controller, Get } from '@nestjs/common';
import { HealthCheck, HealthCheckService, TypeOrmHealthIndicator, MicroserviceHealthIndicator } from '@nestjs/terminus';
import { InjectDataSource } from '@nestjs/typeorm';
import { DataSource } from 'typeorm';

@Controller('health')
export class HealthController {
  constructor(
    private health: HealthCheckService,
    private db: TypeOrmHealthIndicator,
    @InjectDataSource() private dataSource: DataSource,
  ) {}

  @Get()
  @HealthCheck()
  check() {
    return this.health.check([
      () => this.db.pingCheck('database', { connection: this.dataSource }),
    ]);
  }
}

Behaviour:

  • Returns HTTP 200 with { status: 'ok', ... } when all checks pass
  • Returns HTTP 503 with { status: 'error', ... } when any check fails
  • UptimeRobot treats any non-2xx response as "down" — the 503 triggers an alert automatically

Register in AppModule:

import { HealthModule } from './health/health.module';
// Add to imports array:
HealthModule,

Note: If GET /health is already implemented from issue #102 (a simpler version), replace it with this Terminus-based version. The Terminus version adds database health checking.

Part 2: Sentry Integration

Install:

cd backend
pnpm add @sentry/nestjs @sentry/node

New file: backend/src/instrument.ts (must be imported before anything else)

import * as Sentry from '@sentry/nestjs';

Sentry.init({
  dsn: process.env['SENTRY_DSN'],          // undefined = Sentry disabled (no error thrown)
  environment: process.env['NODE_ENV'],
  release: process.env['SENTRY_RELEASE'],  // set by deploy workflow
  enabled: !!process.env['SENTRY_DSN'],    // explicit opt-in
  tracesSampleRate: 0.1,                   // capture 10% of transactions for performance monitoring
});

Update backend/src/main.ts:

import './instrument';   // MUST be first import
import { NestFactory } from '@nestjs/core';
// ... rest of imports

New file: backend/src/filters/sentry-exception.filter.ts

import { Catch, ArgumentsHost, HttpException, HttpStatus } from '@nestjs/common';
import { BaseExceptionFilter } from '@nestjs/core';
import * as Sentry from '@sentry/nestjs';

@Catch()
export class SentryExceptionFilter extends BaseExceptionFilter {
  catch(exception: unknown, host: ArgumentsHost) {
    // Don't report 4xx errors to Sentry — those are user errors, not app bugs
    const status = exception instanceof HttpException
      ? exception.getStatus()
      : HttpStatus.INTERNAL_SERVER_ERROR;

    if (status >= 500) {
      Sentry.captureException(exception);
    }

    super.catch(exception, host);
  }
}

Register in main.ts:

const { HttpAdapterHost } = await import('@nestjs/core');
const httpAdapterHost = app.get(HttpAdapterHost);
app.useGlobalFilters(new SentryExceptionFilter(httpAdapterHost.httpAdapter));

env.validation.ts additions:

SENTRY_DSN: Joi.string().uri().optional(),
SENTRY_RELEASE: Joi.string().optional(),

Deploy workflow additions (in release.yml, deploy-production job):

env:
  SENTRY_RELEASE: ${{ github.sha }}

Part 3: UptimeRobot Configuration (manual — no code)

  1. Create account at uptimerobot.com
  2. Add monitor: HTTP(S) → URL: https://api.drdnt.org/health → interval: 5 minutes → alert when status ≠ 200
  3. Add monitor: HTTP(S) → URL: https://station.drdnt.org → interval: 5 minutes
  4. Configure alert contacts: email (required), Discord webhook (optional — paste webhook URL from Discord server settings)
  5. Create status page: include both monitors → set custom domain status.drdnt.org

Part 4: DNS for status.drdnt.org (in Terraform, issue #106)

Add to infra/terraform/dns.tf:

resource "linode_domain_record" "status_cname" {
  domain_id   = linode_domain.drdnt_org.id
  name        = "status"
  record_type = "CNAME"
  target      = "stats.uptimerobot.com"   # UptimeRobot's CNAME target
  ttl_sec     = 300
}

New file: infra/docs/monitoring.md

Document:

  • UptimeRobot: how to log in, add monitors, configure alert contacts, and access the status page
  • Status page URL and how to find it: https://status.drdnt.org
  • Sentry: how to log in, how to triage an error, how to mark it as resolved
  • Health endpoint: what it checks and how to read the response
  • What to do when an alert fires (decision tree: check UptimeRobot → check Sentry → SSH docker compose ps)

Definition of Done

  • @nestjs/terminus installed; HealthModule and HealthController created and registered in AppModule
  • GET /health returns 200 with database check passing; returns 503 when database is unreachable (test by stopping the postgres container temporarily)
  • @sentry/nestjs installed and initialized in instrument.ts; imported as first line in main.ts
  • SentryExceptionFilter registered globally; 5xx errors captured in Sentry, 4xx errors not captured
  • SENTRY_DSN optional in env.validation.ts; app starts without error when SENTRY_DSN is absent
  • SENTRY_DSN added to GitHub environment secrets for both staging and production environments
  • Two UptimeRobot monitors active: API health endpoint and frontend
  • Alert contact configured — test alert received (UptimeRobot has a "Test Alert" button)
  • Public status page live at https://status.drdnt.org
  • status.drdnt.org CNAME record added to Terraform DNS config
  • infra/docs/monitoring.md written
  • pnpm test passes (Sentry init must not break unit tests — SENTRY_DSN absent in test env)

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions