Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Subscribe via RSS: <https://www.freightutils.com/changelog.xml>

- **Performance**: `/hs/code/*` and `/hs/heading/*` now served from Vercel edge cache with `Cache-Control: public, max-age=300, s-maxage=86400, stale-while-revalidate=604800`. Cold-serve response unchanged; warm-cache response served from the edge without re-rendering the page. ISR (`export const revalidate = 86400`) was already in place on both routes — this change makes the cache strategy explicit and tunable in `next.config.ts`, and surfaces the `Cache-Control` header in the response so cache hits are observable via `curl -I`. Sourced by the 2026-05-14 scraper-signature audit (`docs/audit/scraper-signature-2026-05-14.md`) which confirmed an active 216.* scraper hitting these paths at sustained ~5-second intervals. The application-layer ScrapeGuard rate limiter still runs on every request and continues to 429 the scraper as designed; Phase 1.6 will measure whether edge-cache adoption reduces overall Redis-INCR volume enough to skip Phase 2 (static generation).
- **Internal**: ScrapeGuard now logs sanitised User-Agent + full source IP on block decisions (429s only — never on the success / cache-hit path). Supports evidence-based firewall rule additions; preserves existing structured 429 body and headers. UA sanitisation strips control characters (log-injection guard), replaces internal quotes with apostrophes, truncates to 200 chars, and falls back to `ua=empty` for null/whitespace UAs. Log line format converted to space-separated `key=value` pairs for grep/awk parsing. IP resolution unchanged (`x-real-ip` first, per existing Vercel-trust comment).
- **Reliability**: ScrapeGuard middleware — Redis errors now log at most once per minute per error class instead of per request, preventing log-storm during Redis quota exhaustion or connection issues. Fail-open behaviour preserved. (Incident: 2026-05-14 Upstash quota hit.)

## 2026-05-13

Expand Down
5 changes: 5 additions & 0 deletions lib/changelog-data.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ export const entries: ChangelogEntry[] = [
desc: 'Scraper-target page routes now served from Vercel edge cache with explicit Cache-Control (s-maxage=86400, SWR=7d). Warm-cache hits skip page render — cold-serve response unchanged. ISR was already enabled on both routes; this surfaces the cache strategy in next.config.ts so it is tunable and visible via curl -I. The middleware-layer ScrapeGuard rate limiter continues to 429 the active 216.* scraper as designed.',
link: '/hs',
},
{
isoDate: '2026-05-14', date: 'May 14', tag: 'Bug Fix',
title: 'ScrapeGuard: rate-limit Redis error logs to prevent log-storm',
desc: 'Redis errors in the ScrapeGuard middleware now log at most once per minute per error class instead of per request, preventing the Sentry-event flood we saw during Upstash quota exhaustion on 2026-05-14 (~1K events in 30 min). Fail-open behaviour preserved — Redis failures still allow the request through.',
},
// ─── May 2026 backfill (FAULT 5 catch-up, sprint chore/release-hygiene 2026-05-14) ───
{
isoDate: '2026-05-13', date: 'May 13', tag: 'Data Update',
Expand Down
93 changes: 93 additions & 0 deletions lib/scrapeguard-log-sampler.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
/**
* Module-level error sampler for ScrapeGuard Redis / KV failures.
*
* Incident 2026-05-14: Upstash hit its free-tier 500K-command monthly
* cap. Every ScrapeGuard call thereafter threw, was caught (fail-open),
* and logged a full `[ScrapeGuard] Redis error:` line — ~1K error
* events in 30 minutes before the plan was upgraded. The middleware
* already behaves correctly under @upstash/ratelimit's fail-open
* default; only the log volume was the problem.
*
* This sampler limits console output to one line per error class per
* 60s while keeping a low-cost Sentry breadcrumb on every error (so
* we retain volume-level visibility without burning event quota). The
* sampler is module-scoped: Map state lives for the lifetime of the
* edge isolate, which is long enough to coalesce a quota-storm.
*/
type SentryLike = {
addBreadcrumb?: (b: { category: string; level: string; message: string }) => void;
};

const lastLogged = new Map<string, number>();

const KNOWN_CLASSES = [
'max_requests_limit',
'ECONNRESET',
'ETIMEDOUT',
'ECONNREFUSED',
'ENOTFOUND',
'EAI_AGAIN',
'timeout',
'unauthorized',
] as const;

/**
* Bucket the error to a small allow-list of stable classes so the
* sampler key set stays bounded even if the underlying message
* embeds variable detail (ids, timestamps).
*/
export function classifyRedisError(err: unknown): string {
const raw = err instanceof Error ? err.message : String(err ?? '');
if (!raw) return 'unknown';
const lower = raw.toLowerCase();
for (const cls of KNOWN_CLASSES) {
if (lower.includes(cls.toLowerCase())) return cls;
}
const colonIdx = raw.indexOf(':');
const prefix = (colonIdx > 0 ? raw.slice(0, colonIdx) : raw).trim();
if (!prefix) return 'unknown';
return prefix.length > 40 ? prefix.slice(0, 40) : prefix;
}

function getSentry(): SentryLike | null {
try {
const g = globalThis as unknown as { Sentry?: SentryLike };
if (g.Sentry && typeof g.Sentry.addBreadcrumb === 'function') return g.Sentry;
} catch {
// ignore
}
return null;
}

/**
* Setter used by middleware to inject the Sentry SDK without forcing
* a static import path in this file (keeps the helper trivially
* test-importable in node-strip-types mode).
*/
export function setSentry(sentry: SentryLike | null): void {
(globalThis as unknown as { Sentry?: SentryLike | null }).Sentry = sentry ?? undefined;
}

export function logRedisErrorSampled(err: unknown, now: number = Date.now()): void {
const errorClass = classifyRedisError(err);

const sentry = getSentry();
sentry?.addBreadcrumb?.({
category: 'scrapeguard.redis',
level: 'warning',
message: `Redis degraded: ${errorClass}`,
});

const last = lastLogged.get(errorClass) ?? 0;
if (now - last < 60_000) return;
lastLogged.set(errorClass, now);
const msg = err instanceof Error ? err.message : String(err);
console.warn(
`[ScrapeGuard] Redis degraded (${errorClass}): ${msg}. Suppressing further logs of this class for 60s.`,
);
}

/** Test-only: reset module state between assertions. */
export function __resetSamplerStateForTests(): void {
lastLogged.clear();
}
21 changes: 17 additions & 4 deletions middleware.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@ import { NextRequest, NextResponse } from 'next/server';
import { kv } from '@vercel/kv';
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
import * as Sentry from '@sentry/nextjs';
import { logRedisErrorSampled, setSentry } from '@/lib/scrapeguard-log-sampler';

// Hand the SDK to the sampler so it can attach a low-cost breadcrumb on
// every Redis error. Breadcrumbs don't burn Sentry event quota — they ride
// along with the next captured exception (if any) and rotate out otherwise.
setSentry(Sentry as unknown as { addBreadcrumb: (b: { category: string; level: string; message: string }) => void });

// ─────────────────────────────────────────────────────────────────
// Scrape protection — two tiers, both sliding window
Expand Down Expand Up @@ -332,7 +339,9 @@ async function tryBulkRefScrape(req: NextRequest): Promise<NextResponse | null>
void remaining;
return null;
} catch (err) {
console.error('[ScrapeGuard bulkref] Redis error:', err instanceof Error ? err.message : err);
// Fail open — same as @upstash/ratelimit's default. Log via the
// sampler so a Redis quota/connection storm can't flood Sentry.
logRedisErrorSampled(err);
return null;
}
}
Expand Down Expand Up @@ -374,7 +383,8 @@ async function handleScrapeProtection(req: NextRequest): Promise<NextResponse> {
response.headers.set('X-RateLimit-Reset', String(Math.ceil(reset / 1000)));
return response;
} catch (err) {
console.error('[ScrapeGuard] Redis error:', err instanceof Error ? err.message : err);
// Fail open. See note in tryBulkRefScrape for sampler rationale.
logRedisErrorSampled(err);
return NextResponse.next();
}
}
Expand Down Expand Up @@ -476,8 +486,11 @@ async function handleApiRateLimit(req: NextRequest): Promise<NextResponse> {
await kv.expire(usageKey, window === 'day' ? 172800 : 2764800); // 48h or 32d
}
} catch (err) {
console.error('[RateLimit] KV error:', err instanceof Error ? err.message : err);
// KV failed — allow request but log the failure
// KV uses the same Upstash backend as ScrapeGuard, so a quota /
// connection storm hits both. Route through the same sampler to
// keep failure logs at one-per-minute-per-class. Fail-open: allow
// the request through.
logRedisErrorSampled(err);
return NextResponse.next();
}

Expand Down
Loading