Skip to content

RFC: Rate Limiting & Brute Force Protection #501

@lakhansamani

Description

@lakhansamani

RFC: Rate Limiting & Brute Force Protection

Phase: 1 — Security Hardening & Enterprise Foundation
Priority: P0 — Critical
Estimated Effort: Medium


Problem Statement

Authorizer currently has zero protection against credential stuffing, brute force attacks, or API abuse. Every competitor (WorkOS Radar, Clerk Bot Protection, Keycloak Brute Force Detector) ships rate limiting as a core feature. Without this, Authorizer cannot be recommended for production enterprise use.

The current middleware chain (LoggerMiddleware → ContextMiddleware → CORSMiddleware → ClientCheckMiddleware) has no rate limiting layer.


Current Architecture Context

  • HTTP framework: Gin (gin-gonic/gin)
  • Middleware chain defined in internal/server/http_routes.go
  • Memory store layer exists with Redis and DB-backed implementations (internal/memory_store/)
  • Session tokens already use key patterns like {userId}:{token_type}_{nonce} in the memory store
  • Config parsed via Cobra CLI flags in cmd/root.go
  • No rate limiting library currently in go.mod

Proposed Solution

1. Rate Limiter Middleware

Algorithm: Token bucket via golang.org/x/time/rate for in-memory, sliding window counter for Redis-backed.

Why token bucket + sliding window hybrid: Token bucket is simple and efficient for single-instance deployments. For distributed deployments (multiple Authorizer instances behind a load balancer), we need Redis-backed sliding window counters that are atomic across instances. The memory store abstraction already supports this pattern.

New middleware: internal/http_handlers/rate_limit_middleware.go

type RateLimitConfig struct {
    // Per-IP limits
    RequestsPerWindow int           // default: 100
    WindowDuration    time.Duration // default: 60s
    
    // Auth-specific limits (stricter)
    AuthRequestsPerWindow int           // default: 20
    AuthWindowDuration    time.Duration // default: 60s
    
    // Enabled flag
    Enabled bool
}

Implementation approach:

  • Add RateLimitMiddleware to the Gin middleware chain, placed after LoggerMiddleware and before CORSMiddleware
  • Use c.ClientIP() (Gin's built-in, respects X-Forwarded-For with trusted proxies) as the rate limit key
  • For authenticated endpoints, use {user_id}:{client_ip} composite key
  • Auth endpoints (/oauth/token, /graphql mutations login, signup, verify_otp, magic_link_login, forgot_password) get stricter limits
  • Return 429 Too Many Requests with Retry-After header and JSON error body
  • Store counters in memory store (Redis when available, in-memory with DB fallback)

Redis sliding window implementation (atomic via Lua script):

-- KEYS[1] = rate limit key
-- ARGV[1] = window size in ms, ARGV[2] = current timestamp ms, ARGV[3] = max requests
redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, ARGV[2] - ARGV[1])
local count = redis.call('ZCARD', KEYS[1])
if count < tonumber(ARGV[3]) then
    redis.call('ZADD', KEYS[1], ARGV[2], ARGV[2] .. math.random())
    redis.call('PEXPIRE', KEYS[1], ARGV[1])
    return 0  -- allowed
end
return 1  -- blocked

2. Login Attempts Table & Account Lockout (Sliding Window)

Instead of adding failed_login_count/locked_until columns to the User table, we use a dedicated LoginAttempt table that tracks every login attempt with full metadata. Lockout is determined by counting failures within a sliding time window — no explicit lock/unlock state needed.

Why this approach over per-user columns:

  • Sliding window is more accurate — 10 failures in 15 minutes is suspicious; 10 failures over 6 months isn't. A simple counter can't distinguish these.
  • Multi-dimensional detection — same IP hitting many accounts (credential stuffing) vs many IPs hitting one account (distributed brute force). Per-user columns can't do this.
  • Natural retry after lockout — no need to "reset" anything. Once old attempts fall outside the window, the user is automatically unlocked.
  • Full audit/forensics — every attempt is preserved with IP, user agent, method, and failure reason. Feeds directly into the Audit Log system (Phase 1.3).
  • Clean User schema — no security-state pollution on the User table.

New schema: internal/storage/schemas/login_attempt.go

type LoginAttempt struct {
    ID            string `json:"id" gorm:"primaryKey;type:char(36)"`
    UserID        string `json:"user_id" gorm:"type:char(36);index:idx_login_attempt_user_time"`         // nullable — for non-existent users, track by email
    Email         string `json:"email" gorm:"type:varchar(256);index:idx_login_attempt_email_time"`       // always populated
    IPAddress     string `json:"ip_address" gorm:"type:varchar(45);index:idx_login_attempt_ip_time"`      // supports IPv6
    UserAgent     string `json:"user_agent" gorm:"type:text"`
    Method        string `json:"method" gorm:"type:varchar(50)"`                                          // password, otp, magic_link, totp, social
    Success       bool   `json:"success" gorm:"type:bool;default:false"`
    FailureReason string `json:"failure_reason" gorm:"type:varchar(100)"`                                 // invalid_password, account_not_found, mfa_failed, account_locked, etc.
    CreatedAt     int64  `json:"created_at" gorm:"autoCreateTime"`
}

Composite indexes for query performance:

  • (user_id, created_at) — per-user lockout checks
  • (email, created_at) — lockout checks when user_id is unknown
  • (ip_address, created_at) — per-IP credential stuffing detection

New storage interface methods:

// AddLoginAttempt records a login attempt (success or failure)
AddLoginAttempt(ctx context.Context, attempt *schemas.LoginAttempt) error
// CountFailedAttempts counts failed login attempts for a user/email within a time window
CountFailedAttempts(ctx context.Context, email string, since int64) (int64, error)
// CountFailedAttemptsByIP counts failed attempts from an IP within a time window
CountFailedAttemptsByIP(ctx context.Context, ip string, since int64) (int64, error)
// ListLoginAttempts returns login attempts for a user (for admin/audit views)
ListLoginAttempts(ctx context.Context, userID string, pagination *model.Pagination) ([]*schemas.LoginAttempt, *model.Pagination, error)
// DeleteLoginAttemptsBefore removes attempts older than a timestamp (retention cleanup)
DeleteLoginAttemptsBefore(ctx context.Context, before int64) error

Lockout logic in login flow (internal/graphql/login.go and related auth handlers):

// Before password/OTP verification:
windowStart := time.Now().Add(-lockoutWindow).Unix()
failedCount, _ := store.CountFailedAttempts(ctx, email, windowStart)

if failedCount >= lockoutThreshold {
    // Calculate retry-after: find the oldest attempt in the window,
    // the lock lifts when that attempt falls outside the window
    retryAfter := windowStart + lockoutWindow - oldestAttemptInWindow
    return Error("account_temporarily_locked", retryAfter)
}

// After verification:
attempt := &schemas.LoginAttempt{
    UserID:    user.ID,  // empty if user not found
    Email:     email,
    IPAddress: clientIP,
    UserAgent: userAgent,
    Method:    "password",
    Success:   passwordValid,
    FailureReason: failureReason, // "" on success
}
store.AddLoginAttempt(ctx, attempt)

How "retry after lock open" works:

  • Lockout window = 15 minutes (configurable)
  • Threshold = 10 attempts (configurable)
  • If a user has 10 failures in the last 15 minutes → locked
  • As time passes, old failures slide out of the window → naturally unlocked
  • No state to reset — the window handles everything
  • Example: 10 failures between 10:00–10:05, window=15min → locked until 10:15 (when the first failure at 10:00 falls outside the window). At 10:15 only 9 failures remain in window → unlocked.

Admin override_unlock_user(user_id: ID!): Response
Deletes recent failed attempts for the user, immediately bringing count below threshold. Used for emergency unlocks without waiting for the window to expire.

Credential stuffing detection (per-IP):

ipFailedCount, _ := store.CountFailedAttemptsByIP(ctx, clientIP, windowStart)
if ipFailedCount >= ipThreshold {  // e.g., 50 failed attempts from same IP
    // Block IP temporarily, or trigger CAPTCHA challenge
}

3. IP Blocking/Allowlisting

New schema: internal/storage/schemas/ip_rule.go

type IPRule struct {
    ID        string `json:"id" gorm:"primaryKey;type:char(36)"`
    IP        string `json:"ip" gorm:"type:varchar(45);uniqueIndex"` // supports IPv6, CIDR notation
    Type      string `json:"type" gorm:"type:varchar(10)"`           // "block" or "allow"
    Reason    string `json:"reason" gorm:"type:text"`
    ExpiresAt int64  `json:"expires_at"`                             // 0 = permanent
    CreatedAt int64  `json:"created_at"`
}

New storage interface methods:

AddIPRule(ctx context.Context, rule *schemas.IPRule) (*schemas.IPRule, error)
DeleteIPRule(ctx context.Context, id string) error
ListIPRules(ctx context.Context, ruleType string, pagination *model.Pagination) ([]*schemas.IPRule, *model.Pagination, error)
GetIPRuleByIP(ctx context.Context, ip string) (*schemas.IPRule, error)

Middleware check: Early in request pipeline, check IP against cached allowlist/blocklist. Use memory store for caching (refresh every 60s from DB).

Automatic IP blocking: When credential stuffing is detected (high failed attempts from single IP across multiple accounts), automatically create a temporary IP block rule.

Admin GraphQL mutations:

  • _add_ip_rule(ip: String!, type: String!, reason: String, expires_at: Int64): IPRule
  • _remove_ip_rule(id: ID!): Response
  • _list_ip_rules(params: PaginatedInput, type: String): IPRules

4. Leaked Password Detection

Integration: Have I Been Pwned k-Anonymity API (https://api.pwnedpasswords.com/range/{SHA1_PREFIX})

How it works (privacy-preserving):

  1. SHA-1 hash the password
  2. Send first 5 characters to HIBP API
  3. Compare remaining hash suffix against returned list
  4. No full password or hash ever leaves the server

Implementation: New utility internal/utils/password_check.go

func IsPasswordLeaked(password string) (bool, error) {
    hash := sha1.Sum([]byte(password))
    hexHash := strings.ToUpper(hex.EncodeToString(hash[:]))
    prefix, suffix := hexHash[:5], hexHash[5:]
    
    resp, err := http.Get("https://api.pwnedpasswords.com/range/" + prefix)
    // ... parse response, check if suffix appears
}

Integration points: Called during signup and reset_password mutations when --check-leaked-passwords=true.

5. Retention & Cleanup

Login attempts grow over time — automatic cleanup is essential:

  • CLI flag: --login-attempt-retention-days=90
  • Background goroutine runs DeleteLoginAttemptsBefore() daily
  • Aligns with audit log retention (Phase 1.3) — same cleanup pattern
  • Expired IP rules also cleaned up in the same sweep

CLI Configuration Flags

--enable-rate-limit=true                    # Enable/disable rate limiting
--rate-limit-requests=100                   # Requests per window (general)
--rate-limit-window=60s                     # Window duration (general)
--rate-limit-auth-requests=20               # Requests per window (auth endpoints)
--rate-limit-auth-window=60s                # Window duration (auth endpoints)
--account-lockout-threshold=10              # Failed attempts before lockout
--account-lockout-window=15m                # Sliding window duration
--account-lockout-ip-threshold=50           # Per-IP failed attempts before blocking
--check-leaked-passwords=false              # Enable HIBP password check
--login-attempt-retention-days=90           # Days to keep login attempt records

Migration Strategy

  1. Create login_attempts table/collection across all 13+ DB providers with composite indexes
  2. Create ip_rules table/collection across all DB providers
  3. Add memory store methods for rate limit counters
  4. No changes to User schema
  5. Rate limiting defaults to enabled for new deployments, documented flag to disable

Testing Plan

  • Unit tests for token bucket and sliding window algorithms
  • Integration tests for sliding window lockout flow:
    • Fail N times → locked → wait for window to slide → retry succeeds
    • Fail N times → admin unlock (delete attempts) → retry succeeds immediately
  • Integration tests for IP blocking middleware
  • Integration tests for credential stuffing detection (per-IP threshold)
  • Load tests to verify rate limiting under concurrent requests
  • Test with Redis and in-memory memory store backends
  • Test HIBP API integration with known-leaked passwords
  • Test retention cleanup removes old records correctly

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions