-
-
Notifications
You must be signed in to change notification settings - Fork 204
Description
RFC: Rate Limiting & Brute Force Protection
Phase: 1 — Security Hardening & Enterprise Foundation
Priority: P0 — Critical
Estimated Effort: Medium
Problem Statement
Authorizer currently has zero protection against credential stuffing, brute force attacks, or API abuse. Every competitor (WorkOS Radar, Clerk Bot Protection, Keycloak Brute Force Detector) ships rate limiting as a core feature. Without this, Authorizer cannot be recommended for production enterprise use.
The current middleware chain (LoggerMiddleware → ContextMiddleware → CORSMiddleware → ClientCheckMiddleware) has no rate limiting layer.
Current Architecture Context
- HTTP framework: Gin (
gin-gonic/gin) - Middleware chain defined in
internal/server/http_routes.go - Memory store layer exists with Redis and DB-backed implementations (
internal/memory_store/) - Session tokens already use key patterns like
{userId}:{token_type}_{nonce}in the memory store - Config parsed via Cobra CLI flags in
cmd/root.go - No rate limiting library currently in
go.mod
Proposed Solution
1. Rate Limiter Middleware
Algorithm: Token bucket via golang.org/x/time/rate for in-memory, sliding window counter for Redis-backed.
Why token bucket + sliding window hybrid: Token bucket is simple and efficient for single-instance deployments. For distributed deployments (multiple Authorizer instances behind a load balancer), we need Redis-backed sliding window counters that are atomic across instances. The memory store abstraction already supports this pattern.
New middleware: internal/http_handlers/rate_limit_middleware.go
type RateLimitConfig struct {
// Per-IP limits
RequestsPerWindow int // default: 100
WindowDuration time.Duration // default: 60s
// Auth-specific limits (stricter)
AuthRequestsPerWindow int // default: 20
AuthWindowDuration time.Duration // default: 60s
// Enabled flag
Enabled bool
}Implementation approach:
- Add
RateLimitMiddlewareto the Gin middleware chain, placed afterLoggerMiddlewareand beforeCORSMiddleware - Use
c.ClientIP()(Gin's built-in, respectsX-Forwarded-Forwith trusted proxies) as the rate limit key - For authenticated endpoints, use
{user_id}:{client_ip}composite key - Auth endpoints (
/oauth/token,/graphqlmutationslogin,signup,verify_otp,magic_link_login,forgot_password) get stricter limits - Return
429 Too Many RequestswithRetry-Afterheader and JSON error body - Store counters in memory store (Redis when available, in-memory with DB fallback)
Redis sliding window implementation (atomic via Lua script):
-- KEYS[1] = rate limit key
-- ARGV[1] = window size in ms, ARGV[2] = current timestamp ms, ARGV[3] = max requests
redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, ARGV[2] - ARGV[1])
local count = redis.call('ZCARD', KEYS[1])
if count < tonumber(ARGV[3]) then
redis.call('ZADD', KEYS[1], ARGV[2], ARGV[2] .. math.random())
redis.call('PEXPIRE', KEYS[1], ARGV[1])
return 0 -- allowed
end
return 1 -- blocked2. Login Attempts Table & Account Lockout (Sliding Window)
Instead of adding failed_login_count/locked_until columns to the User table, we use a dedicated LoginAttempt table that tracks every login attempt with full metadata. Lockout is determined by counting failures within a sliding time window — no explicit lock/unlock state needed.
Why this approach over per-user columns:
- Sliding window is more accurate — 10 failures in 15 minutes is suspicious; 10 failures over 6 months isn't. A simple counter can't distinguish these.
- Multi-dimensional detection — same IP hitting many accounts (credential stuffing) vs many IPs hitting one account (distributed brute force). Per-user columns can't do this.
- Natural retry after lockout — no need to "reset" anything. Once old attempts fall outside the window, the user is automatically unlocked.
- Full audit/forensics — every attempt is preserved with IP, user agent, method, and failure reason. Feeds directly into the Audit Log system (Phase 1.3).
- Clean User schema — no security-state pollution on the User table.
New schema: internal/storage/schemas/login_attempt.go
type LoginAttempt struct {
ID string `json:"id" gorm:"primaryKey;type:char(36)"`
UserID string `json:"user_id" gorm:"type:char(36);index:idx_login_attempt_user_time"` // nullable — for non-existent users, track by email
Email string `json:"email" gorm:"type:varchar(256);index:idx_login_attempt_email_time"` // always populated
IPAddress string `json:"ip_address" gorm:"type:varchar(45);index:idx_login_attempt_ip_time"` // supports IPv6
UserAgent string `json:"user_agent" gorm:"type:text"`
Method string `json:"method" gorm:"type:varchar(50)"` // password, otp, magic_link, totp, social
Success bool `json:"success" gorm:"type:bool;default:false"`
FailureReason string `json:"failure_reason" gorm:"type:varchar(100)"` // invalid_password, account_not_found, mfa_failed, account_locked, etc.
CreatedAt int64 `json:"created_at" gorm:"autoCreateTime"`
}Composite indexes for query performance:
(user_id, created_at)— per-user lockout checks(email, created_at)— lockout checks when user_id is unknown(ip_address, created_at)— per-IP credential stuffing detection
New storage interface methods:
// AddLoginAttempt records a login attempt (success or failure)
AddLoginAttempt(ctx context.Context, attempt *schemas.LoginAttempt) error
// CountFailedAttempts counts failed login attempts for a user/email within a time window
CountFailedAttempts(ctx context.Context, email string, since int64) (int64, error)
// CountFailedAttemptsByIP counts failed attempts from an IP within a time window
CountFailedAttemptsByIP(ctx context.Context, ip string, since int64) (int64, error)
// ListLoginAttempts returns login attempts for a user (for admin/audit views)
ListLoginAttempts(ctx context.Context, userID string, pagination *model.Pagination) ([]*schemas.LoginAttempt, *model.Pagination, error)
// DeleteLoginAttemptsBefore removes attempts older than a timestamp (retention cleanup)
DeleteLoginAttemptsBefore(ctx context.Context, before int64) errorLockout logic in login flow (internal/graphql/login.go and related auth handlers):
// Before password/OTP verification:
windowStart := time.Now().Add(-lockoutWindow).Unix()
failedCount, _ := store.CountFailedAttempts(ctx, email, windowStart)
if failedCount >= lockoutThreshold {
// Calculate retry-after: find the oldest attempt in the window,
// the lock lifts when that attempt falls outside the window
retryAfter := windowStart + lockoutWindow - oldestAttemptInWindow
return Error("account_temporarily_locked", retryAfter)
}
// After verification:
attempt := &schemas.LoginAttempt{
UserID: user.ID, // empty if user not found
Email: email,
IPAddress: clientIP,
UserAgent: userAgent,
Method: "password",
Success: passwordValid,
FailureReason: failureReason, // "" on success
}
store.AddLoginAttempt(ctx, attempt)How "retry after lock open" works:
- Lockout window = 15 minutes (configurable)
- Threshold = 10 attempts (configurable)
- If a user has 10 failures in the last 15 minutes → locked
- As time passes, old failures slide out of the window → naturally unlocked
- No state to reset — the window handles everything
- Example: 10 failures between 10:00–10:05, window=15min → locked until 10:15 (when the first failure at 10:00 falls outside the window). At 10:15 only 9 failures remain in window → unlocked.
Admin override — _unlock_user(user_id: ID!): Response
Deletes recent failed attempts for the user, immediately bringing count below threshold. Used for emergency unlocks without waiting for the window to expire.
Credential stuffing detection (per-IP):
ipFailedCount, _ := store.CountFailedAttemptsByIP(ctx, clientIP, windowStart)
if ipFailedCount >= ipThreshold { // e.g., 50 failed attempts from same IP
// Block IP temporarily, or trigger CAPTCHA challenge
}3. IP Blocking/Allowlisting
New schema: internal/storage/schemas/ip_rule.go
type IPRule struct {
ID string `json:"id" gorm:"primaryKey;type:char(36)"`
IP string `json:"ip" gorm:"type:varchar(45);uniqueIndex"` // supports IPv6, CIDR notation
Type string `json:"type" gorm:"type:varchar(10)"` // "block" or "allow"
Reason string `json:"reason" gorm:"type:text"`
ExpiresAt int64 `json:"expires_at"` // 0 = permanent
CreatedAt int64 `json:"created_at"`
}New storage interface methods:
AddIPRule(ctx context.Context, rule *schemas.IPRule) (*schemas.IPRule, error)
DeleteIPRule(ctx context.Context, id string) error
ListIPRules(ctx context.Context, ruleType string, pagination *model.Pagination) ([]*schemas.IPRule, *model.Pagination, error)
GetIPRuleByIP(ctx context.Context, ip string) (*schemas.IPRule, error)Middleware check: Early in request pipeline, check IP against cached allowlist/blocklist. Use memory store for caching (refresh every 60s from DB).
Automatic IP blocking: When credential stuffing is detected (high failed attempts from single IP across multiple accounts), automatically create a temporary IP block rule.
Admin GraphQL mutations:
_add_ip_rule(ip: String!, type: String!, reason: String, expires_at: Int64): IPRule_remove_ip_rule(id: ID!): Response_list_ip_rules(params: PaginatedInput, type: String): IPRules
4. Leaked Password Detection
Integration: Have I Been Pwned k-Anonymity API (https://api.pwnedpasswords.com/range/{SHA1_PREFIX})
How it works (privacy-preserving):
- SHA-1 hash the password
- Send first 5 characters to HIBP API
- Compare remaining hash suffix against returned list
- No full password or hash ever leaves the server
Implementation: New utility internal/utils/password_check.go
func IsPasswordLeaked(password string) (bool, error) {
hash := sha1.Sum([]byte(password))
hexHash := strings.ToUpper(hex.EncodeToString(hash[:]))
prefix, suffix := hexHash[:5], hexHash[5:]
resp, err := http.Get("https://api.pwnedpasswords.com/range/" + prefix)
// ... parse response, check if suffix appears
}Integration points: Called during signup and reset_password mutations when --check-leaked-passwords=true.
5. Retention & Cleanup
Login attempts grow over time — automatic cleanup is essential:
- CLI flag:
--login-attempt-retention-days=90 - Background goroutine runs
DeleteLoginAttemptsBefore()daily - Aligns with audit log retention (Phase 1.3) — same cleanup pattern
- Expired IP rules also cleaned up in the same sweep
CLI Configuration Flags
--enable-rate-limit=true # Enable/disable rate limiting
--rate-limit-requests=100 # Requests per window (general)
--rate-limit-window=60s # Window duration (general)
--rate-limit-auth-requests=20 # Requests per window (auth endpoints)
--rate-limit-auth-window=60s # Window duration (auth endpoints)
--account-lockout-threshold=10 # Failed attempts before lockout
--account-lockout-window=15m # Sliding window duration
--account-lockout-ip-threshold=50 # Per-IP failed attempts before blocking
--check-leaked-passwords=false # Enable HIBP password check
--login-attempt-retention-days=90 # Days to keep login attempt records
Migration Strategy
- Create
login_attemptstable/collection across all 13+ DB providers with composite indexes - Create
ip_rulestable/collection across all DB providers - Add memory store methods for rate limit counters
- No changes to User schema
- Rate limiting defaults to enabled for new deployments, documented flag to disable
Testing Plan
- Unit tests for token bucket and sliding window algorithms
- Integration tests for sliding window lockout flow:
- Fail N times → locked → wait for window to slide → retry succeeds
- Fail N times → admin unlock (delete attempts) → retry succeeds immediately
- Integration tests for IP blocking middleware
- Integration tests for credential stuffing detection (per-IP threshold)
- Load tests to verify rate limiting under concurrent requests
- Test with Redis and in-memory memory store backends
- Test HIBP API integration with known-leaked passwords
- Test retention cleanup removes old records correctly