Skip to content

FluidifyAI/Regen

Fluidify

Part of the Fluidify open-source suite

CI Release Discord License: AGPLv3 Docker Go Report Card


Grafana OnCall was archived in March 2026. ~50,000 teams are looking for a self-hosted alternative. PagerDuty costs $50/user/month — $120,000/year for a 200-person team. Fluidify Regen is free, forever.


Install

Three ways to run — pick what fits your stack:

Docker (fastest)

docker pull ghcr.io/fluidifyai/regen:latest

Need the full stack? One command:

curl -O https://raw.githubusercontent.com/FluidifyAI/Regen/main/docker-compose.yml
docker-compose up -d

Open http://localhost:8080 — API and UI are ready. No configuration required to start receiving alerts.

Docker Compose (recommended for self-hosting)

git clone https://github.com/FluidifyAI/Regen.git
cd Regen
cp .env.example .env   # edit as needed
docker-compose up -d

Kubernetes (Helm)

helm install fluidify-regen deploy/helm/fluidify-regen \
  --set ingress.host=incidents.your-domain.com \
  --set postgresql.auth.password=<strong-password>

For production HA (external DB, Redis Sentinel, zero-downtime deploys), see docs/OPERATIONS.md.

Built for production

Fluidify Regen is designed to run as reliably as the tools it monitors.

Benchmark results (HA stack · Apple M2 / Colima · 2026-03-31)

Scenario Result
Webhook ingestion p99 < 10 ms (target: < 200 ms)
Webhook sustained p50 / p95 1.55 ms / 2.82 ms
API reads p95 (list / detail) 4.42 ms / 2.83 ms
Peak throughput (burst test) 3,917 RPS — 0 × 5xx
PostgreSQL failover RTO 11 s (Patroni + HAProxy, target: < 60 s)
Redis failover RTO 5 s (Sentinel 3-node quorum)
In-flight requests lost on rolling deploy 0

Production numbers will be higher — these were captured on a single-machine local HA stack. Reproduce yourself: make load-test and make chaos-db. Full methodology in docs/RELIABILITY.md.

How it stays up

  • Zero-downtime deploys — rolling restarts drain in-flight requests before pod shutdown (SIGTERM → 30 s drain → exit)
  • PostgreSQL HA — Patroni manages automatic primary election; HAProxy re-routes to the new primary within one health-check interval (3 s). No app restart, no config change.
  • Redis Sentinel — 3-node quorum detects primary loss; workers reconnect to new master automatically
  • Kubernetes-native — HPA, health-gated rolling deploys, resource limits out of the box
  • Webhook flood protection — rate limiter returns 429 before the DB sees load spikes; validated at 3,917 RPS with zero OOM events
  • Full observability/metrics (Prometheus) + pre-built Grafana dashboard in deploy/grafana/

Send a test alert

curl -X POST http://localhost:8080/api/v1/webhooks/prometheus \
  -H "Content-Type: application/json" \
  -d '{
    "receiver": "fluidify-regen",
    "status": "firing",
    "alerts": [{
      "status": "firing",
      "labels": {"alertname": "TestAlert", "severity": "critical"},
      "annotations": {"summary": "Test alert from curl"},
      "startsAt": "2024-01-01T00:00:00Z"
    }]
  }'

An incident is created automatically. If Slack is configured, a dedicated channel appears within seconds.


Coming from Grafana OnCall?

Grafana OnCall was archived in March 2026. Fluidify Regen is built to be the drop-in OSS successor — same self-hosted model, no SaaS lock-in.

Point your Alertmanager at Regen and you're receiving alerts in minutes:

# alertmanager.yml
receivers:
  - name: fluidify-regen
    webhook_configs:
      - url: http://your-regen-host:8080/api/v1/webhooks/prometheus

One-click migration from Grafana OnCall — import your users, on-call schedules, and escalation policies in under 60 seconds:

  1. Go to Settings → Migrations
  2. Enter your Grafana OnCall URL and API token
  3. Preview exactly what will be imported, then click Import everything

Your new Regen webhook URLs are shown immediately — just update them in Grafana Alertmanager and you're live. Full migration guide →


Features

  • Alert ingestion — Prometheus, Grafana, CloudWatch, generic webhook
  • Incident lifecycle with immutable timeline
  • On-call rotations, layers, overrides
  • Escalation policies with multi-step timeouts
  • Slack integration — channels, bot commands, timeline sync
  • Microsoft Teams integration — Adaptive Cards, bot commands
  • AI incident summaries + post-mortem drafts (BYO OpenAI key)
  • SSO / SAML — Okta, Azure AD, Google Workspace — free, always
  • Docker Compose + Kubernetes Helm chart
  • PostgreSQL HA + Redis Sentinel support

SSO is free. Gating SSO behind a paid tier is user-hostile. We stay off sso.tax.

What we're building next

We're working toward fully autonomous incident response — AI agents that triage, correlate, and resolve before your on-call engineer's phone rings. Interested? Join the discussion → or pick up an open issue →


AI Agents

Fluidify ships with AI agents that work autonomously during and after incidents. Your OpenAI key, your infrastructure — incident data never leaves your stack.

Incident Summarization

Reads the full incident timeline and linked Slack thread, then writes a concise summary of what happened, what was done, and current status. Useful for commanders joining mid-incident or shift handoffs.

curl -X POST http://localhost:8080/api/v1/incidents/INC-042/summarize \
  -H "Authorization: Bearer YOUR_TOKEN"

Historical Pattern Matching

When an incident fires, Regen searches your full incident history for similar patterns — same service, same alert fingerprint, similar timeline signatures — and surfaces the match directly in Slack:

🤖 Regen: This looks like INC-157 from November (Redis memory eviction, resolved in 18 min). [View timeline →]

Engineers stop re-diagnosing problems they've already solved. Every incident makes the next one faster.

Post-Mortem Agent

Generates a structured post-mortem draft from the incident timeline, status changes, and linked alerts. Extracts contributing factors and action items automatically. Supports custom templates.

curl -X POST http://localhost:8080/api/v1/incidents/INC-042/postmortem/generate \
  -H "Authorization: Bearer YOUR_TOKEN"

Handoff Digest

Generates a shift-handoff briefing covering all open incidents, recent status changes, and pending action items — delivered to Slack or Teams at the start of each shift.

What's coming — the agent roadmap

Agent What it does Status
Triage agent Calls Datadog, K8s, GitHub MCP — gathers context before you unlock your phone In progress
Co-pilot mode Agent proposes action + confidence score, human approves with one tap In progress
Root cause agent Correlates metrics, logs, and recent deploys to surface likely root cause Planned
Runbook agent Matches incident to known runbooks, executes with human gate Planned
Noise reduction Learns alert patterns, suppresses known-noisy low-signal alerts Planned

Want to help build this? The agent scaffolding is open. See the roadmap issues →


Integrations

Available now

Category Tools
Alert ingestion Prometheus Alertmanager · Grafana · AWS CloudWatch · Generic webhook
Chat Slack · Microsoft Teams · Telegram
AI OpenAI GPT-4o / GPT-4 / GPT-3.5 (BYO key)
Auth SAML 2.0 — Okta · Azure AD · Google Workspace · any compliant IdP
Deploy Docker Compose · Kubernetes Helm · bare metal

Coming soon

Category Tools
Alert ingestion Datadog · New Relic · Sentry · Dynatrace · Elastic · Zabbix · Uptime Kuma · Betterstack
Migration / import PagerDuty · Opsgenie · Splunk On-Call
Post-mortem export Confluence · Notion · Jira
AI providers Anthropic Claude · local LLMs via Ollama
Chat Discord

Missing something? Open an issue — the generic webhook covers most tools today.


Comparison

Fluidify Regen PagerDuty incident.io Grafana OnCall
Price Free / flat enterprise ~$21–50/user/mo ~$30+/user/mo Archived
Self-hosted ✅ (archived)
Open source AGPLv3 Apache 2.0
SSO ✅ Free 💰 Paid tier 💰 Paid tier ✅ Free
BYO AI
Agent-native
Alert + incident + on-call in one ⚠️ ⚠️ ⚠️

Roadmap

Shipping next (v1.x)

  • PagerDuty schedule + escalation policy import
  • Co-pilot mode — agent proposes, human approves with confidence score
  • Fluidify MCP Server — Claude, GPT, and custom bots can call Regen natively
  • Confluence / Notion post-mortem export
  • RBAC, SCIM, audit log export

The bigger picture

Fluidify Regen is built for the age of AI agents. The vision: before your on-call engineer unlocks their phone, the triage agent has already pulled correlated metrics from Datadog, checked K8s pod health, matched the incident against your history, and posted a one-tap approval request to Slack. The engineer taps Approve. Done.

Every incident makes the system smarter. After 12 months, your triage agent knows your stack better than most of your engineers. That institutional memory lives in your own infrastructure — not in a SaaS vendor's cloud.

Horizon Theme
v1.x Agent scaffolding — co-pilot mode, MCP server, Datadog/K8s/Linear integrations
v2.x Autonomous ops — triage agent, runbook execution, confidence gates
v3.x Multi-agent — triage + comms + runbook agents in parallel
Horizon Predictive ops — incidents resolved before alerts fire

Star the repo to follow along.


Contributing

Issues, PRs, and feature requests are welcome. If you're coming from Grafana OnCall, your experience building on that platform is exactly what we need.

# Start backend + dependencies
docker-compose up -d db redis

# Run backend with hot reload
cd backend && go run ./cmd/regen/... serve

# Run frontend with hot reload
cd frontend && npm install && npm run dev

See CONTRIBUTING.md and Makefile (make help) for all commands. For bigger changes, open a discussion first.


Security

Fluidify Regen is built with security as a first-class concern:

  • Authentication: bcrypt (cost 12), timing-safe comparison, 5-attempt account lockout, HTTP-only SameSite=Strict session cookies
  • No SQL injection surface: All database access uses GORM parameterized queries — no raw string interpolation
  • Webhook verification: Slack (HMAC-SHA256 + replay protection), Teams (RSA/OIDC), CloudWatch (RSA + SSRF-safe cert validation)
  • Rate limiting: Redis Lua script enforcing three tiers — 10/min on auth endpoints, 120/min unauthenticated, 600/min authenticated
  • Security headers: CSP, HSTS (2 years), X-Frame-Options, X-Content-Type-Options, Permissions-Policy on every response
  • Container hardening: non-root UID 1001, read-only filesystem, all Linux capabilities dropped
  • CORS: explicit allowlist via CORS_ALLOWED_ORIGINS; dev-only fallback to localhost
  • Frontend: no dangerouslySetInnerHTML, no secrets in bundle, session token never accessible to JavaScript

Before going to production, review the Production Security Checklist — TLS, PostgreSQL password, Redis auth, and CORS origins must all be configured.

Full security architecture: SECURITY.md


License

AGPLv3 — free forever, including SSO.


Built by Fluidify · your incident data belongs to you

About

Open-source incident management Alerts, on-call, AI post-mortems. Self-hosted alternative to PagerDuty & incident.io. Works with Prometheus, Grafana, Datadog, Slack, and Teams. Free forever, BYO-AI.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors