Add multi-instance safe event processing and deployment model

## Summary
Assume `promgithub` is intended to support multi-instance and highly available deployments. This issue tracks the architecture and implementation work required to make webhook ingestion and metric production correct when multiple replicas are running simultaneously.

Today, stateless horizontal scaling would not be correct on its own: duplicate webhook deliveries can be processed by multiple replicas, in-memory workflow/job state does not compose across instances, and retries or out-of-order events can corrupt gauges.

This issue should define and drive the changes needed to make multi-instance deployment a first-class supported model.

## Why this matters
- Operators should be able to run multiple replicas behind a load balancer.
- GitHub retries and repeated deliveries must not inflate counters or corrupt gauges.
- Workflow/job state must remain consistent across replicas.
- HA deployments need a clear storage and failure model.

## Goals
- Make webhook processing idempotent across replicas.
- Support shared deduplication and shared workflow/job state.
- Define a recommended multi-instance deployment topology.
- Add observability for dedupe/state backend health and failure modes.

## Suggested scope
- Use `X-GitHub-Delivery` as a shared idempotency key.
- Add a shared backend for deduplication with bounded retention, likely Redis or equivalent.
- Move workflow/job state tracking to shared storage keyed by stable GitHub IDs such as workflow `run_id` and job `id`.
- Define how gauges are derived from shared state rather than per-process memory.
- Document the recommended HA deployment architecture, including receiver replicas, shared state backend, scrape behavior, and failure handling.
- Add internal metrics for duplicate deliveries, backend failures, backend latency, and state processing errors.

## Child issues
- Add shared deduplication for GitHub delivery IDs.
- Move workflow/job state tracking to a shared backend.
- Make event handling idempotent across replicas.
- Document recommended HA deployment architecture.
- Add observability for dedupe/state backend behavior.

## Acceptance criteria
- Running multiple promgithub replicas behind a load balancer is a supported and documented deployment mode.
- Duplicate deliveries do not inflate counters or corrupt workflow/job state.
- Workflow/job gauges are derived from shared state rather than relying on per-instance memory.
- Operators can observe failures and latency in the dedupe/state path through metrics and documentation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-instance safe event processing and deployment model #45

Summary

Why this matters

Goals

Suggested scope

Child issues

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add multi-instance safe event processing and deployment model #45

Description

Summary

Why this matters

Goals

Suggested scope

Child issues

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions