block-storage-api

Pluggable block storage API in Go — Ceph backend with FSM volume lifecycle, retry policy, and real-time SSE streaming.

Live demo

Deployed on a Scaleway DEV1-S instance (Paris, fr-par-1):

Endpoint	URL
Health check	http://163.172.144.70:8080/healthz
Swagger UI	http://163.172.144.70:8080/swagger/index.html
Grafana (logs)	http://163.172.144.70:3000

curl -s http://163.172.144.70:8080/healthz | jq

Stack

Component	Choice
Language	Go 1.26
Router	chi v5
FSM	looplab/fsm
Database	PostgreSQL (pgx v5)
Migrations	golang-migrate
Observability	OpenTelemetry (traces + metrics)
Logs	Loki + Grafana
Ceph backend	go-ceph / librbd (`-tags ceph`)
API docs	Swagger (swaggo)
CI/CD	GitHub Actions → ghcr.io
Deployment	Docker Compose on Scaleway DEV1-S
Tailscale VPN	Ceph node connectivity (NAT traversal)
cephadm	Ceph cluster deployment (Docker-based)
ghcr.io	Container registry

Infrastructure & Data Flow

Architecture

Hexagonal architecture with a feature-based package structure inspired by production Go codebases.

block-storage-api/
├── cmd/
│   └── api/
│       ├── main.go              # minimal entrypoint (~20 lines)
│       ├── setup.go             # ResourcesRegistry — Setup() + Shutdown() LIFO
│       ├── backend_mock.go      # newBackend() — mock (//go:build !ceph)
│       └── backend_ceph.go      # newBackend() — Ceph  (//go:build ceph)
├── config/
│   └── config.go                # env vars + validation + RetryPolicy + ReconcilePolicy
├── assertor/
│   └── assertor.go              # lightweight dependency validation
├── internal/                    # private — not importable externally
│   ├── db/
│   │   ├── postgres.go          # pgxpool connection
│   │   ├── migrate.go           # golang-migrate runner
│   │   └── migrations/          # SQL migration files
│   └── observability/
│       ├── tracing.go           # OpenTelemetry tracer provider
│       └── metrics.go           # OpenTelemetry meter provider
├── volume/                      # main feature — hexagonal architecture
│   ├── contract.go              # FeatureContract + ApplicationContract
│   ├── dependency.go            # dependency interfaces (consumer-side)
│   ├── application.go           # pure business logic — zero transport imports
│   ├── factory.go               # wiring + WaitGroup + cancelCtx + bgCtx
│   ├── controller_http.go       # HTTP transport only — zero business logic
│   ├── fsm.go                   # FSM states, transitions, RetryPolicy
│   └── repository/
│       ├── postgres.go          # PostgreSQL impl of DatabaseDependency
│       └── inmemory.go          # in-memory impl for tests
├── storage/
│   ├── backend.go               # VolumeBackend interface + Volume type
│   ├── mock/
│   │   ├── mock.go              # in-memory backend (default, no deps)
│   │   └── mock_test.go
│   └── ceph/
│       ├── ceph.go              # Ceph RBD via go-ceph (-tags ceph)
│       └── stub.go              # stub without -tags ceph (//go:build !ceph)
├── transport/
│   └── nvmeof/
│       ├── target.go            # NVMe-oF target (transport, not a backend)
│       └── target_test.go
├── grafana/
│   └── provisioning/
│       ├── datasources/
│       │   └── loki.yaml        # Loki datasource — auto-provisioned at startup
│       └── dashboards/
│           ├── dashboard.yaml   # dashboard provider config
│           └── block-storage-api.json  # 4-panel dashboard
├── docs/
│   ├── docs.go                  # Swagger generated docs
│   ├── swagger.json
│   ├── block-storage-api-architecture.png   # infrastructure diagram
│   └── block-storage-api-infrastructure.html # animated diagram
├── .github/
│   └── workflows/
│       └── ci-cd.yml            # lint → test → build-and-push → deploy (manual)
├── Dockerfile                   # multi-stage: golang:1.26-bookworm + debian:bookworm-slim
├── docker-compose.yml           # API + PostgreSQL + Loki + Grafana
├── Makefile                     # run, run-ceph, test, coverage, lint, migrate
├── .golangci.yml                # golangci-lint v2 config
├── CLAUDE.md                    # architecture rules for Claude Code
├── LICENSE
└── README.md

Feature pattern

The volume/ package is self-contained:

application.go — zero imports from net/http, chi, or any storage impl
controller_http.go — zero business logic, only HTTP ↔ ApplicationContract
repository/postgres.go — implements DatabaseDependency, only known by setup.go
factory.go — wires everything, WaitGroup + LIFO shutdown

feat, err := volume.NewVolumeFeature(volume.NewVolumeFeatureParams{
    Logger:          logger,
    Backend:         storageBackend,
    DB:              volumeRepo,
    Tracer:          tracer,
    Meter:           meter,
    Router:          router,
    RetryPolicy:     volume.RetryPolicy{MaxAttempts: 3, InitialWait: 500*time.Millisecond},
    ReconcilePolicy: config.ReconcilePolicy{DBOnly: "error", CephOnly: "ignore"},
})

NVMe-oF — transport, not a backend

NVMe-oF is a transport layer — it exposes an existing volume over the network. It does NOT store data and does NOT implement VolumeBackend.

Ceph RBD ──► NVMe-oF Target ──► Initiator (sees /dev/nvme1n1 as local disk)

Ceph CAP strategy

Ceph is CP by default — managed at pool level, not application level:

ceph osd pool set rbd-demo min_size 2  # refuse writes if quorum not met
ceph osd pool set rbd-demo size 3

Goroutine lifecycle — WaitGroup

VolumeFeature tracks every internal goroutine with sync.WaitGroup:

type VolumeFeature struct {
    wg        sync.WaitGroup     // tracks all internal goroutines
    cancelCtx context.CancelFunc // cancels them on shutdown
    bgCtx     context.Context    // independent of HTTP request context
}

Shutdown sequence (VolumeFeature.Close):

f.cancelCtx() — signals all goroutines via ctx.Done()
f.wg.Wait() — blocks until every goroutine has exited
LIFO closers — remaining resources closed in reverse init order

Retry goroutines check ctx.Done() before each attempt and during backoff:

for attempt := 1; attempt <= policy.MaxAttempts; attempt++ {
    select {
    case <-ctx.Done():
        return // canceled — stop silently
    default:
    }
    err := op(ctx)
    if err == nil { return }
    select {
    case <-ctx.Done():
        return
    case <-time.After(backoff):
    }
}

Volume FSM lifecycle

stateDiagram-v2
    [*] --> pending
    pending --> creating: create
    creating --> available: ready
    creating --> creating_failed: error
    creating_failed --> creating: retry
    creating_failed --> error: fail

    available --> attaching: attach
    attaching --> attached: attached
    attaching --> attaching_failed: error
    attaching_failed --> attaching: retry
    attaching_failed --> error: fail

    attached --> detaching: detach
    detaching --> available: detached
    detaching --> detaching_failed: error
    detaching_failed --> detaching: retry
    detaching_failed --> error: fail

    available --> deleting: delete
    deleting --> deleted: deleted
    deleting --> deleting_failed: error
    deleting_failed --> deleting: retry
    deleting_failed --> error: fail

    error --> pending: reconcile (absent from backend)
    error --> available: reconcile (exists, not attached)
    error --> attached: reconcile (exists, attached to nodeX)

States

State	Description
`pending`	Volume requested
`creating`	Backend provisioning in progress
`creating_failed`	Provisioning failed — retry pending
`available`	Ready for use
`attaching`	Attach to node in progress
`attaching_failed`	Attach failed — retry pending
`attached`	Mounted on a node
`detaching`	Detach in progress
`detaching_failed`	Detach failed — retry pending
`deleting`	Deletion in progress
`deleting_failed`	Deletion failed — retry pending
`deleted`	Removed
`error`	Terminal failure — recovery via `POST /reconcile`

Retry policy

Every in-progress state has a *_failed intermediate with exponential backoff:

Parameter	Default	Env variable
`MaxAttempts`	`3`	`VOLUME_RETRY_MAX_ATTEMPTS`
`InitialWait`	`500ms`	`VOLUME_RETRY_INITIAL_WAIT`
`Multiplier`	`2.0`	`VOLUME_RETRY_MULTIPLIER`
`MaxWait`	`10s`	`VOLUME_RETRY_MAX_WAIT`

Delays: 500ms → 1s → 2s → terminal error after MaxAttempts.

Every FSM transition is persisted in volume_events (full audit trail for RCA).

Important: retry is triggered only by Ceph backend errors — never by FSM transition errors. An invalid transition (e.g. DELETE on an attached volume) returns 409 Conflict immediately with no retry.

Error recovery — reconcile

error is a terminal state. Recovery is explicit via POST /reconcile:

Real backend state	FSM target
Backend unreachable	stays `error` — returns 503
Volume absent	`pending`
Volume exists, not attached	`available`
Volume exists, attached to node X	`attached`

Startup reconciliation

On startup, volumes stuck in transitional states are reconciled against the real Ceph state. The policy is configurable:

Variable	Default	Description
`RECONCILE_DB_ONLY`	`error`	Volume in DB but absent from Ceph: `error` \| `delete` \| `ignore`
`RECONCILE_CEPH_ONLY`	`ignore`	Volume in Ceph but absent from DB: `ignore` \| `import`

REST endpoints

POST   /api/v1/volumes                    Create a volume
GET    /api/v1/volumes                    List volumes
GET    /api/v1/volumes/{name}             Get a volume
GET    /api/v1/volumes/{name}/events      Stream state changes (SSE)
PUT    /api/v1/volumes/{name}/attach      Attach to a node
PUT    /api/v1/volumes/{name}/detach      Detach
DELETE /api/v1/volumes/{name}             Delete
POST   /api/v1/volumes/{name}/reconcile   Align FSM with real Ceph state
GET    /healthz                           Backend health check
GET    /swagger/index.html                Swagger UI

Server-Sent Events — real-time state streaming

Volume operations are asynchronous (202 Accepted). The /events endpoint streams FSM state transitions in real time — no polling needed:

# Terminal 1 — open stream immediately after create
curl -N http://163.172.144.70:8080/api/v1/volumes/vol-01/events

# Output:
# data: {"name":"vol-01","state":"creating","event":"current","timestamp":"..."}
# data: {"name":"vol-01","state":"available","event":"ready","timestamp":"..."}
# event: done
# data: {"name":"vol-01","state":"available"}

The handler sends the current state immediately on connection, then pushes future transitions. The stream closes automatically on terminal states (available, attached, deleted, error).

Examples

# Create
curl -s -X POST http://localhost:8080/api/v1/volumes \
  -H 'Content-Type: application/json' \
  -d '{"name":"vol-01","size_mb":1024}' | jq

# Stream state changes
curl -N http://localhost:8080/api/v1/volumes/vol-01/events

# List
curl -s http://localhost:8080/api/v1/volumes | jq

# Attach
curl -s -X PUT http://localhost:8080/api/v1/volumes/vol-01/attach \
  -H 'Content-Type: application/json' \
  -d '{"node_id":"node-paris-01"}' | jq

# Detach
curl -s -X PUT http://localhost:8080/api/v1/volumes/vol-01/detach | jq

# Delete
curl -s -X DELETE http://localhost:8080/api/v1/volumes/vol-01

# Reconcile from error state
curl -s -X POST http://localhost:8080/api/v1/volumes/vol-01/reconcile | jq

# Health
curl -s http://localhost:8080/healthz | jq

Quick start

Mock backend (no infrastructure)

make run

Docker Compose (API + PostgreSQL + Loki + Grafana)

docker-compose up

API: http://localhost:8080
Swagger: http://localhost:8080/swagger/index.html
Grafana: http://localhost:3000 — datasource and dashboards provisioned automatically

Ceph backend

STORAGE_BACKEND=ceph \
CEPH_MONITORS=127.0.0.1:6789 \
CEPH_POOL=rbd-demo \
make run-ceph

Backend selection — the CGO nuance

Context	Build flags	Ceph available
`make run`	no `-tags ceph`, `CGO_ENABLED=0`	no — stub returns error
`make run-ceph`	`-tags ceph`, needs local libs	yes
`docker build`	`-tags ceph` inside builder image	yes

The storage/ceph/ package ships a stub.go (//go:build !ceph) that makes the package importable without the C libs, returning an explicit error if Ceph is selected at runtime without the real implementation compiled in.

Environment variables

Variable	Default	Description
`STORAGE_BACKEND`	`mock`	`mock` \| `ceph`
`DATABASE_URL`	(empty)	`postgres://user:pass@host/db?sslmode=disable`
`PORT`	`8080`	HTTP listen port
`ENV`	`development`	`development` \| `production`
`CEPH_MONITORS`	—	Comma-separated monitor addresses
`CEPH_POOL`	`rbd-demo`	Ceph RBD pool name
`CEPH_KEYRING`	`/etc/ceph/ceph.client.admin.keyring`	Keyring path
`OTEL_EXPORTER`	`stdout`	`stdout` \| `jaeger`
`OTEL_JAEGER_ENDPOINT`	`http://localhost:4318/v1/traces`	OTLP HTTP endpoint
`OTEL_SERVICE_NAME`	`block-storage-api`	OTel service name
`VOLUME_RETRY_MAX_ATTEMPTS`	`3`	Max retry attempts before `error` state
`VOLUME_RETRY_INITIAL_WAIT`	`500ms`	Initial backoff delay
`VOLUME_RETRY_MULTIPLIER`	`2.0`	Exponential backoff multiplier
`VOLUME_RETRY_MAX_WAIT`	`10s`	Maximum backoff delay
`RECONCILE_DB_ONLY`	`error`	`error` \| `delete` \| `ignore`
`RECONCILE_CEPH_ONLY`	`ignore`	`ignore` \| `import`

Development

make test         # go test ./... -race
make coverage     # HTML coverage report
make lint         # golangci-lint
make migrate      # apply SQL migrations
make migrate-down # rollback last migration

Test results

ok   config
ok   storage/mock
ok   transport/nvmeof
ok   volume
ok   volume/repository

Deployment

CI/CD pipeline

push → lint → test → build-and-push (main only)
                            ↓
               deploy via workflow_dispatch (manual trigger)

Build is automatic on every push to main. Deploy to production is manual — triggered via GitHub Actions → Run workflow. This prevents accidental deploys and allows grouping multiple commits into a single release.

The deploy step copies grafana/ and docker-compose.yml to the VPS via SCP, then restarts the containers.

Update production manually

ssh root@163.172.144.70
cd /app/block-storage-api
docker compose pull && docker compose up -d
docker compose logs -f

Observability — Grafana + Loki

Grafana datasource and dashboards are provisioned automatically at startup via grafana/provisioning/ — no manual configuration needed after docker compose up.

# HTTP traffic
{service="block-storage-api"} | json | msg="http request"
  | line_format "{{.method}} {{.path}} → {{.status}} ({{.duration_ms}}ms)"

# Errors only
{service="block-storage-api"} | json | level="ERROR"

# HTTP errors 4xx/5xx
{service="block-storage-api"} | json | msg="http request" | status=~"4.*|5.*"

# System / startup logs
{service="block-storage-api"} | json | level="INFO" | msg!="http request"

Context propagation

Context flows through every layer — never dropped or replaced with context.Background():

Layer	Timeout
Backend operations (Ceph)	30s
DB queries	5s
Health check	3s
Graceful shutdown	10s

The HTTP server's BaseContext is set to the application lifecycle context (appCtx). When SIGTERM fires, appCtx is canceled, propagating through all layers to retry goroutines which stop at the next ctx.Done() check. The bgCtx used by retry goroutines is independent of the HTTP request context — it lives for the full duration of the application, not the request.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
assertor		assertor
cmd/api		cmd/api
config		config
docs		docs
grafana/provisioning		grafana/provisioning
internal		internal
scripts		scripts
storage		storage
transport/nvmeof		transport/nvmeof
volume		volume
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

block-storage-api

Live demo

Stack

Infrastructure & Data Flow

Architecture

Feature pattern

NVMe-oF — transport, not a backend

Ceph CAP strategy

Goroutine lifecycle — WaitGroup

Volume FSM lifecycle

States

Retry policy

Error recovery — reconcile

Startup reconciliation

REST endpoints

Server-Sent Events — real-time state streaming

Examples

Quick start

Mock backend (no infrastructure)

Docker Compose (API + PostgreSQL + Loki + Grafana)

Ceph backend

Backend selection — the CGO nuance

Environment variables

Development

Test results

Deployment

CI/CD pipeline

Update production manually

Observability — Grafana + Loki

Context propagation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages