KubeMapReduce is a distributed MapReduce platform for Kubernetes. It provides a CLI for users, a Manager for scheduling and fault tolerance, worker pods for execution, Keycloak-based authentication, PostgreSQL as DDS state, and MinIO for object storage.
- CLI service: user/admin commands (
login,jobs submit,admin create-user, etc.) - Manager API: REST endpoints for job and admin operations
- Manager gRPC server: worker registration, heartbeats, completion/failure reporting
- Workers: dynamic Kubernetes Jobs spawned per task attempt
- Auth: Keycloak + JWT validation
- Storage: PostgreSQL (state) + MinIO (input/shuffle/output objects)
cli-service/ CLI entrypoint and command handlers
manager-service/ API, scheduler/orchestrator, gRPC worker server
worker-service/ Worker runtime/execution pipeline
auth-service/ Keycloak bootstrap and auth helpers
proto/ gRPC/protobuf contract (mapreduce.proto)
k8s/ Kubernetes manifests
migrations/ Numbered PostgreSQL schema migrations
docs/ Architecture, deployment, and operations guides
infra/ Dockerfiles and local infra assets
- Go 1.26+
- Docker + Docker Compose (local infra)
kubectl(Kubernetes deployment / Minikube flow)- A Kubernetes cluster (required for real worker execution)
This path is useful for API/auth/storage integration checks. Real MapReduce execution still requires Kubernetes worker Jobs.
- Copy environment template:
cp infra/docker/.env.example infra/docker/.env- Start infra stack:
cd infra/docker
docker compose up -d- Apply migrations:
Get-Content migrations\0001_initial_schema.sql | docker exec -i mapreduce-postgres psql -U mapreduce -d mapreduce- Create initial admin user:
go run ./auth-service/cmd/setup --admin-password admin --username platform-admin --email platform-admin@example.com --prompt-password --role ADMIN- Verify CLI/API path:
go run ./cli-service/cmd/cli login --username platform-admin
go run ./cli-service/cmd/cli health
go run ./cli-service/cmd/cli jobs listFor full local Kubernetes execution, use docs/MINIKUBE_LOCAL_DEV.md.
For full deployment details and production-oriented setup:
Core behavior:
- Manager runs as a StatefulSet
- Workers are created dynamically as
batch/v1Jobs - API/UI are stateless services
- Gateway API handles external routing (
api,storage,auth)
go build -o bin/cli ./cli-service/cmd/cli
go build -o bin/manager ./manager-service/cmd/manager
go build -o bin/api ./manager-service/cmd/api
go build -o bin/worker ./worker-service/cmd/worker
go build -o bin/auth-setup ./auth-service/cmd/setupgo fmt ./...
go vet ./...
go mod tidy
go test -v -race -coverprofile=coverage.out ./...
govulncheck ./...Equivalent Make targets:
make build
make fmt
make vet
make test
make lintkubemapreduce login
kubemapreduce logout
kubemapreduce health
kubemapreduce jobs submit|list|status|download|cancel
kubemapreduce whoami
kubemapreduce admin create-user|delete-user|worker-config|configure-nodes
kubemapreduce token inspect
Environment defaults used by the CLI:
API_URL(defaulthttp://localhost:8081)KEYCLOAK_BASE_URL(defaulthttp://localhost:8080)KEYCLOAK_REALM(defaultmapreduce)KEYCLOAK_AUDIENCE(defaultmapreduce-api)
All protected routes require Authorization: Bearer <token>.
| Method | Endpoint | Auth |
|---|---|---|
GET |
/healthz |
None |
GET |
/readyz |
None |
POST |
/api/v1/jobs |
USER or ADMIN |
GET |
/api/v1/jobs |
USER or ADMIN |
GET |
/api/v1/jobs/{job_id} |
USER or ADMIN |
DELETE |
/api/v1/jobs/{job_id} |
USER or ADMIN |
POST |
/api/v1/uploads/presigned |
USER or ADMIN |
POST |
/api/v1/downloads/presigned |
USER or ADMIN |
POST |
/api/v1/admin/users |
ADMIN |
DELETE |
/api/v1/admin/users/{username} |
ADMIN |
POST |
/api/v1/admin/config/workers |
ADMIN |
Defined in proto/mapreduce.proto:
RegisterHeartbeatTaskCompleteTaskFailed
Regenerate stubs after editing proto:
protoc --go_out=. --go-grpc_out=. proto/mapreduce.proto- Architecture: docs/ARCHITECTURE.md
- Data locality architecture: docs/ARCHITECTURE_DATA_LOCALITY.md
- External access architecture: docs/ARCHITECTURE_EXTERNAL_ACCESS.md
- Cluster deployment checklist: docs/CLUSTER_DEPLOYMENT.md
- Minikube local flow: docs/MINIKUBE_LOCAL_DEV.md
- Linkerd setup: docs/LINKERD_SETUP.md
- Timeout tuning: docs/TIMEOUT_CONFIGURATION.md
- Monitoring/results: docs/MONITORING_AND_RESULTS.md
- Migrations guide: migrations/README.md