Skip to content

CodeMaestro1/KubeMapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

501 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KubeMapReduce

KubeMapReduce is a distributed MapReduce platform for Kubernetes. It provides a CLI for users, a Manager for scheduling and fault tolerance, worker pods for execution, Keycloak-based authentication, PostgreSQL as DDS state, and MinIO for object storage.

Architecture at a glance

  • CLI service: user/admin commands (login, jobs submit, admin create-user, etc.)
  • Manager API: REST endpoints for job and admin operations
  • Manager gRPC server: worker registration, heartbeats, completion/failure reporting
  • Workers: dynamic Kubernetes Jobs spawned per task attempt
  • Auth: Keycloak + JWT validation
  • Storage: PostgreSQL (state) + MinIO (input/shuffle/output objects)

Repository layout

cli-service/         CLI entrypoint and command handlers
manager-service/     API, scheduler/orchestrator, gRPC worker server
worker-service/      Worker runtime/execution pipeline
auth-service/        Keycloak bootstrap and auth helpers
proto/               gRPC/protobuf contract (mapreduce.proto)
k8s/                 Kubernetes manifests
migrations/          Numbered PostgreSQL schema migrations
docs/                Architecture, deployment, and operations guides
infra/               Dockerfiles and local infra assets

Prerequisites

  • Go 1.26+
  • Docker + Docker Compose (local infra)
  • kubectl (Kubernetes deployment / Minikube flow)
  • A Kubernetes cluster (required for real worker execution)

Local development (infra smoke path)

This path is useful for API/auth/storage integration checks. Real MapReduce execution still requires Kubernetes worker Jobs.

  1. Copy environment template:
cp infra/docker/.env.example infra/docker/.env
  1. Start infra stack:
cd infra/docker
docker compose up -d
  1. Apply migrations:
Get-Content migrations\0001_initial_schema.sql | docker exec -i mapreduce-postgres psql -U mapreduce -d mapreduce
  1. Create initial admin user:
go run ./auth-service/cmd/setup --admin-password admin --username platform-admin --email platform-admin@example.com --prompt-password --role ADMIN
  1. Verify CLI/API path:
go run ./cli-service/cmd/cli login --username platform-admin
go run ./cli-service/cmd/cli health
go run ./cli-service/cmd/cli jobs list

For full local Kubernetes execution, use docs/MINIKUBE_LOCAL_DEV.md.

Kubernetes deployment

For full deployment details and production-oriented setup:

Core behavior:

  • Manager runs as a StatefulSet
  • Workers are created dynamically as batch/v1 Jobs
  • API/UI are stateless services
  • Gateway API handles external routing (api, storage, auth)

Build and quality commands

go build -o bin/cli ./cli-service/cmd/cli
go build -o bin/manager ./manager-service/cmd/manager
go build -o bin/api ./manager-service/cmd/api
go build -o bin/worker ./worker-service/cmd/worker
go build -o bin/auth-setup ./auth-service/cmd/setup
go fmt ./...
go vet ./...
go mod tidy
go test -v -race -coverprofile=coverage.out ./...
govulncheck ./...

Equivalent Make targets:

make build
make fmt
make vet
make test
make lint

CLI commands

kubemapreduce login
kubemapreduce logout
kubemapreduce health
kubemapreduce jobs submit|list|status|download|cancel
kubemapreduce whoami
kubemapreduce admin create-user|delete-user|worker-config|configure-nodes
kubemapreduce token inspect

Environment defaults used by the CLI:

  • API_URL (default http://localhost:8081)
  • KEYCLOAK_BASE_URL (default http://localhost:8080)
  • KEYCLOAK_REALM (default mapreduce)
  • KEYCLOAK_AUDIENCE (default mapreduce-api)

Public REST API (manager-service API)

All protected routes require Authorization: Bearer <token>.

Method Endpoint Auth
GET /healthz None
GET /readyz None
POST /api/v1/jobs USER or ADMIN
GET /api/v1/jobs USER or ADMIN
GET /api/v1/jobs/{job_id} USER or ADMIN
DELETE /api/v1/jobs/{job_id} USER or ADMIN
POST /api/v1/uploads/presigned USER or ADMIN
POST /api/v1/downloads/presigned USER or ADMIN
POST /api/v1/admin/users ADMIN
DELETE /api/v1/admin/users/{username} ADMIN
POST /api/v1/admin/config/workers ADMIN

gRPC contract (manager <-> worker)

Defined in proto/mapreduce.proto:

  • Register
  • Heartbeat
  • TaskComplete
  • TaskFailed

Regenerate stubs after editing proto:

protoc --go_out=. --go-grpc_out=. proto/mapreduce.proto

Documentation

About

Distributed MapReduce platform on Kubernetes. Workers run as dynamic K8s Jobs. Built with Go, gRPC, Keycloak auth, MinIO storage, and a CLI for job submission.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors