feat: Add Apache Spark operator implementation #77

mobs75 · 2025-11-22T16:10:45Z

Summary

Complete Apache Spark operator implementation for OpenServerless.

Changes

New Files

nuvolaris/spark.py: Core Spark operator module (create/delete/patch)
nuvolaris/templates/spark-*: Kubernetes manifests (Master, History Server, RBAC)
deploy/spark/: Deployment resources

Modified Files

nuvolaris/patcher.py: Added Spark patching logic
nuvolaris/main.py: Integrated Spark into operator workflow

Key Features

✅ Spark Master StatefulSet with configurable resources
✅ Spark History Server with MinIO S3 integration
✅ Proper memory format conversion (K8s 1Gi ↔ JVM 1g)
✅ RBAC with least-privilege access
✅ Health checks and owner references

Testing

Verified on MicroK8s:

kubectl get pods -l app=spark
NAME                     READY   STATUS    RESTARTS   AGE
spark-history-xxx        1/1     Running   0          10m
spark-master-0           1/1     Running   0          10m

Part of main integration PR: apache/openserverless#TBD

- Add spark.py operator module with create/delete/patch functions - Add Kubernetes manifests (StatefulSet, Deployment, Services, RBAC) - Integrate Spark operator in main.py (create/delete handlers) - Extend Whisk CRD with Spark configuration schema - Add test configuration example (whisk-spark.yaml) Features: - Spark Master with optional HA support - Scalable Spark Workers (1 to N replicas) - History Server with persistent storage - Complete RBAC configuration - Health checks and resource management - Event logging support Implements all OpenServerless operator patterns: - Kopf framework handlers - Owner references for garbage collection - Kustomize templating - Status tracking in CRD - Error handling and logging

- Implement kopf-based SparkJob operator handlers (create/delete) - Add SparkJob CRD definition for nuvolaris.org/v1 - Create Kubernetes Job template for Spark driver execution - Configure GHCR authentication for mobs75/openserverless-operator:spark-dev - Add operator pod configuration with proper resource allocation - Update build tasks for Spark operator deployment

- Add Spark cluster deployment templates (master, worker, history) - Update Spark StatefulSets and Deployment configurations - Add Spark ConfigMap and RBAC templates - Update kaniko build and operator deployment configs - Update README with Spark operator documentation

- Remove TaskfileBuild.yml, TaskfileDev.yml, TaskfileOlaris.yml, TaskfileTest.yml - These belong to the task project, not operator project - Keep main Taskfile.yml for operator-specific tasks

- Add Spark cluster deployment with master + 2 workers + history server - Implement SparkJob CRD support (sparkjobs.nuvolaris.org) - Configure NodePort services for Web UI external access - Optimize memory settings for single-node MicroK8s deployment - Add Spark component tracking in main.py and patcher.py - Include test configuration whisk-spark-test.yaml - Update operator image to ghcr.io/mobs75/openserverless-operator:dev-spark-v6 Tested features: ✅ Complete Spark cluster deployment and operation ✅ SparkJob CRD creation and management ✅ Web UIs accessible via NodePort (Master 30808, Workers 31081/31082, History 31808) ✅ Job submission and driver execution

- Modifica worker Spark - Fix handler spark.py - Aggiorna operator-deploy.yaml per Spark Branch: feature/add-spark-operator

mobs75 added 9 commits November 10, 2025 17:12

fix: remove whisk-system mkdir from sources stage (Kaniko permission)

545573f

refactor: Move specialized Taskfiles to task project

cd05158

- Remove TaskfileBuild.yml, TaskfileDev.yml, TaskfileOlaris.yml, TaskfileTest.yml - These belong to the task project, not operator project - Keep main Taskfile.yml for operator-specific tasks

Fix whisk-crd.yaml duplicate type:integer in milvus section

8430d5b

fix: Aggiornamenti Spark operator e deploy

b3801bc

- Modifica worker Spark - Fix handler spark.py - Aggiorna operator-deploy.yaml per Spark Branch: feature/add-spark-operator

Enable Spark in whisk-full test configuration

a2f3dae

mobs75 mentioned this pull request Nov 22, 2025

test: Add Spark component to Whisk CRD schema apache/openserverless-testing#3

Open

mobs75 force-pushed the feature/add-spark-operator branch from afc74b4 to a2f3dae Compare November 24, 2025 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Apache Spark operator implementation #77

feat: Add Apache Spark operator implementation #77

Uh oh!

mobs75 commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add Apache Spark operator implementation #77

Are you sure you want to change the base?

feat: Add Apache Spark operator implementation #77

Uh oh!

Conversation

mobs75 commented Nov 22, 2025

Summary

Changes

New Files

Modified Files

Key Features

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant