feat: Add Apache Spark integration to OpenServerless #188

mobs75 · 2025-11-22T16:03:57Z

Summary

This PR adds comprehensive Apache Spark integration to OpenServerless, enabling users to deploy and manage Spark clusters alongside their serverless workloads for big data processing capabilities.

Architecture

The integration is implemented in the operator submodule and follows OpenServerless patterns:

Spark deployment managed by Kubernetes operator
Seamless integration with existing OpenServerless components (MinIO, PostgreSQL, MongoDB, Redis)
Declarative configuration through Whisk CRD

Key Features

Spark Components

✅ Spark Master: Standalone cluster manager with configurable resources
✅ Spark History Server: Web UI for completed applications with S3-compatible storage
✅ Spark Workers: (foundation ready for dynamic scaling)

Technical Implementation

Resource Management: Proper memory format handling (Kubernetes 1Gi ↔ JVM 1g)
Service Discovery: Automatic DNS configuration for inter-component communication
Storage Integration: MinIO S3-compatible storage for Spark event logs
RBAC: Least-privilege security with proper ServiceAccount and Role bindings
Health Checks: Comprehensive readiness/liveness probes
Lifecycle Management: Owner references for automatic cleanup

Changes

Operator Submodule (commit `afc74b4`)

New Module: nuvolaris/spark.py - Complete Spark operator implementation
Templates: Kubernetes manifests for RBAC, ConfigMaps, Services, StatefulSets
Integration: Hooks into main operator workflow (patcher.py, main.py)

Configuration

apiVersion: nuvolaris.org/v1
kind: Whisk
metadata:
  name: controller
spec:
  components:
    spark: true
  spark:
    enabled: true
    mode: standalone
    image: apache/spark:3.5.0
    master:
      memory: 1Gi
      cpu: 1000m
    history:
      enabled: true
      backend: s3a
      s3a:
        bucket: spark-history
        endpoint: http://minio.nuvolaris.svc.cluster.local:9000
        secretRef: nuvolaris-minio

Testing

Tested on MicroK8s cluster:

✅ Spark Master deployment and healthy startup
✅ History Server with MinIO integration
✅ Resource limits properly applied
✅ Service endpoints accessible (spark://spark-master:7077)
✅ Web UI available on port 8080

Verification

kubectl -n nuvolaris get pods -l app=spark
NAME                            READY   STATUS    RESTARTS   AGE
spark-history-7b7d97c7d         1/1     Running   0          10m
spark-master-0                  1/1     Running   0          10m

Use Cases

Data Processing: Run Spark jobs within OpenServerless environment
ETL Pipelines: Process large datasets stored in MinIO
Machine Learning: Train models using Spark MLlib
Analytics: Query and analyze data alongside serverless functions

Future Enhancements

Dynamic Spark Worker scaling
Spark application submission via operator API
Metrics integration with Prometheus
Support for Spark on Kubernetes mode
Jupyter notebook integration

Documentation

User documentation and examples to be added in follow-up PRs.

Related Issues: Closes #[issue-number]

Operator Submodule PR: mobs75/openserverless-operator#[pr-number]

- Add Spark operator build and test tasks in task project - TaskfileBuild.yml for GHCR image building - TaskfileTest.yml for SparkJob testing - Sync with openserverless-task fork feature/enable-spark-in-whisk

…dd-spark-operator branch

- Update operator submodule to include Spark integration (commit afc74b4) - Add comprehensive Spark deployment support - Enable Spark Master, History Server, and Worker management - Integrate with MinIO for Spark event logs storage

- Add Spark operator build tasks - Add Spark testing workflows - Update Whisk CRD with Spark configuration

mobs75 added 7 commits November 20, 2025 18:25

Update testing submodule to include Spark CRD support

c0a071a

Update operator submodule to feature/add-spark-operator for testing

2e19a53

chore: update task submodule to include Spark support in Whisk spec

8239206

feat: Update task submodule with Spark operator Taskfiles

6e6198c

- Add Spark operator build and test tasks in task project - TaskfileBuild.yml for GHCR image building - TaskfileTest.yml for SparkJob testing - Sync with openserverless-task fork feature/enable-spark-in-whisk

fix: update operator submodule to point to mobs75 fork with feature/a…

36d9140

…dd-spark-operator branch

Update operator submodule with Spark test configuration

e15fe70

feat: integrate Apache Spark operator

b30d168

- Update operator submodule to include Spark integration (commit afc74b4) - Add comprehensive Spark deployment support - Enable Spark Master, History Server, and Worker management - Integrate with MinIO for Spark event logs storage

mobs75 mentioned this pull request Nov 22, 2025

test: Add Spark component to Whisk CRD schema apache/openserverless-testing#3

Open

task: integrate Spark build and test automation

dacb864

- Add Spark operator build tasks - Add Spark testing workflows - Update Whisk CRD with Spark configuration

mobs75 mentioned this pull request Nov 22, 2025

feat: Add Spark operator build and test automation apache/openserverless-task#175

Open

mobs75 added 3 commits November 24, 2025 19:10

Remove obsolete submodules olaris-op and task

9142289

Update operator submodule reference

c390c73

Update olaris submodule to feature/enable-spark-in-whisk branch

1a93a3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Apache Spark integration to OpenServerless #188

feat: Add Apache Spark integration to OpenServerless #188

mobs75 commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add Apache Spark integration to OpenServerless #188

Are you sure you want to change the base?

feat: Add Apache Spark integration to OpenServerless #188

Conversation

mobs75 commented Nov 22, 2025

Summary

Architecture

Key Features

Spark Components

Technical Implementation

Changes

Operator Submodule (commit afc74b4)

Configuration

Testing

Verification

Use Cases

Future Enhancements

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Operator Submodule (commit `afc74b4`)