Performance and reliability testing for OpenCloudHub platform using k6.
Explore OpenCloudHub Β»
π Table of Contents
This repository contains the k6-based performance testing suite example for the OpenCloudHub ML platform. It provides comprehensive testing capabilities across all platform servicesβfrom quick smoke tests validating service health to extended soak tests that uncover memory leaks and stability issues.
Performance testing is critical for ML platforms where inference latency directly impacts user experience:
- Validate Correctness: Ensure services respond correctly under various load conditions
- Catch Regressions: Identify performance degradation before reaching production
- Capacity Planning: Understand system limits for infrastructure sizing
- SLA Compliance: Ensure ML model inference latency meets defined thresholds
| Capability | Description |
|---|---|
| Multi-Type Testing | Smoke, load, stress, spike, soak, and breakpoint tests |
| Service Coverage | ML models, MLOps tools, infrastructure, and observability |
| Kubernetes Native | Run tests inside the cluster using k6-operator |
| Grafana Integration | Results tagged for dashboard filtering and analysis |
This work demonstrates how k6 can be integrated for continuous performance validation, with results feeding into Grafana dashboards for trend analysis across deployments.
| Repository | Purpose |
|---|---|
gitops |
ArgoCD application definitions and Kubernetes manifests for the tests |
api-testing (this repo) |
Performance testing suite |
- π Multiple Test Types β Smoke, load, stress, spike, soak, and breakpoint tests with predefined profiles
- π Service Categories β Organized tests for ML models, MLOps, infrastructure, and observability services
- π§ Reusable Helpers β Common HTTP utilities, data loading, and check functions
- βοΈ Configurable Thresholds β Per-test-type thresholds tuned for local Kind/Minikube clusters
- π Automatic Reporting β JSON output with detailed metrics per test run
- π·οΈ Grafana Tagging β All requests tagged for dashboard filtering (testid, test_type, test_target)
- π³ DevContainer Ready β Works out of the box in VS Code DevContainers
- βΈοΈ Kubernetes Native β Run tests inside the cluster using k6-operator
Tests can run in two modes:
| Mode | Command | Description |
|---|---|---|
| Local | make smoke |
Run k6 directly from DevContainer against cluster services |
| In-Cluster | Via gitops repo | k6-operator runs tests inside Kubernetes |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DevContainer β
β ββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββββββββ β
β β make βββββΆβ k6 Runtime βββββΆβ Services (via Ingress) β β
β β smoke β β β β *.opencloudhub.org β β
β ββββββββββββ βββββββββ¬ββββββββ ββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β results/<timestamp>/ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
For Kubernetes-native testing, the gitops repo manages TestRun CRDs that use this repo's Docker image:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Kubernetes Cluster β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β
β β k6-operator ββββΆβ k6 Runner Pod ββββΆβ Services β β
β β β β (k6-tests image)β β (internal DNS) β β
β ββββββββββββββββββββ ββββββββββ¬ββββββββββ ββββββββββββββββββββ β
β β β
β βΌ β
β Prometheus (metrics) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The Docker image (opencloudhuborg/k6-tests) packages all tests, config, and data from this repo.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Test Suite β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β config/ helpers/ tests/ β
β βββ environments.js βββ checks.js βββ 01-smoke/ β
β β (service URLs) β (assertions) βββ 02-load/ β
β βββ endpoints.js βββ data.js βββ 03-stress/ β
β β (API paths) β (test data) βββ 04-spike/ β
β βββ thresholds.js βββ http.js βββ 05-soak/ β
β (SLA limits) (requests) βββ 06-breakpoint/ β
β β
β data/ scripts/ results/ β
β βββ fashion-mnist.json βββ summary.sh βββ <timestamp>/ β
β βββ wine.json (aggregation) (JSON output) β
β βββ qwen-prompts.json β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Requirement | Purpose |
|---|---|
| Docker | Container runtime for DevContainer |
| VS Code | IDE with DevContainers extension |
| OpenCloudHub Cluster | Target platform (Minikube or remote) |
1. Clone the repository
git clone https://github.com/opencloudhub/api-testing.git
cd api-testing2. Open in DevContainer (Recommended)
Press Ctrl+Shift+P β Dev Containers: Rebuild and Reopen in Container
The DevContainer includes k6 pre-installed and configured.
3. Configure /etc/hosts (for local cluster)
Ensure your host machine has cluster IPs mapped:
cat /etc/hosts | grep opencloudhub
# Should show entries like:
# 192.168.49.2 mlflow.internal.opencloudhub.org
# 192.168.49.2 api.opencloudhub.org4. Verify Setup
make help # Show all make targets
make list # List available test scripts5. Run First Test
make smoke-platform-mlops # Quick health checkDefines service URLs per environment. Two environments are supported:
| Environment | Use Case | URL Pattern |
|---|---|---|
dev |
Local testing via ingress | https://*.opencloudhub.org |
internal |
In-cluster testing | http://*.svc.cluster.local |
// Example: Switch environment
// CLI: TEST_ENV=internal make smoke
const ENVIRONMENTS = {
dev: {
models: { api: 'https://api.opencloudhub.org' },
platform: {
mlops: { mlflow: 'https://mlflow.internal.opencloudhub.org' }
}
},
internal: {
platform: {
mlops: { mlflow: 'http://mlflow.mlops.svc.cluster.local:5000' }
}
}
};Performance thresholds and load profiles per test type:
| Metric | Smoke | Load | Stress | Spike | Soak |
|---|---|---|---|---|---|
http_req_failed |
<10% | <5% | <10% | <15% | <5% |
http_req_duration p95 |
<3s | <2.5s | <4s | <5s | <3s |
checks pass rate |
>90% | >90% | >85% | >80% | >90% |
Common endpoint patterns by service type:
// Custom ML models (FastAPI)
export const CUSTOM_MODEL_ENDPOINTS = {
health: '/health',
info: '/info',
predict: '/predict'
};
// Base LLM models (OpenAI-compatible)
export const BASE_MODEL_ENDPOINTS = {
models: '/models',
chat: '/chat/completions'
};api-testing/
βββ config/ # Configuration files
β βββ endpoints.js # API endpoint patterns by service type
β βββ environments.js # Service URLs per environment (dev, internal)
β βββ thresholds.js # Performance thresholds and load profiles
β
βββ data/ # Test data files
β βββ fashion-mnist.json # Image samples (784 pixels each)
β βββ wine.json # Wine feature samples (13 features)
β βββ qwen-prompts.json # LLM prompt samples
β βββ rag-queries.json # RAG query samples
β
βββ helpers/ # Reusable test utilities
β βββ checks.js # Standardized k6 check functions
β βββ data.js # Data loading (SharedArray) utilities
β βββ http.js # HTTP request wrappers with checks
β
βββ tests/ # Test scripts organized by type
β βββ 01-smoke/ # Quick health validation (10s, 1 VU)
β β βββ apps/ # Team applications
β β βββ models/ # ML model tests
β β β βββ base/ # Base LLM models (qwen)
β β β βββ custom/ # Custom models (fashion-mnist, wine)
β β βββ platform/ # Platform services
β β βββ gitops.js # ArgoCD
β β βββ infrastructure.js # MinIO, pgAdmin
β β βββ mlops.js # MLflow, Argo Workflows
β β βββ observability.js # Grafana
β βββ 02-load/ # Normal traffic (~7.5 min, 10-50 VUs)
β βββ 03-stress/ # Beyond normal (~18 min, 5-20 VUs)
β βββ 04-spike/ # Traffic bursts (~2.5 min, 3-25 VUs)
β βββ 05-soak/ # Extended duration (~34 min, 5 VUs)
β βββ 06-breakpoint/ # Find limits (~10 min, 10-100 req/s)
β
βββ scripts/
β βββ summary.sh # Results aggregation script
β
βββ results/ # Test output (gitignored)
β βββ <timestamp>/ # Per-run results
β βββ smoke-*.json # Full k6 output
β βββ smoke-*-summary.json # Aggregated metrics
β
βββ Dockerfile # Container image for k6-operator
βββ Makefile # Test orchestration commands
βββ README.md # This documentation
Different test types validate different aspects of system behavior:
| Test | Duration | VUs | Purpose | When to Use |
|---|---|---|---|---|
| Smoke | 10s | 1 | Quick health validation | After deployments, CI/CD |
| Load | ~7.5m | 10β50 | Normal traffic simulation | Capacity validation |
| Stress | ~18m | 5β20 | Beyond normal capacity | Find breaking points |
| Spike | ~2.5m | 3β25 | Sudden traffic bursts | Test auto-scaling |
| Soak | 34m+ | 5 | Extended duration | Find memory leaks |
| Breakpoint | ~10m | 10β100 req/s | Increasing until failure | Max capacity |
Quick validation that services are alive and responding correctly.
make smoke # All services
make smoke-platform # Platform services only
make smoke-fashion-mnist # Single modelSimulate expected production traffic patterns with ramping VUs.
make load # All load tests
make load-fashion-mnist # Single model (~7.5 minutes)Push beyond normal capacity to observe degradation behavior.
make stress-fashion-mnist # ~18 minutesSudden traffic bursts to test resilience and recovery.
make spike-fashion-mnist # ~2.5 minutesExtended duration to find memory leaks and connection exhaustion.
make soak-fashion-mnist # ~34 minutesContinuously increase load until the system fails.
make breakpoint-fashion-mnist # ~10 minutes# Run all smoke tests (recommended first step)
make smoke
# Run by category
make smoke-platform # MLOps, GitOps, Infrastructure, Observability
make smoke-models # Fashion MNIST, Wine, Qwen
# Run specific service
make smoke-fashion-mnist
make smoke-platform-mlops
# Different environment
TEST_ENV=internal make smoke# Show summary of latest run
make summary
# Browse result files
ls results/
# View detailed JSON
cat results/20251203-120000/smoke-platform-mlops-summary.json | jqRun make help to see all available targets:
Test Types:
smoke - Quick health checks (10s)
load - Normal load (~9min)
stress - Beyond normal (~18min)
spike - Sudden bursts (~2.5min)
soak - Extended duration (~34min)
breakpoint - Find limits (~10min)
Targets:
smoke Run all smoke tests
smoke-platform Platform smoke tests
smoke-models Model smoke tests
load Run all load tests
...
For in-cluster testing, see the gitops repo testing section which manages:
- k6-operator deployment
- TestRun CRDs for each test
- Makefile for easy execution (
make smoke-fashion-mnist) - Prometheus integration for metrics export
The tests use the Docker image built from this repo (opencloudhuborg/k6-tests), which packages all test scripts, config, and data.
# Build locally
docker build -t opencloudhuborg/k6-tests:latest .
# Image contents
/tests/
βββ config/ # Environment configs
βββ helpers/ # Test utilities
βββ tests/ # Test scripts
βββ data/ # Test dataContributions are welcome! This project follows OpenCloudHub's contribution standards.
- Add service URL to
config/environments.js - Create test file following existing patterns in
tests/ - Add make target to
Makefile - Test locally before submitting
- Use descriptive check names for Grafana filtering
- Follow existing file structure and naming conventions
- Add JSDoc comments for exported functions
- Use helpers from
helpers/for consistency
- Fork the repository
- Create a feature branch
- Commit with descriptive messages
- Open PR against
main
See Contributing Guidelines for details.
Distributed under the Apache 2.0 License. See LICENSE for details.
OpenCloudHub β GitHub Organization
Project Link: https://github.com/opencloudhub/api-testing