v2.2.0-ce

Latest

Latest

Rader released this 16 Jun 03:13

eecbbd4

v2.2.0-ce

✨ New Features

AI Gateway & OpenAI-Compatible APIs
- Upstream Reliability: Added upstream health checks, circuit breaker state, fallback routing, and per-upstream availability filtering.
- Observability: Added Prometheus metrics for chat upstream attempts and optional LLM tracing for chat completions.
- Model Routing: Improved external model routing with relational upstreams, per-upstream provider/auth/model overrides, and session-aware routing.
ClawHub & Skills
- Added ClawHub-compatible APIs for skill search, publish, resolve, download, whoami, and skill version lookup.
- Added skill version publishing from existing skill repositories through /skills/{namespace}/{name}/publish.
- Added persisted skill version records so published skills can expose installable versions.
Platform Dataflow
- Added platform dataflow job APIs for creating, deleting, inspecting, and reading logs from Argo-based dataflow jobs.
- Added runner-side dataflow workflow support, including PVC creation, workflow status tracking, and log access.
Activity Logs
- Added activity log collection for model, dataset, git, inference, fine-tune, serverless, and agent operations.
- Added admin activity log listing via /api/v1/admin/activity_logs.
Code Repositories
- Added code package upload support with presigned S3 POST URLs.
- Added ZIP and tar.gz package import during code repository creation.

🚀 Enhancements & Bug Fixes

AI Gateway Reliability
- Fixed model cache miss behavior and reduced stale routing issues.
- Improved chat fallback retry behavior for failed upstream attempts.
- Improved multimedia generation behavior, including Seedance video download handling.
- Added safer token/API-key handling in metering and proxy logs.
Repository Search & Metadata
- Added model parameter range filters for model listing.
- Added dataset repository-size range filters for dataset listing.
- Fixed skill tag and skill mirror_from_saas route behavior..
Resource Scheduling
- Added admin listing for all space resources.
- Improved resource availability checks by returning explicit errors when cluster resource status is uncertain.
- Made CPU-only scheduling to GPU/XPU nodes configurable.
Finetune & Runner
- Allowed fine-tune logs to be read by either numeric job ID or task ID.
- Fixed runner service/workflow panic paths and cluster existence checks.
Repository & LFS
- Improved LFS mirroring for small files.
- Added object size mismatch cleanup during LFS sync.
- Improved LFS worker lifecycle cleanup.
- Fixed XNet public repository download behavior when no access token is available.
Proxy & Networking
- Sanitized unsafe forwarded headers.
- Used request host handling that preserves proper upstream Host behavior.

🛠 Maintenance

Migrated AI Gateway upstream configuration into relational upstream, health state, and circuit state tables.
Added persistence support for skill versions, activity logs, dataflow tasks, and repository range filters.
Updated inference runtime configs and images, including AMD, HF inference toolkit, and vLLM variants.
Added localized error codes for skill, git, agent, and task errors.
Refreshed tests across AI Gateway, activity logs, ClawHub, dataflow, skill publishing, resource checks, and LFS sync.

Assets 2