-
Notifications
You must be signed in to change notification settings - Fork 503
Insights: ray-project/kuberay
Overview
Could not load contribution data
Please try again later
10 Pull requests merged by 6 people
-
add node selector option for kubectl plugin create worker group
#3235 merged
Mar 27, 2025 -
[Docs] update development md
#3230 merged
Mar 27, 2025 -
kubectl-plugin: set global flags at the root cmd
#3203 merged
Mar 26, 2025 -
[kubectl-plugin] Add head/worker node selector option
#3228 merged
Mar 26, 2025 -
Integrate with rayci
#3215 merged
Mar 25, 2025 -
[operator] add
// +optional
to CRD fields withomitempty
#3220 merged
Mar 25, 2025 -
[fix][proto] install missing
unzip
in Dockerfile#3221 merged
Mar 24, 2025 -
[operator] remove incorrect comment
#3218 merged
Mar 22, 2025 -
[RayService][Test] make sure annotation populated to RayCluster
#3210 merged
Mar 21, 2025
14 Pull requests opened by 7 people
-
[Prometheus] Add `ray_cluster_head_pod_ready_duration_seconds` metric
#3222 opened
Mar 24, 2025 -
[RayJob][Test] refactor TestValidateRayJobSpec with table test
#3223 opened
Mar 25, 2025 -
[Prometheus] Add `ray_services_created_total` metric
#3224 opened
Mar 25, 2025 -
create cluster with config file
#3225 opened
Mar 25, 2025 -
Update KubeRay release documentation
#3226 opened
Mar 25, 2025 -
[plugin] warn if RayJob YAML has entrypoint
#3229 opened
Mar 26, 2025 -
[operator] add `+optional` to CRD fields that are optional
#3231 opened
Mar 26, 2025 -
Integrate with rayci (#3215)
#3234 opened
Mar 26, 2025 -
[WIP] refactor metrics and add ray_jobs_created_total metric
#3236 opened
Mar 27, 2025 -
[fix][plugin] add missing CLI flags for create cluster cmd
#3237 opened
Mar 27, 2025 -
[refactor][plugin] RayClusterSpecObject
#3238 opened
Mar 27, 2025 -
[feat] enforce DNS1035 validations on RayCluster, RayService, and RayJob names
#3239 opened
Mar 27, 2025 -
[apiserver][feat] add pagination to ListClustersRequest
#3240 opened
Mar 27, 2025 -
[kubectl-plugin] remove CPU limits by default
#3243 opened
Mar 28, 2025
30 Issues closed by 5 people
-
[Bug] test issue for Exalate integration (please ignore)
#3241 closed
Mar 27, 2025 -
What's the behavior for a Pod with restartPolicy Always when it is in PodFailed?
#2293 closed
Mar 26, 2025 -
[Bug] Priority Class Name from worker group spec not forwarded to final templated yaml files
#2086 closed
Mar 26, 2025 -
[Feat][kubectl-plugin] Support node selectors for creating clusters
#3143 closed
Mar 26, 2025 -
[Refactor] Use constants for image tag, image repo, and versions in golang to avoid hard-coded strings
#2939 closed
Mar 22, 2025 -
[release] Update TPU-related YAMLs to Ray 2.41
#2944 closed
Mar 22, 2025 -
[release] Update RayService YAMLs to Ray 2.41
#2948 closed
Mar 22, 2025 -
[release] Update the YAML files of the following examples to Ray 2.41.
#2947 closed
Mar 22, 2025 -
[release] Update YuniKorn YAML files to Ray 2.41
#2968 closed
Mar 22, 2025 -
[GCS FT] Use GcsFaultToleranceOptions instead of `ray.io/ft-enabled` in sample YAMLs
#2957 closed
Mar 22, 2025 -
[release] Update Volcano YAML files to Ray 2.41
#2967 closed
Mar 22, 2025 -
[Refactor] Make head Pod name deterministic
#3013 closed
Mar 22, 2025 -
[release] Add doc for RayJob DeletionPolicy
#3042 closed
Mar 22, 2025 -
[Doc] Update KubeRay upgrade guide for KubeRay v1.2.2 to v1.3.0
#3054 closed
Mar 22, 2025 -
[Request] Test Ray#46861 does not break Kuberay redis cleanup
#2319 closed
Mar 22, 2025 -
[release] Update sample YAMLs from Ray 2.9 to Ray 2.41
#2920 closed
Mar 22, 2025 -
[Feature] Add timestamps for logs in e2e tests
#2836 closed
Mar 22, 2025 -
[Feature] Update doc for GCS FT after API changes
#2697 closed
Mar 22, 2025 -
[Umbrella] RayService Refactoring
#2548 closed
Mar 22, 2025 -
[RayService][Refactor] RayService events
#2547 closed
Mar 22, 2025 -
[Epic] Python tests cleanup
#2509 closed
Mar 22, 2025 -
[Feature] Improve observability of e2e tests in Buildkite runner
#2651 closed
Mar 22, 2025 -
[Bug] Re-enable flaky kubectl plugin e2e test "should reconnect after pod connection is lost"
#2752 closed
Mar 22, 2025 -
[release] Update kubectl plugin doc to beta
#3044 closed
Mar 22, 2025 -
[Refactor] Merge raycluster_gcs_ft_test.go and raycluster_gcsft_test.go
#3002 closed
Mar 22, 2025 -
[Feature] Add shellcheck to pre-commit
#2932 closed
Mar 22, 2025 -
[Bug] Test E2E (kubectl-plugin) Has Different Buildkite Display Format
#3001 closed
Mar 21, 2025 -
[CI] Support upload artifacts in BuildKite
#3106 closed
Mar 21, 2025
6 Issues opened by 6 people
-
[Bug] please ignore - one more test ticket for Exalate connection (to delete)
#3242 opened
Mar 27, 2025 -
[Bug] Ray Dashboard breaks running k8s locally in kind w/ prometheus + grafana configured
#3233 opened
Mar 26, 2025 -
[Feature] Improve error message from "kubectl ray session"
#3232 opened
Mar 26, 2025 -
[Umbrella] Add Autoscaler e2e tests for partial placement groups
#3227 opened
Mar 26, 2025 -
[Feature][metrics] ray_cluster_head_pod_ready_duration_seconds
#3219 opened
Mar 21, 2025 -
[Bug] how to create jobs to use vgpu in hami with volcano and kuberay
#3217 opened
Mar 21, 2025
261 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[refactor][operator]: make `RayStartParams` optional
#3202 commented on
Mar 27, 2025 • 10 new comments -
feat: add aggregated clusterrole
#3193 commented on
Mar 21, 2025 • 2 new comments -
[Prometheus] Add `ray_cluster_provisioned_duration_seconds` metric
#3212 commented on
Mar 25, 2025 • 1 new comment -
[Chore][CI] Limit the release-image-build github workflow to only take tag as input
#3117 commented on
Mar 26, 2025 • 1 new comment -
[Bug] [RayService] KubeRay does not handle voluntary disruptions well
#1333 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Kuberay API Server Makefile target deploy-opeartor default image does not exist
#1350 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Address Ray's suggestion. Claiming an entire K8s Node per Pod.
#1378 commented on
Mar 22, 2025 • 0 new comments -
[Core] Metric unintentional_worker_failures_total is not accurate
#1918 commented on
Mar 22, 2025 • 0 new comments -
[Bug] KubeRay switched traffic when Serve did not start
#1559 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Distributed Inference Issues - GKE Autopilot
#1568 commented on
Mar 22, 2025 • 0 new comments -
[Bug] RayJob Volcano integration
#1580 commented on
Mar 22, 2025 • 0 new comments -
[Bug] KubeRay API Server ListAllComputeTemplates is intermittently failing.
#1591 commented on
Mar 22, 2025 • 0 new comments -
[Bug] KubeRay keep update RayJob CR when external Finalizer added to it.
#1626 commented on
Mar 22, 2025 • 0 new comments -
[Bug][GCS FT] Some Ray Serve configurations
#1628 commented on
Mar 22, 2025 • 0 new comments -
[autoscaler] Autoscaler can not know the overwritten port number.
#1644 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Kustomize cluster-scope-resources creates a namespace.
#1670 commented on
Mar 22, 2025 • 0 new comments -
[Bug] The command `make deploy-with-webhooks` does not succeed in the first run
#1692 commented on
Mar 22, 2025 • 0 new comments -
[Bug] worker group cannot be removed from RayCluster
#1739 commented on
Mar 22, 2025 • 0 new comments -
RayCluster raylet health probe uses hardcoded default dashboard agent port
#1760 commented on
Mar 22, 2025 • 0 new comments -
[RayService] [CI] Some tests for pending/active clusters may spuriously pass because head pod is not manually set to ready
#1768 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Autoscaler round up the amount of cpu resources a pod has.
#1166 commented on
Mar 22, 2025 • 0 new comments -
[Bug] RayService's Head Service does not inherit annotations from Head Pod's Service
#1092 commented on
Mar 22, 2025 • 0 new comments -
default K8s Service `appProtocol: tcp` causes Istio RequestAuthentication to not apply to that port
#1025 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Can not create utils.py in rayservice directory
#843 commented on
Mar 22, 2025 • 0 new comments -
[Bug] change to request and limits does not trigger pod updates
#787 commented on
Mar 22, 2025 • 0 new comments -
[Bug] De-dupe RayCluster comments
#765 commented on
Mar 22, 2025 • 0 new comments -
[Bug] 409 conflicts when updating status
#745 commented on
Mar 22, 2025 • 0 new comments -
Verify the state of `ray pdb` and `ray memory` on KubeRay
#710 commented on
Mar 22, 2025 • 0 new comments -
[Bug] GPU multitenancy issues
#687 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Investigate slow Ray pod termination
#503 commented on
Mar 22, 2025 • 0 new comments -
[RayService] Add envtests for RayService
#2878 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Support `timeout` in Ray job
#2743 commented on
Mar 22, 2025 • 0 new comments -
[Doc] Best practice to reduce image pulling overhead
#2742 commented on
Mar 22, 2025 • 0 new comments -
[Feature][CLI] Support `working_dir` is bigger than 500MB for `ray job submit`
#2717 commented on
Mar 22, 2025 • 0 new comments -
[RayService] Make `ServiceStatus` to be equivalent to `Ready` condition
#2851 commented on
Mar 22, 2025 • 0 new comments -
Make sure RayService example still work
#2307 commented on
Mar 22, 2025 • 0 new comments -
[Feature] shell completion for `kubectl ray get [workergroups|nodes]` CLI commands
#3051 commented on
Mar 22, 2025 • 0 new comments -
[Doc] Rewrite RayCluster config doc to make it more organized
#3055 commented on
Mar 22, 2025 • 0 new comments -
Revisit "Using GPU" doc
#3059 commented on
Mar 22, 2025 • 0 new comments -
[Doc] Update kuberay's performance test documentation
#3082 commented on
Mar 22, 2025 • 0 new comments -
[Bug][kubectl-plugin] E2e test DeferCleanup in BeforeEach may cause namespace race condition
#2758 commented on
Mar 22, 2025 • 0 new comments -
[Bug] `wait-gcs-ready` init-container going out-of-memory indefinitely (OOMKilled)
#2735 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Ray Job submit fails when GCS is enabled
#2676 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Submit multiple RayJobs concurrently will cause ray-operator to slow down
#2646 commented on
Mar 22, 2025 • 0 new comments -
[Bug] apiserver API + Client do not support setting headServiceAnnotations
#2626 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Incorrect labels on the Service for RayCluster head
#2564 commented on
Mar 22, 2025 • 0 new comments -
[RayService][Bug] Partial Removal of Deployments in ray-service.sample.yaml's ServeConfigV2 Causes WaitForServeDeploymentReady State
#2557 commented on
Mar 22, 2025 • 0 new comments -
[RayService][Bug] when doing multi-node serving with vLLM, non-primary worker pods reports unready.
#2552 commented on
Mar 22, 2025 • 0 new comments -
[Bug] ray.exceptions.RpcError: Timed out while waiting for GCS to become available.
#2540 commented on
Mar 22, 2025 • 0 new comments -
[Bug][High Availability] 502 Errors while Head Node in Recovery
#1153 commented on
Mar 22, 2025 • 0 new comments -
If the head node dies, the cluster is never restored [Bug]
#1141 commented on
Mar 22, 2025 • 0 new comments -
[Bug] lifecycle prestophook not available in helm chart values.yaml
#1103 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Kuberay can not start when enableBatchScheduler=true
#2430 commented on
Mar 22, 2025 • 0 new comments -
[Bug] KubeRay Worker group pod keeps restarting on EKS - Fails to CrashLoopBackOff
#2420 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Grafana DashBoard is too old
#2400 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Bubble ImagePullErr and ImagePullBackoff to the Ray CRD
#2387 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Autoscaler sideacr crashes, bringing down head pod, if request exceeds max pod replicas
#2385 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Old RayServices not deleted after operator update to 1.2.1
#2374 commented on
Mar 22, 2025 • 0 new comments -
[Bug] RayJob does not shut down the submitter pod properly
#2359 commented on
Mar 22, 2025 • 0 new comments -
[Bug] RayJob can have JobStatus = Running while JobDeploymentStatus = Failed
#2314 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Readiness and liveness probes failing when applying ray-service.sample.yaml file
#2269 commented on
Mar 22, 2025 • 0 new comments -
[Bug] What's minimum permission set for kuberay-operator?
#2213 commented on
Mar 22, 2025 • 0 new comments -
[Bug] "enable-batch-scheduler" bool flag is not working for schedulers other than Volcano
#2185 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Readiness probe failed: timeout on minikube
#2158 commented on
Mar 22, 2025 • 0 new comments -
[Bug] [API Server] Can't specify cluster rayVersion in Ray Job
#2109 commented on
Mar 22, 2025 • 0 new comments -
[Bug] What's the relationship between watching `Endpoints` and RayService e2e tests?
#2085 commented on
Mar 22, 2025 • 0 new comments -
[Bug/RayJob] k8sjob doesn't contain ray worker logs with ray.shutdown() at the end
#1975 commented on
Mar 22, 2025 • 0 new comments -
[Bug] Ray cluster terminates more worker pods than the amount of replica scale down requested
#1936 commented on
Mar 22, 2025 • 0 new comments -
[Umbrella] Remove the RayCluster `.Status.State` field
#2299 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Support get k8s pod log for interactive RayJob
#2701 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Make `suspend` API for worker group atomic
#2693 commented on
Mar 22, 2025 • 0 new comments -
[Umbrella] `Suspend` API for worker groups
#2692 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Emit metrics of cluster creation and other related metrics for observability
#2681 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Don't submit scale requests if the worker group is suspended
#2666 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Run Ginkgo tests with Ginkgo CLI
#2658 commented on
Mar 22, 2025 • 0 new comments -
Replace references of rayproject/ray-ml image
#2292 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [API Server] Support `activeDeadlineSeconds` in API Server RayJob resource
#2278 commented on
Mar 22, 2025 • 0 new comments -
[Discuss][Feature] Authenticate requests sent to Ray Serve proxy actors
#2276 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Remove wait-for-gcs init container
#2275 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Support extended kube-scheduler as batch scheduler
#2052 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [API Server] Allow autoscaling in python api server client
#2029 commented on
Mar 22, 2025 • 0 new comments -
[CI] RayJob `ray-job.shutdown.yaml` sample YAML test is flaky
#2979 commented on
Mar 24, 2025 • 0 new comments -
[CI] `TestAutoscalingRayService` is flaky
#2981 commented on
Mar 24, 2025 • 0 new comments -
[Doc] Explain the difference between head svc and serve of RayService
#2995 commented on
Mar 24, 2025 • 0 new comments -
[Bug] Minor inconsistency in RayJob submitter retries in v1.3
#3211 commented on
Mar 26, 2025 • 0 new comments -
[Feature][metrics] ray_services_ready_duration_seconds
#3177 commented on
Mar 26, 2025 • 0 new comments -
[Feature][metrics] ray_services_created_total
#3176 commented on
Mar 26, 2025 • 0 new comments -
[Bug] RayService applications do not load on v1.3.0
#3133 commented on
Mar 26, 2025 • 0 new comments -
[Discussion] KubeRay and Ray version compatibilities
#187 commented on
Mar 26, 2025 • 0 new comments -
[Epic][Feature] KubeRay v1.4.0 - Operator SLI Tracking
#3171 commented on
Mar 26, 2025 • 0 new comments -
[Umbrella] Autoscaler improvements
#2600 commented on
Mar 26, 2025 • 0 new comments -
[Bug] Compilation fails for apiserver
#3061 commented on
Mar 26, 2025 • 0 new comments -
[Bug] Minimum CPU and Memory requirements for KubeRay Head and worker pods
#2186 commented on
Mar 26, 2025 • 0 new comments -
[Feature] I want to be able to add annotations to the ServiceAccount that RayCluster creates
#2322 commented on
Mar 26, 2025 • 0 new comments -
[Bug] autoscaler doesn't launch larger nodes for pending tasks
#3214 commented on
Mar 26, 2025 • 0 new comments -
[Bug] [API Server] JobSubmission service does not work for cluster names >41 characters
#2169 commented on
Mar 27, 2025 • 0 new comments -
[release] Add doc tests for important KubeRay docs
#3157 commented on
Mar 28, 2025 • 0 new comments -
[RayService] Support Incremental Zero-Downtime Upgrades
#3166 commented on
Mar 25, 2025 • 0 new comments -
[Prometheus] Add `ray_clusters_created_total` metric
#3204 commented on
Mar 25, 2025 • 0 new comments -
[Feature] Add `DeleteWorkersOnFailure` deletion policy for RayJob
#2765 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Check Redis cleanup in Golang e2e tests
#2764 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Print out errors or upload logs as artifacts when kubectl plugin e2e test error occurs
#2753 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add `force` to worker group `suspend` API
#2744 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Allow setting `app.kubernetes.io/name` in head-svc created by RayService
#2648 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [kuberay/apiserver] adding custom init container configuration to worker container specs
#2623 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Support preserving only the Head pod with RayJob when shutdownAfterJobFinishes=true
#2615 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add kubectl plugin to KubeRay repo README
#2603 commented on
Mar 22, 2025 • 0 new comments -
[auth] Support auth for RayJob / RayService
#2586 commented on
Mar 22, 2025 • 0 new comments -
[auth] RBAC to support both Autoscaler and Auth
#2585 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Configure dashboard readiness probe based on dashboard-port in rayStartParams
#2584 commented on
Mar 22, 2025 • 0 new comments -
[Feature] How to propagate the labels from CR to K8s built-in resources?
#2583 commented on
Mar 22, 2025 • 0 new comments -
[Umbrella] KubeRay Authentication Support
#2581 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Why KubeRay image becomes larger?
#2580 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Integrate KubeRay with Koordinator
#2573 commented on
Mar 22, 2025 • 0 new comments -
[Feature] ENABLE_GCS_FT_REDIS_CLEANUP should be an field in RayService
#2571 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add E2E Test for Autoscaler Nested Remote Functions
#2568 commented on
Mar 22, 2025 • 0 new comments -
[Chore] Golang import grouping by package
#2567 commented on
Mar 22, 2025 • 0 new comments -
[Umbrella] Controller Expectation
#2566 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add e2e Ray v2 Autoscaler Tests with KubeRay
#2561 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Light-weight job submitter
#2537 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Identify and apply changes on ray-cluster
#2534 commented on
Mar 22, 2025 • 0 new comments -
[Umbrella] Multi-tenant observability
#2526 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Deploy Ray Serve replicas across multiple K8s nodes / regions / zones to achieve HA
#2500 commented on
Mar 22, 2025 • 0 new comments -
Statistics of other types of gpu
#2484 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Ray Cluster: Preserving Job State After Cluster Restart
#2479 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Support cron scheduling for RayJob
#2426 commented on
Mar 22, 2025 • 0 new comments -
[Benchmark] Summarize the benchmark results and experiment setup
#2402 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Consider setting AUTOSCALER_CONSERVE_GPU_NODES by default in Ray autoscaler
#2381 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Remove `//nolint:gosec` to allow rule G115 after the false positive issue is solved
#2369 commented on
Mar 22, 2025 • 0 new comments -
[Doc] KubeRay configuration
#2356 commented on
Mar 22, 2025 • 0 new comments -
[RFC] Introduce new API-RayCluster Fleet and ReplicaSet in KubeRay
#2323 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Api Server Support API: Run Ray Service on the cluster which is created already
#1646 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Schedule RayJob
#1662 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [ApiServer] Publish API Server Python client to PyPi
#1708 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Finalizer to block deletion of RayCluster with running jobs
#1740 commented on
Mar 22, 2025 • 0 new comments -
[Feature] rayStartParams UX improvement
#1758 commented on
Mar 22, 2025 • 0 new comments -
[RayService] Refactor to unify cluster decision for active and pending RayClusters
#1761 commented on
Mar 22, 2025 • 0 new comments -
[RayService] [Enhancement] Avoid unnecessary pod deletion when updating RayCluster
#1769 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Create `serve-svc` before all apps are running
#1803 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [CLI] Add arm64 arch build for kuberay and output should ignore by git
#1819 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Use ServiceAccount token when submitting RayJobs
#1824 commented on
Mar 22, 2025 • 0 new comments -
[Test] Add RayJob + Kueue e2e tests
#1846 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add default volumes and volumeMounts for `/tmp/ray` to Ray Pods
#1851 commented on
Mar 22, 2025 • 0 new comments -
[Docs] Add docs for structured config and default sidecar containers
#1869 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Support AWS IAM for Redis Auth
#1888 commented on
Mar 22, 2025 • 0 new comments -
[Doc] RayJob `suspend`
#1900 commented on
Mar 22, 2025 • 0 new comments -
[Doc] Create a doc for external Redis with TLS
#1921 commented on
Mar 22, 2025 • 0 new comments -
[Discussion][GCS FT] Delete Pods by the value of restartCount
#1356 commented on
Mar 22, 2025 • 0 new comments -
[Feature] publishNotReadyAddresses for services
#1365 commented on
Mar 22, 2025 • 0 new comments -
[Doc] Create a doc for #1386
#1393 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Extract autoscaler as separate deployment per raycluster
#1416 commented on
Mar 22, 2025 • 0 new comments -
[Feature][GCS FT] Test the behavior of `cleanup_redis_storage`
#1422 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Improve the Kustomization deployment
#1434 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Allow for specifying options when creating ingresses
#1436 commented on
Mar 22, 2025 • 0 new comments -
Enable usage of Ray Serve when using RayCluster CRD
#1451 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [api-server] API documentation revamp
#1455 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Support networking.k8s.io.IngressSpec as ingress config for ray head
#1475 commented on
Mar 22, 2025 • 0 new comments -
[CI] Test the compatibility between KubeRay / Ray
#1499 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Serve gRPC health check
#1554 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add a doc and YAML for Ray Serve grpc support
#1555 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Configurable RayCluster readiness definition
#1631 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Move kuberay-helm to KubeRay repo
#1635 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Aggregate roles on Kuberay resources to Kubernetes user-facing roles
#1641 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Should we stop publishing images on DockerHub?
#1924 commented on
Mar 22, 2025 • 0 new comments -
[Feature][metrics] ray_cluster_provisioned_duration_seconds
#3172 commented on
Mar 21, 2025 • 0 new comments -
[release] Automate the Helm chart release process
#3162 commented on
Mar 21, 2025 • 0 new comments -
[release] Port KubeRay CI tests to Ray CI
#3158 commented on
Mar 21, 2025 • 0 new comments -
[Umbrella] Optimize release process
#3098 commented on
Mar 21, 2025 • 0 new comments -
[Bug] Unable to deploy vLLM with llama 3 8B on Red Hat Openshift
#3164 commented on
Mar 21, 2025 • 0 new comments -
[Bug] Ray can not schedule the workload when there is enough resource
#3151 commented on
Mar 21, 2025 • 0 new comments -
[Bug] kubectl plugin not detecting KubeRay on GKE RayOperator Add-On
#3115 commented on
Mar 21, 2025 • 0 new comments -
[Feature][metrics] ray_jobs_terminated_total
#3180 commented on
Mar 21, 2025 • 0 new comments -
[Feature][metrics] ray_clusters_created_total
#3175 commented on
Mar 21, 2025 • 0 new comments -
[Feature][metrics] ray_job_deployment_status
#3179 commented on
Mar 21, 2025 • 0 new comments -
[Feature][metrics] ray_job_execution_duration_seconds
#3181 commented on
Mar 21, 2025 • 0 new comments -
[Feature] Add ability to add a prestop hook
#3185 commented on
Mar 21, 2025 • 0 new comments -
[doc] Update kubectl plugin doc for `scale` command
#3190 commented on
Mar 21, 2025 • 0 new comments -
[Feature][metrics] ray_job_submitter_pod_startup_duration_seconds
#3201 commented on
Mar 21, 2025 • 0 new comments -
[Feature] RayService Incremental Upgrade Project Tracker
#3209 commented on
Mar 21, 2025 • 0 new comments -
[Feature] Revisit RBAC
#3213 commented on
Mar 21, 2025 • 0 new comments -
Improve logging configuration experience
#1940 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Create a Script to Manually Clean up Redis
#1941 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Set appProtocol correctly for head service
#1946 commented on
Mar 21, 2025 • 0 new comments -
[Feature] Support dynamic refresh of watched namespaces
#2061 commented on
Mar 21, 2025 • 0 new comments -
[Feature] KubeRay Scalability Benchmarking
#2069 commented on
Mar 21, 2025 • 0 new comments -
[Feature] [API Server] [RFC] Add persistence for job history using a SQL database
#2114 commented on
Mar 21, 2025 • 0 new comments -
[Feature] RayService CRD to have ImagePullSecret Reference
#2137 commented on
Mar 21, 2025 • 0 new comments -
FT GCS should handle draining of node where head pod is scheduled
#2153 commented on
Mar 21, 2025 • 0 new comments -
[Feature] Should we also set PublishNotReadyAddresses if the service is not headless?
#2157 commented on
Mar 21, 2025 • 0 new comments -
[Umbrella] Ray Autoscaling tests
#2173 commented on
Mar 21, 2025 • 0 new comments -
[Doc] Release schedule
#2191 commented on
Mar 21, 2025 • 0 new comments -
Use login shell for Submitter Pod command
#2209 commented on
Mar 21, 2025 • 0 new comments -
[Feature] Add API reference documentation for KubeRay custom resources
#2247 commented on
Mar 21, 2025 • 0 new comments -
[CI] CI doesn't detect whether a PR runs `make generate` or not
#3091 commented on
Mar 21, 2025 • 0 new comments -
[Bug] RayCluster K8s event for creating worker Pods
#3056 commented on
Mar 21, 2025 • 0 new comments -
[Feature][metrics] ray_jobs_created_total
#3178 commented on
Mar 21, 2025 • 0 new comments -
[Feature] raycluster created by rayservice controller may need some default labels?
#412 commented on
Mar 22, 2025 • 0 new comments -
[Feature] use kubernetes events to trigger autoscaling vs directly patching the CR
#442 commented on
Mar 22, 2025 • 0 new comments -
[Feature] add ut for job spec in api server
#444 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [Documentation] Document state and status fields
#454 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Enable replicated KubeRay operator deployments
#474 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [Test] KubeRay vs. Kubernetes version compatibility testing
#516 commented on
Mar 22, 2025 • 0 new comments -
[Feature][RayService] Handle serve deployment delete during the cluster destroy.
#647 commented on
Mar 22, 2025 • 0 new comments -
integration test to stop ray job when delete job custom resource or timeout is triggered
#664 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add a stale bot
#681 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Python linting
#682 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Streaming data from local machine to cluster
#684 commented on
Mar 22, 2025 • 0 new comments -
[Discussion][Feature] Clean up workersToDelete?
#733 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [Test] E2E release tests on cloud K8s infra
#737 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [Docs] Explain how to use Ray custom resources with examples
#742 commented on
Mar 22, 2025 • 0 new comments -
[Feature][Docs][Discussion] Provider consistent guidance on resource Request and Limits
#744 commented on
Mar 22, 2025 • 0 new comments -
[Feature] add ServiceAccountName
#746 commented on
Mar 22, 2025 • 0 new comments -
Create a table for YAML and who refers to it
#3050 commented on
Mar 22, 2025 • 0 new comments -
[Doc] Rewrite GPU training doc
#3049 commented on
Mar 22, 2025 • 0 new comments -
Use uv in sample YAML files
#3039 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Please publish the api as distinct go module
#3020 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Throw an error message on applying if redis password is set in GcsFaultToleranceOptions and rayStartParams both
#2986 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Support to generate PersistentVolumeClaim for Pod
#59 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add affinity and toleration in ComputeTemplate
#177 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Enrich Github Issue Templates
#186 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Revisit share memory setting for Ray Cluster
#201 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [apiserver] support custom resource in ComputeTemplate
#207 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Authn and Authz support for KubeRay API server and CLI
#263 commented on
Mar 22, 2025 • 0 new comments -
[Feature] [Docs] Document workflows for deploying KubeRay with GitOps systems (ArgoCD, Flux)
#273 commented on
Mar 22, 2025 • 0 new comments -
[Feature] CI test for single namespace
#282 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Ray remote kernels
#300 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Make a one click deployment of Kuberay by using EKS Blueprint or a eksctl example
#301 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Pagination support for ListClusters
#312 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Sort randomPodsToDelete in descending order of timestamp
#747 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Publish Helm chart for RayService
#1085 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Expose Ray --logging-level parameter
#1098 commented on
Mar 22, 2025 • 0 new comments -
[DOC] Add tips to use ray serve cli to check the kuberay cluster remotely.
#1124 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add LLM examples using KubeRay
#1131 commented on
Mar 22, 2025 • 0 new comments -
How does ray cluster support third-party resources besides GPU and CPU
#1135 commented on
Mar 22, 2025 • 0 new comments -
[Feature][Discussion] RayCluster restart policy
#1174 commented on
Mar 22, 2025 • 0 new comments -
[RayJob] Add `RayCronJob` Custom Resource Definition
#1206 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Use the observedGeneration to determine whether to update the status or not.
#1217 commented on
Mar 22, 2025 • 0 new comments -
[Refactor] Separate the logic for retrieving Serve application statuses and checking Serve application statuses
#1234 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add warning messages for instances where CPUs aren't specified when using Autoscaler.
#1254 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add kubeconform to CI pipeline
#1267 commented on
Mar 22, 2025 • 0 new comments -
Add To Kuberay Metrics
#1272 commented on
Mar 22, 2025 • 0 new comments -
[Doc] Add guidelines for setting up ALB ingress for RayService
#1273 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Avoid manually importing Grafana dashboard JSON file
#1285 commented on
Mar 22, 2025 • 0 new comments -
[Enhancement] [Doc] Link shutdown sample YAML in RayJob doc
#1302 commented on
Mar 22, 2025 • 0 new comments -
Contribute Tutorial for model training/fine-tuning using KubeRay with CodeFlare and ODH stack.
#1328 commented on
Mar 22, 2025 • 0 new comments -
[CI][Release] Run CI for PRs against release branch
#799 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Connect to RayCluster via GCS port rather than Ray client in compatibility test
#848 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Autoscaler to support scaling from zero based on `custom_resource` or `accelerator_type`
#863 commented on
Mar 22, 2025 • 0 new comments -
[Feature] The raycluster-autoscaler serviceaccount inherit default serviceaccount imagePullSecrets
#892 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Nightly helm releases
#910 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add tutorial to explain how to collect metrics from operator for Prometheus
#921 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Provide KubeRay backend implementation for SkyPilot
#930 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Reconcile workers when volumeMount differs
#944 commented on
Mar 22, 2025 • 0 new comments -
[Feature][Doc] Troubleshooting for networking
#955 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Add document for environment_variables and Audit all environment variables to identify which should not be modified by users.
#957 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Automatically expire GCS FT Redis keys
#959 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Doc about network policies
#961 commented on
Mar 22, 2025 • 0 new comments -
[Feature][Doc] Revisit the definitions for ports
#964 commented on
Mar 22, 2025 • 0 new comments -
[Feature] Ensure the number of healthy workers while keep the abnormal worker for troubleshooting
#1022 commented on
Mar 22, 2025 • 0 new comments -
[Umbrella] GCS fault tolerance on KubeRay
#1033 commented on
Mar 22, 2025 • 0 new comments -
Adding support for environment variables, labels, annotations, and toleraritions
#1067 commented on
Mar 22, 2025 • 0 new comments