Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from ray-project:master #958

Merged
merged 39 commits into from
Jul 11, 2024
Merged

Conversation

pull[bot]
Copy link

@pull pull bot commented Jul 10, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

hongchaodeng and others added 22 commits July 9, 2024 19:59
Signed-off-by: hongchaodeng <hongchaodeng1@gmail.com>
since the tests are currently flaky and slow

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
Signed-off-by: liuxsh9 <liuxiaoshuang4@huawei.com>
…s in ray.init(). (#46516)

This helps debugging GCS connection issues.

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
for standard bazel build file formatting

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
the pycache and tests dirs are not useful, non deterministic, and just
making the wheel larger.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
to load balance the review load

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
to master and release branches only

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…nosecond (#46518)

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
move into ray package, remove the one from root.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
remove stuff that does not work anymore, and fixes the ray image
building parts

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
Did some bisecting and found out this commit was causing Serve's performance test latency to spike. Reverting 05067f4 to go back to previous state.
- Do not block windows on release automation run anymore
- Make the block on windows + linux arm64 consistent

Test:
- CI

Signed-off-by: can <can@anyscale.com>
…perly (#46484)

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Split window flaky test jobs into core, serve and serverless. Couple of
reasons;
- the script `ci/build/upload_build_info.sh` fails on windows when being
called repeatedly
https://buildkite.com/ray-project/postmerge/builds/5319#0190996e-9a8d-4385-bb22-0d51ff2cd9cd/7990-7991
- it can take hours and the whole things keep retrying

Test:
- CI
- postmerge: https://buildkite.com/ray-project/postmerge/builds/5349

Signed-off-by: can <can@anyscale.com>
…ed memory write operation (#46508)

Signed-off-by: kaihsun <kaihsun@anyscale.com>
Signed-off-by: hongchaodeng <hongchaodeng1@gmail.com>
Forgot to fix bazel version in the new window flaky test jobs

Test:
- CI

Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
so that the effect of `refreshenv` is preserved

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
@pull pull bot added the ⤵️ pull label Jul 10, 2024
khluu and others added 7 commits July 10, 2024 05:11
```
New release perf metrics missing file scalability/object_store.json
REGRESSION 6.74%: placement_group_create/removal (THROUGHPUT) regresses from 840.8257707443967 to 784.1202913310515 in microbenchmark.json
REGRESSION 4.55%: client__1_1_actor_calls_sync (THROUGHPUT) regresses from 534.2825013844715 to 509.9599816194958 in microbenchmark.json
REGRESSION 4.46%: single_client_get_calls_Plasma_Store (THROUGHPUT) regresses from 10593.772848299006 to 10121.103242219997 in microbenchmark.json
REGRESSION 4.06%: single_client_put_gigabytes (THROUGHPUT) regresses from 20.28764104367834 to 19.46333348333893 in microbenchmark.json
REGRESSION 4.04%: multi_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 13048.216108376133 to 12520.58965968965 in microbenchmark.json
REGRESSION 3.95%: n_n_actor_calls_with_arg_async (THROUGHPUT) regresses from 2713.0325692965866 to 2605.856362562882 in microbenchmark.json
REGRESSION 3.83%: client__tasks_and_put_batch (THROUGHPUT) regresses from 11759.788796582228 to 11309.127935041968 in microbenchmark.json
REGRESSION 3.08%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 13.16167615938565 to 12.756244682120503 in microbenchmark.json
REGRESSION 2.54%: client__put_calls (THROUGHPUT) regresses from 814.6764560093619 to 794.0222907625882 in microbenchmark.json
REGRESSION 1.58%: tasks_per_second (THROUGHPUT) regresses from 588.1590100663536 to 578.8766226882515 in benchmarks/many_tasks.json
REGRESSION 1.58%: single_client_tasks_and_get_batch (THROUGHPUT) regresses from 8.033801054151493 to 7.9070880635954 in microbenchmark.json
REGRESSION 1.54%: n_n_actor_calls_async (THROUGHPUT) regresses from 27657.83033159681 to 27232.414296780542 in microbenchmark.json
REGRESSION 1.41%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.378868872174563 to 5.302957674144409 in microbenchmark.json
REGRESSION 1.39%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5300.894918847503 to 5227.298677681264 in microbenchmark.json
REGRESSION 1.22%: pgs_per_second (THROUGHPUT) regresses from 22.96731187832995 to 22.687659485012095 in benchmarks/many_pgs.json
REGRESSION 1.16%: multi_client_tasks_async (THROUGHPUT) regresses from 23557.51911206466 to 23283.706392178385 in microbenchmark.json
REGRESSION 0.76%: tasks_per_second (THROUGHPUT) regresses from 346.9124752975113 to 344.2841239720449 in benchmarks/many_nodes.json
REGRESSION 0.57%: single_client_tasks_sync (THROUGHPUT) regresses from 987.4363632697047 to 981.7983599799647 in microbenchmark.json
REGRESSION 35.04%: dashboard_p95_latency_ms (LATENCY) regresses from 1221.413 to 1649.419 in benchmarks/many_tasks.json
REGRESSION 25.70%: dashboard_p95_latency_ms (LATENCY) regresses from 8.221 to 10.334 in benchmarks/many_pgs.json
REGRESSION 18.31%: dashboard_p99_latency_ms (LATENCY) regresses from 281.247 to 332.736 in benchmarks/many_pgs.json
REGRESSION 9.82%: dashboard_p50_latency_ms (LATENCY) regresses from 128.651 to 141.285 in benchmarks/many_tasks.json
REGRESSION 7.17%: dashboard_p95_latency_ms (LATENCY) regresses from 63.659 to 68.223 in benchmarks/many_nodes.json
REGRESSION 5.59%: dashboard_p99_latency_ms (LATENCY) regresses from 133.071 to 140.505 in benchmarks/many_nodes.json
REGRESSION 5.56%: stage_2_avg_iteration_time (LATENCY) regresses from 62.212187099456784 to 65.67325186729431 in stress_tests/stress_test_many_tasks.json
REGRESSION 4.42%: avg_pg_remove_time_ms (LATENCY) regresses from 0.8868805465475326 to 0.9261005015020346 in stress_tests/stress_test_placement_group.json
REGRESSION 3.93%: dashboard_p99_latency_ms (LATENCY) regresses from 3317.765 to 3448.302 in benchmarks/many_tasks.json
REGRESSION 3.92%: avg_iteration_time (LATENCY) regresses from 1.0120761251449586 to 1.0517002582550048 in stress_tests/stress_test_dead_actors.json
REGRESSION 3.80%: 3000_returns_time (LATENCY) regresses from 5.560233610000012 to 5.771739185999991 in scalability/single_node.json
REGRESSION 3.46%: 10000_get_time (LATENCY) regresses from 22.85316222099999 to 23.645023898000005 in scalability/single_node.json
REGRESSION 3.03%: 1000000_queued_time (LATENCY) regresses from 182.31759296599998 to 187.84350834300002 in scalability/single_node.json
REGRESSION 1.79%: stage_3_time (LATENCY) regresses from 3011.46821808815 to 3065.2378103733063 in stress_tests/stress_test_many_tasks.json
REGRESSION 1.20%: dashboard_p50_latency_ms (LATENCY) regresses from 3.924 to 3.971 in benchmarks/many_nodes.json
REGRESSION 1.05%: 10000_args_time (LATENCY) regresses from 17.234402031000002 to 17.415384294000006 in scalability/single_node.json
REGRESSION 0.38%: dashboard_p50_latency_ms (LATENCY) regresses from 3.377 to 3.39 in benchmarks/many_pgs.json
```

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
Co-authored-by: Lonnie Liu <lonnie@anyscale.com>
…ack and two separate optimizers (w/ different learning rates). (#46540)
linux://python/ray/dashboard:test_serve_dashboard has become timedout
and flaky recently (#46459);
not sure if it has anything to do with
#45943. I just increase its timed
out in this PR

Test:
- CI
- https://buildkite.com/ray-project/postmerge/builds/5358

Signed-off-by: can <can@anyscale.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
…Ref`s in-memory (#46369)

Currently, the implementation of `Dataset.count()` retrieves the entire
list of `BlockRef`s associated with the Dataset when calculating the
number of rows per block. This PR is a minor performance improvement to
use an iterator over the `BlockRef`s, so that we can drop them as soon
as we get each block's row count, and we do not need to hold the entire
list of `BlockRef`s.

Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
This dependency is needed to test video-related APIs.

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
can-anyscale and others added 10 commits July 10, 2024 20:53
Currently `api_policy_check` only obtain APIs in the head rst file, and
rsts included in the head file. However, we now have rsts including
other rsts, etc.

This PR updates the logic to recursively obtain all rsts from the head
file.

Test:
- CI

---------

Signed-off-by: can <can@anyscale.com>
closes #46350

Signed-off-by: Superskyyy <yihaochen@apache.org>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
simplify things

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…ndles` (#46547)

The name is misleading. The value represents bundles, not blocks.

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Fix api policy check for auto-generated API docs. For the check to work
properly, we first need to compile ray docs to generate all API docs.

Test:
- CI

Signed-off-by: can <can@anyscale.com>
close #46482

Signed-off-by: zhilong <zhilong.chen@mail.mcgill.ca>
…GcsJobManager::HandleGetAllJobInfo (#46335)

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@pull pull bot merged commit bb1759a into ddelange:master Jul 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.