-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/streamingccl/streamingest: TestStreamingReplanOnLag failed #120688
Comments
failure unrelated to actual test, so likely an infra flake |
I took a peak at the logs on this one and the behaviour is relatively confusing to me. I'll backfill some notes but I'm gonna bump this to P1. |
Here is what I find confusing:
But then we time out waiting to reach that initial scan timestamp:
That is odd in itself. But even more odd is the timestamp in our job progress. At least, it was odd to me until I remembered that we started quantising our checkpoints. My guess is we forgot to update this test. |
ccl/streamingccl/streamingest.TestStreamingReplanOnLag failed on master @ f0116ea373a2b87155e7f0264df4f783ce177360:
Parameters:
|
This test is very flaky. See cockroachdb#120688 Epic: none Release note: None
120419: sqlstats: simplify transaction latency test r=abarganier,xinhaoz a=dhartunian Remove need for test case counter which causes a data race. Fixes: #119580 Epic: None Release note: None 120633: physicalplan: bias towards streaks in bulk planning r=dt a=dt The bulk oracle is used when planning large, bulk jobs, that are expected to involve many or all ranges in a cluster, where all nodes are likely to be assigned a large number of spans, and overall plan and the specs that represent it will include a very large number of distinct spans. These large numbers of distinct spans in the specs can increase the cost of executing such a plan. In particular, processes that maintain a frontier of spans processes or not processed or time at which they are processed, such as CDC and PCR, have to track far more distinct spans in large clusters. We can, however, in some cases reduce this number of distinct spans, by biasing the assignment of key ranges to nodes during replica selection to pick the same node for sequential ranges. By assigning, say, 10 spans to node one, then ten to two, then ten to three, potentially each node is now only tracking one logical span, that is 10x wider, instead of ten distinct spans. We can bias towards such streaks only when the candidate replicas for a span include one on the node that would extend the streak, so this is an opportunistic optimization that depends on replica placement making it an option. Additionally we need to be careful when applying such a bias that we still *distribute* work roughly evenly to achieve our desired overall utilization of the cluster. Thus we only bias towards streaks when the streak length is short or when the node on which we are extending a streak remains within some multiple of the least assigned node, reverting to the normal random selection if this is not the case. Release note: none. Epic: none. 120766: sql: increase raft command size limit for some tests r=DrewKimball a=DrewKimball The tests `TestLargeDynamicRows` and `TestLogic_upsert_non_metamorphic` occasionally flake because they set the raft command size limit to the minimum `4MiB`, and their batch size limiting is inexact. This commit prevents the flake by increasing the limit to `5MiB`. Making the batch size limit exact will still be tracked by #117070. Informs #117070 Release note: None 120769: sqlstats: skip TestSQLStatsCompactor r=abarganier a=dhartunian Release note: None 120781: streamingest: skip `TestStreamingReplanOnLag` r=rail a=rickystewart This test is very flaky. See #120688 Epic: none Release note: None Co-authored-by: David Hartunian <davidh@cockroachlabs.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Drew Kimball <drewk@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
@dt I downgraded this for now, but I think we skipped this after the quantization PR went in. I think some of your subsequent updates probably fixed this and we can unskip it now. |
Fixes cockroachdb#120688 Release note: none
ccl/streamingccl/streamingest.TestStreamingReplanOnLag failed on master @ a2f1f379ee52ceee2b2aa6769f1fa162f8d6b8a7:
Parameters:
attempt=1
run=1
shard=15
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-36820
The text was updated successfully, but these errors were encountered: