Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downloadOp leads to meta test timeouts #3430

Closed
RaduBerinde opened this issue Mar 20, 2024 · 3 comments · Fixed by #3445
Closed

downloadOp leads to meta test timeouts #3430

RaduBerinde opened this issue Mar 20, 2024 · 3 comments · Fixed by #3445
Assignees
Projects

Comments

@RaduBerinde
Copy link
Member

RaduBerinde commented Mar 20, 2024

We have seen meta tests timing out since the introduction of downloadOp (e.g. #3426).

With the patch below, we see cases where DownloadOp takes 20+ seconds.

--- a/metamorphic/meta.go
+++ b/metamorphic/meta.go
@@ -609,7 +609,11 @@ func Execute(m *Test) error {
                                        }
                                }
 
+                               start := time.Now()
                                m.ops[idx].run(m, m.h.recorder(t, idx, nil /* optionalRecordf */))
+                               if delta := time.Since(start); delta > 10*time.Second {
+                                       panic(fmt.Sprintf("%s took %s", m.ops[idx].String(), delta))
+                               }
@RaduBerinde
Copy link
Member Author

#3431 temporarily disables download ops.

@RaduBerinde
Copy link
Member Author

The problem seems to be that we're constantly kicking off compactions but they're not the ones we want:

downloadSpan select len(downloads)=1

// INFO: [JOB 1142] sstable deleted 001189
// INFO: [JOB 1142] sstable deleted 001194
// INFO: [JOB 1142] sstable deleted 001222
// INFO: [JOB 1143] compacting(move) L3 [001223] (600B) Score=7.84 + L4 [] (0B) Score=6.48; OverlappingRatio: Single 0.00, Multi 0.00
// INFO: [JOB 1143] compacted(move) L3 [001223] (600B) Score=7.84 + L4 [] (0B) Score=6.48 -> L4 [001223] (600B), in 0.0s (0.0s total), output rate 12MB/s

downloadSpan select len(downloads)=1

// INFO: [JOB 1144] compacting(move) L2 [000977] (599B) Score=7.42 + L3 [] (0B) Score=7.12; OverlappingRatio: Single 0.00, Multi 0.00
// INFO: [JOB 1144] compacted(move) L2 [000977] (599B) Score=7.42 + L3 [] (0B) Score=7.12 -> L3 [000977] (599B), in 0.0s (0.0s total), output rate 59MB/s
// INFO: [JOB 1145] compacting(move) L3 [000974] (599B) Score=7.54 + L4 [] (0B) Score=6.74; OverlappingRatio: Single 0.00, Multi 0.00
// INFO: [JOB 1145] compacted(move) L3 [000974] (599B) Score=7.54 + L4 [] (0B) Score=6.74 -> L4 [000974] (599B), in 0.0s (0.0s total), output rate 28MB/s

downloadSpan select len(downloads)=1

// INFO: [JOB 1146] compacting(default) L4 [001144] (855B) Score=7.00 + L5 [] (0B) Score=6.97; OverlappingRatio: Single 0.00, Multi 0.00
// INFO: [JOB 1146] compacting: sstable created 001243
// INFO: [JOB 1146] compacted(default) L4 [001144] (855B) Score=7.00 + L5 [] (0B) Score=6.97 -> L5 [001243] (855B), in 0.0s (0.0s total), output rate 523KB/s

downloadSpan select len(downloads)=1

// INFO: [JOB 1147] compacting(move) L5 [000811] (825B) Score=7.19 + L6 [] (0B) Score=0.27; OverlappingRatio: Single 0.00, Multi 0.00
// INFO: [JOB 1147] compacted(move) L5 [000811] (825B) Score=7.19 + L6 [] (0B) Score=0.27 -> L6 [000811] (825B), in 0.0s (0.0s total), output rate 66MB/s

downloadSpan select len(downloads)=1

// INFO: [JOB 1148] compacting(move) L1 [000980] (598B) Score=6.75 + L2 [] (0B) Score=6.73; OverlappingRatio: Single 0.00, Multi 0.00
// INFO: [JOB 1148] compacted(move) L1 [000980] (598B) Score=6.75 + L2 [] (0B) Score=6.73 -> L2 [000980] (598B), in 0.0s (0.0s total), output rate 54MB/s

downloadSpan select len(downloads)=1

// INFO: [JOB 1149] compacting(move) L2 [000978] (598B) Score=7.42 + L3 [] (0B) Score=7.24; OverlappingRatio: Single 0.00, Multi 0.00
// INFO: [JOB 1149] compacted(move) L2 [000978] (598B) Score=7.42 + L3 [] (0B) Score=7.24 -> L3 [000978] (598B), in 0.0s (0.0s total), output rate 58MB/s

@RaduBerinde
Copy link
Member Author

In maybeScheduleCompactionPicker we call maybeScheduleDownloadCompaction last, it will only run if there are no other compactions we could do. I think if there is a running workload, it's conceivable that you'd always find some other compaction to perform.

I hesitate moving it up because we might have the opposite problem. Perhaps, given that these "compactions" don't use much CPU, we could have a separate concurrency count for them. So we'd allow maxConcurrentCompactions regular compactions plus maxConcurrentCompactions.

Storage automation moved this from Incoming to Done Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Storage
  
Done
Development

Successfully merging a pull request may close this issue.

2 participants