Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keyvisjob: key visualizer job should set MarkIdle to true #101539

Closed
adityamaru opened this issue Apr 14, 2023 · 3 comments · Fixed by #101750
Closed

keyvisjob: key visualizer job should set MarkIdle to true #101539

adityamaru opened this issue Apr 14, 2023 · 3 comments · Fixed by #101750
Assignees
Labels
A-kv-observability C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Apr 14, 2023

The keyviz job is a forever-running, background job that should not prevent serverless from winding down the SQL pod even when the job is running. This is indicated to the jobs registry by marking the job as idle, see

r.job.MarkIdle(true)
as an example.

Jira issue: CRDB-27000

@adityamaru adityamaru added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-observability labels Apr 14, 2023
@adityamaru adityamaru added this to Triage in Cluster Observability via automation Apr 14, 2023
@zachlite
Copy link
Contributor

zachlite commented Apr 17, 2023

@adityamaru IIUC, The key vis job doesn't run in SQL pods. The job record is created in a migration only for system tenants.

@adityamaru
Copy link
Contributor Author

Ahh I see that makes sense, my reasoning in the issue for marking this job as idle is then incorrect. However, I believe @tbg is using this metric to annotate prom graphs with jobs that are actively doing work, see https://cockroachlabs.slack.com/archives/C0514GA30G3/p1681477306778839?thread_ts=1681458391.221229&cid=C0514GA30G3 for context. In the future we may partition user-created jobs from background service jobs / forever running jobs and subsequently separate the metrics they increment, but for the time being can we toggle MarkIdle based on if the keyviz job is actually doing work or just idling?

@zachlite
Copy link
Contributor

Sure. I'll get back to you with a PR.

craig bot pushed a commit that referenced this issue Apr 18, 2023
98762: kvserver: remove below-raft PreIngestDelay during SST application r=erikgrinaker a=tbg

**This is for 23.2 only**

We shouldn't artificially delay SST ingestion below raft because this
exacerbates memory pressure (#81834).

Anecdotally I see in my "typical experiment" (restores under I/O
bandwidth constraints) that `PreIngestDelay` is mostly fairly small
compared to the slowness that comes from write bandwidth overload
itself, so at least in those experiments removing this has little
to no effect.

As we are also working on replication admission control[^1] and are
looking into improving the memory footprint under unchecked overload[^2]
now's a good time to rip this out as we'll be in a good place to address
any detrimental fallout from doing so.

[^1]: #95563
[^2]: #98576

Touches #81834.
Fixes #57247.

Epic: CRDB-25503
Release note: None

101381: ui: add trace rate option to stmt diagnostics r=maryliag a=maryliag

Fixes #92415

Add option to select the trace rate for statement
diagnostics collection directly on the Console.

https://www.loom.com/share/beaf1ce16f7d4efc845056e33cb3bee1

<img width="587" alt="Screenshot 2023-04-12 at 4 09 50 PM" src="https://user-images.githubusercontent.com/1017486/231573175-e05ea6dd-d03d-4044-ae53-bb4cba55bb77.png">

<img width="591" alt="Screenshot 2023-04-12 at 4 10 00 PM" src="https://user-images.githubusercontent.com/1017486/231573195-5189fee3-8af2-4ada-af1b-8cc6fde5ceb2.png">


Release note (ui change): Add option to select the trace rate for statement diagnostics collection.

101650: tenantcostclient: mark test as being `no-remote` r=rail a=rickystewart

This test is broken under remote execution.

Epic: CRDB-17165
Release note: None

101706: ui: update side nav and titles to match r=maryliag a=maryliag

Previously the values for Advanced Debug (side nav) and Advanced Debugging (page title) were not matching. This commit uses the name "`Advanced Debug` for both.

Similarly, we were using Network Latency on the side nav and Network Diagnostics on the page title. Since we might want to show more than just latency, this commit updates the title to the more generic `Network`, to match how we name other pages (e.g. SQL Activity, Database, etc).

This commit also removed an extra space on the filter on the Network page.

Before
<img width="968" alt="Screenshot 2023-04-17 at 6 02 56 PM" src="https://user-images.githubusercontent.com/1017486/232657702-c399f8b0-9755-4aff-8043-c1753c7ad913.png">


After
<img width="967" alt="Screenshot 2023-04-17 at 10 43 18 PM" src="https://user-images.githubusercontent.com/1017486/232657728-7b907868-abc4-4926-80c8-5cf616ee2d58.png">


Epic: none

Release note (ui change): Update Network Latency side nav name and Network Diagnostics page title to `Network`. Update the Advanced Debugging page title to `Advanced Debug`.

101750: keyvisjob: mark the keyvis job as idle when it is not doing useful work r=zachlite a=zachlite

Resolves #101539
Epic: none
Release note: None

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: maryliag <marylia@cockroachlabs.com>
Co-authored-by: Ricky Stewart <rickybstewart@gmail.com>
Co-authored-by: Zach Lite <zach@cockroachlabs.com>
@craig craig bot closed this as completed in d51b0ea Apr 18, 2023
Cluster Observability automation moved this from Triage to Done Apr 18, 2023
blathers-crl bot pushed a commit that referenced this issue Apr 18, 2023
blathers-crl bot pushed a commit that referenced this issue Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants