Skip to content

Conversation

@spilchen
Copy link
Contributor

@spilchen spilchen commented Nov 5, 2025

Adds a new taskset package that provides task coordination for a number of workers. TaskSet hands out integer identifiers (TaskIDs) that workers claim and process, with the caller responsible for mapping TaskIDs to actual work (file indices, key ranges, batches, etc.).

Features round-robin initial distribution across workers and locality when getting a task ID. Not thread-safe; callers provide external synchronization.

This will be used by the new distributed merge pipeline.

Closes #156578

Epic: CRDB-48845
Release note: none

Co-authored by: @jeffswenson

@spilchen spilchen self-assigned this Nov 5, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@spilchen spilchen force-pushed the gh-156578/251105/1047/taskset-helper/pr-ready branch from 5d0b13b to 5e93754 Compare November 5, 2025 15:41
}

// Verify that tasks are claimed sequentially.
require.Equal(t, []TaskID{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, found)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! It's cool to see this case got improved while keeping the same overall API.

@spilchen spilchen marked this pull request as ready for review November 5, 2025 18:09
@spilchen spilchen requested a review from mw5h November 5, 2025 18:09
Copy link
Contributor

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@mw5h reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @spilchen)

@spilchen
Copy link
Contributor Author

spilchen commented Nov 6, 2025

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Nov 6, 2025
154467: roachprod: azure GetVMSpecs() implementation r=DarrylWong a=williamchoe3

Informs #146202 

Implementation for previously unimplemented interface method `GetVMSpecs()` for Azure. This will allow for cluster information to be fetched in roachtest via `FetchVMSpecs`

Also changed `azureIDPattern` to match the optional type name pairs that may exist in a fully qualified azure resource id. See comments for more details.  Previously, if these optional type name pairs existed, they were just a part of the `resourceName` field. Currently, I didn't see any of these child resource ids being parsed, but if we were to ever parse those in the future the previous implementation wouldn't make sense.
* Note: Although the regex selects child resource type and names in the last matching group, they are not saved to AzureId struct because they are not being used

For Azure, `client.Get(...)` allows you to pass in an optional `expander` parameter which seems to provide additional information for `"instanceView"` and runs an optionally passed in user script (passed in on vm creation) when `"userData"` is passed in.
*  `"userData"` doesn't seem relevant for us "retrieves the UserData property as part of the VM model view that was provided by the user during the VM Create/Update operation" https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/get?view=rest-compute-2025-04-01&tabs=HTTP
  *  "The `instanceView` of an Azure Virtual Machine (VM) provides a snapshot of the runtime properties and status of the VM" https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/instance-view?view=rest-compute-2025-04-01&tabs=HTTP Seems nice, but there's no diff in the json returned by `"instanceView"` and the default when `expander` is not set (referred to `model view` in some places on thier docs) 
```go
InstanceViewTypesInstanceView InstanceViewTypes = "instanceView"  
InstanceViewTypesUserData InstanceViewTypes = "userData"
```


IBM PR: #155368


155368: roachprod: implement GetVMSpecs for IBM r=DarrylWong,golgeek a=williamchoe3

Informs #146202

Implementation for previously unimplemented interface method GetVMSpecs() for IBM. This will allow for cluster information to be fetched in roachtest via FetchVMSpecs
* API: https://cloud.ibm.com/apidocs/resource-controller/resource-controller?code=go#get-resource-instance 

Also added comments to help contextualize some parts that I thought would be helpful while doing some debug.

See comment for e.g. vm specs


156504: sql: enable test tenants r=yuzefovich a=yuzefovich

**sql: remove some redundant kv.TestingIsRangeLookupRequest calls in tests**

These were redundant because we have GetRequests in all call sites.

**sql: enable test tenants**

Some tests have been adjusted to work with test tenants, others have specific issues to investigate further.

Fixes: #143114
Epic: CRDB-48945

156547: sql: add ColumnGeneratedAsIdentity as a new schema change element r=shghasemi a=shghasemi

Previously, GeneratedAsIdentity was part of the Column element.
Creating a new schema change element will allow the declarative schema changer
to add/remove/update such columns without dropping the Column element.

Epic: [CRDB-31283](https://cockroachlabs.atlassian.net/browse/CRDB-31283)

Part of: #142918

Release note: None

156940: util/taskset: add generic work distribution mechanism r=spilchen a=spilchen

Adds a new taskset package that provides task coordination for a number of workers. TaskSet hands out integer identifiers (TaskIDs) that workers claim and process, with the caller responsible for mapping TaskIDs to actual work (file indices, key ranges, batches, etc.).

Features round-robin initial distribution across workers and locality when getting a task ID. Not thread-safe; callers provide external synchronization.

This will be used by the new distributed merge pipeline.

Closes #156578

Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

156987: sql: remove now-stale growstack for internal executor r=yuzefovich a=yuzefovich

As of 4a5c503, we now grow stack unconditionally for all async tasks. Some time ago we added an ability to growstack in the async task of the internal executor when used in the LDR, and now that has become redundant, so we remove this special case.

Epic: None
Release note: None

Co-authored-by: William Choe <williamchoe3@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Shadi Ghasemitaheri <shadi.ghasemitaheri@cockroachlabs.com>
Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Nov 6, 2025

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Nov 6, 2025
155368: roachprod: implement GetVMSpecs for IBM r=DarrylWong,golgeek a=williamchoe3

Informs #146202

Implementation for previously unimplemented interface method GetVMSpecs() for IBM. This will allow for cluster information to be fetched in roachtest via FetchVMSpecs
* API: https://cloud.ibm.com/apidocs/resource-controller/resource-controller?code=go#get-resource-instance 

Also added comments to help contextualize some parts that I thought would be helpful while doing some debug.

See comment for e.g. vm specs


156504: sql: enable test tenants r=yuzefovich a=yuzefovich

**sql: remove some redundant kv.TestingIsRangeLookupRequest calls in tests**

These were redundant because we have GetRequests in all call sites.

**sql: enable test tenants**

Some tests have been adjusted to work with test tenants, others have specific issues to investigate further.

Fixes: #143114
Epic: CRDB-48945

156940: util/taskset: add generic work distribution mechanism r=spilchen a=spilchen

Adds a new taskset package that provides task coordination for a number of workers. TaskSet hands out integer identifiers (TaskIDs) that workers claim and process, with the caller responsible for mapping TaskIDs to actual work (file indices, key ranges, batches, etc.).

Features round-robin initial distribution across workers and locality when getting a task ID. Not thread-safe; callers provide external synchronization.

This will be used by the new distributed merge pipeline.

Closes #156578

Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

Co-authored-by: William Choe <williamchoe3@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Nov 6, 2025

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Nov 6, 2025
156940: util/taskset: add generic work distribution mechanism r=spilchen a=spilchen

Adds a new taskset package that provides task coordination for a number of workers. TaskSet hands out integer identifiers (TaskIDs) that workers claim and process, with the caller responsible for mapping TaskIDs to actual work (file indices, key ranges, batches, etc.).

Features round-robin initial distribution across workers and locality when getting a task ID. Not thread-safe; callers provide external synchronization.

This will be used by the new distributed merge pipeline.

Closes #156578

Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Nov 6, 2025

Build failed:

@spilchen
Copy link
Contributor Author

bors retry

craig bot pushed a commit that referenced this pull request Nov 17, 2025
156940: util/taskset: add generic work distribution mechanism r=spilchen a=spilchen

Adds a new taskset package that provides task coordination for a number of workers. TaskSet hands out integer identifiers (TaskIDs) that workers claim and process, with the caller responsible for mapping TaskIDs to actual work (file indices, key ranges, batches, etc.).

Features round-robin initial distribution across workers and locality when getting a task ID. Not thread-safe; callers provide external synchronization.

This will be used by the new distributed merge pipeline.

Closes #156578

Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Nov 17, 2025

Build failed:

@spilchen
Copy link
Contributor Author

The lint failed because of "The self-hosted runner lost communication with the server"

bors retry

@spilchen
Copy link
Contributor Author

bors retry

Adds a new taskset package that provides task coordination for a number of
workers. TaskSet hands out integer identifiers (TaskIDs) that workers claim and
process, with the caller responsible for mapping TaskIDs to actual work (file
indices, key ranges, batches, etc.).

Features round-robin initial distribution across workers and locality when
getting a task ID. Not thread-safe; callers provide external synchronization.

This will be used by the new distributed merge pipeline.

Closes cockroachdb#156578

Epic: CRDB-48845
Release note: none

Co-authored by: @jeffswenson
@spilchen spilchen force-pushed the gh-156578/251105/1047/taskset-helper/pr-ready branch from 5e93754 to ac82256 Compare November 17, 2025 13:35
@spilchen
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 17, 2025

@craig craig bot merged commit 9510b99 into cockroachdb:master Nov 17, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

util/taskset: introduce lightweight distributed task scheduler

4 participants