fix: require connected and approved nodes for scheduling #3957

frrist · 2024-04-25T17:43:54Z

This change modifies the Requester nodes scheduling constraints s.t. jobs will only be scheduled on nodes that are online and approved. Disconnected nodes and nodes that are rejected or pending will not be eligible to run jobs.
Additionally, this change cleans up some code by making constraints an parameter to the node selector - which simplify various parts of dependency construction.
Lastly, this change removes some *Param types to avoid the possibility of NPD.
fixes The requester node attempts to schedule work on disconnected nodes resulting in the job never running #3784

Summary by CodeRabbit

Refactor
- Improved the initialization and configuration process for job schedulers and node selectors for better performance and maintainability.
- Streamlined node selection logic by removing redundant parameters and simplifying method signatures.
- Renamed several methods and fields to enhance clarity and consistency across the application.
New Features
- Enhanced node selection capabilities with additional constraints to better tailor node selection to specific job requirements.

coderabbitai · 2024-04-25T17:44:01Z

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This set of changes focuses on restructuring the node selection and job scheduling logic across various components. Key modifications include simplifying interfaces by removing constraints parameters, updating method signatures, and enhancing code structure by passing parameters directly to constructors, thereby streamlining the setup and execution processes.

Changes

File Path	Change Summary
`pkg/node/requester.go`	Reorganized initialization of `nodeSelector` and `batchServiceJobScheduler`; enhanced selector constraints.
`pkg/orchestrator/interfaces.go` `pkg/orchestrator/mocks.go` `pkg/orchestrator/selection/selector/node_selector.go`	Modified `NodeSelector` interface and related mocks; updated method signatures and internal logic.
`pkg/orchestrator/scheduler/...`	Refactored scheduler constructors and methods across multiple files; updated field and method usage.

Possibly related issues

bacalhau-project/expanso-planning#410: The changes in this PR could streamline the scheduling process, potentially addressing the inefficiency issues highlighted in the proposal for a more efficient scheduling approach.
bacalhau-project/expanso-planning#405: Improvements in node selection and job execution handling could enhance the reliability and accuracy of job status updates, which is a concern in this issue.

🐇🌟
A hop and a skip, the code takes a leap,
With selectors so sleek, the nodes they will peep.
Scheduler's new dance, with parameters neat,
A whirl and a prance, efficiency we greet!
🥕🍃

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

frrist · 2024-04-25T19:37:33Z

@coderabbitai review

frrist · 2024-04-25T20:28:24Z

pkg/test/requester/node_selection_test.go

-	requesterConfig, err := node.NewRequesterConfigWith(
-		node.RequesterConfigParams{
-			NodeRankRandomnessRange: 0,
-			OverAskForBidsFactor:    1,
-		},
-	)


No ides how this ever worked before NodeRankRandomnessRange cannot be 0 or the code that uses it panics, somehow it never panicked, meaning the setting of these values were likely ineffectual

frrist · 2024-04-25T20:28:29Z

pkg/test/requester/retries_test.go

-	requesterConfig, err := node.NewRequesterConfigWith(
-		node.RequesterConfigParams{
-			NodeRankRandomnessRange: 0,
-			OverAskForBidsFactor:    1,
-		},
-	)


No ides how this ever worked before NodeRankRandomnessRange cannot be 0 or the code that uses it panics, somehow it never panicked, meaning the setting of these values were likely ineffectual

wdbaruni

To make reviewing the PRs easier, avoid coupling refactoring with actual changes the PR is mainly about, such as fixing node selection in this case

wdbaruni · 2024-05-02T11:28:12Z

pkg/node/requester.go

-		NodeSelector:  nodeSelector,
-		RetryStrategy: retryStrategy,
-	})
+	batchServiceJobScheduler := scheduler.NewBatchServiceJobScheduler(jobStore, planners, nodeSelector, retryStrategy)


Keeping params in the constructor helps maintain code stability and makes the overall code more maintainable. This ensures that constructor signatures and tests do not require changes when new dependencies are introduced. This also makes the code more readable even if we fail to provide clear field names

I generally get where you are coming from - maintainable code is great! But please hear me out here:

Keeping params in the constructor helps maintain code stability and makes the overall code more maintainable.

I'd argue it does the opposite: the params pattern doesn't provide compile-time safety. Additionally, whenever a new dependency is introduced, we need to modify the the constructor, the call sites, and the params type.

Comparing this with declaring dependencies in the function signature: It provides compile-time safety, and reduces the number of modifications we need to make when introducing new dependencies. We only need to update the constructor and the call sites. (And the compiler will complain very loudly in places we don't, instead of panicking later when the application runs - this is what soured my additional refactor here.)

This ensures that constructor signatures and tests do not require changes when new dependencies are introduced

I don't follow. Constructors and tests still need to change when new dependencies are introduce - all fields are required and therefore must be provided.

This also makes the code more readable even if we fail to provide clear field names

I'll acknowledge we have a different in opinions here. Personally I want to know the type of something rather than its name in a structure.

Given the points regarding of compile time safety and reduced maintenance overhead, would you be alright moving forward with this code as is?

Having methods or constructors with long parameters is by standard a bad code design in the industry. First time I hear a discussion favouring long parameters :)

I hear your point about wanting to fail early at compile-time. It is all about tradeoffs. An example, you have picked fx as our dependency injection framework, and that will move a lot of compile-time failures to runtime. You have prioritized code readability, maintainability and testability in favour of compile-time checks, and that is okay as the gains are worth the cost.

Cons of long params include:

Increased error potential as more params increase the likelihood of passing arguments in the wrong order

Poor readability as methods with many params are harder to read and understand at a glance, without having to open the method implementation to understand what param does. The IDEs might help, but not always

Optional fields still need to be passed with their zero values

Harder to maintain as adding or modifying parameters requires changes in all your method calls. This is where you have a different opinion that you prefer the compiler to fail and force the developer to update all calls. Yes, initialization oversight is a true problem with using a single param object. Though in practice, it still makes the code more maintainable for the follow reasons:

Compile time are not sufficient to validate the correct initialization of types. They are great to fail fast, but we shouldn't assume if the compiler is not complaining that we don't need to do runtime validation or that we won't get NPE. This is an example where param object can help with runtime validation. I am still saying that failing at compile-time is great, but it is not the only criteria we should evaluate tradeoffs at. Runtime validation during node startup can be as good.

Usually constructors/methods are called once or very few times in production code, but multiple times in tests to test different branches and combinations. In practice, and with good tests, I see the parameter object is initialized once with default testing values, and then different tests override certain values that they want to test. This makes tests more maintainable and readable compared to having to modify tens of method signatures and that is before even adding any new tests

A lot of fields and configurations are exposed in the constructor mainly for testing purposes with fallback default values if not configured. e.g. allow to inject a clock to have more predictable tests. Having a param object enables this without complicating the type initialization

A bonus is fx seem to work well with param objects out of the box https://uber-go.github.io/fx/parameter-objects.html#using-parameter-objects

Overall I feel silly blocking the PR because of this code style, specially that the number of parameters (4) is still manageable. My concerns are I feel you are planning to change this code design in more places, I expect the scheduler to have more params soon as I work on the queueing, and our code is still in its early stages and more parameters might be added in places other than the scheduler.

There is certainly a limit to consider when declaring function parameters, and when encapsulating them starts to make sense. I completely agree that this isn't a black and white decision - nuance is required depending on the specific context.

A lot of fields and configurations are exposed in the constructor mainly for testing purposes with fallback default values if not configured

In this case I'd prefer to use functional options (we do a good job with this in some places already) as it makes it clearer when parameters are optional vs required. It's also easy to extend later with more options.

My concerns are I feel you are planning to change this code design in more places

I'll probably make a change like this again in the future if it makes sense while refactoring code. But I don't intend to remove param arguments everywhere! In this specific case 4 felt manageable enough to make the change.

Since it seems we are in agreement about 4 arguments being manageable here, can this get a ✅? 🙏

- fixes #3784

- cleans up some of the setup

- previously the test set NodeRankRandomnessRange to 0 which ought to have paniced. But somehow it didn't? Addtionally the test produced a requester config that had an 'unknown' membership state, now the state is the default value: `Approved`. This change addess both the potential for a panic, and sets the correct membership status. - More generlly, using mergo.Merge to reconcile config files is a shitty fucking pattern and needs to be purged from the codebase. It is far simpler to start with a default config and modify the fields that need to change for the test, instead of this overly complex circus fluster cluck that I just wasted 20mins of my life debugging. /rant

- Ensures a request config is produced for testing that will not cause panics and has a default node membership state of 'Approved'. - ffs same bug as previous commit, wonder how many more instances of this there are. *sobs into keyboard*

frrist self-assigned this Apr 25, 2024

frrist force-pushed the frrist/requester/fix-scheduling branch 2 times, most recently from fc65c1b to 3de4a9f Compare April 25, 2024 19:34

frrist requested a review from wdbaruni April 25, 2024 19:37

frrist commented Apr 25, 2024

View reviewed changes

wdbaruni requested changes May 2, 2024

View reviewed changes

wdbaruni approved these changes May 6, 2024

View reviewed changes

frrist force-pushed the frrist/requester/fix-scheduling branch from dab5242 to aace34c Compare May 6, 2024 15:18

frrist enabled auto-merge (squash) May 6, 2024 15:19

frrist added 4 commits May 8, 2024 10:07

fix: require connected and approved nodes for scheduling

68866fc

- fixes #3784

refactor: selection constraints are an attribute of the NodeSelector

4421492

- cleans up some of the setup

fix: RetriesSuite uses a valid config

87f1126

- Ensures a request config is produced for testing that will not cause panics and has a default node membership state of 'Approved'. - ffs same bug as previous commit, wonder how many more instances of this there are. *sobs into keyboard*

frrist force-pushed the frrist/requester/fix-scheduling branch from aace34c to 87f1126 Compare May 8, 2024 17:07

frrist merged commit 693c6d3 into main May 8, 2024
12 checks passed

frrist deleted the frrist/requester/fix-scheduling branch May 8, 2024 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: require connected and approved nodes for scheduling #3957

fix: require connected and approved nodes for scheduling #3957

frrist commented Apr 25, 2024 •

edited

Loading

coderabbitai bot commented Apr 25, 2024 •

edited

Loading

Auto Review Skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

frrist commented Apr 25, 2024

frrist Apr 25, 2024

frrist Apr 25, 2024

wdbaruni left a comment

wdbaruni May 2, 2024

frrist May 3, 2024

wdbaruni May 4, 2024

frrist May 4, 2024

frrist May 4, 2024

fix: require connected and approved nodes for scheduling #3957

fix: require connected and approved nodes for scheduling #3957

Conversation

frrist commented Apr 25, 2024 • edited Loading

Summary by CodeRabbit

coderabbitai bot commented Apr 25, 2024 • edited Loading

Auto Review Skipped

Walkthrough

Changes

Possibly related issues

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

frrist commented Apr 25, 2024

frrist Apr 25, 2024

Choose a reason for hiding this comment

frrist Apr 25, 2024

Choose a reason for hiding this comment

wdbaruni left a comment

Choose a reason for hiding this comment

wdbaruni May 2, 2024

Choose a reason for hiding this comment

frrist May 3, 2024

Choose a reason for hiding this comment

wdbaruni May 4, 2024

Choose a reason for hiding this comment

frrist May 4, 2024

Choose a reason for hiding this comment

frrist May 4, 2024

Choose a reason for hiding this comment

frrist commented Apr 25, 2024 •

edited

Loading

coderabbitai bot commented Apr 25, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)