refactor: bidder to simplfiy exposing errors #3680

frrist · 2024-03-22T04:07:31Z

Refactors bidder to better expose errors
closes #3868

Summary by CodeRabbit

New Features
- Enhanced bidding strategies to better handle available capacity and resource availability.
- Introduced a new field to improve compute node bidding logic.
Refactor
- Streamlined bidding process by restructuring the Bidder struct and related methods.
- Simplified compute node setup with new bidder initialization function.
- Adjusted test setups and synchronization for improved testing reliability.
Bug Fixes
- Added error handling for job resource configuration parsing.
- Updated endpoint methods to properly handle bid completion and compute failures.
Chores
- Cleaned up import statements and added a TODO for documentation review.

coderabbitai · 2024-03-22T04:07:35Z

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

This update streamlines the bidding and execution logic for compute resources, enhancing efficiency and clarity. It restructures the Bidder component, updates strategy implementations for resource availability, and improves error handling and status reporting mechanisms. Additionally, it addresses communication and recovery processes to ensure robust orchestration and execution of jobs.

Changes

Files	Change Summary
`.../capacity_available_strategy.go` `.../capacity_max_strategy.go`	Updated bidding logic based on resource availability, removing `ctx` parameter, and adding error handling.
`.../networking.go`	Added TODO for invalid documentation link.
`.../bidder.go` `.../bidder_test.go`	Restructured `Bidder` and tests with new strategies and `UsageCalculator`; updated method parameters.
`.../endpoint.go`	Enhanced error handling and restructured calls for bidding based on job resource config.
`.../types.go`	Added `Wait` field to `BidResult` for better job bidding logic.
`.../compute.go`	Simplified `NewComputeNode` by abstracting bidder creation.
`.../endpoint.go` (`pkg/requester`)	Improved job status handling with added `Comment` field.
`.../async_bid_test.go` `.../setup_test.go`	Adjusted test synchronization and channel initialization for better flow control.

Related issues

bacalhau-project/expanso-planning#410: The PR's enhancements in bidding logic and efficiency could address concerns over legacy bidding semantics and inefficiency.
bacalhau-project/expanso-planning#179: By aligning resource allocation policies, this PR potentially resolves inconsistencies between bidding and execution components.
Requester node specifies ExecutionID/ComputeReference when createing an ExecutionState on Compute node(s) #2296: The standardized creation and handling of ExecutionIDs in this PR may streamline the process as intended in this issue.
bacalhau-project/expanso-planning#405: The improvements in job status communication between nodes could mitigate miscommunication issues outlined here.

🐇 A Poem by CodeRabbit

In the land of code and byte,
Where logic dances, and servers ignite,
We refined our bid, with might and main,
Streamlining processes, in our domain.
Through fields of data, we hop with ease,
🌟 Celebrating changes, in the digital breeze. 🌟
~ CodeRabbit 🐾

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

frrist · 2024-03-22T20:29:52Z

@coderabbitai review

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 0948775 and 0a2b597.

Files selected for processing (11)

pkg/bidstrategy/resource/capacity_available_strategy.go (3 hunks)
pkg/bidstrategy/resource/capacity_max_strategy.go (2 hunks)
pkg/bidstrategy/semantic/networking.go (1 hunks)
pkg/compute/bidder.go (3 hunks)
pkg/compute/bidder_test.go (6 hunks)
pkg/compute/endpoint.go (2 hunks)
pkg/compute/types.go (2 hunks)
pkg/node/compute.go (3 hunks)
pkg/requester/endpoint.go (2 hunks)
pkg/test/compute/async_bid_test.go (3 hunks)
pkg/test/compute/setup_test.go (2 hunks)

Additional comments: 21

pkg/bidstrategy/semantic/networking.go (1)

20-20: The TODO comment regarding the invalid link in the docsLink constant is clear and actionable. It's good practice to track such TODOs in a project management tool or issue tracker to ensure they are addressed in a timely manner.

pkg/bidstrategy/resource/capacity_max_strategy.go (2)

5-5: The addition of the fmt import is appropriate for the functionality introduced in the ShouldBidBasedOnUsage method.

27-38: The refactored logic in the ShouldBidBasedOnUsage method, including the conditional check on resource availability and the construction of BidStrategyResponse, improves the clarity and functionality of the method.

pkg/bidstrategy/resource/capacity_available_strategy.go (3)

5-5: The addition of the fmt import is appropriate for the functionality introduced in the ShouldBidBasedOnUsage method.

22-22: The removal of the ctx parameter from NewAvailableCapacityStrategy simplifies the function signature, making it more straightforward to use.

35-46: The updated logic in the ShouldBidBasedOnUsage method, including the conditional check on available capacity and the construction of BidStrategyResponse, improves the clarity and functionality of the method.

pkg/test/compute/async_bid_test.go (1)

75-106: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [81-125]

The introduction and usage of sync.WaitGroup instances (doneWg and bidSentWg) in the asynchronous bid tests improve synchronization and coordination, ensuring proper sequencing and handling of bid responses. This is a good practice for managing the flow of asynchronous operations in tests.

pkg/test/compute/setup_test.go (1)

71-71: Initializing the bidChannel with a buffer size of 1 is a good practice for preventing potential deadlocks in tests involving asynchronous operations. This allows for non-blocking sends to the channel, improving the reliability of the test setup.

pkg/compute/types.go (1)

146-146: The addition of the Wait field to the BidResult struct enhances the flexibility and control in the bidding process, allowing for more nuanced decision-making by the compute node.

pkg/compute/endpoint.go (3)

7-9: Reordering import statements improves readability by grouping similar imports together.

60-68: Adding error handling for parsing job resource configuration in the AskForBid function is a necessary improvement for robustness, ensuring that errors are properly logged and communicated.

70-76: The restructuring of the AskForBid function to include parsing of job resource config and passing it to RunBidding enhances the functionality and clarity of the bidding process.

pkg/compute/bidder_test.go (2)

58-63: Changing SemanticStrategy and ResourceStrategy to slices and updating the UsageCalculator initialization improves flexibility, extensibility, and cohesion in the bidder tests setup.

157-162: Adjusting the RunBidding method calls to pass a BidderRequest struct simplifies the method signature and improves code readability in the bidder tests.

pkg/node/compute.go (2)

146-158: Refactoring the bidder creation into a separate NewBidder function simplifies the NewComputeNode function and improves code modularity. Ensure that all necessary parameters are correctly passed to NewBidder.

253-328: The implementation of NewBidder correctly encapsulates the bidder creation logic, enhancing code modularity and readability. Ensure that it integrates well with the rest of the system and that all parameters are utilized appropriately.

pkg/requester/endpoint.go (2)

248-248: Adding the Comment field to capture the reason for bid completion enhances clarity and traceability. Ensure that response.Reason provides meaningful information and integrates well with existing mechanisms.

343-343: Adding the Comment field to capture the reason for compute failures enhances clarity and traceability. Ensure that result.Err provides meaningful information and integrates well with existing mechanisms.

pkg/compute/bidder.go (3)

22-40: The restructuring of the Bidder struct to include SemanticStrategy, ResourceStrategy, and UsageCalculator supports a more modular and simplified bidding process. Ensure that these new fields are utilized appropriately and integrate well with the rest of the system.

252-300: The implementation of runSemanticBidding correctly encapsulates semantic bidding logic, enhancing code modularity. Ensure that it properly evaluates semantic strategies and integrates well with error handling and logging mechanisms.

302-360: The implementation of runResourceBidding correctly encapsulates resource bidding logic, enhancing code modularity. Ensure that it properly calculates resource usage, evaluates resource strategies, and integrates well with error handling and logging mechanisms.

pkg/bidstrategy/resource/capacity_available_strategy.go

rossjones

LGTM. A few comments/queries which are optional. Mostly picking on TODO's as I'm thinking they're often better as tickets instead (maybe reference the ticket from the TODO?).

"TODO(forrest) fix before merging" seems like it is worth double-checking.

pkg/bidstrategy/semantic/networking.go

pkg/bidstrategy/resource/capacity_available_strategy.go

pkg/compute/bidder.go

pkg/node/compute.go

rossjones · 2024-03-28T08:43:25Z

pkg/node/compute.go

+	config ComputeConfig,
+	publishers publisher.PublisherProvider,
+	storages storage.StorageProvider,
+	executors executor.ExecutorProvider,
+	runningCapacityTracker capacity.Tracker,
+	enqueuedCapacityTracker capacity.Tracker,
+	nodeID string,
+	executionStore store.ExecutionStore,
+	computeCallback compute.Callback,
+	bufferRunner *compute.ExecutorBuffer,
+	apiServer *publicapi.Server,
+	calculator capacity.UsageCalculator,


When should we use Params and when should we just inline all the parameters.

Yeah, this brings up a good point. We should probably give our usage of the *Params pattern more thought. It makes it easy to forget to pass values to a constructor when things change unless we inline them as you've said e.g

// can panic when a new field is added to params but not populated in the constructor svr := NewServer(ServerParams{ Host: "0.0.0.0", Port: 80, Config: []byte("whatever"), }) // will fail to compile if a new field is added to params svr := NewServer(ServerParams{ "0.0.0.0", 80, []byte("whatever"), })

I believe it is considered a best practice to provide field names when initializing a structure, but the catch is we lose compile time guarantees when they change. The NewBidder method here accepts all arguments required to construct bidding and helps reduce the cognitive load of the NewComputeNode constructor, okay if this is left as is?

I think the initial intent of the *Params pattern was to make it easier to adopt a dependency injection framework (do I remember this correctly @wdbaruni?). For example, if we use uber.fx *Params are easy to migrate to: https://uber-go.github.io/fx/parameter-objects.html#using-parameter-objects

I do think this is okay for now, really just highlighting that it might be a good candidate for a future PR

rossjones

LGTM

In a commit to follow it will be simpler to surface bidding failures to the compute node.

This commit simplifies the bidder's implementation with the following key changes: - We've removed the use of network transport types for internal communications, which previously added unnecessary complexity. - It's important to note that the responsibilities of the bidder significantly exceed its basic function of deciding whether to bid on jobs. The current scope includes a range of tasks such as updating execution states, initiating execution processes, making callbacks to the requester node, calculating job resource requirements, and managing job approval processes. The utility and continuation of some features, like job approval, are still under consideration due to lack of documentation. For illustration, refer to the setup in `bidder_test.go`. - Recognizing the overstretched functionality of the bidder, this refactor does not aim to comprehensively overhaul it but makes a significant stride towards future refactoring. By centralizing the aforementioned complexities into a single method, `handleBidResult`, we lay the groundwork for shaving the yak latter. This change does not shave the full yak, but has removed some hair.

- closes https://github.com/bacalhau-project/expanso-planning/issues/389

- the executor will already handle this internally.

- semantic should be first because: https://github.com/bacalhau-project/bacalhau/pull/3680/files#r1543761175

Refactors bidder to better expose errors closes https://github.com/bacalhau-project/expanso-planning/issues/389  ## Summary by CodeRabbit - **New Features** - Enhanced bidding strategies to better handle available capacity and resource availability. - Introduced a new field to improve compute node bidding logic. - **Refactor** - Streamlined bidding process by restructuring the `Bidder` struct and related methods. - Simplified compute node setup with new bidder initialization function. - Adjusted test setups and synchronization for improved testing reliability. - **Bug Fixes** - Added error handling for job resource configuration parsing. - Updated endpoint methods to properly handle bid completion and compute failures. - **Chores** - Cleaned up import statements and added a TODO for documentation review.  --------- Co-authored-by: frrist <forrest@expanso.io>

frrist force-pushed the frrist/job-history-comments branch from 85013cf to 1596ae7 Compare March 22, 2024 19:43

frrist marked this pull request as ready for review March 22, 2024 20:29

frrist mentioned this pull request Mar 22, 2024

feat: include evaluation history in JobHistory #3499

Closed

coderabbitai bot reviewed Mar 22, 2024

View reviewed changes

rossjones reviewed Mar 22, 2024

View reviewed changes

pkg/bidstrategy/resource/capacity_available_strategy.go Outdated Show resolved Hide resolved

frrist self-assigned this Mar 25, 2024

frrist requested review from wdbaruni and rossjones March 27, 2024 19:34

rossjones reviewed Mar 28, 2024

View reviewed changes

frrist force-pushed the frrist/job-history-comments branch from 5a398ad to 0e519e8 Compare March 29, 2024 20:10

rossjones approved these changes Apr 2, 2024

View reviewed changes

frrist added 8 commits April 2, 2024 08:53

refactor: bidder to simplfiy exposing errors

d8eb632

In a commit to follow it will be simpler to surface bidding failures to the compute node.

feat: provide bid rejected and compute failure comments of job history

b6e9f23

- closes https://github.com/bacalhau-project/expanso-planning/issues/389

spelling is hard

fbd753b

fix: remove broken docs link for networking

444c9aa

fix: don't call OnComputeFailure when Run fails in handleBidResult

32adda2

- the executor will already handle this internally.

fix: better comment around ordering of bid strats

d3d45dc

- semantic should be first because: https://github.com/bacalhau-project/bacalhau/pull/3680/files#r1543761175

fix: remove extraneous TODO

44384e4

frrist force-pushed the frrist/job-history-comments branch from 0e519e8 to 44384e4 Compare April 2, 2024 15:53

frrist enabled auto-merge (squash) April 2, 2024 15:53

frrist merged commit 3cdcb6e into main Apr 2, 2024
12 checks passed

frrist deleted the frrist/job-history-comments branch April 2, 2024 16:08

simonwo mentioned this pull request Apr 4, 2024

Fix capacity string output #3531

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: bidder to simplfiy exposing errors #3680

refactor: bidder to simplfiy exposing errors #3680

frrist commented Mar 22, 2024 •

edited

Loading

coderabbitai bot commented Mar 22, 2024 •

edited

Loading

Auto Review Skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

frrist commented Mar 22, 2024

coderabbitai bot left a comment

rossjones left a comment

rossjones Mar 28, 2024

frrist Mar 28, 2024 •

edited

Loading

rossjones Apr 2, 2024

rossjones left a comment

refactor: bidder to simplfiy exposing errors #3680

refactor: bidder to simplfiy exposing errors #3680

Conversation

frrist commented Mar 22, 2024 • edited Loading

Summary by CodeRabbit

coderabbitai bot commented Mar 22, 2024 • edited Loading

Auto Review Skipped

Walkthrough

Changes

Related issues

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

frrist commented Mar 22, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

rossjones left a comment

Choose a reason for hiding this comment

rossjones Mar 28, 2024

Choose a reason for hiding this comment

frrist Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

rossjones Apr 2, 2024

Choose a reason for hiding this comment

rossjones left a comment

Choose a reason for hiding this comment

frrist commented Mar 22, 2024 •

edited

Loading

coderabbitai bot commented Mar 22, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

frrist Mar 28, 2024 •

edited

Loading