Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: bidder to simplfiy exposing errors #3680

Merged
merged 8 commits into from
Apr 2, 2024

Conversation

frrist
Copy link
Member

@frrist frrist commented Mar 22, 2024

Refactors bidder to better expose errors
closes #3868

Summary by CodeRabbit

  • New Features

    • Enhanced bidding strategies to better handle available capacity and resource availability.
    • Introduced a new field to improve compute node bidding logic.
  • Refactor

    • Streamlined bidding process by restructuring the Bidder struct and related methods.
    • Simplified compute node setup with new bidder initialization function.
    • Adjusted test setups and synchronization for improved testing reliability.
  • Bug Fixes

    • Added error handling for job resource configuration parsing.
    • Updated endpoint methods to properly handle bid completion and compute failures.
  • Chores

    • Cleaned up import statements and added a TODO for documentation review.

Copy link

coderabbitai bot commented Mar 22, 2024

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

This update streamlines the bidding and execution logic for compute resources, enhancing efficiency and clarity. It restructures the Bidder component, updates strategy implementations for resource availability, and improves error handling and status reporting mechanisms. Additionally, it addresses communication and recovery processes to ensure robust orchestration and execution of jobs.

Changes

Files Change Summary
.../capacity_available_strategy.go
.../capacity_max_strategy.go
Updated bidding logic based on resource availability, removing ctx parameter, and adding error handling.
.../networking.go Added TODO for invalid documentation link.
.../bidder.go
.../bidder_test.go
Restructured Bidder and tests with new strategies and UsageCalculator; updated method parameters.
.../endpoint.go Enhanced error handling and restructured calls for bidding based on job resource config.
.../types.go Added Wait field to BidResult for better job bidding logic.
.../compute.go Simplified NewComputeNode by abstracting bidder creation.
.../endpoint.go (pkg/requester) Improved job status handling with added Comment field.
.../async_bid_test.go
.../setup_test.go
Adjusted test synchronization and channel initialization for better flow control.

Related issues

  • bacalhau-project/expanso-planning#410: The PR's enhancements in bidding logic and efficiency could address concerns over legacy bidding semantics and inefficiency.
  • bacalhau-project/expanso-planning#179: By aligning resource allocation policies, this PR potentially resolves inconsistencies between bidding and execution components.
  • Requester node specifies ExecutionID/ComputeReference when createing an ExecutionState on Compute node(s) #2296: The standardized creation and handling of ExecutionIDs in this PR may streamline the process as intended in this issue.
  • bacalhau-project/expanso-planning#405: The improvements in job status communication between nodes could mitigate miscommunication issues outlined here.

🐇 A Poem by CodeRabbit

In the land of code and byte,
Where logic dances, and servers ignite,
We refined our bid, with might and main,
Streamlining processes, in our domain.
Through fields of data, we hop with ease,
🌟 Celebrating changes, in the digital breeze. 🌟
~ CodeRabbit 🐾


Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@frrist frrist force-pushed the frrist/job-history-comments branch from 85013cf to 1596ae7 Compare March 22, 2024 19:43
@frrist frrist marked this pull request as ready for review March 22, 2024 20:29
@frrist
Copy link
Member Author

frrist commented Mar 22, 2024

@coderabbitai review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0948775 and 0a2b597.
Files selected for processing (11)
  • pkg/bidstrategy/resource/capacity_available_strategy.go (3 hunks)
  • pkg/bidstrategy/resource/capacity_max_strategy.go (2 hunks)
  • pkg/bidstrategy/semantic/networking.go (1 hunks)
  • pkg/compute/bidder.go (3 hunks)
  • pkg/compute/bidder_test.go (6 hunks)
  • pkg/compute/endpoint.go (2 hunks)
  • pkg/compute/types.go (2 hunks)
  • pkg/node/compute.go (3 hunks)
  • pkg/requester/endpoint.go (2 hunks)
  • pkg/test/compute/async_bid_test.go (3 hunks)
  • pkg/test/compute/setup_test.go (2 hunks)
Additional comments: 21
pkg/bidstrategy/semantic/networking.go (1)
  • 20-20: The TODO comment regarding the invalid link in the docsLink constant is clear and actionable. It's good practice to track such TODOs in a project management tool or issue tracker to ensure they are addressed in a timely manner.
pkg/bidstrategy/resource/capacity_max_strategy.go (2)
  • 5-5: The addition of the fmt import is appropriate for the functionality introduced in the ShouldBidBasedOnUsage method.
  • 27-38: The refactored logic in the ShouldBidBasedOnUsage method, including the conditional check on resource availability and the construction of BidStrategyResponse, improves the clarity and functionality of the method.
pkg/bidstrategy/resource/capacity_available_strategy.go (3)
  • 5-5: The addition of the fmt import is appropriate for the functionality introduced in the ShouldBidBasedOnUsage method.
  • 22-22: The removal of the ctx parameter from NewAvailableCapacityStrategy simplifies the function signature, making it more straightforward to use.
  • 35-46: The updated logic in the ShouldBidBasedOnUsage method, including the conditional check on available capacity and the construction of BidStrategyResponse, improves the clarity and functionality of the method.
pkg/test/compute/async_bid_test.go (1)
  • 75-106: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [81-125]

The introduction and usage of sync.WaitGroup instances (doneWg and bidSentWg) in the asynchronous bid tests improve synchronization and coordination, ensuring proper sequencing and handling of bid responses. This is a good practice for managing the flow of asynchronous operations in tests.

pkg/test/compute/setup_test.go (1)
  • 71-71: Initializing the bidChannel with a buffer size of 1 is a good practice for preventing potential deadlocks in tests involving asynchronous operations. This allows for non-blocking sends to the channel, improving the reliability of the test setup.
pkg/compute/types.go (1)
  • 146-146: The addition of the Wait field to the BidResult struct enhances the flexibility and control in the bidding process, allowing for more nuanced decision-making by the compute node.
pkg/compute/endpoint.go (3)
  • 7-9: Reordering import statements improves readability by grouping similar imports together.
  • 60-68: Adding error handling for parsing job resource configuration in the AskForBid function is a necessary improvement for robustness, ensuring that errors are properly logged and communicated.
  • 70-76: The restructuring of the AskForBid function to include parsing of job resource config and passing it to RunBidding enhances the functionality and clarity of the bidding process.
pkg/compute/bidder_test.go (2)
  • 58-63: Changing SemanticStrategy and ResourceStrategy to slices and updating the UsageCalculator initialization improves flexibility, extensibility, and cohesion in the bidder tests setup.
  • 157-162: Adjusting the RunBidding method calls to pass a BidderRequest struct simplifies the method signature and improves code readability in the bidder tests.
pkg/node/compute.go (2)
  • 146-158: Refactoring the bidder creation into a separate NewBidder function simplifies the NewComputeNode function and improves code modularity. Ensure that all necessary parameters are correctly passed to NewBidder.
  • 253-328: The implementation of NewBidder correctly encapsulates the bidder creation logic, enhancing code modularity and readability. Ensure that it integrates well with the rest of the system and that all parameters are utilized appropriately.
pkg/requester/endpoint.go (2)
  • 248-248: Adding the Comment field to capture the reason for bid completion enhances clarity and traceability. Ensure that response.Reason provides meaningful information and integrates well with existing mechanisms.
  • 343-343: Adding the Comment field to capture the reason for compute failures enhances clarity and traceability. Ensure that result.Err provides meaningful information and integrates well with existing mechanisms.
pkg/compute/bidder.go (3)
  • 22-40: The restructuring of the Bidder struct to include SemanticStrategy, ResourceStrategy, and UsageCalculator supports a more modular and simplified bidding process. Ensure that these new fields are utilized appropriately and integrate well with the rest of the system.
  • 252-300: The implementation of runSemanticBidding correctly encapsulates semantic bidding logic, enhancing code modularity. Ensure that it properly evaluates semantic strategies and integrates well with error handling and logging mechanisms.
  • 302-360: The implementation of runResourceBidding correctly encapsulates resource bidding logic, enhancing code modularity. Ensure that it properly calculates resource usage, evaluates resource strategies, and integrates well with error handling and logging mechanisms.

@frrist frrist self-assigned this Mar 25, 2024
Copy link
Contributor

@rossjones rossjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few comments/queries which are optional. Mostly picking on TODO's as I'm thinking they're often better as tickets instead (maybe reference the ticket from the TODO?).

"TODO(forrest) fix before merging" seems like it is worth double-checking.

pkg/bidstrategy/semantic/networking.go Outdated Show resolved Hide resolved
pkg/compute/bidder.go Show resolved Hide resolved
pkg/compute/bidder.go Outdated Show resolved Hide resolved
pkg/compute/bidder.go Outdated Show resolved Hide resolved
pkg/compute/bidder.go Show resolved Hide resolved
pkg/compute/bidder.go Show resolved Hide resolved
pkg/node/compute.go Outdated Show resolved Hide resolved
Comment on lines +254 to +261
config ComputeConfig,
publishers publisher.PublisherProvider,
storages storage.StorageProvider,
executors executor.ExecutorProvider,
runningCapacityTracker capacity.Tracker,
enqueuedCapacityTracker capacity.Tracker,
nodeID string,
executionStore store.ExecutionStore,
computeCallback compute.Callback,
bufferRunner *compute.ExecutorBuffer,
apiServer *publicapi.Server,
calculator capacity.UsageCalculator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When should we use Params and when should we just inline all the parameters.

Copy link
Member Author

@frrist frrist Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this brings up a good point. We should probably give our usage of the *Params pattern more thought. It makes it easy to forget to pass values to a constructor when things change unless we inline them as you've said e.g

// can panic when a new field is added to params but not populated in the constructor
svr := NewServer(ServerParams{
	Host: "0.0.0.0",
	Port: 80,
	Config: []byte("whatever"),
})
// will fail to compile if a new field is added to params
svr := NewServer(ServerParams{
	"0.0.0.0",
	80,
	[]byte("whatever"),
})

I believe it is considered a best practice to provide field names when initializing a structure, but the catch is we lose compile time guarantees when they change. The NewBidder method here accepts all arguments required to construct bidding and helps reduce the cognitive load of the NewComputeNode constructor, okay if this is left as is?

I think the initial intent of the *Params pattern was to make it easier to adopt a dependency injection framework (do I remember this correctly @wdbaruni?). For example, if we use uber.fx *Params are easy to migrate to: https://uber-go.github.io/fx/parameter-objects.html#using-parameter-objects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is okay for now, really just highlighting that it might be a good candidate for a future PR

@frrist frrist force-pushed the frrist/job-history-comments branch from 5a398ad to 0e519e8 Compare March 29, 2024 20:10
Copy link
Contributor

@rossjones rossjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

frrist added 8 commits April 2, 2024 08:53
In a commit to follow it will be simpler to surface bidding failures to
the compute node.
This commit simplifies the bidder's implementation with the following key changes:

- We've removed the use of network transport types for internal communications,
which previously added unnecessary complexity.

- It's important to note that the responsibilities of the bidder significantly exceed
its basic function of deciding whether to bid on jobs. The current scope includes a
range of tasks such as updating execution states, initiating execution processes,
making callbacks to the requester node, calculating job resource requirements,
and managing job approval processes. The utility and continuation of some features,
like job approval, are still under consideration due to lack of documentation.
For illustration, refer to the setup in `bidder_test.go`.

- Recognizing the overstretched functionality of the bidder, this refactor does not
aim to comprehensively overhaul it but makes a significant stride towards future
refactoring. By centralizing the aforementioned complexities into a single
method, `handleBidResult`, we lay the groundwork for shaving the yak latter.

This change does not shave the full yak, but has removed some hair.
- the executor will already handle this internally.
@frrist frrist force-pushed the frrist/job-history-comments branch from 0e519e8 to 44384e4 Compare April 2, 2024 15:53
@frrist frrist enabled auto-merge (squash) April 2, 2024 15:53
@frrist frrist merged commit 3cdcb6e into main Apr 2, 2024
12 checks passed
@frrist frrist deleted the frrist/job-history-comments branch April 2, 2024 16:08
aronchick pushed a commit that referenced this pull request Apr 27, 2024
Refactors bidder to better expose errors
closes https://github.com/bacalhau-project/expanso-planning/issues/389


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Enhanced bidding strategies to better handle available capacity and
resource availability.
	- Introduced a new field to improve compute node bidding logic.

- **Refactor**
- Streamlined bidding process by restructuring the `Bidder` struct and
related methods.
- Simplified compute node setup with new bidder initialization function.
- Adjusted test setups and synchronization for improved testing
reliability.

- **Bug Fixes**
	- Added error handling for job resource configuration parsing.
- Updated endpoint methods to properly handle bid completion and compute
failures.

- **Chores**
- Cleaned up import statements and added a TODO for documentation
review.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: frrist <forrest@expanso.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Job Progress Visibility
2 participants