Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: heartbeat task&peer lookup in proc #4179

Merged
merged 11 commits into from
Jun 24, 2024

Conversation

discord9
Copy link
Contributor

@discord9 discord9 commented Jun 20, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

added heartbeat&peer lookup, allowing flow to register to metasrc& using flownode in procedure

  • add heart beat task for flownode
  • add lookup service for ddlcontext so procedure can look up Peer for a given node id
  • using round-robin allocator for flow create choosing flownode

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

  • New Features

    • Introduced heartbeat functionality for flownodes to periodically send status updates to the metasrv server.
    • Implemented leases with optional durations for datanodes and flownodes to improve node liveness determination.
  • Improvements

    • Enhanced error handling and peer lookup processes.
    • Added new peer lookup services and associated methods for more robust peer management.
  • Bug Fixes

    • Improved error handling for flownode not found scenarios during flow procedure drops.
  • Refactor

    • Renamed lookup_alive_datanode_peer to lookup_datanode_peer and added lookup_flownode_peer to streamline peer lookup functionalities.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jun 20, 2024
@discord9 discord9 changed the title feat: heart beat task feat: heart beat task&peer lookup Jun 21, 2024
@discord9 discord9 changed the title feat: heart beat task&peer lookup feat: heart beat task&peer lookup in proc Jun 21, 2024
@discord9 discord9 marked this pull request as ready for review June 21, 2024 04:56
src/meta-srv/src/lease.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl/drop_flow.rs Outdated Show resolved Hide resolved
src/meta-srv/src/lease.rs Outdated Show resolved Hide resolved
src/meta-srv/src/metasrv/builder.rs Outdated Show resolved Hide resolved
src/meta-srv/src/selector/round_robin.rs Outdated Show resolved Hide resolved
src/meta-srv/src/selector/round_robin.rs Outdated Show resolved Hide resolved
src/flow/src/heartbeat.rs Show resolved Hide resolved
@fengjiachun fengjiachun changed the title feat: heart beat task&peer lookup in proc feat: heartbeat task&peer lookup in proc Jun 21, 2024
src/common/meta/src/peer.rs Outdated Show resolved Hide resolved
src/flow/src/heartbeat.rs Outdated Show resolved Hide resolved
src/flow/src/heartbeat.rs Outdated Show resolved Hide resolved
src/meta-srv/src/lease.rs Show resolved Hide resolved
src/meta-srv/src/metasrv.rs Outdated Show resolved Hide resolved
Copy link
Member

@WenyXu WenyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

Attention: Patch coverage is 29.48718% with 165 lines in your changes missing coverage. Please review.

Project coverage is 84.75%. Comparing base (5566dd7) to head (2ba19a9).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4179      +/-   ##
==========================================
- Coverage   85.06%   84.75%   -0.31%     
==========================================
  Files        1028     1030       +2     
  Lines      180477   181069     +592     
==========================================
- Hits       153514   153462      -52     
- Misses      26963    27607     +644     

Copy link
Contributor

coderabbitai bot commented Jun 24, 2024

Walkthrough

The recent updates introduce significant enhancements across various modules, focusing on peer lookup mechanisms, error handling, and heartbeat functionalities. Notably, a StandalonePeerLookupService is now woven into many components to streamline peer management. Additionally, a new heartbeat functionality ensures robust communication between flownodes and the metasrv server. These changes collectively improve error handling, peer lookup consistency, and the overall robustness of the system's network communication.

Changes

Files Change Summaries
src/cmd/.../standalone.rs Added peer_lookup_service with StandalonePeerLookupService in the StartCommand implementation.
src/common/meta/.../drop_flow.rs Introduced OptionExt for improved error handling and modified Peer creation in DropFlowProcedure.
src/common/meta/.../ddl_manager.rs Added StandalonePeerLookupService to the Peer import and included peer_lookup_service field.
src/common/meta/.../peer.rs Imported Arc and Error, declared new types and traits like PeerLookupService and StandalonePeerLookupService.
src/common/meta/.../test_util.rs Initialized peer_lookup_service with StandalonePeerLookupService in new_ddl_context_with_kv_backend.
src/flow/.../heartbeat.rs Introduced HeartbeatTask struct for sending heartbeat messages to the metasrv server.
src/meta-srv/.../lease.rs Modified peer lookup functions, introduced lease durations, and added MetaPeerLookupService.
src/meta-srv/.../metasrv.rs Replaced lookup_alive_datanode_peer with lookup_datanode_peer and updated method parameters.
src/meta-srv/.../procedure/utils.rs Imported StandalonePeerLookupService and initialized peer_lookup_service.
tests-integration/.../standalone.rs Initialized peer_lookup_service with StandalonePeerLookupService in GreptimeDbStandaloneBuilder.

Poem

In the realm of code, where bytes do dance,
A new peer service takes its chance.
Heartbeats whisper, nodes align,
Error handling now so fine.
Lookups swift, in clusters roam,
Hail to changes, our new home. 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Outside diff range and nitpick comments (3)
src/common/meta/src/peer.rs (1)

72-77: Add documentation for the PeerLookupService trait.

The PeerLookupService trait is crucial for the lookup functionality, but it lacks documentation. Providing a brief description of the trait and its methods would enhance code maintainability and readability.

src/common/meta/src/ddl.rs (1)

122-123: Document the purpose of peer_lookup_service in DdlContext.

The addition of peer_lookup_service to DdlContext is crucial for enabling peer lookups within DDL operations. It would be beneficial to add a comment explaining how this service is used within the context to aid future developers.

src/meta-srv/src/lease.rs (1)

Line range hint 37-54: Refactor to reduce code duplication in peer lookup functions.

The functions lookup_datanode_peer and lookup_flownode_peer share a significant amount of logic. Consider abstracting the common logic into a private helper function to reduce code duplication and improve maintainability.

+ private async fn lookup_peer_generic(lease_key: LeaseKey, meta_peer_client: &MetaPeerClientRef, lease_secs: Option<u64>) -> Result<Option<Peer>> {
+     let lease_filter = lease_secs.map(build_lease_filter);
+     let lease_key_bytes: Vec<u8> = lease_key.clone().try_into()?;
+     let Some(kv) = meta_peer_client.get(&lease_key_bytes).await? else {
+         return Ok(None);
+     };
+     let lease_value: LeaseValue = kv.value.try_into()?;
+     let is_alive = lease_filter.map(|f| f(&lease_value)).unwrap_or(true);
+     if is_alive {
+         Ok(Some(Peer {
+             id: lease_key.node_id,
+             addr: lease_value.node_addr,
+         }))
+     } else {
+         Ok(None)
+     }
+ }
- // Remove duplicated logic from `lookup_datanode_peer` and `lookup_flownode_peer` and replace with calls to `lookup_peer_generic`

Also applies to: 80-105

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 5566dd7 and 8bd6d3c.

Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (19)
  • src/cmd/src/standalone.rs (2 hunks)
  • src/common/meta/src/ddl.rs (2 hunks)
  • src/common/meta/src/ddl/create_flow/metadata.rs (1 hunks)
  • src/common/meta/src/ddl/drop_flow.rs (2 hunks)
  • src/common/meta/src/ddl/flow_meta.rs (4 hunks)
  • src/common/meta/src/ddl_manager.rs (2 hunks)
  • src/common/meta/src/peer.rs (2 hunks)
  • src/common/meta/src/test_util.rs (2 hunks)
  • src/flow/Cargo.toml (1 hunks)
  • src/flow/src/adapter.rs (3 hunks)
  • src/flow/src/heartbeat.rs (1 hunks)
  • src/flow/src/lib.rs (1 hunks)
  • src/meta-srv/src/flow_meta_alloc.rs (1 hunks)
  • src/meta-srv/src/lease.rs (6 hunks)
  • src/meta-srv/src/lib.rs (1 hunks)
  • src/meta-srv/src/metasrv.rs (2 hunks)
  • src/meta-srv/src/metasrv/builder.rs (4 hunks)
  • src/meta-srv/src/procedure/utils.rs (2 hunks)
  • tests-integration/src/standalone.rs (2 hunks)
Files skipped from review due to trivial changes (2)
  • src/flow/Cargo.toml
  • src/meta-srv/src/lib.rs
Additional comments not posted (23)
src/flow/src/lib.rs (2)

29-29: Approved addition of the heartbeat module.

This change aligns with the PR's objectives to enhance heartbeat functionality for flownodes.


35-35: Approved re-export of adapter entities.

Re-exporting these entities simplifies access to common functionalities used across the project.

src/meta-srv/src/flow_meta_alloc.rs (1)

24-55: Approved addition of FlowPeerAllocator structure and methods.

The new FlowPeerAllocator structure aligns with the PR's goals to manage flow metadata more effectively, particularly in allocating peers for flow partitions.

src/common/meta/src/ddl/create_flow/metadata.rs (1)

26-30: Approved changes to allocate_flow_id method.

The inclusion of cluster_id in the flow creation process is a necessary update to support the new functionality of managing flows across different clusters.

src/common/meta/src/ddl/flow_meta.rs (2)

63-72: Approved updates to the create method in FlowMetadataAllocator.

The method's update to include cluster_id in peer allocation supports the enhanced distributed architecture of the system.


46-54: Approved addition of the with_peer_allocator method.

This method enhances the configurability of the FlowMetadataAllocator, allowing for dynamic peer allocator selection based on system needs.

src/common/meta/src/peer.rs (2)

79-79: Approved the addition of PeerLookupServiceRef.

The introduction of PeerLookupServiceRef as a type alias for an Arc of the PeerLookupService trait is a good practice for simplifying the codebase and improving readability.


81-101: Skip redundant comment about #[cfg(test)] gate.

Previous discussions clarified why a #[cfg(test)] gate isn't feasible here, referencing StackOverflow for more details.

src/common/meta/src/test_util.rs (1)

183-183: Ensure consistency in the instantiation of StandalonePeerLookupService.

The instantiation of StandalonePeerLookupService within the test utility ensures that peer lookup functionality can be tested consistently. This is a positive change, enhancing the testing framework's capabilities.

src/meta-srv/src/lease.rs (2)

154-191: Approve the implementation of MetaPeerLookupService.

The MetaPeerLookupService implementation provides a centralized way to perform peer lookups, encapsulating the logic in a clean and reusable manner. This is a well-implemented enhancement to the system's architecture.


Line range hint 37-54: Skip redundant comment about documenting public functions.

The functions have been adequately documented following previous suggestions.

Also applies to: 80-105

src/flow/src/heartbeat.rs (1)

38-45: Consider using Option<Duration> for intervals.

Using Option<Duration> for report_interval and retry_interval would allow for more flexible configuration, such as disabling automatic retry or reporting by setting them to None. This approach can lead to cleaner control flow handling in the logic.
[REFACTOR_SUGGESTion]

-    report_interval: Duration,
-    retry_interval: Duration,
+    report_interval: Option<Duration>,
+    retry_interval: Option<Duration>,
src/meta-srv/src/procedure/utils.rs (1)

228-228: Use of DummyPeerLookupService in test utility.

The addition of DummyPeerLookupService to new_ddl_context is appropriate for testing environments where actual peer lookup isn't necessary. Ensure that this is well documented to avoid confusion in non-test environments.

tests-integration/src/standalone.rs (1)

201-201: Integration of StandalonePeerLookupService in standalone builder.

The integration of StandalonePeerLookupService into the DdlManager setup is crucial for simulating peer interactions in a standalone environment. This is a positive change, enhancing test fidelity.

src/meta-srv/src/metasrv/builder.rs (3)

45-45: Import of FlowPeerAllocator is well-placed.

The import statement for FlowPeerAllocator is correctly positioned within the import section, maintaining a clean and organized code structure.


245-264: Implementation of FlowPeerAllocator in the builder pattern.

The integration of FlowPeerAllocator using the builder pattern is well-implemented. The use of Arc and cloning of the selector context is appropriate for thread safety and shared ownership in a multi-threaded environment.


293-295: Correct instantiation of MetaPeerLookupService.

The instantiation of MetaPeerLookupService within the DdlManager setup is correct. It properly utilizes the cloned meta_peer_client, ensuring that the peer lookup service is appropriately configured.

src/meta-srv/src/metasrv.rs (1)

487-491: Proper modification to lookup_datanode_peer function call.

The update to include the DATANODE_LEASE_SECS as a parameter in the lookup_datanode_peer function call is appropriate. It aligns with the changes in the function definition and ensures that lease time is configurable and correctly passed.

src/flow/src/adapter.rs (2)

82-83: Approved the addition of node_id and grpc options to FlownodeOptions.

These additions are consistent with the PR's objectives to enhance node identification and communication capabilities. Please ensure that these new options are being utilized effectively throughout the system.


501-501: Reminder to implement heartbeat tasks.

The TODO comment suggests adding heartbeat tasks in the future. Please ensure this task is tracked appropriately in the project management tools to ensure it's not overlooked.

src/cmd/src/standalone.rs (1)

561-561: Approved the integration of StandalonePeerLookupService in create_ddl_task_executer.

The addition of StandalonePeerLookupService aligns with the PR's objectives to enhance peer lookup services. Please ensure that this service is properly integrated and tested throughout the system.

src/common/meta/src/ddl_manager.rs (2)

813-813: Approved the addition of imports for DummyPeerLookupService and Peer.

The newly added imports are crucial for supporting the peer lookup functionality introduced in this PR. It's good to see modular use of dummy services for testing or fallback scenarios.


858-858: Ensure appropriate usage of DummyPeerLookupService.

Initializing peer_lookup_service with DummyPeerLookupService in production code might not be ideal unless it's strictly for fallback or testing. If this is for production use, consider using a more robust service or ensure that there's a clear switch mechanism to toggle between dummy and actual services based on the environment.

Consider implementing a factory pattern or dependency injection for better manageability and testing of different peer lookup services.

src/flow/src/heartbeat.rs Outdated Show resolved Hide resolved
src/flow/src/heartbeat.rs Show resolved Hide resolved
src/flow/src/heartbeat.rs Outdated Show resolved Hide resolved
src/common/meta/src/ddl/drop_flow.rs Outdated Show resolved Hide resolved
src/common/meta/src/peer.rs Outdated Show resolved Hide resolved
src/common/meta/src/peer.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range and nitpick comments (1)
src/meta-srv/src/lease.rs (1)

Line range hint 36-54: Refactor the lookup_datanode_peer function to improve readability and error handling.

The function's structure is generally good, but consider handling the potential error from lease_secs.checked_mul(1000).unwrap_or(u64::MAX) more gracefully to avoid panics in production.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8bd6d3c and fcb98c2.

Files selected for processing (10)
  • src/cmd/src/standalone.rs (2 hunks)
  • src/common/meta/src/ddl/drop_flow.rs (2 hunks)
  • src/common/meta/src/ddl_manager.rs (2 hunks)
  • src/common/meta/src/peer.rs (2 hunks)
  • src/common/meta/src/test_util.rs (2 hunks)
  • src/flow/src/heartbeat.rs (1 hunks)
  • src/meta-srv/src/lease.rs (6 hunks)
  • src/meta-srv/src/metasrv.rs (2 hunks)
  • src/meta-srv/src/procedure/utils.rs (2 hunks)
  • tests-integration/src/standalone.rs (2 hunks)
Files skipped from review due to trivial changes (1)
  • src/common/meta/src/test_util.rs
Files skipped from review as they are similar to previous changes (8)
  • src/cmd/src/standalone.rs
  • src/common/meta/src/ddl/drop_flow.rs
  • src/common/meta/src/ddl_manager.rs
  • src/common/meta/src/peer.rs
  • src/flow/src/heartbeat.rs
  • src/meta-srv/src/metasrv.rs
  • src/meta-srv/src/procedure/utils.rs
  • tests-integration/src/standalone.rs
Additional comments not posted (1)
src/meta-srv/src/lease.rs (1)

154-186: Review the implementation of MetaPeerLookupService.

The implementation of MetaPeerLookupService is straightforward and adheres to the PeerLookupService trait. However, consider adding error handling documentation and possibly unit tests to ensure the robustness of these critical network operations.

src/meta-srv/src/lease.rs Show resolved Hide resolved
src/meta-srv/src/lease.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between fcb98c2 and 2ba19a9.

Files selected for processing (1)
  • src/meta-srv/src/lease.rs (6 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/meta-srv/src/lease.rs

@fengjiachun fengjiachun added this pull request to the merge queue Jun 24, 2024
Merged via the queue into GreptimeTeam:main with commit a44fe62 Jun 24, 2024
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants