Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ref count segment error handling #490

Merged
merged 3 commits into from
Nov 15, 2023

Conversation

rbx
Copy link
Member

@rbx rbx commented Nov 15, 2023

  • shm: Throw RefCountBadAlloc when insufficient space in the ref count segment
  • shm: Bump the ref segment size 10x.
  • Examples: use multipart in the region example

@rbx rbx requested a review from dennisklein November 15, 2023 09:20
Copy link

coderabbitai bot commented Nov 15, 2023

Walkthrough

The changes across the files indicate a shift in the FairMQ framework towards enhanced message handling, specifically by adopting fair::mq::Parts for multi-part message processing. This allows for more efficient and flexible data processing. Additionally, the configuration and management of shared memory segments have been improved, with increased segment sizes and better error handling for allocation failures. The scripts and examples have been updated to reflect these changes, introducing new processors and sinks to demonstrate the expanded capabilities.

Changes

File(s) Change Summary
examples/region/fairmq-start-ex-region-advanced.sh.in Significant changes to process setup, introducing new processor and sink instances, and adjusting shared memory configurations.
examples/region/processor.cxx Replaced usage of fair::mq::MessagePtr with fair::mq::Parts for multi-part message handling, enabling parallel processing.
examples/region/sampler.cxx Introduced rc-segment-size configuration property, modified message sending logic, and updated Sampler struct.
examples/region/sink.cxx Replaced message handling with fair::mq::Parts for receiving data, simplifying the data receiving process.
fairmq/Message.h, fairmq/shmem/Message.h Added RefCountBadAlloc exception for handling memory allocation failures in reference counting.
fairmq/UnmanagedRegion.h Increased rcSegmentSize for larger reference count segments in unmanaged regions.

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 961eca5 and 097aabf.
Files selected for processing (7)
  • examples/region/fairmq-start-ex-region-advanced.sh.in (1 hunks)
  • examples/region/processor.cxx (1 hunks)
  • examples/region/sampler.cxx (5 hunks)
  • examples/region/sink.cxx (1 hunks)
  • fairmq/Message.h (1 hunks)
  • fairmq/UnmanagedRegion.h (1 hunks)
  • fairmq/shmem/Message.h (1 hunks)
Files skipped from review due to trivial changes (2)
  • fairmq/Message.h
  • fairmq/UnmanagedRegion.h
Additional comments: 7
fairmq/shmem/Message.h (1)
  • 279-289: The exception handling here is correctly catching boost::interprocess::bad_alloc and throwing a custom exception RefCountBadAlloc. This is a good practice as it provides a clearer indication of what the problem is when there is insufficient space in the reference count segment. However, ensure that the RefCountBadAlloc exception is properly defined elsewhere in the codebase and that it inherits from an appropriate base exception class, such as std::exception, to maintain compatibility with exception handling mechanisms.

Additionally, it's important to verify that all parts of the code that could potentially throw this new exception are properly handling it. This includes any upstream code that calls functions which may indirectly lead to this exception being thrown.

examples/region/processor.cxx (1)
  • 36-57: The changes reflect a shift from handling single messages to multipart messages, which is a significant change in the message processing logic. The use of fair::mq::Parts allows for more complex message patterns, such as sending and receiving multiple parts of a message at once. This can be beneficial for performance and flexibility but requires careful handling to ensure that all parts are processed correctly.

The loop correctly receives multipart messages, creates new multipart messages for each output channel, copies the content from the input parts to the new messages, and sends them out. However, there are a few considerations:

  • Ensure that the NewMessage() function is correctly allocating memory and that there is no risk of memory leaks or other memory management issues.
  • Verify that the Copy method is the intended way to duplicate the message content and that it handles all necessary internal state correctly (e.g., reference counting, buffer management).
  • Confirm that the Send method on the dataOut1 and dataOut2 channels is designed to handle fair::mq::Parts and that it will correctly process the multipart messages.
  • Check that the fMaxIterations and fNumIterations variables are correctly used to control the number of iterations this loop should perform. It seems that the loop will exit once fNumIterations reaches fMaxIterations, which is expected behavior.

Given that the changes are quite extensive, it is crucial to perform thorough testing to ensure that the new message handling logic works as expected and does not introduce any regressions or new bugs.

examples/region/fairmq-start-ex-region-advanced.sh.in (1)
  • 22-24: The SAMPLER command construction includes a channel configuration for data1. Ensure that the address tcp://127.0.0.1:7777 does not conflict with any other services and that it is consistent with the intended network design.
examples/region/sampler.cxx (4)
  • 26-29: The new configuration property rc-segment-size is correctly retrieved from the configuration and stored in fRCSegmentSize. The default value is set to 10,000,000, which is consistent with the summary provided.

  • 68-89: The Run method has been updated to send multipart messages using fair::mq::Parts. The loop correctly increments fNumUnackedMsgs by the number of parts sent, and the Send method is used to send the multipart message. This change aligns with the shift towards using multipart messages as mentioned in the summary.

  • 124-130: The member variables are correctly initialized with default values. The fRCSegmentSize is set to 10,000,000 by default, which is consistent with the summary. The fRegion is initialized to nullptr, and fNumUnackedMsgs is initialized to 0, which is appropriate for tracking the number of unacknowledged messages.

  • 140-144: The custom options for the command-line interface are correctly defined with default values. The rc-segment-size option is added with a default value of 10,000,000, which is consistent with the summary and the new feature implementation.

examples/region/sink.cxx Outdated Show resolved Hide resolved
examples/region/fairmq-start-ex-region-advanced.sh.in Outdated Show resolved Hide resolved
@rbx rbx force-pushed the ref-count-segment-error-handling branch from 097aabf to 1ce8d4a Compare November 15, 2023 09:29
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 961eca5 and 1ce8d4a.
Files selected for processing (7)
  • examples/region/fairmq-start-ex-region-advanced.sh.in (1 hunks)
  • examples/region/processor.cxx (1 hunks)
  • examples/region/sampler.cxx (5 hunks)
  • examples/region/sink.cxx (1 hunks)
  • fairmq/Message.h (1 hunks)
  • fairmq/UnmanagedRegion.h (1 hunks)
  • fairmq/shmem/Message.h (1 hunks)
Files skipped from review due to trivial changes (3)
  • examples/region/sink.cxx
  • fairmq/Message.h
  • fairmq/UnmanagedRegion.h
Additional comments: 9
fairmq/shmem/Message.h (1)
  • 279-289: The exception handling for boost::interprocess::bad_alloc is correctly implemented to throw a RefCountBadAlloc exception when there is insufficient space in the reference count segment. This is a good use of custom exceptions to provide more context-specific error information to the caller. However, ensure that the RefCountBadAlloc exception is properly documented and that its handling is implemented wherever the Copy method is called.
examples/region/sampler.cxx (4)
  • 26-29: The new configuration property rc-segment-size is correctly retrieved and stored in fRCSegmentSize. This aligns with the summary provided.

  • 68-89: The logic for creating 64 message parts and sending them is implemented as described in the summary. However, ensure that the rest of the system is designed to handle multipart messages efficiently.

  • 124-130: The new member variable fRCSegmentSize is initialized with the default value of 10000000, which is consistent with the previous default size of the reference count segment. This is a good practice for backward compatibility.

  • 140-144: The new command-line option rc-segment-size is added with a default value of 10000000, which matches the updated default size for the reference count segment. This allows for configuration flexibility.

examples/region/fairmq-start-ex-region-advanced.sh.in (4)
  • 22-24: The SAMPLER command construction looks correct. It sets up the channel configuration for data1 with the push method and binds it to tcp://127.0.0.1:7777. This should work as expected provided that the address is not already in use and that the @EX_BIN_DIR@ variable is correctly replaced during the script's configuration.

  • 27-35: The PROCESSOR1 command is correctly constructed to pull from data1 and push to data2 and data3. The --shm-segment-id is set to 1, which should be unique to this processor instance. Ensure that the @EX_BIN_DIR@ variable is correctly replaced and that the TCP ports 7778 and 7779 are free to use.

  • 38-46: The PROCESSOR2 command is similar to PROCESSOR1 but uses a different shared memory segment (--shm-segment-id 2) and binds to different ports (7788 and 7789). This is a good practice to avoid collisions in shared memory and network ports. Again, ensure that the @EX_BIN_DIR@ variable is correctly replaced and that the TCP ports 7788 and 7789 are free to use.

  • 49-86: The SINK1_1, SINK1_2, SINK2_1, and SINK2_2 commands are set up to connect to the corresponding processor's push channels. The --shm-segment-id values are consistent with their respective processors. Ensure that the @EX_BIN_DIR@ variable is correctly replaced and that the geometry settings for the xterm windows do not overlap in a way that would be inconvenient for the user.

@rbx rbx force-pushed the ref-count-segment-error-handling branch 2 times, most recently from 05a9457 to a25c557 Compare November 15, 2023 09:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 4

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 961eca5 and a25c557.
Files selected for processing (7)
  • examples/region/fairmq-start-ex-region-advanced.sh.in (1 hunks)
  • examples/region/processor.cxx (1 hunks)
  • examples/region/sampler.cxx (5 hunks)
  • examples/region/sink.cxx (1 hunks)
  • fairmq/Message.h (1 hunks)
  • fairmq/UnmanagedRegion.h (1 hunks)
  • fairmq/shmem/Message.h (1 hunks)
Files skipped from review due to trivial changes (2)
  • examples/region/sink.cxx
  • fairmq/Message.h
Additional comments: 11
examples/region/sampler.cxx (4)
  • 26-29: The new configuration property rc-segment-size is correctly retrieved and stored in fRCSegmentSize. This allows the user to specify the size of the reference count segment for an unmanaged region, which is a useful feature for customizing the memory allocation based on the application's needs.

  • 68-89: The logic for creating and sending multipart messages using fair::mq::Parts is correctly implemented. This change allows for more complex message structures and can improve the efficiency of message processing. The use of a rate limiter to control the sampling rate is also a good practice to prevent overwhelming the receiver or the network.

  • 124-130: The member variables are correctly initialized with default values. It's important to ensure that these defaults are sensible for the expected use cases of the Sampler device.

  • 140-145: The new configuration option rc-segment-size is added to the addCustomOptions function with a default value of 10000000. This aligns with the changes made in the InitTask function and provides a default that matches the previous hardcoded value.

examples/region/fairmq-start-ex-region-advanced.sh.in (7)
  • 22-24: The --chan-name and --channel-config options are correctly set for the SAMPLER process. The --shm-monitor flag is also added, which is useful for debugging shared memory issues.

  • 27-35: The PROCESSOR1 configuration is correctly updated to include the new channel configurations and the shared memory segment ID. The --shm-monitor flag is included here as well.

  • 38-46: The PROCESSOR2 configuration is introduced with a unique --shm-segment-id and channel configurations that differ from PROCESSOR1, which is necessary to avoid conflicts in shared memory and channel bindings.

  • 49-56: The SINK1_1 configuration is set up to connect to the PROCESSOR1's data2 channel. The shared memory segment ID matches that of PROCESSOR1, which is correct for shared memory communication.

  • 59-66: The SINK1_2 configuration is set up to connect to the PROCESSOR1's data3 channel. The shared memory segment ID is consistent with PROCESSOR1, ensuring proper communication.

  • 69-76: The SINK2_1 configuration is set up to connect to the PROCESSOR2's data2 channel. The shared memory segment ID is set to 2, matching the PROCESSOR2 configuration.

  • 79-86: The SINK2_2 configuration is set up to connect to the PROCESSOR2's data3 channel. The shared memory segment ID is set to 2, which is consistent with PROCESSOR2.

fairmq/UnmanagedRegion.h Show resolved Hide resolved
fairmq/shmem/Message.h Outdated Show resolved Hide resolved
@rbx rbx force-pushed the ref-count-segment-error-handling branch from a25c557 to 40d2523 Compare November 15, 2023 10:06
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 961eca5 and 40d2523.
Files selected for processing (7)
  • examples/region/fairmq-start-ex-region-advanced.sh.in (1 hunks)
  • examples/region/processor.cxx (1 hunks)
  • examples/region/sampler.cxx (5 hunks)
  • examples/region/sink.cxx (1 hunks)
  • fairmq/Message.h (1 hunks)
  • fairmq/UnmanagedRegion.h (1 hunks)
  • fairmq/shmem/Message.h (1 hunks)
Files skipped from review due to trivial changes (2)
  • fairmq/Message.h
  • fairmq/UnmanagedRegion.h
Additional comments: 9
examples/region/sink.cxx (1)
  • 36-43: The code correctly replaces the previous message handling with fair::mq::Parts to receive multipart messages. This change is consistent with the pull request's goal of improving message processing efficiency. However, ensure that the rest of the codebase, especially any parts that interact with Sink, are updated to handle the new message format.
fairmq/shmem/Message.h (1)
  • 279-290: The addition of the RefCountBadAlloc exception handling is a good practice to provide more context when a bad_alloc is encountered. However, it's important to ensure that all parts of the code that might throw or catch this new exception are updated accordingly. This includes checking that the exception is properly caught and handled wherever Message::Copy is called.
examples/region/fairmq-start-ex-region-advanced.sh.in (4)
  • 22-24: The --channel-config option for data1 is set to use TCP transport (tcp://127.0.0.1:7777). If the intention is to use shared memory (shmem), this should be reflected in the transport method. Verify that this is the intended configuration.

  • 31-35: The --shm-segment-id is set to 1 for PROCESSOR1. This is fine as long as there is no conflict with other processes using the same shared memory segment. Ensure that segment IDs are unique across different processes if they are not supposed to share the same segment.

  • 42-46: Similar to PROCESSOR1, PROCESSOR2 has its --shm-segment-id set to 2. Verify that this is the correct segment ID and that it does not conflict with other processes.

  • 84-86: Each sink is configured with a --shm-segment-id. Ensure that these IDs are correctly assigned and that there is no overlap or conflict with other processes unless sharing is intended.

examples/region/sampler.cxx (3)
  • 26-29: The addition of fRCSegmentSize to store the reference count segment size from the configuration is a good practice for making the segment size configurable. This allows for flexibility in the application's memory management.

  • 68-89: The loop for sending 64 message parts is a significant change. It's important to ensure that the receiving end of these messages is also updated to handle multiple parts if it was previously expecting single-part messages. Additionally, the use of a mutex (fMtx) to protect fNumUnackedMsgs is good for thread safety.

  • 140-144: The addition of new command-line options is well done, with default values provided and clear naming. This will make it easier for users to configure the application without modifying the code.

examples/region/sampler.cxx Show resolved Hide resolved
examples/region/sampler.cxx Show resolved Hide resolved
examples/region/processor.cxx Show resolved Hide resolved
@rbx rbx merged commit 34e8a24 into FairRootGroup:dev Nov 15, 2023
12 of 13 checks passed
@rbx rbx deleted the ref-count-segment-error-handling branch February 16, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants