Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIU-390: Move plasma drop implementation to this repository. #1

Merged
merged 4 commits into from
Jun 20, 2024

Conversation

myxie
Copy link
Collaborator

@myxie myxie commented Jun 19, 2024

Summary by Sourcery

This pull request moves the plasma drop implementation from the daliuge/daliuge-engine repository to this repository. It introduces new classes for handling shared-memory IO operations using plasma store and Arrow Flight protocol, and adds corresponding unit tests. The README.md has been updated to reflect these changes.

  • New Features:
    • Introduced PlasmaIO and PlasmaFlightIO classes for handling shared-memory IO operations using plasma store and Arrow Flight protocol.
    • Added PlasmaDROP and PlasmaFlightDROP classes to represent data stored in a Plasma Store and managed by Arrow Flight protocol.
    • Implemented PlasmaFlightClient class for accessing plasma-backed Arrow Flight data server.
  • Enhancements:
    • Moved the plasma drop implementation from daliuge/daliuge-engine to this repository, consolidating related functionality.
  • Documentation:
    • Updated README.md to reflect the new daliuge_plasma_components package and its installation instructions.
  • Tests:
    • Added unit tests for PlasmaDROP and PlasmaFlightDROP classes, including tests for writing data and dynamic writing with staging buffers.

Copy link

sourcery-ai bot commented Jun 19, 2024

Reviewer's Guide by Sourcery

This pull request moves the Plasma IO and PlasmaDrops implementation from the daliuge-engine repository to this repository. It introduces new classes for handling Plasma IO and PlasmaDrops, including PlasmaIO, PlasmaFlightIO, PlasmaDROP, and PlasmaFlightDROP. Additionally, it adds a new PlasmaFlightClient class for accessing plasma-backed arrow flight data servers. Comprehensive test cases for these new classes have been added, and the README and init.py files have been updated accordingly.

File-Level Changes

Files Changes
daliuge_plasma_components/data.py
daliuge_plasma_components/apps.py
Moved Plasma IO and PlasmaDrops implementation from daliuge-engine to this repository, including PlasmaIO, PlasmaFlightIO, PlasmaDROP, and PlasmaFlightDROP classes.
tests/test_components.py Added comprehensive test cases for PlasmaDROP and PlasmaFlightDROP, including setup and dynamic write tests.

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @myxie - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 4 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟡 Documentation: 2 issues found

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

Comment on lines +78 to +79
self._expected_size = (
expected_size if expected_size and expected_size > 0 else None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider simplifying the conditional assignment.

The conditional assignment for _expected_size could be simplified to self._expected_size = expected_size if expected_size > 0 else None since expected_size is already optional.

Suggested change
self._expected_size = (
expected_size if expected_size and expected_size > 0 else None
self._expected_size = (
expected_size if expected_size > 0 else None
)


@overrides
def _open(self, **kwargs):
return plasma.connect(self._plasma_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Handle potential connection errors.

Consider adding error handling for the connection to the plasma store. This will help in diagnosing issues if the connection fails.

Comment on lines 115 to 116
assert isinstance(
data, Union[memoryview, bytes, bytearray, pyarrow.Buffer].__args__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Use isinstance with a tuple.

Instead of using Union.__args__, you can pass a tuple of types directly to isinstance: assert isinstance(data, (memoryview, bytes, bytearray, pyarrow.Buffer)).

Suggested change
assert isinstance(
data, Union[memoryview, bytes, bytearray, pyarrow.Buffer].__args__
assert isinstance(
data, (memoryview, bytes, bytearray, pyarrow.Buffer)

scheme (str, optional): [description]. Defaults to "grpc+tcp".
connection_args (dict, optional): [description]. Defaults to {}.
"""
self.plasma_client: plasma.PlasmaClient = plasma.connect(socket)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Handle potential connection errors.

Consider adding error handling for the connection to the plasma store. This will help in diagnosing issues if the connection fails.

@@ -17,11 +17,7 @@ There are multiple options for the installation, depending on how you are intend
```bash
pip install daliuge_plasma_components
```
This will only work after releasing the project to PyPi.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (documentation): Clarify if the project has been released to PyPi.

The line about PyPi release has been removed. Has the project been released to PyPi, making this line obsolete?

Add your functionality in the run method and optional additional
methods.

class PlasmaFlightClient:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider breaking down the new functionality into smaller, more manageable pieces.

The new code introduces significant complexity compared to the original. Here are some points to consider:

  1. Increased Code Size: The new code is significantly longer, making it more complex to read and understand.
  2. Additional Dependencies: The new code introduces several new dependencies (pyarrow, pyarrow.flight, pyarrow.plasma, hashlib, BytesIO, and Optional), which increases the complexity of the codebase.
  3. New Functionality: The new code adds a lot of new functionality related to Plasma and Flight clients, which adds to the cognitive load required to understand the code.
  4. Error Handling: The new code includes more error handling, which adds to the complexity.
  5. Logging: The new code includes more logging statements, which, while useful for debugging, add to the overall complexity.
  6. Documentation: The new code has more extensive documentation, which adds to the length and complexity of the file.

To maintain simplicity while adding new functionality, consider breaking down the new functionality into smaller, more manageable pieces. For example, you could introduce a separate PlasmaClient class to handle Plasma-related operations, keeping it separate from the MyAppDROP class. This approach keeps the code more modular and easier to manage.

Comment on lines +55 to +57
if inputDrop.status == DROPStates.COMPLETED:
if inputDrop.checksum:
crcSum += inputDrop.checksum
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Merge nested if conditions (merge-nested-ifs)

Suggested change
if inputDrop.status == DROPStates.COMPLETED:
if inputDrop.checksum:
crcSum += inputDrop.checksum
if inputDrop.status == DROPStates.COMPLETED and inputDrop.checksum:
crcSum += inputDrop.checksum


ExplanationToo much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two if conditions can be combined using
and is an easy win.

Comment on lines +93 to +96
self._writer.close()
else:
self._desc.seal(self._object_id)
self._writer.close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Hoist repeated code outside conditional statement (hoist-statement-from-if)

Comment on lines 326 to 328
return "plasma://%s" % (
binascii.hexlify(self.object_id).decode("ascii")
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

Suggested change
return "plasma://%s" % (
binascii.hexlify(self.object_id).decode("ascii")
)
return f'plasma://{binascii.hexlify(self.object_id).decode("ascii")}'

Comment on lines 379 to 381
return "plasmaflight://%s" % (
binascii.hexlify(self.object_id).decode("ascii")
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

Suggested change
return "plasmaflight://%s" % (
binascii.hexlify(self.object_id).decode("ascii")
)
return f'plasmaflight://{binascii.hexlify(self.object_id).decode("ascii")}'

@myxie myxie merged commit 2898d96 into main Jun 20, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant