-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LIU-117 Add PlasmaFlight Drops #58
Conversation
daliuge-engine/dlg/io.py
Outdated
self._done = True | ||
return data | ||
else: | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seemed to have been a bug before as I found it would read the store contents over and over in a loop in droputils.copyDropContents()
logger = logging.getLogger(__name__) | ||
|
||
|
||
class PlasmaFlightClient(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is a copy from https://github.com/ICRAR/plasmaflight/blob/main/icrar/plasmaflight/client/plasmaflight_client.py
An alternative option of getting it here is to add icrar.plasmaflight as a package dependency for daliuge
@@ -315,6 +320,20 @@ def run(self): | |||
binds = list(set(binds)) # make this a unique list else docker complains | |||
logger.debug("Volume bindings: %r", binds) | |||
|
|||
portMappings = {} # = {'5005/tcp': 5005, '5006/tcp': 5006} | |||
for mapping in self._portMappings.split(','): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its worth noting this is exposing containers to public networks and a secure connection should be used. There are supposedly other possibilities for network binding such as specifying a private network ip (e.g. docker run -p localhost:5005:5005
or docker run --net=<custom bridge network>
)
if not self._writer: | ||
# use client.create and FixedSizeBufferWriter | ||
self._writer = pyarrow.BufferOutputStream() | ||
self._writer.write(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the plasma writing here as the DataIO interface expects to write the number of bytes in count
and calls write
multiple times. Tested with a large file and this now works. Ideally if the write size is known in advanced a FixedSizeBufferWriter could be used here to write directly to the plasma mapped memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed sized buffer writer is now used in PlasmaFlightIO
@@ -119,7 +119,7 @@ def _test_runGraphInTwoNMs( | |||
drops.update(dm2._sessions[sessionId].drops) | |||
|
|||
leaf_drop = drops[leaf_oid] | |||
with droputils.DROPWaiterCtx(self, leaf_drop, 1): | |||
with droputils.DROPWaiterCtx(self, leaf_drop, 5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this sometimes fails as the test uses memory drops and not plasma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit beyond me to understand completely. I just have a few comments:
- Is exposing the container to a public networks a serious security issue? Is it just temporary to get the system working? As long as you are happy that the issue isn't serious, or that there is a plan to mitigate the issue, it's fine with me.
- The change to test_dm.py seems weird, maybe create a ticket to look into that at some stage?
Do you want me to add plasma_path and object_id parameters to Plasma drops by default in EAGLE, and add those two params plus flight_path for PlasmaFlight Drops? I'm happy to do that.
Thanks for reviewing! The main issue with exposing the container at the moment is that the FlightServer does not use any form of network security so if the IP is discoverable on the internet anything with the ip and port combination can access and modify the data store. If the deployment tends to not change IPs very often a secure connection is likely a good idea. I have added some unit tests for creating a TLS flight server my plan for the security ticket is allow the server and client to be configured with a certificate and private key (possibly stored on the shared filesystem). I've looked over the changes a bit for anything possibly relating to intermittent the test failure but almost everything exclusively affects plasma. I'll mention in a ticket which one is failing the commit where I first noticed it. I was under the impression that plasma_path and object_id were already in the default palette, but if not that would be great to add. |
portMappings is in the Docker drop, but the others are missing from Plasma/PlasmaFlight. I'll add them tomorrow. |
LIU-117 Add PlasmaFlight Drops
Summary:
One of the problems of using plasma drops is data not being available to ports located on a different compute node due to plasma only being shared system memory store. Here a new PlasmaFlight drop has been made that uses a fallback machanism to a flight server-client whenever a plasma object does not exist locally.
The flight server uses plasma object IDs as ticket identifiers, plus only reads and writes raw binary data to plasma and the server container is maintained separately at https://github.com/ICRAR/plasmaflight
PlasmaFlight Drops have the parameters:
Demos:
![image](https://user-images.githubusercontent.com/7478405/128828798-f8227fd8-9fed-4a89-b995-d286998a9466.png)
To demonstrate the functionality the first graph shows the compatibility of the new drops with simple plasma drops and how the flight container can connect to a plasma_store container:
In the second data is not copied into
![image](https://user-images.githubusercontent.com/7478405/128829904-ea9088e4-3342-48c4-a9cc-57845c3e1e28.png)
![image](https://user-images.githubusercontent.com/7478405/128831692-9331aa25-a374-4905-8c45-4c442eee62c7.png)
PlasmaFlight 5006
and forces the flight grpc+tcp call to network retrieve the object from the other plasma store. The PlasmaFlight Service is also capable of running the flight server and plasma store in the same container.Future work:
True DataIO interface implementation which due to locking shared memory between reading and writing can't allow for immediate reading before a plasma buffer is sealed, but providing the memory size to the drop in advanced however should reduce the amount of in memory buffers for copying and be interface compatible for data over 4kb.Flight server examples with TLS security are publicly available and should be incorporated in some form.PlasmaFlight Service as a docker node currently blocks graph completion as the engine waits for the docker app to terminate. A new service type node that does not wait for termination would be ideal for situations such as this.