Python server sync - Fixes #1480#2456
Conversation
|
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
jmao-denver
left a comment
There was a problem hiding this comment.
Very solid, needs only minor changes
devinrsmith
left a comment
There was a problem hiding this comment.
This is a good first pass, welcome to Deephaven!
| self._grpc_app_stub = application_pb2_grpc.ApplicationServiceStub(session.grpc_channel) | ||
|
|
||
| def list_fields(self): | ||
| try: | ||
| fields = self._grpc_app_stub.ListFields( | ||
| application_pb2.ListFieldsRequest(), | ||
| metadata=self.session.grpc_metadata | ||
| ) | ||
| return fields | ||
| except Exception as e: | ||
| raise DHError("failed to list fields.") from e No newline at end of file |
There was a problem hiding this comment.
Not to distract from this PR - but I think we might want to look into the python grpc asyncio APIs - it might be a more natural fit for streaming responses like this. @jmao-denver
| self._list_fields = self.app_service.list_fields() | ||
| self._parse_fields_change(next(self._list_fields)) |
There was a problem hiding this comment.
Couple things:
- I don't think we should be subscribing on connect.
- As is, it's not correct - we've starting a list_fields subscription, but we haven't started the threading handler for it. I don't know exactly how python gRPC will handle this (either a memory leak in list_fields, or eventually an error b/c it's not being consumed).
I think we need this logic self-contained within the subscribe_fields method.
| def _parse_script_response(self, response): | ||
| self._parse_fields_change(response.changes) |
There was a problem hiding this comment.
I think we may be doubling up on our response handling for code we execute via scripts, in the case where we are also subscribing to list fields. We may want to ignore list_field changes that apply to the global script scope (FieldInfo.application_id == "scope"), or choose to only handle list_field changes (ignore script responses) when they are active.
The problem w/ the former is if we ignore the "scope" changes for list_field, that will miss updates executed in the web UI instead of via the python client.
The problem w/ the latter is that updates to the fields will be async with the response to script execution response.
There was a problem hiding this comment.
I decided to go with the former option, because otherwise the user can't rely on any of the tables they create in a script being usable until after some arbitrary delay, which seems like a significant usability problem.
Plus, ignoring global-script-scope changes doesn't harm existing functionality.
|
There have been a number of changes made. The two new functions have been collapsed into one: |
|
By canceling the ListFields request before a script is run, then restarting it afterwards, it is possible to sidestep the sync/async changes issue entirely at the cost of needing to request the full list of tables again. Since this is a significant usability improvement for a mode that may or may not be very popular, this seems reasonable. |
| # We can ignore the script response because | ||
| # all the new tables are added by this call anyways | ||
| self._fields = {} | ||
| self.sync_fields(repeating=True) |
There was a problem hiding this comment.
So this isn't actually true if there are other list field subscribers at the time run_script is called. If so, run_script changes get batched with a certain frequency (250ms by default), so they aren't necessarily visible on the first return from sync_fields. I think just applying both will "work".
| def sync_fields(self, repeating: bool): | ||
| """ Check for fields that have been added/deleted by other sessions and add them to the local list | ||
|
|
||
| This will start a new background thread when `repeating=True`. |
There was a problem hiding this comment.
It should be noted - when repeating, only the first response will modify global "scope" - future responses against global "scope" will not be handled.
There was a problem hiding this comment.
Since the issue with script response/listfields response conflicts is fixed, can the repeating sync_fields be switched back to handling all changes?
There was a problem hiding this comment.
Yes, I think that's correct. But my other comment is still relevant - we want run_script changes to be visible upon return.
| def __init__(self, host: str = None, port: int = None, never_timeout: bool = True, session_type: str = 'python', sync_fields: int = NO_SYNC): | ||
| """ Initialize a Session object that connects to the Deephaven server |
There was a problem hiding this comment.
I'd prefer the default to be SYNC_ONCE. @jakemulf what do you think?
There was a problem hiding this comment.
I think NO_SYNC might actually be the most reasonable default. I generally find that no-ops are the best defaults
jmao-denver
left a comment
There was a problem hiding this comment.
Approve based on Jake's feedback
This PR adds three things.
session.subscribe_fields()now starts up a thread that checks for new changes to tables, so tables created/deleted in other sessions/the web UI become visible automatically.Fixes #1480