Skip to content

Python server sync - Fixes #1480#2456

Merged
jmao-denver merged 17 commits intodeephaven:mainfrom
SuperTails:python-server-sync
Jun 8, 2022
Merged

Python server sync - Fixes #1480#2456
jmao-denver merged 17 commits intodeephaven:mainfrom
SuperTails:python-server-sync

Conversation

@SuperTails
Copy link
Copy Markdown
Contributor

@SuperTails SuperTails commented Jun 1, 2022

This PR adds three things.

  1. Sessions now request the list of tables when connecting to the server, so tables created before the session starts (from previous sessions, from the web UI, etc.) can be opened.
  2. session.subscribe_fields() now starts up a thread that checks for new changes to tables, so tables created/deleted in other sessions/the web UI become visible automatically.
  3. Tests for the above two features.

Fixes #1480

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2022

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@SuperTails
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Copy Markdown
Contributor

@jmao-denver jmao-denver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very solid, needs only minor changes

Comment thread py/client/pydeephaven/session.py Outdated
Comment thread py/client/tests/test_sync.py Outdated
Copy link
Copy Markdown
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good first pass, welcome to Deephaven!

Comment on lines +10 to +20
self._grpc_app_stub = application_pb2_grpc.ApplicationServiceStub(session.grpc_channel)

def list_fields(self):
try:
fields = self._grpc_app_stub.ListFields(
application_pb2.ListFieldsRequest(),
metadata=self.session.grpc_metadata
)
return fields
except Exception as e:
raise DHError("failed to list fields.") from e No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to distract from this PR - but I think we might want to look into the python grpc asyncio APIs - it might be a more natural fit for streaming responses like this. @jmao-denver

Comment thread py/client/pydeephaven/session.py Outdated
Comment thread py/client/pydeephaven/session.py Outdated
Comment on lines +170 to +171
self._list_fields = self.app_service.list_fields()
self._parse_fields_change(next(self._list_fields))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple things:

  1. I don't think we should be subscribing on connect.
  2. As is, it's not correct - we've starting a list_fields subscription, but we haven't started the threading handler for it. I don't know exactly how python gRPC will handle this (either a memory leak in list_fields, or eventually an error b/c it's not being consumed).

I think we need this logic self-contained within the subscribe_fields method.

Comment on lines +235 to +236
def _parse_script_response(self, response):
self._parse_fields_change(response.changes)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be doubling up on our response handling for code we execute via scripts, in the case where we are also subscribing to list fields. We may want to ignore list_field changes that apply to the global script scope (FieldInfo.application_id == "scope"), or choose to only handle list_field changes (ignore script responses) when they are active.

The problem w/ the former is if we ignore the "scope" changes for list_field, that will miss updates executed in the web UI instead of via the python client.

The problem w/ the latter is that updates to the fields will be async with the response to script execution response.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go with the former option, because otherwise the user can't rely on any of the tables they create in a script being usable until after some arbitrary delay, which seems like a significant usability problem.
Plus, ignoring global-script-scope changes doesn't harm existing functionality.

Comment thread py/client/tests/test_sync.py Outdated
@SuperTails
Copy link
Copy Markdown
Contributor Author

There have been a number of changes made. The two new functions have been collapsed into one: sync_fields(repeating=bool). This is to make it more clear that calling it multiple times is redundant.
Session now takes an optional argument that effectively just defers to a sync_fields call; this is solely so that this new feature can be found easier when looking through docs.
In order to avoid problems with duplicate handling of table changes, however, the repeating/background syncing doesn't register changes made at the global scope, so it's not super useful. This does not affect the one-time sync: all tables/plots/etc are picked up and are usable as request in the issue.

Comment thread py/client/pydeephaven/session.py
Comment thread py/client/pydeephaven/session.py Outdated
Comment thread py/client/tests/test_multi_session.py Outdated
@SuperTails
Copy link
Copy Markdown
Contributor Author

By canceling the ListFields request before a script is run, then restarting it afterwards, it is possible to sidestep the sync/async changes issue entirely at the cost of needing to request the full list of tables again. Since this is a significant usability improvement for a mode that may or may not be very popular, this seems reasonable.

Comment thread py/client/pydeephaven/session.py Outdated
Comment on lines +298 to +301
# We can ignore the script response because
# all the new tables are added by this call anyways
self._fields = {}
self.sync_fields(repeating=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this isn't actually true if there are other list field subscribers at the time run_script is called. If so, run_script changes get batched with a certain frequency (250ms by default), so they aren't necessarily visible on the first return from sync_fields. I think just applying both will "work".

def sync_fields(self, repeating: bool):
""" Check for fields that have been added/deleted by other sessions and add them to the local list

This will start a new background thread when `repeating=True`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted - when repeating, only the first response will modify global "scope" - future responses against global "scope" will not be handled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the issue with script response/listfields response conflicts is fixed, can the repeating sync_fields be switched back to handling all changes?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's correct. But my other comment is still relevant - we want run_script changes to be visible upon return.

@devinrsmith devinrsmith self-requested a review June 6, 2022 22:39
@jmao-denver jmao-denver self-requested a review June 6, 2022 23:32
Comment on lines +42 to 43
def __init__(self, host: str = None, port: int = None, never_timeout: bool = True, session_type: str = 'python', sync_fields: int = NO_SYNC):
""" Initialize a Session object that connects to the Deephaven server
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the default to be SYNC_ONCE. @jakemulf what do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NO_SYNC might actually be the most reasonable default. I generally find that no-ops are the best defaults

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakemulf

@jmao-denver jmao-denver self-requested a review June 7, 2022 03:04
Copy link
Copy Markdown
Contributor

@jmao-denver jmao-denver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve based on Jake's feedback

@jmao-denver jmao-denver merged commit 9dafae1 into deephaven:main Jun 8, 2022
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pydeephaven Session() should display tables that currently exist in Deephaven

4 participants