Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push notifications on data changes (client-server data sync) #300

Closed
Mec-iS opened this issue Dec 5, 2018 · 12 comments
Closed

Push notifications on data changes (client-server data sync) #300

Mec-iS opened this issue Dec 5, 2018 · 12 comments

Comments

@Mec-iS
Copy link
Contributor

Mec-iS commented Dec 5, 2018

I'm submitting a

  • [x ] feature request.

Current Behaviour:

There is no way for now to make the client-side graph representation of the data aware of data changes. Clients have to query the server every time to be sure that data has not been updated since the last query. This opens to the possibility of the client using stale data.

Expected Behaviour:

hydrus should allows the clients to connect through WebSocket to push data changes for every object directly to the client.
Maybe also needed some sort of security mechanism should be present; server keeps a table for its outbox and client keeps an inbox. The server keeps a log of changes in an outbox table, the client every time before sending a request should check the server's outbox if its inbox is synced with the latest changes. This implies that every request is under-the-hood two requests.
This refers to #218 as the inbox/outbox may be a DAG.

Do you want to work on this issue?

This will probably be a task to be part of GSOC 2019

Please start collecting here ideas for implementation, tell the community how you would like to see implemented this feature.

@shravandoda
Copy link
Contributor

shravandoda commented Mar 22, 2019

@Mec-iS shouldn't this work both ways. Rather than just making client aware of data changes on the server-side, client should be able to push changes to server as well (given it has been authorized). I don't know if hydrus currently supports any way to push changes to server database. Please guide.

@vddesai1871
Copy link
Contributor

vddesai1871 commented Mar 22, 2019

I do not think the client is supposed to change HydraDoc. If I am correct, this feature is mainly about pushing data changes made by other clients to any client connected with that instance of hydrus.

@shravandoda
Copy link
Contributor

@vddesai1871 I think hydra-python-agent can add new instances of resources.. It might not be able to add new classes or collections. Please take a look
https://github.com/HydraCG/Specifications/blob/master/drafts/use-cases/5.1.creating-event-with-put.md

@vddesai1871
Copy link
Contributor

@vddesai1871 I think hydra-python-agent can add new instances of resources.. It might not be able to add new classes or collections.

That's what I wrote above

@shravandoda
Copy link
Contributor

I guess I should've said server database instead of HydraDoc

@vddesai1871
Copy link
Contributor

vddesai1871 commented Mar 22, 2019

client should be able to push changes to server as well (given it has been authorized)

Client does this by standard HTTP methods. (through operations available/provided in HydraDoc).
We need push mechanism at server to propagate such individual changes to other clients connected to the server. (So data at every client remains synchronized with the server data).

@Mec-iS
Copy link
Contributor Author

Mec-iS commented Mar 23, 2019

Absolutely not, only source of truth for data shall be the server.

@shravandoda
Copy link
Contributor

Can we use some stateful protocol for synchronization between client and server?

@HTTP-APIs HTTP-APIs deleted a comment from HarsheetKakar May 3, 2019
@Mec-iS
Copy link
Contributor Author

Mec-iS commented May 3, 2019

The best approach is something like a PubSub (with Redis as we have already it available) or better an inboxes system like the one used by actors in an Actor Model; the server and clients have each of them an inbox; messages are in a queue that is consumed by the actor. Types of messages can be for example: PUT this payload in resource with id, POST this payload in resource with id, GET resource with id.

@Guttz
Copy link
Contributor

Guttz commented May 11, 2019

After a bit of discussion with @vddesai1871 and inspiration from other comments here I would like to bring a solution into discussion that would use Websockets and an Inbox mechanism.

Websocket implementation

To implement the socket mechanism we would use FlaskSocketIO which is a Socket.io intregration for Flask applications and allows the client to be connected via a socket when it queries for the API Doc, also having connection inconsistency and etc.

Regarding the socket functionatily @vddesai1871 already developed an experiment and added some basic WebSocket support in hydrus and simulated a small client. You can run multiple clients and when making a request to hydrus it forwards the modification after successfully adding the resource in resource.py.

Inbox mechanism

Below is the inbox that is kept by the server and by the clients.

image

This solution introduces modifications in both hydrus and the Hydra Python Agent. At hydrus it's necessary to add the table log mechanism as well as a web socket available to notify modifications at the table. Client side, it should hold an own internal table, connect to the web socket and also be able to handle the four different situations described below to maintain the data updated.

Client Initialization

When starting a client, it will query the API basic structure and also copy the current server modification log table. After that, there are three situations that have to be addressed when dealing with new rows at the modification table: the client has internally an outdated resource that needs an update, the client never queried that resource and lastly the client was the one who made a transaction.

The client finds a JOB ID referring to an outdated resource

The client finds a new JOB ID, it queries it's Redis graph to check if it already has that specific resource,
if finding it, it will compare both the internal resource date and the one provided by the server. If the resource provided by the server is more recent, it will call itself internally to query that resource again from the hydrus server and update it internally accordingly.

The client finds a JOB ID to a nonexisting resource internally

The client finds a new JOB ID, it queries its Redis graph to check if it already has that specific resource, if not finding, meaning that the Client hasn't yet queried for that resource, the client can ignore the modification and simply add the row to its internal table since it's not relevant for the Client.

The Client finds a JOB ID made by itself on the table

When the Client finds a new job that isn't on its table, the first thing it has to do is to check if its internal resource has a Date signature after or similar to the one in the server table, if the internal client representation is more recent, basically the client adds that transaction to the table since the client has a more up to date resource that will be shown soon at the server table.

An important observation here is that, the Client should set the internal Date for an resource with the Date object sent as the response Header by the hydrus server(to make sure it uses an standard centralized date). Also, it should only set it's internal Date for a resource when receiving a successful response for the HTTP request sent to modify/create a resource on the server.

That's the overall concept. I've been trying to grasp more concepts from the Actor Model so the solution is more robust and can perhaps process some concurrent changes.

@Guttz
Copy link
Contributor

Guttz commented Jul 19, 2019

[Following discussing at https://github.com/HTTP-APIs/hydra-python-agent/pull/123]
The server automatically launches a Flask server and creates a socket in the Namespace '/sync', all clients connect to that socket and listen for events.

The server has a limited sized modifications_table as the following:

JOB_ID METHOD RESOURCE_URL
fece6d5e... POST http://server.com/Collection/98e8e272-e5ae-4f1a-a0b2-117fb052ca50
aaa49974... DELETE http://server.com/Collection/f1404e8d-0a52-4359-88c3-29ec9f208525
360df976... POST http://server.com/Collection/f1404e8d-0a52-4359-88c3-29ec9f208525
  • All clients when connecting will copy the last_job_id(fece6d5e here).
  • When hydrus has POST and DELETE modifications, it simply informs all clients that "there's updates".
  • Clients send hydrus the last job id it had processed(fece6d5e), and will get all the new rows above fece6d5e and process them.
  • After that, it will update its last_job_id variable with the new last_job_id on the server.

In the end, hydrus has three core implementations:

  1. A socket connection that broadcasts that there were new events to all Clients

  2. A limited size table that should contain modifications made to resources with POST and DELETE

  3. An endpoint for the clients to fetch the modifications table difference:

@app.route('/modification-table-diff')

This receives a PARAM with a Job ID and sends the table diff according to the last updated resource the Agent had.
GET Example: https://localhost:5000/modification-table-diff?agent_job_id=2
Obs.: If empty parameter, the endpoint returns the full table(For initialization purposes)

A simulated server, compatible and working with current Agent PR, simulates this behavior and is available at: https://github.com/Guttz/simulated-hydrus-sync-socket.

@Mec-iS
Copy link
Contributor Author

Mec-iS commented Sep 20, 2019

@chrizandr please close if it is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

5 participants