Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Changes only" status output #686

Closed
meejah opened this issue Jan 19, 2023 · 9 comments · Fixed by #694
Closed

"Changes only" status output #686

meejah opened this issue Jan 19, 2023 · 9 comments · Fixed by #694

Comments

@meejah
Copy link
Collaborator

meejah commented Jan 19, 2023

The status endpoint currently has the behavior of sending the entire JSON-serialized status state on every change to that state.

Instead, all clients could be kept in sync by using the following protocol:

  • the first message is always a "full state" message

  • any number of "update" messages may follow

  • (think; optional? required?) any message may be a "full state" message, after which any "update" messages pertain to changes to the latest "full state" message

  • on any incoming "full state" message, reset all state (e.g. GUI controls) to match this message

  • on "update" messages, change the existing state as per the message

We should spec out all the messages; see e.g. interface.rst and the status-API (there's just one message currently). This would make the entire API /v2 I guess? (Or, how do we tell existing clients what version this is?)

...and for compatibility the /v1 endpoint would have to keep doing what it does now, until deprecation? Also we don't have a way to deprecate APIs.

@crwood
Copy link
Member

crwood commented Jan 20, 2023

Changes in this direction would be extremely welcome. Presently, Gridsync (and any other hypothetical/future consumers of the status API) must compare/diff each incoming status message against the previous one in order to determine what the actually-salient changes were (which, at higher volumes of files/changes, becomes quite expensive). Having some sort of stream/endpoint that conveys only the updates would remove the need for such comparisons altogether (and, in the case of Gridsync, would allow about 50%(!) of the code relating to magic-folder to be deleted).

For what it's worth, Gridsync presently parses out -- and emits Qt "signals" for -- these events from magic-folder. It would be great if the same or similar information could be provided directly from magic-folder itself as individual "update" messages -- maybe in some form similar to these (only examples, of course):

# Folder-related events:
{"event": "folder-added", "folder": "Cat Pics", "collective-dircap": "URI:...", "personal-dircap": "URI:..." }
{"event": "folder-status-updated", "folder": "Cat Pics", "status": "syncing" }
{"event": "folder-mtime-updated", "folder": "Cat Pics", "mtime": 1674229561 }
{"event": "folder-size-updated", "folder": "Cat Pics", "size": 28409382 }
{"event": "folder-removed", "folder": "Cat Pics" }

# Membership-related events:
{"event": "device-added", "folder": "Cat Pics", "device": "Mallory's Machination", "pubkey": "Ed25519:BcEe3h84+uCLz4/n..." }
{"event": "device-removed", "folder": "Cat Pics", "device": "Mallory's Machination" }

# Polling/scanning:
{"event": "scan-started", "folder": "Cat Pics" }
{"event": "scan-finished", "folder": "Cat Pics" }
{"event": "poll-started", "folder": "Cat Pics" }
{"event": "poll-finished", "folder": "Cat Pics" }

# Transfer-related events:
{"event": "upload-queued", "folder": "Cat Pics", "relpath": "Garfield.jpg" }
{"event": "upload-started", "folder": "Cat Pics", "relpath": "Garfield.jpg" }
{"event": "upload-progressed", "folder": "Cat Pics", "relpath": "Garfield.jpg", "transferred": 0.58 }
{"event": "upload-finished", "folder": "Cat Pics", "relpath": "Garfield.jpg" }
{"event": "download-started", "folder": "Cat Pics", "relpath": "Grumpy Cat.jpg", "device": "Bob's laptop" }
{"event": "download-progressed", "folder": "Cat Pics", "relpath": "Grumpy Cat.jpg", "device": "Bob's laptop", "transferred": 0.33 }
{"event": "download-finished", "folder": "Cat Pics", "relpath": "Grumpy Cat.jpg", "device": "Bob's laptop" }
{"event": "folder-sync-started", "folder": "Cat Pics" }
{"event": "folder-sync-finished", "folder": "Cat Pics" }

# File-related events (e.g., for when changes to the DMD are detected by a remote-poll):
{"event": "file-added", "folder": "Cat Pics", "relpath": "lolcat.jpg", "device": "Alice's phone" }
{"event": "file-modified", "folder": "Cat Pics", "relpath": "lolcat.jpg", "device": "Alice's phone", "mtime": 1674229569, "size": 49152 }
{"event": "file-removed", "folder": "Cat Pics", "relpath": "lolcat.jpg", "device": "Alice's phone" }

# Conflicts:
{"event": "conflict-detected", "folder": "Cat Pics", "relpath": "lolcat.jpg" }
{"event": "conflict-resolved", "folder": "Cat Pics", "relpath": "lolcat.jpg" }

# Folder errors:
{"event": "folder-error-detected", "folder": "Cat Pics", "message": {"kind": "upload-failed", "relpath": "lolcat.jpg" } }
{"event": "folder-error-resolved", "folder": "Cat Pics", "message": {"kind": "upload-retry-succeeded", "relpath": "lolcat.jpg" } }

# Connectivity:
{"event": "connecting", "known": 10, "happy": 3, "connected": 1 }
{"event": "connected", "known": 10, "happy": 3, "connected": 7 }

# Others??

We should spec out all the messages; see e.g. interface.rst and the status-API (there's just one message currently). This would make the entire API /v2 I guess? (Or, how do we tell existing clients what version this is?)

...and for compatibility the /v1 endpoint would have to keep doing what it does now, until deprecation? Also we don't have a way to deprecate APIs.

Another option might be to just keep the existing (/v1/status) endpoint as-is for now and add the more granular message stream described here under an entirely different path (e.g., /v1/events)? To me, at least, incrementing a protocol version usually signals a break in compatibility (and it seems like maybe this functionality can be added without changing the old behavior?).

@exarkun
Copy link
Member

exarkun commented Jan 20, 2023

the first message is always a "full state" message

It should also be possible to represent the full state as a sequence of updates. If the update protocol is like:

{"events": [{...}]}

Then the only thing that is special about the first message a client receives when they connect is that it happens to have all of the events necessary to describe the state that "already" existed in magic-folder at the time of the connection.

This makes event notification the only message in the protocol which could be a nice simplification.

@meejah
Copy link
Collaborator Author

meejah commented Jan 21, 2023

I do believe that could work.

That would mean some potentially extra work on a new client connection ("figure out what the updates are" from the existing state) but it should be a straightforward mapping from the current serializer, I suppose.

To me, at least, incrementing a protocol version usually signals a break in compatibility (and it seems like maybe this functionality can be added without changing the old behavior?).

I guess that's the question, sort of: do we want to increment on any change? Or only increment on a change we "can't" make work (which basically implies /v1/... can add any endpoints at all but changing or similar is harder).

One benefit of getting good at incrementing the version easily is that we can improve the API more freely.

That said, there wasn't a statement about the HTTP API stability, so currently I added one (saying it's not yet considered stable). So we could just change the status WebSocket to emit "whatever makes the most sense" and leave the above question for "some point after we declare it stable". (We also could declare it stable now and use the /v1/events approach suggested above...)

@meejah
Copy link
Collaborator Author

meejah commented Jan 27, 2023

One use-case that's hard to cover with the "updates-only" API is a "one-time" status command to display the current status -- such a command would have to wait some hard-to-define (but presumably fairly short) amount of time to have "a" coherent status.

If a single coherent status message is sent upon connect (i.e. like now), such a command waits for just the first message and exits.

(I don't know the details of Qt enough to know if this would help or hinder there -- e.g. does it special-case the initial state, such that this could save some amount of "UI churn" versus starting with a blank slate and applying all the updates incrementally?)

I also like the idea of the separate /v1/events allowing clients to choose -- I wonder if these could be cohesively combined somehow?

@exarkun
Copy link
Member

exarkun commented Jan 27, 2023

One use-case that's hard to cover with the "updates-only" API is a "one-time" status command to display the current status -- such a command would have to wait some hard-to-define (but presumably fairly short) amount of time to have "a" coherent status.

If magic-folder sends all of the updates representing the current state in one {"events": [...]} message then I think this concern is addressed (e.g., a "show me the current status" tool would just wait for one message and be done). So, essentially - yes, it is good to have a single coherent "here's the state" message delivered first, but the message can still be event/update shaped.

And more generally, it seems like it would make sense if it always sent a consistent batch of events in one container message. That way any UI (or any other kind of processing) has a clear indication of the boundaries of whatever state transitions are going on.

This is all supposing that it is simpler to implement just event processing instead of event processing & some other "here's the current state" message. That seems like a safe bet but I haven't worked as closely with magic-folder as @meejah or @crwood .

@meejah
Copy link
Collaborator Author

meejah commented Jan 27, 2023

One the sending side, since we already track "the state" and have a way to serialize it, simply sending that is .. already done (and fairly easy to expand). (I was originally viewing "update" messages as one-each, so yeah being able to "batch" them could be a solution to "here is a coherent state").

The "update the state methods" already (I believe) fairly closely track what the UI might want (e.g. "add a folder" etc) so simply emitting those as updates also seems straightforward.

It would be "work" to do an "updates-only" initial message, because a new client would arrive and we'd have to deduce what the updates are from the current state -- that is, re-write "marshal the state" to be in terms of updates only. (Saving each update individually could be a strategy, but would lead to unbounded memory use).

Consider, for example, uploads or downloads: there aren't usually a lot going on at once, but there could have been a lot that have happened. So with the current strategy of "track the state", the "downloads" list may have two elements in it .. so then we deduce that a new clients gets a state with two "download-queued" events followed by two "download-started" events (or just serialize the "downloads" list as now).

However, to pursue a "save all events that happened" we'd have a ton of redundant stuff like download-queued' -> download-started->download-finished` events saved up. They could in principal of course "prune" the old events, but ...

Aside: does a download-queued have to come before a download-started? Probably, to get the queued-at timestamp.

Anyway, I think it could work with just events (from the sending side). Maybe it's best to just "try it and see"?

I guess the good-news side is that the internal API is fairly "event" focused already...looking a little, the one thing that doesn't fit very well is the "recent" list which is the last 30 most-recently updated files (that aren't currently being uploaded or downloaded). Not entirely sure how to fit this into an "event" format....maybe it simply shouldn't be in the status API at all. @chris what does this actually get used for? Could it be moved to a separate API (then also giving control over the "30" part via a query-arg)?

@crwood
Copy link
Member

crwood commented Jan 27, 2023

...looking a little, the one thing that doesn't fit very well is the "recent" list which is the last 30 most-recently updated files (that aren't currently being uploaded or downloaded). Not entirely sure how to fit this into an "event" format....maybe it simply shouldn't be in the status API at all. @chris what does this actually get used for? Could it be moved to a separate API (then also giving control over the "30" part via a query-arg)?

Currently, Gridsync doesn't make use of the "recent" list at all. Instead, it fetches the "full" (files) state of a given folder via an initial GET /v1/<FOLDER_NAME>/file-status and updates the UI in accordance with the "status" websocket messages it subsequently reads about that folder (for example, showing that a folder is "Syncing" when a file within that folder enters the upload/download queue, or adding an entry to the top of the "History" liest/view once a file finishes uploading, etc.). That said, I would have no objections to (re)moving the "recent" list from "status" messages (or future "event" messages, or whatever).

@meejah
Copy link
Collaborator Author

meejah commented Jan 28, 2023

Okay, so maybe we just remove "recent" (from status) then.

'.../file-status` is going to be pretty inefficient (unless you do need the state of all the files). But, we can add different endpoints as-needed for other things. (I don't remember of the top of my head why "recent" was added?)

@meejah
Copy link
Collaborator Author

meejah commented Feb 17, 2023

magic-folder status will also need to be re-written (to deserialize in terms of "events" instead of "here is the state").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants