-
Notifications
You must be signed in to change notification settings - Fork 162
Accurately inform user of document import progress (even on background jobs) #268
Comments
For background: now that a document is usable before being "fully" processed (i.e., now that you can view/edit/annotate/etc. while entities are still processing), we need a redesigned and more robust "what's the status of my document?" interface in the workspace, both in the index (workspace) and show (viewer) screens. That will require a bit more work. This solution lays the technical foundation for that but with a stopgap interface (see "successful test case"). @cometman Rather than "Your document is still being processed", let's go with the more explicit "We're still extracting entities from your document. Check back in a little while." |
👍 error message text change |
Yeah, just for a bit more background (ha): Entity Extraction to this point has been a portion of document importation & processing. Processing has been treated as monolithic. Either all processing has been completed, or it hasn't. That processing has included image extraction, text extraction and then entity extraction from the text. In order to be able to rate limit Entity Extraction, it is necessary to break apart document processing (so that Entity Extraction can be controlled independently from document importation) into multiple backgrounded jobs which may be chained together in sequence. As a consequence, the DocumentCloud workspace, which currently can only treat documents as Available or Unavailable, must be updated to also reflect a sequence of possible statuses. Rather than do a whole hog reworking of the workspace with an actual state machine, for the time being, we will provide the workspace with additional information about what jobs are being run for a particular document. If an entity extraction job is currently being run, entity display tools will not be available for that document in the work space. |
The messages to the user should be framed in terms of "what can I do with the document." So, for example, we might consider status messages that include:
|
That's an excellent idea, Tony. |
Thanks. Further to expand the idea, these statuses could be grayed-out and gradually get filled in (or a check mark added) as the step completes. This also serves to educate users as to what's happening to their docs behind the scenes and is better than a simple percent-done indicator. |
Right. I like the idea of effectively enabling visible feature flags, versus a progress bar which implies both sequential processing and a mythical concept of "completeness". For instance, users who don't use entities would consider a pre-entity-extracted document as complete, so to generically communicate that a pre-extraction document is "80% ready for use" would be wrong. The model you laid out is more like "hey, you can do A/B/C on this doc, but check back for Z" which communicates personal completeness. I like. |
Yeah that's my preferred course of action as well, however doing that will require taking a look at the document tiles, what information is conveyed there, and how we can integrate that information into the design & interaction. The thing we're trying to prevent right now is the following: Once a document has finished processing it is marked as complete. The workspace has only one notion of "completeness" and notifies the user that the document is done. The user then clicks on the completed document and asks for the entities. The workspace reports back that the document has no entities. The workspace needs to know on a data level what jobs are connected with which documents, and what users are or are not allowed to do. The start for this is definitely just having the workspace tell users they should come back later for entities if entities are currently being processed. |
_Objective_
As a DC user, after uploading a document to DC, I need to be fully informed of any additional processing taking place on my document that could impede my current task.
_History_
With the introduction of decoupling entity extraction from document imports, documents are first imported, and then a background job is started to begin entity extraction. Today, after a document has imported the document appears to be complete, even if the entity extraction job is still running.
_Successful test case_
After a user uploads a document and the initial import completes, if the user attempts to clicked "View Entities" on a document that is still undergoing entity extraction, the user should be informed: "We're still extracting entities from your document. Check back in a little while."
_Technical Change Overview_
The text was updated successfully, but these errors were encountered: