Skip to content
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.

Accurately inform user of document import progress (even on background jobs) #268

Open
cometman opened this issue Sep 8, 2015 · 8 comments

Comments

@cometman
Copy link
Contributor

cometman commented Sep 8, 2015

_Objective_
As a DC user, after uploading a document to DC, I need to be fully informed of any additional processing taking place on my document that could impede my current task.

_History_
With the introduction of decoupling entity extraction from document imports, documents are first imported, and then a background job is started to begin entity extraction. Today, after a document has imported the document appears to be complete, even if the entity extraction job is still running.

_Successful test case_
After a user uploads a document and the initial import completes, if the user attempts to clicked "View Entities" on a document that is still undergoing entity extraction, the user should be informed: "We're still extracting entities from your document. Check back in a little while."

_Technical Change Overview_

  • Update API to include jobs being processed in the response of GET /document/:id
  • Add error message (also localized) to constants
  • When clicking the action "View Entities" ensure all dependent jobs have completed.
@cometman cometman self-assigned this Sep 8, 2015
@cometman cometman changed the title Document upload progress notification Accurately inform user of document import progress (even on background jobs) Sep 8, 2015
@reefdog
Copy link
Contributor

reefdog commented Sep 8, 2015

For background: now that a document is usable before being "fully" processed (i.e., now that you can view/edit/annotate/etc. while entities are still processing), we need a redesigned and more robust "what's the status of my document?" interface in the workspace, both in the index (workspace) and show (viewer) screens. That will require a bit more work. This solution lays the technical foundation for that but with a stopgap interface (see "successful test case").

@cometman Rather than "Your document is still being processed", let's go with the more explicit "We're still extracting entities from your document. Check back in a little while."

@cometman
Copy link
Contributor Author

cometman commented Sep 8, 2015

👍 error message text change

@knowtheory
Copy link
Member

Yeah, just for a bit more background (ha):

Entity Extraction to this point has been a portion of document importation & processing. Processing has been treated as monolithic. Either all processing has been completed, or it hasn't. That processing has included image extraction, text extraction and then entity extraction from the text.

In order to be able to rate limit Entity Extraction, it is necessary to break apart document processing (so that Entity Extraction can be controlled independently from document importation) into multiple backgrounded jobs which may be chained together in sequence.

As a consequence, the DocumentCloud workspace, which currently can only treat documents as Available or Unavailable, must be updated to also reflect a sequence of possible statuses.

Rather than do a whole hog reworking of the workspace with an actual state machine, for the time being, we will provide the workspace with additional information about what jobs are being run for a particular document.

If an entity extraction job is currently being run, entity display tools will not be available for that document in the work space.

@anthonydb
Copy link
Member

As a consequence, the DocumentCloud workspace, which currently can only treat documents as Available or Unavailable, must be updated to also reflect a sequence of possible statuses.

The messages to the user should be framed in terms of "what can I do with the document." So, for example, we might consider status messages that include:

  • Ready to annotate.
  • Ready to publish.
  • Ready to explore entities.

@reefdog
Copy link
Contributor

reefdog commented Sep 8, 2015

That's an excellent idea, Tony.

@anthonydb
Copy link
Member

Thanks. Further to expand the idea, these statuses could be grayed-out and gradually get filled in (or a check mark added) as the step completes. This also serves to educate users as to what's happening to their docs behind the scenes and is better than a simple percent-done indicator.

@reefdog
Copy link
Contributor

reefdog commented Sep 8, 2015

Right. I like the idea of effectively enabling visible feature flags, versus a progress bar which implies both sequential processing and a mythical concept of "completeness". For instance, users who don't use entities would consider a pre-entity-extracted document as complete, so to generically communicate that a pre-extraction document is "80% ready for use" would be wrong. The model you laid out is more like "hey, you can do A/B/C on this doc, but check back for Z" which communicates personal completeness. I like.

@knowtheory
Copy link
Member

Yeah that's my preferred course of action as well, however doing that will require taking a look at the document tiles, what information is conveyed there, and how we can integrate that information into the design & interaction.

The thing we're trying to prevent right now is the following:

Once a document has finished processing it is marked as complete. The workspace has only one notion of "completeness" and notifies the user that the document is done. The user then clicks on the completed document and asks for the entities. The workspace reports back that the document has no entities.

The workspace needs to know on a data level what jobs are connected with which documents, and what users are or are not allowed to do.

The start for this is definitely just having the workspace tell users they should come back later for entities if entities are currently being processed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants