Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write tech spec to solve mismatch in reader/vbms documents #14518

Closed
8 tasks
hschallhorn opened this issue Jun 11, 2020 · 5 comments
Closed
8 tasks

Write tech spec to solve mismatch in reader/vbms documents #14518

hschallhorn opened this issue Jun 11, 2020 · 5 comments
Labels
Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. Product: caseflow-reader Stakeholder: BVA Functionality associated with the Board of Veterans' Appeals workflows/feature requests Team: Echo 🐬 Type: Tech-Spec Label for issues that contain a tech spec

Comments

@hschallhorn
Copy link
Contributor

The board has raised some incidents where the documents available in VBMS did not match the documents shown in reader (#14081). This led to us digging into what occurs when a user refreshed reader (#14298). The documentation around that investigation resides in https://github.com/department-of-veterans-affairs/caseflow/wiki/Reader-Backend. This raised multiple concerns.

Concerns

  • In EfolderService.fetch_documents_for(appeal, user), if conversion to PDF fails, the image is saved to S3 (however Reader can only show PDFs) and no alert is logged.

  • With each document shown to the user, DocumentController#pdf is called for the current, next, and previous documents. It serves up the pdf file from directory /tmp/pdfs/. The pdf could come from 3 places:

    • from memory
    • S3 if it's not in memory
    • VBMS if it's not in S3

    So if the document is not in S3 and comes from VVA, then Reader won't be able to show it.

  • Reader pulls document files from S3, if they're available. A RetrieveDocumentsForReaderJob caches documents in S3 every 5 minutes for active Reader users. This job chooses up to 5 users who (1) logged in within the last week and (2) haven't used eFolder to fetch documents at all or not within the last day. 5 minutes may be too frequent. Could the same 5 users be chosen by consecutive jobs if the first job is still processing? Since efolder_documents_fetched_at is not set until a job finishes, if the first job takes longer than 5 minutes (e.g., 1000+ documents) then the next job would pick the same users. How often is S3 used compared to document retrievals from VBMS/VVA? The intent of the job is to retrieve preferably all documents from S3. Should measure how well this job is achieving this intent and improve it, while considering S3 file auto-deletions.

  • In the Reader UI, document counts are displayed to the user through ExternalApi::EfolderService.document_count (Document List page) and ExternalApi::EfolderService.fetch_documents_for().count (Document View page). These document counts can change over time. For example,

    • 2 Document records were created and retrieved but are no longer retrievable by eFolder, possibly because new versions are available.
    • eFolder has new 1 Record that Reader doesn't yet know about, possibly because a new document was uploaded to VBMS/VVA.
    • The net document count change may be 1, but there are 3 differences

Suggested improvements

Note: new features should be in Reader so that other non-Queue users (Intake, ...) can make use of them.

  1. Have a nightly job that retrieves VBMS and VVA documents for all open appeals and save to S3. Log appeals and documents that have been processed and that remain to be processed. Consider that S3 files will expire (auto-delete) after 5 days - will probably want to make this longer to reduce redownloading or have another job to delete closed appeals. Are there S3 cost considerations or storage bounds?
  2. On the Reader front page (where all doc count links lead), have a popup that shows stats about the docs such as:
    • Last sync time
    • Failed syncs
    • Differences since last sync
    • Removed docs and possible reasons
    • Metadata for new docs - should also somehow highlight these new docs in the document list or add a downloaded timestamp and/or receipt date
  3. Either in the popup above or elsewhere in Reader, have a button to allow the user to initiate a documents download/refresh from VBMS/VVA with a warning that it may take many minutes for it to complete, depending on other similar requests already in line. This would initiate a background job so users can continue reviewing existing docs.
    We should focus efforts to reduce the need for this feature since (ideally) all docs as of the prior night are already in S3 and are updated nightly. This should reduce the risk of doc count discrepancies among pages since none of those pages should initiate a query to external systems. The button mentioned in this feature would be the only way to manually initiate such a query.

Goals of the implementation

  • All PDF documents viewable in reader match (or can be updated to match) documents viewable in VBMS/VVA (or there is a descriptive error/reason why a certain document is not viewable)
  • Dependency on eFolder reduced to improve page load times
    • If heavy enough caching is used to achieve this and there may be noticeable discrepancies, allow the user to force refresh documents
  • Document counts throughout caseflow are consistent (In queue table view, in case details, in Reader)
  • More transparency available to the user around syncs/doc count changes/etc.

Non Goals/Out of scope

@hschallhorn hschallhorn added Product: caseflow-reader Stakeholder: BVA Functionality associated with the Board of Veterans' Appeals workflows/feature requests Team: Echo 🐬 Type: Tech-Spec Label for issues that contain a tech spec Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. labels Jun 11, 2020
@yoomlam
Copy link
Contributor

yoomlam commented Jun 11, 2020

Out of scope but FYI: There a new AWS lambda for converting TIFFs to PDFs (https://github.com/department-of-veterans-affairs/appeals-lambdas/pull/68). eFolder Express (which Reader queries) will need to be updated to use the lambda.
PDF conversion errors should probably be logged/stored so it can be reported to the user.

@hschallhorn hschallhorn changed the title Write tech spec to solve incorrect reader documents Write tech spec to solve mismatch in reader/vbms documents Jun 11, 2020
@lomky
Copy link
Contributor

lomky commented Jun 11, 2020

what is this chart?

1 | 
2 | 
3 | 
5 | |||||||||||
8 | 

Why 3?

  • Normal timebox

Why 5?

  • Requires investigation & confirmation
  • Requires spin up
  • Solution may require re-architecting of some aspects of Reader-eFolder interaction

@yoomlam
Copy link
Contributor

yoomlam commented Jun 12, 2020

@pshahVA provided some information that could be considered in the tech spec: VBMS was migrated to the cloud, so documents may already be in the cloud for access by Caseflow. VBMS API convo, VBMS eFolder API.

Recent convo

Questions:

  1. Where does VBMS store documents?
  2. Can Caseflow access those documents directly (such as via some URL provided by VBMS)? If so and if the documents can be quickly retrieved by Reader and EE, then this would avoid Caseflow having to download and cache documents temporarily in S3.

@hschallhorn
Copy link
Contributor Author

@hschallhorn
Copy link
Contributor Author

Another instance of all docs not showing up in reader? https://dsva.slack.com/archives/CHX8FMP28/p1603477605369000?thread_ts=1601667814.184200&cid=CHX8FMP28

There were 13 documents in caseflow that the user did not view before dispatching the case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. Product: caseflow-reader Stakeholder: BVA Functionality associated with the Board of Veterans' Appeals workflows/feature requests Team: Echo 🐬 Type: Tech-Spec Label for issues that contain a tech spec
Projects
None yet
Development

No branches or pull requests

4 participants