Write tech spec to solve mismatch in reader/vbms documents #14518

hschallhorn · 2020-06-11T16:07:33Z

The board has raised some incidents where the documents available in VBMS did not match the documents shown in reader (#14081). This led to us digging into what occurs when a user refreshed reader (#14298). The documentation around that investigation resides in https://github.com/department-of-veterans-affairs/caseflow/wiki/Reader-Backend. This raised multiple concerns.

Concerns

In EfolderService.fetch_documents_for(appeal, user), if conversion to PDF fails, the image is saved to S3 (however Reader can only show PDFs) and no alert is logged.
With each document shown to the user, DocumentController#pdf is called for the current, next, and previous documents. It serves up the pdf file from directory /tmp/pdfs/. The pdf could come from 3 places:
- from memory
- S3 if it's not in memory
- VBMS if it's not in S3
So if the document is not in S3 and comes from VVA, then Reader won't be able to show it.
Reader pulls document files from S3, if they're available. A RetrieveDocumentsForReaderJob caches documents in S3 every 5 minutes for active Reader users. This job chooses up to 5 users who (1) logged in within the last week and (2) haven't used eFolder to fetch documents at all or not within the last day. 5 minutes may be too frequent. Could the same 5 users be chosen by consecutive jobs if the first job is still processing? Since efolder_documents_fetched_at is not set until a job finishes, if the first job takes longer than 5 minutes (e.g., 1000+ documents) then the next job would pick the same users. How often is S3 used compared to document retrievals from VBMS/VVA? The intent of the job is to retrieve preferably all documents from S3. Should measure how well this job is achieving this intent and improve it, while considering S3 file auto-deletions.
In the Reader UI, document counts are displayed to the user through ExternalApi::EfolderService.document_count (Document List page) and ExternalApi::EfolderService.fetch_documents_for().count (Document View page). These document counts can change over time. For example,
- 2 Document records were created and retrieved but are no longer retrievable by eFolder, possibly because new versions are available.
- eFolder has new 1 Record that Reader doesn't yet know about, possibly because a new document was uploaded to VBMS/VVA.
- The net document count change may be 1, but there are 3 differences

Suggested improvements

Note: new features should be in Reader so that other non-Queue users (Intake, ...) can make use of them.

Have a nightly job that retrieves VBMS and VVA documents for all open appeals and save to S3. Log appeals and documents that have been processed and that remain to be processed. Consider that S3 files will expire (auto-delete) after 5 days - will probably want to make this longer to reduce redownloading or have another job to delete closed appeals. Are there S3 cost considerations or storage bounds?
On the Reader front page (where all doc count links lead), have a popup that shows stats about the docs such as:
- Last sync time
- Failed syncs
- Differences since last sync
- Removed docs and possible reasons
- Metadata for new docs - should also somehow highlight these new docs in the document list or add a downloaded timestamp and/or receipt date
Either in the popup above or elsewhere in Reader, have a button to allow the user to initiate a documents download/refresh from VBMS/VVA with a warning that it may take many minutes for it to complete, depending on other similar requests already in line. This would initiate a background job so users can continue reviewing existing docs.
We should focus efforts to reduce the need for this feature since (ideally) all docs as of the prior night are already in S3 and are updated nightly. This should reduce the risk of doc count discrepancies among pages since none of those pages should initiate a query to external systems. The button mentioned in this feature would be the only way to manually initiate such a query.

Goals of the implementation

All PDF documents viewable in reader match (or can be updated to match) documents viewable in VBMS/VVA (or there is a descriptive error/reason why a certain document is not viewable)
Dependency on eFolder reduced to improve page load times
- If heavy enough caching is used to achieve this and there may be noticeable discrepancies, allow the user to force refresh documents
Document counts throughout caseflow are consistent (In queue table view, in case details, in Reader)
More transparency available to the user around syncs/doc count changes/etc.

Non Goals/Out of scope

Ensuring all non pdf documents are correctly converted to be viewable in Reader (eg: Fix support for TIFF documents in Reader/eFolder #14193)
Improving load times of document counts in queue or remove async request for doc counts all together
Reimplementing the new documents icon!

The text was updated successfully, but these errors were encountered:

yoomlam · 2020-06-11T16:12:01Z

Out of scope but FYI: There a new AWS lambda for converting TIFFs to PDFs (https://github.com/department-of-veterans-affairs/appeals-lambdas/pull/68). eFolder Express (which Reader queries) will need to be updated to use the lambda.
PDF conversion errors should probably be logged/stored so it can be reported to the user.

lomky · 2020-06-11T17:22:26Z

what is this chart?

1 | 
2 | 
3 | 
5 | |||||||||||
8 |

Why 3?

Normal timebox

Why 5?

Requires investigation & confirmation
Requires spin up
Solution may require re-architecting of some aspects of Reader-eFolder interaction

yoomlam · 2020-06-12T20:48:48Z

@pshahVA provided some information that could be considered in the tech spec: VBMS was migrated to the cloud, so documents may already be in the cloud for access by Caseflow. VBMS API convo, VBMS eFolder API.

Recent convo

Questions:

Where does VBMS store documents?
Can Caseflow access those documents directly (such as via some URL provided by VBMS)? If so and if the documents can be quickly retrieved by Reader and EE, then this would avoid Caseflow having to download and cache documents temporarily in S3.

hschallhorn · 2020-06-26T14:06:17Z

Reimplement doc count mismatch: https://github.com/department-of-veterans-affairs/caseflow/pull/14586/files

hschallhorn · 2020-10-23T18:36:05Z

Another instance of all docs not showing up in reader? https://dsva.slack.com/archives/CHX8FMP28/p1603477605369000?thread_ts=1601667814.184200&cid=CHX8FMP28

There were 13 documents in caseflow that the user did not view before dispatching the case

hschallhorn changed the title ~~Write tech spec to solve incorrect reader documents~~ Write tech spec to solve mismatch in reader/vbms documents Jun 11, 2020

ThorntonMatthew closed this as completed Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write tech spec to solve mismatch in reader/vbms documents #14518

Write tech spec to solve mismatch in reader/vbms documents #14518

hschallhorn commented Jun 11, 2020

yoomlam commented Jun 11, 2020

lomky commented Jun 11, 2020

yoomlam commented Jun 12, 2020 •

edited

Loading

hschallhorn commented Jun 26, 2020

hschallhorn commented Oct 23, 2020

Write tech spec to solve mismatch in reader/vbms documents #14518

Write tech spec to solve mismatch in reader/vbms documents #14518

Comments

hschallhorn commented Jun 11, 2020

Concerns

Suggested improvements

Goals of the implementation

Non Goals/Out of scope

yoomlam commented Jun 11, 2020

lomky commented Jun 11, 2020

Why 3?

Why 5?

yoomlam commented Jun 12, 2020 • edited Loading

hschallhorn commented Jun 26, 2020

hschallhorn commented Oct 23, 2020

yoomlam commented Jun 12, 2020 •

edited

Loading