Fix support for TIFF documents in Reader/eFolder #14193

nanotone · 2020-05-05T21:58:35Z

Description

TIFF documents are sometimes included in the Reader documents for a case, and cannot currently be displayed by Reader's PDF viewer, PDF.js. See technical notes for some thoughts on implementation.

Background/context/resources

Bat Team thread

Related ticket on "corrupted" files in eFolder: #10504

Technical notes

There is some internet literature on getting PDF.js to display TIFF images embedded within the PDF, but displaying plain TIFF images (mimetype image/tiff) is out of scope for PDF.js and will likely never be implemented.

Rather than detecting the file type in Reader and using yet another frontend library to display TIFFs, a more straightforward approach may be to implement a PDF-to-TIFF converter in eFolder.

Because PDF-to-TIFF conversion is likely to be fraught and full of exotic edge cases, here is the output of /usr/bin/file on the header of one such problematic TIFF file seen in Reader:
TIFF image data, little-endian, direntries=19, height=3367, bps=1, compression=bi-level group 4, PhotometricIntepretation=WhiteIsZero, orientation=upper-left, width=2541

The text was updated successfully, but these errors were encountered:

pkarman · 2020-05-05T22:33:14Z

The Caseflow efolder app has a TIFF-to-PDF converter already. Maybe we just need to expose that as an API?

https://github.com/department-of-veterans-affairs/caseflow-efolder/blob/master/app/services/image_converter_service.rb

Earlier thought by @enriquemanuel and I was making that into a lambda so it was not married to either app.

nanotone · 2020-05-06T12:02:05Z

That's good news, although it does raise the question why Reader was getting TIFFs when convert_tiff_images is enabled for efolder prod. I'm not as familiar with that codebase but it looks like the conversion should already be done during Document#fetch_content!.

Perhaps this ticket should be reincarnated as a straight-up bug report in the efolder repo?

pkarman · 2020-05-06T13:04:16Z

I believe that's because Reader does not reader from eFolder Express. It reads directly from VBMS efolder.

nanotone · 2020-05-06T13:26:32Z

For the document that I was investigating in the Bat Team thread above, Reader was hitting eFolder Express's /api/v2/records/ID for the file itself. I was able to look at the TIFF headers by SSMing into eFolder prod and retracing some of the controller code for that record.

pkarman · 2020-05-14T19:44:09Z

Researching related efolder INC we discovered imagemagick error example:

[ActiveJob] [V2::SaveFilesInS3Job] [3f4fda3c-1baa-4b96-a026-cb7bbb902689] [2020-05-14 15:27:06 -0400] STARTED ImageConverterService for {57B5D24F-0D8C-4B20-871A-B3452617CC93}.tiff
[ActiveJob] [V2::SaveFilesInS3Job] [3f4fda3c-1baa-4b96-a026-cb7bbb902689] [2020-05-14 15:27:06 -0400] STARTED Image Magick: Convert tiff to pdf
[ActiveJob] [V2::SaveFilesInS3Job] [3f4fda3c-1baa-4b96-a026-cb7bbb902689] [2020-05-14 15:27:07 -0400] RESCUED Image Magick: Convert tiff to pdf
[ActiveJob] [V2::SaveFilesInS3Job] [3f4fda3c-1baa-4b96-a026-cb7bbb902689] [2020-05-14 15:27:07 -0400] RESCUED ImageConverterService for {57B5D24F-0D8C-4B20-871A-B3452617CC93}.tiff
[ActiveJob] [V2::SaveFilesInS3Job] [3f4fda3c-1baa-4b96-a026-cb7bbb902689] [2020-05-14 15:27:07 -0400] Error performing V2::SaveFilesInS3Job (Job ID: 3f4fda3c-1baa-4b96-a026-cb7bbb902689) from Shoryuken(efolder_prod_low_priority) in 5243.6ms: HTTPClient::KeepAliveDisconnected (HTTPClient::KeepAliveDisconnected: Broken pipe):
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:524:in `rescue in query'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:514:in `query'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:177:in `query'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1242:in `do_get_block'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1019:in `block in do_request'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1138:in `rescue in protect_keep_alive_disconnected'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1131:in `protect_keep_alive_disconnected'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1014:in `do_request'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:856:in `request'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:765:in `post'
/opt/efolder-express/src/app/services/image_converter_service.rb:51:in `block (2 levels) in convert_tiff_to_pdf'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/tempfile.rb:295:in `open'
/opt/efolder-express/src/app/services/image_converter_service.rb:45:in `block in convert_tiff_to_pdf'
/opt/efolder-express/src/app/services/metrics_service.rb:13:in `block in record'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/benchmark.rb:293:in `measure'
/opt/efolder-express/src/app/services/metrics_service.rb:12:in `record'
/opt/efolder-express/src/app/services/image_converter_service.rb:42:in `convert_tiff_to_pdf'
/opt/efolder-express/src/app/services/image_converter_service.rb:66:in `convert'
/opt/efolder-express/src/app/services/image_converter_service.rb:12:in `process'
/opt/efolder-express/src/app/services/record_fetcher.rb:34:in `block in content_from_vbms'
/opt/efolder-express/src/app/services/metrics_service.rb:13:in `block in record'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/benchmark.rb:293:in `measure'
/opt/efolder-express/src/app/services/metrics_service.rb:12:in `record'
/opt/efolder-express/src/app/services/record_fetcher.rb:31:in `content_from_vbms'
/opt/efolder-express/src/app/services/record_fetcher.rb:15:in `process'
/opt/efolder-express/src/app/models/record.rb:36:in `fetch!'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activerecord-5.2.4.2/lib/active_record/relation/delegation.rb:71:in `each'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activerecord-5.2.4.2/lib/active_record/relation/delegation.rb:71:in `each'
/opt/efolder-express/src/app/jobs/v2/save_files_in_s3_job.rb:7:in `perform'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/execution.rb:39:in `block in perform_now'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:109:in `block in run_callbacks'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/sentry-raven-2.7.2/lib/raven/integrations/rails/active_job.rb:18:in `capture_and_reraise_with_sentry'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/sentry-raven-2.7.2/lib/raven/integrations/rails/active_job.rb:12:in `block (2 levels) in included'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `instance_exec'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/i18n-1.8.2/lib/i18n.rb:308:in `with_locale'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/translation.rb:9:in `block (2 levels) in <module:Translation>'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `instance_exec'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/logging.rb:26:in `block (4 levels) in <module:Logging>'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications.rb:168:in `block in instrument'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications/instrumenter.rb:23:in `instrument'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications.rb:168:in `instrument'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/logging.rb:25:in `block (3 levels) in <module:Logging>'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/logging.rb:46:in `block in tag_logger'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/tagged_logging.rb:71:in `block in tagged'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/tagged_logging.rb:28:in `tagged'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/tagged_logging.rb:71:in `tagged'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/logging.rb:46:in `tag_logger'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/logging.rb:22:in `block (2 levels) in <module:Logging>'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `instance_exec'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:136:in `run_callbacks'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/execution.rb:38:in `perform_now'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/execution.rb:18:in `perform_now'
(irb):17:in `irb_binding'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/workspace.rb:85:in `eval'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/workspace.rb:85:in `evaluate'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/context.rb:380:in `evaluate'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:491:in `block (2 levels) in eval_input'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:623:in `signal_status'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:488:in `block in eval_input'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/ruby-lex.rb:246:in `block (2 levels) in each_top_level_statement'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/ruby-lex.rb:232:in `loop'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/ruby-lex.rb:232:in `block in each_top_level_statement'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/ruby-lex.rb:231:in `catch'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/ruby-lex.rb:231:in `each_top_level_statement'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:487:in `eval_input'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:428:in `block in run'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:427:in `catch'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:427:in `run'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:383:in `start'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/commands/console/console_command.rb:64:in `start'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/commands/console/console_command.rb:19:in `start'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/commands/console/console_command.rb:96:in `perform'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/thor-0.20.3/lib/thor/command.rb:27:in `run'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/command/base.rb:69:in `perform'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/command.rb:46:in `invoke'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/commands.rb:18:in `<top (required)>'
bin/rails:4:in `require'
bin/rails:4:in `<main>'
Traceback (most recent call last):
       15: from (irb):17
       14: from app/jobs/v2/save_files_in_s3_job.rb:7:in `perform'
       13: from app/models/record.rb:36:in `fetch!'
       12: from app/services/record_fetcher.rb:15:in `process'
       11: from app/services/record_fetcher.rb:31:in `content_from_vbms'
       10: from app/services/metrics_service.rb:12:in `record'
        9: from app/services/metrics_service.rb:13:in `block in record'
        8: from app/services/record_fetcher.rb:34:in `block in content_from_vbms'
        7: from app/services/image_converter_service.rb:12:in `process'
        6: from app/services/image_converter_service.rb:66:in `convert'
        5: from app/services/image_converter_service.rb:42:in `convert_tiff_to_pdf'
        4: from app/services/metrics_service.rb:12:in `record'
        3: from app/services/metrics_service.rb:13:in `block in record'
        2: from app/services/image_converter_service.rb:45:in `block in convert_tiff_to_pdf'
        1: from app/services/image_converter_service.rb:51:in `block (2 levels) in convert_tiff_to_pdf'
HTTPClient::KeepAliveDisconnected (HTTPClient::KeepAliveDisconnected: Broken pipe)

We are wondering whether the tiff-to-pdf service is broken. Investigation in https://dsva.slack.com/archives/CAM9FJ85P/p1589485314027300

pkarman · 2020-05-22T16:09:01Z

Update: yes, TIFF service in EE was broken.

Still not the case that Caseflow Reader uses it, afaik.

yoomlam · 2020-05-22T16:16:25Z

Support ticket came up about this issue again: https://dsva.slack.com/archives/CHX8FMP28/p1590085212373400

yoomlam · 2020-05-22T17:06:11Z

I have a sequence of actions that enable Reader to display TIFF as PDF (almost all the time) based on this code:

# Refresh Reader at https://appeals.cf.ds.va.gov/reader/appeal/4025589/documents/13466417
# Get "Unable to load document" error

# In Certification console
doc=Document.find(13466417)
vbms_doc_id=doc.vbms_document_id
RequestStore.store[:application]="reader"
doc.content_url
=> "https://efolder.cf.ds.va.gov/api/v2/records/7F45E2D6-6060-46F3-AFAA-041D666694AF"
# Go to that doc.content_url in the browser, and it downloads the file as a TIFF
# doc.content_url is used by Reader's PDF.js

# In eFolder Express console
vbms_doc_id="{7F45E2D6-6060-46F3-AFAA-041D666694AF}"
record=Record.find_by(version_id: vbms_doc_id)
# check if conversion worked in the past
record.conversion_status
content=record.service.v2_fetch_document_file(record)
content=ImageConverterService.new(image: content, record: record).process
# If "conversion_success", then store file in S3 for Reader to retrieve.
S3Service.store_file(record.s3_filename, content) if record.conversion_status=="conversion_success"

# I don't know why this is necessary: 
#     Refresh the browser at doc.content_url; browser downloads file as a PDF
# Refresh Reader and it shows the pdf

Note the download button (near the top-right corner) within Reader may still download the file as TIFF.

To do a mass conversion, may want to query for record.conversion_status: "not_converted" and record.mime_type: "image/tiff". Something like:

record.manifest_source.records.count
record.manifest_source.records.where(mime_type: "image/tiff", conversion_status: "not_converted").count
retryRecords=record.manifest_source.records.where(mime_type: "image/tiff", conversion_status: "not_converted")
retryRecords.map{|record|
  content=record.service.v2_fetch_document_file(record)
  content=ImageConverterService.new(image: content, record: record).process
  S3Service.store_file(record.s3_filename, content) if record.conversion_status=="conversion_success"
}

pkarman · 2020-05-22T18:36:36Z

@yoomlam that's good stuff. I would suggest working it into this https://github.com/department-of-veterans-affairs/appeals-deployment/issues/2718

yoomlam · 2020-06-03T00:21:01Z

Some more info as I'm digging into a related ticket #14298.
In Reader's Document View page, DocumentController#pdf is called for the current, next, and previous documents. (Note this is not the same Reader::DocumentsController used for Reader's Document List page.)

DocumentController#pdf will

serve up the pdf file from directory /tmp/pdfs/. The pdf could come from 3 places:

Currently three levels of caching. Try to serve content
from memory, then look to S3 if it's not in memory, and
if it's not in S3 grab it from VBMS
Log where we get the file from for now for easy verification
of S3 integration.
add a Rails log "File #{vbms_document_id} fetched from VBMS" if it did so

So if the document is not in S3 and comes from VVA, then Reader won't be able to show it.

A RetrieveDocumentsForReaderJob caches documents in S3

According to serverless.yml, this job runs every 5 minutes for active Reader users
For up to 5 users whose last_login_at >= 1.week.ago and whose user.efolder_documents_fetched_at == nil or <= 24.hours.ago, get Legacy and AMA appeals the users are assigned to, then do appeal.document_fetcher.find_or_create_documents! -- same as on Reader's Document List page.

When developing a solution, we should also consider that these S3 documents are auto-deleted after 5 days -- Slack convo.

nanotone added Product: caseflow-reader Product: caseflow-eFolder-Express Stakeholder: BVA Functionality associated with the Board of Veterans' Appeals workflows/feature requests Source: Bat Team labels May 5, 2020

nanotone changed the title ~~Add support for TIFF documents in Reader/eFolder~~ Fix support for TIFF documents in Reader/eFolder May 6, 2020

yoomlam mentioned this issue Jun 3, 2020

Investigate what occurs when Reader is refreshed #14298

Closed

2 tasks

hschallhorn mentioned this issue Jun 11, 2020

Write tech spec to solve mismatch in reader/vbms documents #14518

Closed

8 tasks

ThorntonMatthew closed this as completed Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix support for TIFF documents in Reader/eFolder #14193

Fix support for TIFF documents in Reader/eFolder #14193

nanotone commented May 5, 2020

pkarman commented May 5, 2020 •

edited

nanotone commented May 6, 2020

pkarman commented May 6, 2020

nanotone commented May 6, 2020

pkarman commented May 14, 2020

pkarman commented May 22, 2020

yoomlam commented May 22, 2020

yoomlam commented May 22, 2020 •

edited

pkarman commented May 22, 2020

yoomlam commented Jun 3, 2020 •

edited

Fix support for TIFF documents in Reader/eFolder #14193

Fix support for TIFF documents in Reader/eFolder #14193

Comments

nanotone commented May 5, 2020

Description

Background/context/resources

Technical notes

pkarman commented May 5, 2020 • edited

nanotone commented May 6, 2020

pkarman commented May 6, 2020

nanotone commented May 6, 2020

pkarman commented May 14, 2020

pkarman commented May 22, 2020

yoomlam commented May 22, 2020

yoomlam commented May 22, 2020 • edited

pkarman commented May 22, 2020

yoomlam commented Jun 3, 2020 • edited

pkarman commented May 5, 2020 •

edited

yoomlam commented May 22, 2020 •

edited

yoomlam commented Jun 3, 2020 •

edited