Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix support for TIFF documents in Reader/eFolder #14193

Closed
nanotone opened this issue May 5, 2020 · 10 comments
Closed

Fix support for TIFF documents in Reader/eFolder #14193

nanotone opened this issue May 5, 2020 · 10 comments
Labels
Product: caseflow-eFolder-Express Product: caseflow-reader Source: Bat Team Stakeholder: BVA Functionality associated with the Board of Veterans' Appeals workflows/feature requests

Comments

@nanotone
Copy link
Contributor

nanotone commented May 5, 2020

Description

TIFF documents are sometimes included in the Reader documents for a case, and cannot currently be displayed by Reader's PDF viewer, PDF.js. See technical notes for some thoughts on implementation.

Background/context/resources

Bat Team thread

Related ticket on "corrupted" files in eFolder: #10504

Technical notes

There is some internet literature on getting PDF.js to display TIFF images embedded within the PDF, but displaying plain TIFF images (mimetype image/tiff) is out of scope for PDF.js and will likely never be implemented.

Rather than detecting the file type in Reader and using yet another frontend library to display TIFFs, a more straightforward approach may be to implement a PDF-to-TIFF converter in eFolder.

Because PDF-to-TIFF conversion is likely to be fraught and full of exotic edge cases, here is the output of /usr/bin/file on the header of one such problematic TIFF file seen in Reader:
TIFF image data, little-endian, direntries=19, height=3367, bps=1, compression=bi-level group 4, PhotometricIntepretation=WhiteIsZero, orientation=upper-left, width=2541

@nanotone nanotone added Product: caseflow-reader Product: caseflow-eFolder-Express Stakeholder: BVA Functionality associated with the Board of Veterans' Appeals workflows/feature requests Source: Bat Team labels May 5, 2020
@pkarman
Copy link
Contributor

pkarman commented May 5, 2020

The Caseflow efolder app has a TIFF-to-PDF converter already. Maybe we just need to expose that as an API?

https://github.com/department-of-veterans-affairs/caseflow-efolder/blob/master/app/services/image_converter_service.rb

Earlier thought by @enriquemanuel and I was making that into a lambda so it was not married to either app.

@nanotone
Copy link
Contributor Author

nanotone commented May 6, 2020

That's good news, although it does raise the question why Reader was getting TIFFs when convert_tiff_images is enabled for efolder prod. I'm not as familiar with that codebase but it looks like the conversion should already be done during Document#fetch_content!.

Perhaps this ticket should be reincarnated as a straight-up bug report in the efolder repo?

@nanotone nanotone changed the title Add support for TIFF documents in Reader/eFolder Fix support for TIFF documents in Reader/eFolder May 6, 2020
@pkarman
Copy link
Contributor

pkarman commented May 6, 2020

I believe that's because Reader does not reader from eFolder Express. It reads directly from VBMS efolder.

@nanotone
Copy link
Contributor Author

nanotone commented May 6, 2020

For the document that I was investigating in the Bat Team thread above, Reader was hitting eFolder Express's /api/v2/records/ID for the file itself. I was able to look at the TIFF headers by SSMing into eFolder prod and retracing some of the controller code for that record.

@pkarman
Copy link
Contributor

pkarman commented May 14, 2020

Researching related efolder INC we discovered imagemagick error example:

[ActiveJob] [V2::SaveFilesInS3Job] [3f4fda3c-1baa-4b96-a026-cb7bbb902689] [2020-05-14 15:27:06 -0400] STARTED ImageConverterService for {57B5D24F-0D8C-4B20-871A-B3452617CC93}.tiff
[ActiveJob] [V2::SaveFilesInS3Job] [3f4fda3c-1baa-4b96-a026-cb7bbb902689] [2020-05-14 15:27:06 -0400] STARTED Image Magick: Convert tiff to pdf
[ActiveJob] [V2::SaveFilesInS3Job] [3f4fda3c-1baa-4b96-a026-cb7bbb902689] [2020-05-14 15:27:07 -0400] RESCUED Image Magick: Convert tiff to pdf
[ActiveJob] [V2::SaveFilesInS3Job] [3f4fda3c-1baa-4b96-a026-cb7bbb902689] [2020-05-14 15:27:07 -0400] RESCUED ImageConverterService for {57B5D24F-0D8C-4B20-871A-B3452617CC93}.tiff
[ActiveJob] [V2::SaveFilesInS3Job] [3f4fda3c-1baa-4b96-a026-cb7bbb902689] [2020-05-14 15:27:07 -0400] Error performing V2::SaveFilesInS3Job (Job ID: 3f4fda3c-1baa-4b96-a026-cb7bbb902689) from Shoryuken(efolder_prod_low_priority) in 5243.6ms: HTTPClient::KeepAliveDisconnected (HTTPClient::KeepAliveDisconnected: Broken pipe):
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:524:in `rescue in query'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:514:in `query'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:177:in `query'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1242:in `do_get_block'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1019:in `block in do_request'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1138:in `rescue in protect_keep_alive_disconnected'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1131:in `protect_keep_alive_disconnected'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1014:in `do_request'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:856:in `request'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:765:in `post'
/opt/efolder-express/src/app/services/image_converter_service.rb:51:in `block (2 levels) in convert_tiff_to_pdf'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/tempfile.rb:295:in `open'
/opt/efolder-express/src/app/services/image_converter_service.rb:45:in `block in convert_tiff_to_pdf'
/opt/efolder-express/src/app/services/metrics_service.rb:13:in `block in record'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/benchmark.rb:293:in `measure'
/opt/efolder-express/src/app/services/metrics_service.rb:12:in `record'
/opt/efolder-express/src/app/services/image_converter_service.rb:42:in `convert_tiff_to_pdf'
/opt/efolder-express/src/app/services/image_converter_service.rb:66:in `convert'
/opt/efolder-express/src/app/services/image_converter_service.rb:12:in `process'
/opt/efolder-express/src/app/services/record_fetcher.rb:34:in `block in content_from_vbms'
/opt/efolder-express/src/app/services/metrics_service.rb:13:in `block in record'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/benchmark.rb:293:in `measure'
/opt/efolder-express/src/app/services/metrics_service.rb:12:in `record'
/opt/efolder-express/src/app/services/record_fetcher.rb:31:in `content_from_vbms'
/opt/efolder-express/src/app/services/record_fetcher.rb:15:in `process'
/opt/efolder-express/src/app/models/record.rb:36:in `fetch!'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activerecord-5.2.4.2/lib/active_record/relation/delegation.rb:71:in `each'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activerecord-5.2.4.2/lib/active_record/relation/delegation.rb:71:in `each'
/opt/efolder-express/src/app/jobs/v2/save_files_in_s3_job.rb:7:in `perform'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/execution.rb:39:in `block in perform_now'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:109:in `block in run_callbacks'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/sentry-raven-2.7.2/lib/raven/integrations/rails/active_job.rb:18:in `capture_and_reraise_with_sentry'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/sentry-raven-2.7.2/lib/raven/integrations/rails/active_job.rb:12:in `block (2 levels) in included'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `instance_exec'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/i18n-1.8.2/lib/i18n.rb:308:in `with_locale'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/translation.rb:9:in `block (2 levels) in <module:Translation>'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `instance_exec'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/logging.rb:26:in `block (4 levels) in <module:Logging>'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications.rb:168:in `block in instrument'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications/instrumenter.rb:23:in `instrument'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications.rb:168:in `instrument'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/logging.rb:25:in `block (3 levels) in <module:Logging>'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/logging.rb:46:in `block in tag_logger'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/tagged_logging.rb:71:in `block in tagged'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/tagged_logging.rb:28:in `tagged'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/tagged_logging.rb:71:in `tagged'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/logging.rb:46:in `tag_logger'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/logging.rb:22:in `block (2 levels) in <module:Logging>'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `instance_exec'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:118:in `block in run_callbacks'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:136:in `run_callbacks'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/execution.rb:38:in `perform_now'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.4.2/lib/active_job/execution.rb:18:in `perform_now'
(irb):17:in `irb_binding'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/workspace.rb:85:in `eval'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/workspace.rb:85:in `evaluate'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/context.rb:380:in `evaluate'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:491:in `block (2 levels) in eval_input'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:623:in `signal_status'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:488:in `block in eval_input'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/ruby-lex.rb:246:in `block (2 levels) in each_top_level_statement'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/ruby-lex.rb:232:in `loop'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/ruby-lex.rb:232:in `block in each_top_level_statement'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/ruby-lex.rb:231:in `catch'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb/ruby-lex.rb:231:in `each_top_level_statement'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:487:in `eval_input'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:428:in `block in run'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:427:in `catch'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:427:in `run'
/opt/rbenv/versions/2.5.3/lib/ruby/2.5.0/irb.rb:383:in `start'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/commands/console/console_command.rb:64:in `start'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/commands/console/console_command.rb:19:in `start'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/commands/console/console_command.rb:96:in `perform'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/thor-0.20.3/lib/thor/command.rb:27:in `run'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/command/base.rb:69:in `perform'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/command.rb:46:in `invoke'
/opt/efolder-express/src/vendor/bundle/ruby/2.5.0/gems/railties-5.2.4.2/lib/rails/commands.rb:18:in `<top (required)>'
bin/rails:4:in `require'
bin/rails:4:in `<main>'
Traceback (most recent call last):
       15: from (irb):17
       14: from app/jobs/v2/save_files_in_s3_job.rb:7:in `perform'
       13: from app/models/record.rb:36:in `fetch!'
       12: from app/services/record_fetcher.rb:15:in `process'
       11: from app/services/record_fetcher.rb:31:in `content_from_vbms'
       10: from app/services/metrics_service.rb:12:in `record'
        9: from app/services/metrics_service.rb:13:in `block in record'
        8: from app/services/record_fetcher.rb:34:in `block in content_from_vbms'
        7: from app/services/image_converter_service.rb:12:in `process'
        6: from app/services/image_converter_service.rb:66:in `convert'
        5: from app/services/image_converter_service.rb:42:in `convert_tiff_to_pdf'
        4: from app/services/metrics_service.rb:12:in `record'
        3: from app/services/metrics_service.rb:13:in `block in record'
        2: from app/services/image_converter_service.rb:45:in `block in convert_tiff_to_pdf'
        1: from app/services/image_converter_service.rb:51:in `block (2 levels) in convert_tiff_to_pdf'
HTTPClient::KeepAliveDisconnected (HTTPClient::KeepAliveDisconnected: Broken pipe)

We are wondering whether the tiff-to-pdf service is broken. Investigation in https://dsva.slack.com/archives/CAM9FJ85P/p1589485314027300

@pkarman
Copy link
Contributor

pkarman commented May 22, 2020

Update: yes, TIFF service in EE was broken.

Still not the case that Caseflow Reader uses it, afaik.

@yoomlam
Copy link
Contributor

yoomlam commented May 22, 2020

Support ticket came up about this issue again: https://dsva.slack.com/archives/CHX8FMP28/p1590085212373400

@yoomlam
Copy link
Contributor

yoomlam commented May 22, 2020

I have a sequence of actions that enable Reader to display TIFF as PDF (almost all the time) based on this code:

# Refresh Reader at https://appeals.cf.ds.va.gov/reader/appeal/4025589/documents/13466417
# Get "Unable to load document" error

# In Certification console
doc=Document.find(13466417)
vbms_doc_id=doc.vbms_document_id
RequestStore.store[:application]="reader"
doc.content_url
=> "https://efolder.cf.ds.va.gov/api/v2/records/7F45E2D6-6060-46F3-AFAA-041D666694AF"
# Go to that doc.content_url in the browser, and it downloads the file as a TIFF
# doc.content_url is used by Reader's PDF.js

# In eFolder Express console
vbms_doc_id="{7F45E2D6-6060-46F3-AFAA-041D666694AF}"
record=Record.find_by(version_id: vbms_doc_id)
# check if conversion worked in the past
record.conversion_status
content=record.service.v2_fetch_document_file(record)
content=ImageConverterService.new(image: content, record: record).process
# If "conversion_success", then store file in S3 for Reader to retrieve.
S3Service.store_file(record.s3_filename, content) if record.conversion_status=="conversion_success"

# I don't know why this is necessary: 
#     Refresh the browser at doc.content_url; browser downloads file as a PDF
# Refresh Reader and it shows the pdf

Note the download button (near the top-right corner) within Reader may still download the file as TIFF.

To do a mass conversion, may want to query for record.conversion_status: "not_converted" and record.mime_type: "image/tiff". Something like:

record.manifest_source.records.count
record.manifest_source.records.where(mime_type: "image/tiff", conversion_status: "not_converted").count
retryRecords=record.manifest_source.records.where(mime_type: "image/tiff", conversion_status: "not_converted")
retryRecords.map{|record|
  content=record.service.v2_fetch_document_file(record)
  content=ImageConverterService.new(image: content, record: record).process
  S3Service.store_file(record.s3_filename, content) if record.conversion_status=="conversion_success"
}

@pkarman
Copy link
Contributor

pkarman commented May 22, 2020

@yoomlam that's good stuff. I would suggest working it into this https://github.com/department-of-veterans-affairs/appeals-deployment/issues/2718

@yoomlam
Copy link
Contributor

yoomlam commented Jun 3, 2020

Some more info as I'm digging into a related ticket #14298.
In Reader's Document View page, DocumentController#pdf is called for the current, next, and previous documents. (Note this is not the same Reader::DocumentsController used for Reader's Document List page.)

DocumentController#pdf will

So if the document is not in S3 and comes from VVA, then Reader won't be able to show it.

A RetrieveDocumentsForReaderJob caches documents in S3

When developing a solution, we should also consider that these S3 documents are auto-deleted after 5 days -- Slack convo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Product: caseflow-eFolder-Express Product: caseflow-reader Source: Bat Team Stakeholder: BVA Functionality associated with the Board of Veterans' Appeals workflows/feature requests
Projects
None yet
Development

No branches or pull requests

4 participants