Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errno::EPIPE on certain png images #7

Closed
rywall opened this issue Aug 1, 2019 · 5 comments · Fixed by #9
Closed

Errno::EPIPE on certain png images #7

rywall opened this issue Aug 1, 2019 · 5 comments · Fixed by #9

Comments

@rywall
Copy link

rywall commented Aug 1, 2019

First off, great work on this gem. It works amazingly well 99.9% of the time.

I've encountered certain png images (one uploaded as test2.png) that produce an Errno::EPIPE error when calling .text. I would expect either an empty string (like test1.png) or a more intelligible error message from Henkei.

test1.png

test

> Henkei.new("test1.png").text
=> ""

test2.png

> Henkei.new("test2.png").text
Errno::EPIPE (Broken pipe)

test2

@abrom
Copy link
Owner

abrom commented Aug 1, 2019

Thanks, although I've just been updating Tika as new versions are released. The gem is mostly the work of https://github.com/yomurb/yomu (been inactive for some time).

Hmm so it seems like Tika is closing the pipe before Henkei has finished writing the image file.
Piping the file on the console works fine so it's something in either reading, or the writing of the file within Henkei, although I couldn't say what just yet.

Can Tika even extract text out of a PNG? Or is this just a repeatable use-case you've found?

One option would be to capture a pipe exception and handle it in a more gracious manner. Return nil? Raise some other exception? Hmm.. not a fan

This is certainly not the first time this issue has come up.. see yomurb#7 (unresolved)

I'll have a look to see if there is a better way to pipe the data into Tika, but open to suggestions

@rywall
Copy link
Author

rywall commented Aug 1, 2019

I'm actually not sure if Tika can extract text from a png or not. My app just tries to extract text from any uploaded file.

I agree that ideally the root of the problem would be fixed, but even just being able to rescue a Henkei::TikaError or something like that instead of Errno::EPIPE would be an improvement IMO.

Thoughts?

@abrom
Copy link
Owner

abrom commented Aug 2, 2019

interestingly, using the following in the client_read method results in just an empty string returned (expected):

    sh = Shell.new
    (sh.echo(data) | sh.system(tika_command(type))).to_s

although I need to do some more research into the differences between writing to a Ruby IO vs using Ruby Shell echo.

On the limited number of files I've tested it with, I get the expected results.

@abrom
Copy link
Owner

abrom commented Sep 19, 2019

@rywall did you get a chance to try my suggestion?

@rywall
Copy link
Author

rywall commented Sep 28, 2019

@abrom I've been using your suggestion in production for the past couple of weeks and it seems to be working great. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants