Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running Windows #44

Closed
dongnizh opened this issue Jun 2, 2015 · 15 comments
Closed

Error when running Windows #44

dongnizh opened this issue Jun 2, 2015 · 15 comments

Comments

@dongnizh
Copy link
Contributor

dongnizh commented Jun 2, 2015

@chrismattmann When I am running tika-python by "parsing a file" on Windows (actually a virtual windows), it shows like:
a6781832-2fb4-495a-a926-c2dea2467c1d
However, when you run cmd like "python tika.py config mime-types", it is working.
This is one link I found so far on this problem: https://github.com/kennethreitz/requests/issues/2364

Please have a look.

@chrismattmann
Copy link
Owner

thanks for filing this @dongnizh ! how are you doing in Seattle?

@dongnizh
Copy link
Contributor Author

dongnizh commented Jun 2, 2015

Hi, @chrismattmann Hope you are doing all good. Today is my first day at work.It is kind of different from what I did at school and I still need much more time to adapt to the new environment. ^_^

@chrismattmann
Copy link
Owner

hang in there and keep me posted @dongnizh

@dongnizh
Copy link
Contributor Author

dongnizh commented Jun 3, 2015

@chrismattmann Of course!!

@chrismattmann
Copy link
Owner

@dongnizh are we still seeing this error?

@chrismattmann
Copy link
Owner

More info on this, someone reported it to the Python httplib: https://bugs.python.org/issue23054 This is still open as of December 2014.

@chrismattmann
Copy link
Owner

overall this issue has to do with a PUT request being made in Windows. Since Tika Server uses PUT requests like everywhere this is causing the issue, only on Windows.

@chrismattmann
Copy link
Owner

See my comment on: http://bugs.python.org/issue23054

@chrismattmann
Copy link
Owner

@dongnizh
Copy link
Contributor Author

Hi, @chrismattmann will look into this.

@chrismattmann
Copy link
Owner

Thanks @dongnizh !

@chrismattmann
Copy link
Owner

Hey @dongnizh I think this error has to do with the fact that you have a bogus tika-server.jar in your temp folder. And for whatever reason on Windows it doesn't seem to remove the C:\Users\appdata\Local\Temp directory when you restart. If tika-server was downloaded incorrectly then it will remain there. If you delete that jar file, it will redownload it. However see two related problems in #54 that are affecting windows use.

@chrismattmann
Copy link
Owner

See my fix for #54 and #56 @dongnizh let me know if that fixes this. We could probably add some more robust code here in #44 to verify the downloaded tika-server jar against its sha1. For example, we could check in getRemoteFile if there is a corresponding .md5 file for the URL, if so, we could then test if the download was successful. For example, https://repo1.maven.org/maven2/org/apache/tika/tika-server/1.9/tika-server-1.9.jar.md5 exists, and we could then test it.

chrismattmann added a commit that referenced this issue Jul 21, 2015
this closes #44. Code to make downloading of the Tika Server jar use an MD5 to ensure we have the right Java file and that it was downloaded correctly. This will make sure that if there was e.g., HTTPS or HTTP issues, or if we got a corrupt jar, we have the ability to start the Tika server correctly.
@dongnizh
Copy link
Contributor Author

@chrismattmann Thanks for your update. Will try the new version on Windows and update the result later.

@chrismattmann
Copy link
Owner

Btw I implemented the md5 check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants