Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

make tika CLI similar to parser.from_file #329

Closed
wants to merge 1 commit into from

Conversation

vedal
Copy link

@vedal vedal commented Dec 1, 2020

Thanks a lot for tika-python. its fast and awesome! 馃

I suggest the following change to make the command line tool
$ tika-python parse all file.pdf
behave more similarly to inline python function
tika.parser.from_file("file.pdf", service='all')

Currently, the command line tool produces content in XHTML by default, while the inline function produces plain text with an option to set argument xmlContent=True (False by default), which was unexpected. It is unclear how to specify plaintext output for the command line tool otherwise.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 47.645% when pulling e2e602a on vedal:master into d692c0f on chrismattmann:master.

@chrismattmann
Copy link
Owner

this is an interesting suggestion thank you @vedal. In the PR, I would rather not just comment out the old code, I'd rather see a clean update. Also how do you think this will affect back compat with other users? I'm going to schedule this for tika-next (the post release milestone for discussion). Thanks.

@chrismattmann
Copy link
Owner

Will close this for now. If you desire this as I mentioned, need a better patch and some doco explaining for users. Thanks for the idea though @vedal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants