Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for TIKA-2347 Adds underline extraction from word documents #173

Closed
wants to merge 1 commit into from
Closed

Conversation

stuarthendren
Copy link

Extracts underline for both doc and docx and assigns tag .
Given lowest nesting among style tags.
Adds tests using testWORD_various.doc and testWord_various.docx
Updates affected output in other WordParserTests.

Extracts underline for both doc and docx and assigns tag <u>.
Given lowest nesting among style tags.
Adds tests using testWORD_various.doc and testWord_various.docx
Updates affected output in other WordParserTests.
@darkdreamingdan
Copy link

darkdreamingdan commented Sep 16, 2017

Could you also add strikethrough support? It's just the same thnig but using the <strike> xhtml element. We have our own branch for this code but it would be good to unify our PRs.

Also, any news on this getting merged?

@dameikle dameikle self-assigned this Nov 23, 2017
@dameikle
Copy link
Member

Merged into master now with included strikethrough support for docx. Thanks @stuarthendren!

@dameikle dameikle closed this Nov 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants