-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: include text from shapes in docx #2510
Conversation
I've settled on this xpath for now: It'll find first occurence of
I've added two example documents and one test for each. Also this was reported as a bug but I wonder if this should be considered a fix or a feature, what's your take? I'll update changelog accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few remarks :)
Hi @ds-filipknefel that XPath expression looks good to me. I think the trailing Btw, to understand the possible parent-child relationships in the XML you can consult the XML Schema for Open XML here: https://github.com/python-openxml/python-docx/blob/master/ref/xsd/wml.xsd. That said I feel pretty confident this XPath will do just what we want. I'd call this a feature, albeit a small one. This is extending the range of what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think this is good to go. One remark on test that can now be simplified.
If we're ready, you can change it from Draft to "ready" status and let me know and I'll approve it :)
3fe73be
to
b9c534a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Amazingly compact :)
One small nit but approving in advance.
Reported bug: Text from docx shapes is not included in the
partition
output.Fix: Extend docx partition to search for text tags nested inside structures responsible for creating the shape.