Skip to content

fix: page_number appears in partition_html metadata if include_metadata=False#658

Merged
MthwRobinson merged 4 commits into
Unstructured-IO:mainfrom
Coniferish:jj/test_metadata
May 30, 2023
Merged

fix: page_number appears in partition_html metadata if include_metadata=False#658
MthwRobinson merged 4 commits into
Unstructured-IO:mainfrom
Coniferish:jj/test_metadata

Conversation

@Coniferish
Copy link
Copy Markdown
Contributor

Summary

Fixes bug from #592
Creates test for partitioning html from filename when metadata=False
Creates _remove_element_metadata method

Testing

from unstructured.partition.html import partition_html

filename = "example-docs/example-10k.html"
elements = partition_html(filename=filename, include_metadata=False)
elements[-1].metadata

Copy link
Copy Markdown
Contributor

@MthwRobinson MthwRobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good. Just one small question. Could you also add a CHANGELOG entry?

Comment thread unstructured/partition/common.py Outdated
@Coniferish Coniferish marked this pull request as draft May 30, 2023 20:19
Copy link
Copy Markdown
Contributor

@MthwRobinson MthwRobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for the contribution!

@Coniferish Coniferish marked this pull request as ready for review May 30, 2023 20:33
@MthwRobinson MthwRobinson enabled auto-merge (squash) May 30, 2023 20:39
@MthwRobinson MthwRobinson merged commit c78c5b6 into Unstructured-IO:main May 30, 2023
@Coniferish Coniferish deleted the jj/test_metadata branch June 1, 2023 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants