Skip to content

Improve content text extraction#40

Merged
freddyheppell merged 1 commit intodevfrom
text-exclude-tables
Aug 6, 2024
Merged

Improve content text extraction#40
freddyheppell merged 1 commit intodevfrom
text-exclude-tables

Conversation

@freddyheppell
Copy link
Copy Markdown
Member

@freddyheppell freddyheppell commented Aug 6, 2024

  • Fix bug where only the first element to be excluded from post content was removed
    • Changed to use el.decompose() instead of el.extract() because this more robustly destroys the element
  • Add tables to the list of elements which should be excluded when extracting post content
  • Improve the test to cover these changes

@freddyheppell freddyheppell merged commit 8e3260f into dev Aug 6, 2024
@freddyheppell freddyheppell deleted the text-exclude-tables branch August 6, 2024 15:38
freddyheppell added a commit that referenced this pull request Aug 6, 2024
* Add package version attribute (#36)

* Add version attribute to package

* Revert "Hotfix: remove usage of __version__ in docs (#35)"

This reverts commit 641375a.

* add contributing guidelines (#37)

* Add ref to langcodes docs (#38)

* add manual ref to Language class

* fix footnote in start

* make opening to multilingual docs clearer

* Fix element exclusion in text extraction (#40)

* Prepare 1.0.3 release (#41)

* prepare 1.0.3

* fix changelog sections
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant