Skip to content

WAT extractor: do not extract page title from embedded SVG images#37

Merged
sebastian-nagel merged 2 commits into
masterfrom
ia-web-commons-36-title-embedded-svg
Oct 18, 2024
Merged

WAT extractor: do not extract page title from embedded SVG images#37
sebastian-nagel merged 2 commits into
masterfrom
ia-web-commons-36-title-embedded-svg

Conversation

@sebastian-nagel
Copy link
Copy Markdown

Address #36:

  • do not use <title> elements embedded in <svg> as page/document title
  • use the first non-empty <title> element to set the page/document title. This is required for documents where the <title> is not enclosed in the <head> element.
    Note: HTML5 allows the <head> element to be ommitted, see https://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#optional-tags
  • overwrite the page/document title by the content of a <title> element inside the <head> element
  • for text extraction: define the title element as block element
  • add unit test that correct title is extracted from a document which includes an embedded SVG image containing a title element
  • extend existing unit tests to test for proper title extraction

- add unit test that correct title is extracted from a document
  which includes an embedded SVG image containing a title element
- extend existing unit tests to test for proper title extraction
- do not use <title> elements embedded in <svg> as page/document title
- use the first non-empty <title> element to set the page/document
  title. This is required for documents where the <title> is not
  enclosed in the <head> element. Note: HTML5 allows the <head> element
  to be ommitted, see
   https://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#optional-tags
- overwrite the page/document title by the content of a <title> element
  inside the <head> element
- for text extraction: define the title element as block element
@sebastian-nagel sebastian-nagel merged commit da324f9 into master Oct 18, 2024
@sebastian-nagel sebastian-nagel deleted the ia-web-commons-36-title-embedded-svg branch October 18, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant