Skip to content

Fix DOCTYPE is disallowed error in NCBI taxonomy parsing#623

Open
vagisha wants to merge 1 commit intorelease26.3-SNAPSHOTfrom
26.3_fb_panoramapublic-ncbi-doctype-parsing
Open

Fix DOCTYPE is disallowed error in NCBI taxonomy parsing#623
vagisha wants to merge 1 commit intorelease26.3-SNAPSHOTfrom
26.3_fb_panoramapublic-ncbi-doctype-parsing

Conversation

@vagisha
Copy link
Copy Markdown
Collaborator

@vagisha vagisha commented Apr 8, 2026

Rationale

NcbiUtils.getScientificNames throws an exception on NCBI eSummary lookup because the response begins with a <!DOCTYPE> declaration that XmlBeansUtil.DOCUMENT_BUILDER_FACTORY rejects. The exception:
SAXParseException: DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true

Related Pull Requests

Changes

  • Switched to the new XmlBeansUtil.DOCUMENT_BUILDER_FACTORY_ALLOWING_DOCTYPE
  • Added a unit test

…tils.getScientificNames threw PxException on NCBI eSummary lookup because the response begins with a <!DOCTYPE> declaration that

XmlBeansUtil.DOCUMENT_BUILDER_FACTORY rejects. Switch to the new DOCUMENT_BUILDER_FACTORY_ALLOWING_DOCTYPE.
- Added unit test
private static Map<Integer, String> parseScientificNames(InputStream in)
throws ParserConfigurationException, SAXException, IOException
{
Document doc = getDocumentBuilder().parse(in);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to suppress this as a false positive if it's still showing up post-merge.

private static Map<Integer, String> parseScientificNames(InputStream in)
throws ParserConfigurationException, SAXException, IOException
{
Document doc = getDocumentBuilder().parse(in);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to suppress this as a false positive if it's still showing up post-merge.

@labkey-jeckels
Copy link
Copy Markdown
Contributor

labkey-jeckels pushed a commit to LabKey/platform that referenced this pull request Apr 9, 2026
#### Rationale
`XmlBeansUtil.DOCUMENT_BUILDER_FACTORY` sets
`disallow-doctype-decl=true` for XXE protection, which causes parsers to
fail on any XML with a `<!DOCTYPE>` declaration. This is a problem for
the Panorama Public code that parses NCBI's `esummary.fcgi` response
that begins with `<!DOCTYPE eSummaryResult PUBLIC ... esummary-v1.dtd>`

#### Related Pull Requests
- LabKey/MacCossLabModules#605
- LabKey/MacCossLabModules#623

#### Changes
- Added `DOCUMENT_BUILDER_FACTORY_ALLOWING_DOCTYPE` to `XmlBeansUtil`,
mirroring the existing `SAX_PARSER_FACTORY_ALLOWING_DOCTYPE`. The
DOCTYPE declaration is permitted, but every other XXE mitigation stays
in place.
- Extracted a private `documentBuilderFactory(boolean allowDocType)`
helper, mirroring the existing `saxParserFactory(boolean)` helper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants