Skip to content

Unable to extract text inside <code> tag #1302

@FernandoSSI

Description

@FernandoSSI

Bug

When I try to extract the content of an API documentation page using docling, I realized that it cannot extract the text that is inside the <code> tag in html.

Steps to reproduce

url: https://api.slack.com/methods/conversations.acceptSharedInvite

The <code> blocks in the following image, which contain the path of the http request, cannot be extracted in the result markdown:
Image

the result markdown of this part in the above picture is:
Image

Docling version

Version: 2.28.4

Python version

Version: 3.12.15

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthtmlissue related to html backend

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions