-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't detect bullet points within tables #335
Comments
I agree, it's uncommon to find bullet points in tables but it would indeed be a useful addition. |
Do you feel this could be related to #318? |
If the parent element is a table it's a duplicate issue, otherwise it's a problem with the extraction of nested elements. |
See this site uses bullet points in tables a lot. https://www.spotify.com/in-en/legal/privacy-policy/ |
I see that the first case now appears to be solved. The second one can be addressed by focusing on recall: |
Site: https://stackoverflow.com/legal/privacy-policy#:~:text=We%20will%20only%20process%20your,be%20shared%20with%20other%20parties.
For one of the tables, it has a list within a cell, this content gets missed out.
This is what trafilatura generated
<row> <cell> <p>Marketing our services and those of selected third parties to:</p> </cell> <cell>For our legitimate interests or those of a third party, i.e., to promote our business to existing and former customers</cell> </row>
Hope this helps, thanks
The text was updated successfully, but these errors were encountered: