You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On some documents Trafilatura 0.6.0 fails with this error:
... File "/usr/local/Caskroom/miniconda/base/envs/myenv/lib/python3.7/site-packages/trafilatura/external.py", line 123, in sanitize_tree tree = prune_html(tree) File "/usr/local/Caskroom/miniconda/base/envs/myenv/lib/python3.7/site-packages/trafilatura/htmlprocessing.py", line 63, in prune_html element.drop_tree() AttributeError: 'lxml.etree._Element' object has no attribute 'drop_tree'
A git blame on this line reveals this is new code that has been made 21 days ago in this revision: 74444d2
Note: I am using the latest version of lxml (4.6.1)
The text was updated successfully, but these errors were encountered:
Further investigation reveals there might be 2 type of elements within the lxml library:
lxml.etree._Element (generated here by Trafilatura but that doesn't have the drop_tree method)
lxml.html.HtmlElement (that is not used here but has the drop_tree method called by Trafilatura)
On some documents Trafilatura 0.6.0 fails with this error:
... File "/usr/local/Caskroom/miniconda/base/envs/myenv/lib/python3.7/site-packages/trafilatura/external.py", line 123, in sanitize_tree tree = prune_html(tree) File "/usr/local/Caskroom/miniconda/base/envs/myenv/lib/python3.7/site-packages/trafilatura/htmlprocessing.py", line 63, in prune_html element.drop_tree() AttributeError: 'lxml.etree._Element' object has no attribute 'drop_tree'
A git blame on this line reveals this is new code that has been made 21 days ago in this revision:
74444d2
Note: I am using the latest version of lxml (4.6.1)
The text was updated successfully, but these errors were encountered: