Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in xml #681

Closed
Honesty-of-the-Cavernous-Tissue opened this issue Aug 26, 2024 · 3 comments · Fixed by #685
Closed

ValueError in xml #681

Honesty-of-the-Cavernous-Tissue opened this issue Aug 26, 2024 · 3 comments · Fixed by #685
Labels
bug Something isn't working

Comments

@Honesty-of-the-Cavernous-Tissue
Copy link

Honesty-of-the-Cavernous-Tissue commented Aug 26, 2024

trafilatura: 1.12.1

raise by: https://raw.githubusercontent.com/Honesty-of-the-Cavernous-Tissue/trafilatura/master/tests/test.html

ValueError: invalid literal for int() with base 10: ''
from:

max_span = min(int(element.get("colspan") or element.get("span", 1)), 1000)

@adbar
Copy link
Owner

adbar commented Aug 26, 2024

I just edited your comment to replace the URL by the raw data, but I still cannot reproduce the bug with XML output, do you use particular options?

@adbar adbar added the feedback Feedback from users requested label Aug 26, 2024
@Honesty-of-the-Cavernous-Tissue
Copy link
Author

I just edited your comment to replace the URL by the raw data, but I still cannot reproduce the bug with XML output, do you use particular options?我刚刚编辑了您的评论,将 URL 替换为原始数据,但我仍然无法使用 XML 输出重现该错误,您是否使用特定选项?

sorry, i found out it's seems about the python version, my environment is 3.12.0, there's no error in 3.9.18

@adbar adbar added bug Something isn't working and removed feedback Feedback from users requested labels Aug 26, 2024
@adbar
Copy link
Owner

adbar commented Aug 26, 2024

My bad, the bug occurs when Trafilatura is used with Python, the CLI suppresses the error.

@adbar adbar linked a pull request Aug 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
@adbar @Honesty-of-the-Cavernous-Tissue and others