Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverflowError: signed integer is greater than maximum #49

Closed
Lucabenj opened this issue Dec 28, 2020 · 2 comments
Closed

OverflowError: signed integer is greater than maximum #49

Lucabenj opened this issue Dec 28, 2020 · 2 comments
Labels
bug Something isn't working

Comments

@Lucabenj
Copy link

Traceback (most recent call last):
File "indexer.py", line 53, in
content_trafilatura = trafilatura.extract(document, json_output=True, with_metadata=False, include_tables=False, deduplicate=True, include_comments=False)
File "/Users/luca/enviroments/3.7/lib/python3.7/site-packages/trafilatura/core.py", line 684, in extract
max_tree_size=max_tree_size, url_blacklist=url_blacklist
File "/Users/luca/enviroments/3.7/lib/python3.7/site-packages/trafilatura/core.py", line 586, in bare_extraction
docmeta = extract_metadata(tree, url, date_extraction_params)
File "/Users/luca/enviroments/3.7/lib/python3.7/site-packages/trafilatura/metadata.py", line 367, in extract_metadata
metadata['date'] = find_date(tree, **date_config)
File "/Users/luca/enviroments/3.7/lib/python3.7/site-packages/htmldate/core.py", line 605, in find_date
original_date, min_date, max_date)
File "/Users/luca/enviroments/3.7/lib/python3.7/site-packages/htmldate/core.py", line 124, in examine_header
headerdate = tryfunc(elem.get('content'))
File "/Users/luca/enviroments/3.7/lib/python3.7/site-packages/htmldate/extractors.py", line 385, in try_ymd_date
customresult = custom_parse(string, outputformat, extensive_search, min_date, max_date)
File "/Users/luca/enviroments/3.7/lib/python3.7/site-packages/htmldate/extractors.py", line 302, in custom_parse
result = parse_datetime_as_naive(string)
File "/Users/luca/enviroments/3.7/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1374, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/Users/luca/enviroments/3.7/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 655, in parse
ret = self._build_naive(res, default)
File "/Users/luca/enviroments/3.7/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1241, in _build_naive
naive = default.replace(**repl)
OverflowError: signed integer is greater than maximum

@adbar adbar added the bug Something isn't working label Dec 29, 2020
@adbar
Copy link
Owner

adbar commented Dec 29, 2020

Hi, thanks for the bug report, it happens during date extraction, I referenced it accordingly and will fix it with the next version of htmldate.

@Lucabenj
Copy link
Author

Lucabenj commented Dec 29, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants