Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"URL couldn't be processed: %s" during callinf of find_date() #53

Closed
HubLubas opened this issue May 28, 2022 · 6 comments
Closed

"URL couldn't be processed: %s" during callinf of find_date() #53

HubLubas opened this issue May 28, 2022 · 6 comments
Labels
question Further information is requested

Comments

@HubLubas
Copy link

I got a problem with exctracting date from website.
date = find_date('https://uk.investing.com/news/astrazeneca-earnings-revenue-beat-in-q4-2582731');

I got such an error:

ValueError Traceback (most recent call last)
in ()
----> 1 date = find_date('https://uk.investing.com/news/astrazeneca-earnings-revenue-beat-in-q4-2582731');

1 frames
/usr/local/lib/python3.7/dist-packages/htmldate/core.py in find_date(htmlobject, extensive_search, original_date, outputformat, url, verbose, min_date, max_date)
598 if verbose is True:
599 logging.basicConfig(level=logging.DEBUG)
--> 600 tree = load_html(htmlobject)
601 find_date.extensive_search = extensive_search
602 min_date, max_date = get_min_date(min_date), get_max_date(max_date)

/usr/local/lib/python3.7/dist-packages/htmldate/utils.py in load_html(htmlobject)
165 # log the error and quit
166 if htmltext is None:
--> 167 raise ValueError("URL couldn't be processed: %s", htmlobject)
168 # start processing
169 tree = None

ValueError: ("URL couldn't be processed: %s", 'https://uk.investing.com/news/astrazeneca-earnings-revenue-beat-in-q4-2582731')

I will be gratefull for any support and help with this.

@adbar
Copy link
Owner

adbar commented Jun 1, 2022

Hi @HubLubas, I cannot reproduce the bug:

>>> from htmldate import find_date
>>> find_date('https://uk.investing.com/news/astrazeneca-earnings-revenue-beat-in-q4-2582731')
'2022-02-10'

Are you using the last version? Which system are you on?

@HubLubas
Copy link
Author

HubLubas commented Jun 2, 2022

Hi @adbar

I'm writing in Jupyter notebook on Google Collab.
htmldate 1.2.1 /usr/local/lib/python3.7/dist-packages pip

@adbar
Copy link
Owner

adbar commented Jun 2, 2022

I cannot reproduce it, it works for me:

Screenshot_2022-06-02_17-39-43

@adbar adbar added the question Further information is requested label Jun 2, 2022
@HubLubas
Copy link
Author

HubLubas commented Jun 2, 2022

I found the reason of problem.
I changed from
!pip install htmldate
to
!pip install -U htmldate
and now it works.
Thank you @adbar for answering the issue!

@adbar adbar closed this as completed Jun 2, 2022
@jifan-chen
Copy link

I am facing the same issue again:
image

@adbar
Copy link
Owner

adbar commented Jan 6, 2023

Hi @jifan-chen, an error is raised because the web page couldn't be downloaded.

You can try to use another download utility or an archived version of the page. Then you can use this pre-existing HTML file as input to the find_date() function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants