Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on "urllib.request import urlopen" from Chapter01_BeginningToScrape.ipynb #91

Open
efebuyuk opened this issue Aug 12, 2020 · 2 comments

Comments

@efebuyuk
Copy link

Hi,

I am getting below error after the code

`from urllib.request import urlopen

html = urlopen('http://pythonscraping.com/pages/page1.html')`

`Traceback (most recent call last):
File "C:\Anaconda3\envs\py38\lib\http\client.py", line 871, in _get_hostport
port = int(host[i+1:])
ValueError: invalid literal for int() with base 10: 'port'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 1379, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 1319, in do_open
h = http_class(host, timeout=req.timeout, **http_conn_args)
File "C:\Anaconda3\envs\py38\lib\http\client.py", line 833, in init
(self.host, self.port) = self._get_hostport(host, port)
File "C:\Anaconda3\envs\py38\lib\http\client.py", line 876, in _get_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: 'port'`

I am using the latest version of Python (3.8.5). What could be the problem?

Thank you.

@CrustyBarnacle
Copy link

~ bpython
bpython version 0.18 on top of Python 3.8.5 /usr/bin/python3
>>> from urllib.request import urlopen
>>> response = urlopen('http://pythonscraping.com/pages/page1.html')
>>> response
<http.client.HTTPResponse object at 0x7f196406b850>
>>> 

And read the data:

>>> data = response.read().decode('utf-8')
>>> data
'<html>\n<head>\n<title>A Useful Page</title>\n</head>\n<body>\n<h1>An Interesting Title</h1>\n<div>\nLorem ipsum dolor sit amet, consectetur adipisic
ing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi u
t aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur si
nt occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n</div>\n</body>\n</html>\n'
>>>

@miroslavsavel
Copy link

Try this:

import urllib.request request_url = urllib.request.urlopen('https://www.pythonscraping.com/pages/page1.html') print(request_url.read())

Read here:
https://www.geeksforgeeks.org/python-urllib-module/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants