Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot download urls with Cyrillic letters and https protocol. #949

Closed
amorgun opened this issue Jul 1, 2016 · 9 comments
Closed

Cannot download urls with Cyrillic letters and https protocol. #949

amorgun opened this issue Jul 1, 2016 · 9 comments
Labels
Milestone

Comments

@amorgun
Copy link

amorgun commented Jul 1, 2016

My script:

import aiohttp
import asyncio


async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

if __name__ == '__main__':
    url = u'https://цфоут.мвд.рф/news/item/8065038/'
    loop = asyncio.get_event_loop()
    with aiohttp.ClientSession(loop=loop) as session:
        html = loop.run_until_complete(
            fetch(session, url))
        print(html)

It fails with the following error:

Exception in callback None
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/usr/lib/python3.5/asyncio/events.py", line 125, in _run
    self._callback(*self._args)
  File "/usr/lib/python3.5/asyncio/selector_events.py", line 671, in _read_ready
    self._protocol.data_received(data)
  File "/usr/lib/python3.5/asyncio/sslproto.py", line 492, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/usr/lib/python3.5/asyncio/sslproto.py", line 200, in feed_ssldata
    self._sslobj.do_handshake()
  File "/usr/lib/python3.5/ssl.py", line 633, in do_handshake
    match_hostname(self.getpeercert(), self.server_hostname)
  File "/usr/lib/python3.5/ssl.py", line 296, in match_hostname
    % (hostname, ', '.join(map(repr, dnsnames))))
ssl.CertificateError: hostname 'цфоут.мвд.рф' doesn't match either of '*.xn--b1aew.xn--p1ai', 'xn--b1aew.xn--p1ai'

Interestingly enough, string 'цфоут.мвд.рф' actually matches '*.xn--b1aew.xn--p1ai':

>>> 'цфоут.мвд.рф'.encode('idna').decode('utf8').endswith('.xn--b1aew.xn--p1ai')
True

Same script with requests:

# This works fine
import requests

if __name__ == '__main__':
    url = u'https://цфоут.мвд.рф/news/item/8065038/'
    print(requests.get(url).text)

Versions

$ python -V
Python 3.5.1
$ pip3 freeze
aiohttp==0.22.0a0
chardet==2.3.0
multidict==1.0.3
requests==2.10.0
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:    14.04
Codename:   trusty

What I think

I found similar question on SO, but setting verify_ssl=False looks like a pretty dangerous hack to me.

@minhoryang
Copy link

minhoryang commented Aug 15, 2016

This problem caused by https://hg.python.org/cpython/file/3.5/Lib/ssl.py#l381
Before this function called, server_hostname == 'xn--n1aiccj.xn--b1aew.xn--p1ai'.
But after it became server_hostname == 'цфоут.мвд.рф'. (I don't know why yet.)
I can't dig in more.

@minhoryang
Copy link

requests use wrap_socket() not wrap_bio() at requests.packages.urllib3.connection. HTTPSConnection

@fafhrd91
Copy link
Member

this is not aiohttp bug. please file python bug report.

@fafhrd91
Copy link
Member

python3.4 uses wrap_socket()

@asvetlov asvetlov reopened this Oct 25, 2017
@asvetlov
Copy link
Member

We should use url.raw_host
I'll make a PR soon.

@asvetlov asvetlov added this to the 3.0 milestone Oct 25, 2017
@hellysmile
Copy link
Member

Hey, for now at we have created small monkey patch https://github.com/wikibusiness/idna_ssl

Can You try is it helps for Your case?

@asvetlov
Copy link
Member

asvetlov commented Nov 7, 2017

CPython fix: python/cpython#3010

@asvetlov
Copy link
Member

asvetlov commented Feb 9, 2018

Fixed in aiohttp 2.3.10

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants