New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNSCache BUG #2620

Closed
socketpair opened this Issue Dec 24, 2017 · 2 comments

Comments

Projects
None yet
3 participants
@socketpair
Contributor

socketpair commented Dec 24, 2017

(please help me to choose good subject)

It's not easy to explain why this bug happens. I will try:

Here is current code (cutted):

class _DNSCacheTable:
    def add(self, host, addrs):
        self._addrs[host] = addrs
        self._addrs_rr[host] = cycle(addrs)
    def next_addrs(self, host):
        # Return an iterator that will get at maximum as many addrs
        # there are for the specific host starting from the last
        # not itereated addr.
        return islice(self._addrs_rr[host], len(self._addrs[host]))

another place (very simplified):

    @asyncio.coroutine
    def _create_direct_connection(....):
        hosts = dns_cache.next_addrs()
        for hinfo in hosts:
            host = hinfo['host']
            transp, proto = yield from self._wrap_create_connection(   # YIELD FROM IS CRUSHIAL

Suppose now, that _create_direct_connection is run in parallel. Since dns_cache.next_addrs() returns islice which points to cycle which points to internal state, it's state IS REUSED in parallel instances of _create_direct_connection. This lead to wrong results returned by islice.

And now, let's try to reconstruct situation in test program:

from itertools import cycle, islice
import random
import asyncio

addrs = ['a','b','c','d']
rr_addrs = cycle(addrs)

async def cccc(ident):
    await asyncio.sleep(random.random())
    hosts = islice(rr_addrs, len(addrs))
    for i in hosts:
        print(ident, i)
        await asyncio.sleep(random.random())

async def amain():
    await asyncio.wait([cccc(i) for i in range(5)])

loop = asyncio.get_event_loop()
loop.run_until_complete(amain())

and run it:

./qwe.py | sort
0 a
0 a
0 b
0 c
1 a
1 b
1 d
1 d
2 a
2 b
2 d
2 d
3 b
3 b
3 c
3 d
4 a
4 c
4 c
4 c

As you can see, it's not what author expect! No one instance received a,b,c and d!

How did I come to this bug? I was just doing parallel requests to Dropbox. The resolver gives me IPv6 address as well as IPv4. But my system does not have IPv6 and so connect() reports EUNREACH for IPv6 addresses. In my case sometimes islice gave only IPv6 addresses! and whole request failed.

How to cure?
simpliest solution is to return list(islice(....))

Except for my problem, there is two more:

  1. uneven distribution
  2. race-conditions with DNSCace::remove() and DNSCache.clean() since the operate on "shared between green threads" state.

socketpair added a commit that referenced this issue Dec 24, 2017

socketpair added a commit that referenced this issue Dec 24, 2017

@pfreixes

This comment has been minimized.

Contributor

pfreixes commented Dec 24, 2017

Well spotted, this is a corner case tha was not take into consideration. However, the outcome of your test matches with the desired implementation that considered a full round robin between all of the ongoing tasks to create connections, indeed the addrs used at the end of your test are in terms of how many, the same per addr.

The fix should still meet IMHO the requirement which try to use as a first option for the incomming task the next addr avoiding use the same addr and using the list only when some host is unreachable.

socketpair added a commit that referenced this issue Dec 25, 2017

socketpair added a commit that referenced this issue Dec 25, 2017

socketpair added a commit that referenced this issue Dec 26, 2017

socketpair added a commit that referenced this issue Dec 26, 2017

socketpair added a commit that referenced this issue Dec 26, 2017

socketpair added a commit that referenced this issue Dec 26, 2017

@socketpair socketpair referenced this issue Dec 26, 2017

Merged

Fix DNSCache race-condition #2621

5 of 5 tasks complete

socketpair added a commit that referenced this issue Dec 27, 2017

socketpair added a commit that referenced this issue Dec 27, 2017

asvetlov added a commit that referenced this issue Dec 27, 2017

@asvetlov

This comment has been minimized.

Member

asvetlov commented Dec 27, 2017

Fixed by #2621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment