Description
(please help me to choose good subject)
It's not easy to explain why this bug happens. I will try:
Here is current code (cutted):
class _DNSCacheTable:
def add(self, host, addrs):
self._addrs[host] = addrs
self._addrs_rr[host] = cycle(addrs)
def next_addrs(self, host):
# Return an iterator that will get at maximum as many addrs
# there are for the specific host starting from the last
# not itereated addr.
return islice(self._addrs_rr[host], len(self._addrs[host]))another place (very simplified):
@asyncio.coroutine
def _create_direct_connection(....):
hosts = dns_cache.next_addrs()
for hinfo in hosts:
host = hinfo['host']
transp, proto = yield from self._wrap_create_connection( # YIELD FROM IS CRUSHIALSuppose now, that _create_direct_connection is run in parallel. Since dns_cache.next_addrs() returns islice which points to cycle which points to internal state, it's state IS REUSED in parallel instances of _create_direct_connection. This lead to wrong results returned by islice.
And now, let's try to reconstruct situation in test program:
from itertools import cycle, islice
import random
import asyncio
addrs = ['a','b','c','d']
rr_addrs = cycle(addrs)
async def cccc(ident):
await asyncio.sleep(random.random())
hosts = islice(rr_addrs, len(addrs))
for i in hosts:
print(ident, i)
await asyncio.sleep(random.random())
async def amain():
await asyncio.wait([cccc(i) for i in range(5)])
loop = asyncio.get_event_loop()
loop.run_until_complete(amain())and run it:
./qwe.py | sort
0 a
0 a
0 b
0 c
1 a
1 b
1 d
1 d
2 a
2 b
2 d
2 d
3 b
3 b
3 c
3 d
4 a
4 c
4 c
4 c
As you can see, it's not what author expect! No one instance received a,b,c and d!
How did I come to this bug? I was just doing parallel requests to Dropbox. The resolver gives me IPv6 address as well as IPv4. But my system does not have IPv6 and so connect() reports EUNREACH for IPv6 addresses. In my case sometimes islice gave only IPv6 addresses! and whole request failed.
How to cure?
simpliest solution is to return list(islice(....))
Except for my problem, there is two more:
- uneven distribution
- race-conditions with
DNSCace::remove()andDNSCache.clean()since the operate on "shared between green threads" state.