Conversation
I used single underscores rather than double underscore because double underscores do confusing name mangling that make it more difficult to use pdb, which I depend upon a lot. |
@richardkiss , see https://github.com/boramalper/magnetico/projects/1 . Here is the pending task of rewriting on asyncio. |
Hey @richardkiss, thanks a ton! Really. I'll review it when I'm done with my finals (one week left...) but in the meanwhile, can you check how it performs with uvloop? I expect some increase in hash throughput or decrease in CPU usage and I wonder if it will actually happen. |
I'm skeptical because although I expect uvloop to reduce CPU usage, magneticod doesn't seem to be CPU-bound. I'll give it a try though. |
Here is the experiment I run to measure CPU usage and torrent count:
(or you can rename this path if you want to preserve the data in it)
This runs magneticod for a half hour from a clean slate, then counts the torrents gathered. |
@richardkiss , see #60 for real speed tracking. |
OK, I have some measurements. v0.3.0:
asyncio:
uvloop:
|
We could also have dht keep a running total of bytes sent and received quite easily, since there is only one function for each ( |
For benchmarking, it's important to choose a unique random port for each trial, otherwise the state of the global DHT will impact the results. |
I haven't been specifying on the command-line, so the default of 0.0.0.0:0 is being used. I suspect that uses a random port. |
You'd hope, but uvloop coming in last makes me wonder. It definitely increased my throughput with mana... |
I agree that it doesn't make sense. I think there is a lot of variance which means one trial doesn't really give that much information: it's just a random sample. One thing I think this data shows though is that we're within an order of magnitude in performance. And I think the code is a lot less fragile, maybe even easier to follow. |
magneticod/magneticod/__main__.py
Outdated
import uvloop | ||
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy()) | ||
logging.info("using uvloop") | ||
except ModuleNotFoundError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/artur/.local/bin/magneticod", line 9, in <module>
load_entry_point('magneticod==0.3.0', 'console_scripts', 'magneticod')()
File "/home/artur/.local/lib/python3.5/site-packages/magneticod/__main__.py", line 45, in main
except ModuleNotFoundError:
NameError: name 'ModuleNotFoundError' is not defined
2017-05-17 09:15:21,906 INFO magneticod v0.3.0 started
Traceback (most recent call last):
File "/home/artur/.local/lib/python3.5/site-packages/magneticod/__main__.py", line 42, in main
import uvloop
ImportError: No module named 'uvloop'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, it suggests during installation in setup.py add a message about use / not uvloop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I tested this on python3.6. I will change ModuleNotFoundError to ImportError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I note some snippet:
|
There is following exceptions after some time:
There is no crash of magneticod, only log of error. |
That's nice! |
Is our SQLite require load full database into memory? I flush database before startup. The size of the database corresponds to the increase in memory consumption.
|
I found a leak of DisposablePeer objects that I will fix. |
Regarding:
Since the only TCP open is in |
At the end of the day (my time is 23:24) just before the app is shut down, I share the current results. Calculation:
Logs: https://transfer.sh/qjEeb/log-1495030970.txt.tar.gz @richardkiss , I don't have idea how to debug uvloop.
|
I think it's a bug in uvloop, and I believe it's safe to ignore. |
My 9b1bbfc check-in should fix at least some of the leaks. |
Some leaks definitely still exist. |
With this latest change, I don't know of any further outstanding leaks (although I would not be surprised if there were some). |
@richardkiss , Do you need me to run this on your server and collect data about how long your application behaves, or are you working on it, and is not it still intentional? |
I think testing this for memory leaks and crashes would be useful. |
There are definitely still leaks... I ran out of memory on DigitalOcean after a few days. I need to try https://docs.python.org/3/library/tracemalloc.html on it. |
Commit e4c33ce fixes the issue. I initially had thought that keeping the temporary files in memory would increase the performance but the memory consumption increased drastically as well. @richardkiss Do you think you might have mis-identified this as the cause of the leaks? If not, can you please be more specific about what you mean when you say "leaks"? I would love to help. =) |
A couple days ago I looked to see where all the memory was going and it appeared to be the set of known info hashes. I wrote a patch to query the SQLite db instead and that drastically reduced the memory usage. I can check it in tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments on unclear points.
@@ -6,6 +6,7 @@ | |||
] | |||
PENDING_INFO_HASHES = 10 # threshold for pending info hashes before being committed to database: | |||
|
|||
TICK_INTERVAL = 1 # in seconds (soft constraint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove the tick interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no real reason to have it or to remove it, either way is fine. I probably did it because I don't usually factor constants to one place when I write in Python.
magneticod/magneticod/dht.py
Outdated
def when_peer_found(info_hash: InfoHash, peer_addr: PeerAddress) -> None: | ||
raise NotImplementedError() | ||
async def launch(self, loop): | ||
self._loop = loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we just use asyncio.get_event_loop()
here or can it return different event loops each time it is called. I looked at the documentation and it talks about the context but it was unclear for me what is meant by the word "context".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can do that. I think in general, it's better practice to explicitly pass along the current loop, but we could certainly get away with not doing that here.
magneticod/magneticod/dht.py
Outdated
if addr[1] == 0: | ||
continue | ||
def connection_lost(self, exc): | ||
self._is_paused = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What can connection_lost mean in a connectionless protocol? Can this ever be called in a asyncio.DatagramProtocol
and if yes, under what conditions? Again, asking because I couldn't make sense of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember if this being called or not. I may have just added it when I added connection_made. You could try logging or putting a pdb.set_trace()
to see if it's ever called. I believe it's called when close
is called on the transport. https://github.com/python/cpython/blob/master/Lib/asyncio/selector_events.py#L621
magneticod/magneticod/dht.py
Outdated
|
||
async def on_tick(self) -> None: | ||
while True: | ||
await asyncio.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constants.TICK_INTERVAL
could be used here, if not removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
Hello @richardkiss. First and foremost, thank you! A lot! I was planning the same but never had enough courage and energy at the same time to dare. I left some comments above about certain places, that I couldn't understand. I went through the whole PR and to me it seems perfectly fine, although I'll make some changes myself to clarify naming, and add typing annotations (plus, some other small changes if necessary).
How much memory was it consuming and for how many info-hashes? EDIT: Holy!
316.0 MiB! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @richardkiss, once again!
I would be glad if you can clarify one more point. I'm nearly done with cleanup, annotation, and some renaming so it will be merged quite soon. =)
parent_task.child_count -= 1 | ||
try: | ||
metadata = child_task.result() | ||
if metadata and not parent_task.done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check for parent_task
being done here when a child got result? I mean, if parent_task
is done before, and successful, all of its childs will be terminated and this function cannot be called anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose two child tasks are fetching the same metadata for a parent and they finish at the same time (or very close). The first one wakes up, sets the parent_task
result which will cause the done callback to be scheduled. The scheduler might still then chooses the second child task to run next (why not? It's been waiting longer) before the parent has a chance to cancel it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked through asyncio and confirmed that indeed, this really could happen.
I've rewritten and simplified much of magneticod to use asyncio.
This is a big change, so I can understand if you're reluctant to merge it. If you have requests about specific stylistic or other changes, I'm happy to work with you.
I've tested it in a Digital Ocean droplet, and it got 700 hashes in 30 minutes, about the same as v0.3.0. It seems to be a bit lighter on CPU.
The code is smaller by over 100 lines, and all the globals in main are now gone.