New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to forcefully close a connection. #29
Conversation
# Conflicts: # amqpstorm/channel.py # amqpstorm/io.py # amqpstorm/rpc.py
Current coverage is 99.74% (diff: 59.37%)@@ master #29 diff @@
==========================================
Files 64 64
Lines 5668 5557 -111
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 5663 5543 -120
- Misses 5 14 +9
Partials 0 0
|
Hey, Thanks for the pull request. I actually saw your changes a few weeks ago in your project. I was quite amused by some of your commit messages. =] I have actually been considering implementing a flag on connection.close, something along the lines connection.close(forcefully=True). I'll try to review your changes in the upcoming weeks, after my vacation. |
I saw you had watched my project, which is actually why I figured I'd make the pull request. And yes, I tend to be an angry, sweary coder, particularly on personal projects I'm working only for my own interests. Anyways, I don't pretend that the above pull request is necessarily /good/ code, it's just functional enough to work in a environment with an extremely unreliable network. FWIW, This is the only AMQP library that I've been able to convince to be genuinely durable.
AMQPStorm would wedge on close sometimes, but the library is actually structured well enough that I could hack the killer calls in this pull request into it. Anyways, practically speaking, I think having a "force" method on the No hurry, have a fun vacation! |
I am glad you found it durable. My whole design is all about keeping things simple, straight forward and to make sure that we fail-fast and reliably. Reviewing this I actually noticed that I wasn't always closing the connection cleanly. It is seen as general good behavior to wait for a confirmation from RabbitMQ that the connection was closed successfully, so that the connection and channels gets cleaned up immediately on the server side. Previously it would only wait for the response when closing a channel. This fix will actually make it worse for you though. Anyway, I added a flag called "wait_for_rpc", I am not going to pretend that it will fix your problem, since it does lack some components, and I am still looking into adopting your kill concept, but if you have time, feel free to test this as an alternative. You can enable this new behavior like this.
|
I tend to be more of a "hacker" then a actual proper "programmer", so I wouldn't pretend my modifications are elegant or cleanly implemented in any way. In any event, I apparently am working across a unreliable link (RabbitMQ server is in Europe, host is on the west-coast of the US), and basically it'll apparently drop or garble (not sure) packets periodically. The problem I encountered is that if there is any requirement for a disconnect handshake/RPC, it'll eventually fail at some point, resulting in the connection being wedged. I think the "proper" solution here is to have the disconnection also respect timeouts, where if a close request doesn't receive the response in some function of the socket timeout (3x socket timeout or similar), the connection is considered "closed", even without confirmation.. |
That might actually be the issue. As with the current implementation the RPC call for closing channels will not respect the TCP timeout. Instead it uses the individual Channel RPC timeout, set to 60s by default. I think that actually sending an RPC for channels when closing the connection is a bad idea. Instead I think we should make it up to the client to properly close the channel. e.g.
I removed the With this, the connection should always be properly closed within the socket timeout (+ ~0.25s). |
I'm pretty sure the problem I was encountering was channels wedging on close if the close request has been lost without the corresponding closure of the socket. The worker thread would then get stuck indefinitely. It's also possible it wasn't stuck indefinitely, but just longer then I was waiting while debugging, though I did reduce timeouts to (IIRC) 30 seconds without changes. |
Yep, that should hopefully not be an issue any longer. You may however want to skip channel.close() and go straight to connection.close() when trying to kill the connection fast. |
4b959a1
to
f45fc43
Compare
It should probably be noted I found a pile of issues in this PR, don't merge it whatever you do. Other comments: I've run 100+ MB messages through AMQP, but it's durability is heavily, HEAVILY dependent on the connection load percentage. If you have one HUGE message every 5 minutes, it'll be fine. If you have continuous huge messages, the connection (I think) starves for control-channel bandwidth, and everything is fire. |
It might be a good idea to go through and audit every while loop in the codebase. I'm still consistently getting things wedged in various places, for reasons I haven't tracked down yet (I can't replicate the issue on-command). |
Interesting. I'll take a look at that build message. Even though the code never got merged I learnt a lot from your pull request. I have a lot more ideas based on this that I want to implement, but I am in the US for a few months and my laptop broke so I don't have a workstation to work on from home. At the moment. |
Might be good to add some sort of timeout or something similar on the while loops. |
Yeah, it looks like my major source of wedging was indeed No worries about turn around time or "WHY U NO FIX" or whatever. I'm mostly trying to get this noted down just in case you'd be interested, or for when I look at it in a few months and wonder why the heck I did what I did. I'm "fixing" (read hacking around horribly) in my vendored version of the library, so this is more "FYI" then anything. |
Yep, and I appreciate the feedback. I am here to write a kickass library and to try to level up my coding skills. Always helpful to have someone else help review code and report flaws. |
Pushed a patch using my phone lol that might fix that issue |
Wow, you're way more hardcore then I. You put me on a box without sublime text and 2 monitors and I'm all "Ugh.... I can't work like this." |
Hah I normally have three monitors and complain when I have to use my laptop. ;) I guess that's what a month without either does to you |
Basically, I'm dealing with a context where I have a high volume traffic AMQP connection, across a high-latency, unreliable link (intercontinental).
If there is a way for a connection to possibly wedge, I'll encounter it.
Anyways, The issue I ran into here is that it's possible for the "close" operation on a connection to wedge indefinitely, preventing a connection from actually closing (I /think/ if the shutdown RPC request gets garbled/eaten somewhere).
The question of /how/ this happens aside, I therefore need to be able to kill a open connection in a dirty manner. This adds the ability to
kill()
a connection which will force the worker thread to exit immediately, without bothering to do any proper cleanup.Because this is a operation that's generally done with a additional watcher process, outside of the normal program flow, I used multiprocessing primitives to manage the
_die
flag (at one point, the watcher was a separate /process/, not thread).Anyways, I'm not sure if this is inline with the ideas of the library, but I haven't been able to wedge the amqp connection with this patch set, so that's something.
Only tested on Py3.5x64, Ubuntu 14.04.
Things I haven't done: Additional unit tests.