New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sftp upload request hangs in disconnect after callback-abort #3650
Comments
The description looks good to me but we can edit it a bit:
There is a large latency (500 ms RTT) and packet loss connection (30%) over which data is uploaded. Implemented timeout logic by using the libcurl progress callback. Once the timeout is hit we do clean up of the curl Handles in following way: curl_multi_remove_handle and the hang happens in the curl_multi_remove_handle. And here, it seems libssh2 can get stuck on the libssh2_session_disconnect call. |
Answer to your question asked on email:
Yes sums up the situation properly.
I believe this happens before the actual upload starts. From the debugging till now what I have understood is that due to latency and packet loss it takes lot of time to just establish a SSH session, transfer the AUTH KEYS and then begin the actual transfer of the data (payload). This time of initial SSH setup sometimes happens quickly and sometimes it takes > 5mins. And so we happen to hit the progress timeout and call the libssh2_disconnect which doesn't complete because libssh2 is still stuck in the initial setup.
libssh2 version I am using is 1.7.0 and the latest available is 1.8.0 I am not sure if this explains the problem properly but I am uploading the log here which might throw some more light on the issue. |
This log seems to pretty accurately pinpoint either a bad use of libssh2 or a bug in libssh2. I'm looking at the lines that keep repeating at the end:
They come from this code in libssh2: The comment there says: /* Don't write any new packets if we're still in the middle of a key exchange. */ ... which of course is crazy, since we've given up on the key exchange and just want to shut down and now we're stuck! That Can you try this simple libssh2 patch? diff --git a/src/session.c b/src/session.c
index 1aee429..aaa7103 100644
--- a/src/session.c
+++ b/src/session.c
@@ -1154,11 +1154,11 @@ session_disconnect(LIBSSH2_SESSION *session, int reason,
LIBSSH2_API int
libssh2_session_disconnect_ex(LIBSSH2_SESSION *session, int reason,
const char *desc, const char *lang)
{
int rc;
-
+ session->state = 0;
BLOCK_ADJUST(rc, session,
session_disconnect(session, reason, desc, lang));
return rc;
} |
Tried this patch. The hang intensity has drastically reduced. Now I am seeing hang once in 3 hrs. But, I am still seeing the hang sometimes not exactly sure what the reason is although the hang is still at curl_multi_remove_handle I need some more time debugging it . I will share more details soon. |
Cool. A step in the right direction then at least. But what kind of (libssh2) log outputs do you get now then when it hangs? Surely it's not the same repeated logs as before? |
If authentication is started but not completed before the application gives up and instead wants to shut down the session, the '->state' field might still be set and thus effectively dead-lock session_disconnect. This happens because both _libssh2_transport_send() and _libssh2_transport_read() refuse to do anything as long as state is set without the LIBSSH2_STATE_KEX_ACTIVE bit. Reported in curl bug curl/curl#3650
Since this happens randomly now I am not able to capture the logs at that particular instant when it gets hang. I am looking at some way where I can trigger it and get the libssh2/libcurl logs.
I am sure it is not the same repeated logs. Will update as soon as I have something. |
Hi, I saw hang for sometime like 10-12 mins and then everything get back to normal. Sharing the log: 39281:Mar 13 17:53:00 UniTask: * Callback aborted It got stuck here for good ~12 mins and then it recovered by itself. Have not yet exactly caught the hang but this might shed some light on where possibly it could hang? |
If authentication is started but not completed before the application gives up and instead wants to shut down the session, the '->state' field might still be set and thus effectively dead-lock session_disconnect. This happens because both _libssh2_transport_send() and _libssh2_transport_read() refuse to do anything as long as state is set without the LIBSSH2_STATE_KEX_ACTIVE bit. Reported in curl bug curl/curl#3650 Closes #310
This hang happens only when there is a state change from SSH_SFTP_INIT to SSH_SFTP_SHUTDOWN (see above log) in case of an abort. At SSH_SFTP_INIT all the authentication has been done and we have established a valid session. After SSH_SFTP_SHUTDOWN comes SSH_SESSION_DISCONNECT and then SSH_SESSION_FREE. It is here that it gets hang. In the session_free call it tries to close the channels associated with the transfer where it get's stuck in this while loop:
This is present in src/channel.c file function is _libssh2_channel_close get's called from _libssh2_channel_free which in turn is called by libssh2_session_free. Here it is expecting to read from the socket and it just waits there (_libssh2_transport_read). The log in the comment above is from the hang. Below is the log when there is no hang and it clears up everything fine: Mar 13 14:58:05 UniTask: * SFTP 0x1c3a9e0 state change from SSH_SFTP_INIT to SSH_SFTP_SHUTDOWN Mar 13 14:58:05 UniTask: 0000: 00 00 00 0C 06 60 00 00 00 00 85 3C 6E F5 5D 27 : .....`.....<n.]' |
More problems that are not curl bugs but libssh2 bugs. This issue doesn't belong here.
If it gets stuck there it must mean that |
You can register a shutdown function. This will be called when the script finishes, when the user aborts or when you call exit() or die(). http://php.net/manual/en/function.register-shutdown-function.php |
@Sammydov30 you speak PHP, we don't. |
I agree with you. I have raised this issue on libssh2 forum a while back but no one is responding to that.
I am not sure how or why it is getting stuck there but from the logs, that seems to be the only place where there is a while loop. |
While that is sad and unfortunate, it is still not a good enough reason to discuss libssh2 bugs in the curl issue tracker. curl seems to do right and the problem looks like it is within libssh2. I must insist that you continue this in the libssh2 project. |
I totally understand it. Is there a way we can move this existing issue to the libssh2 project? Since, @bagder opened this issue may be you can do it. Otherwise it will be hard to properly present all the data points from this issue to a new issue under libssh2 project. Thanks for helping out. |
Issue transfer requirements are pretty strict, you need to be an admin on both repos, it would only be possible if @bagder is an admin there as well and agrees. The meat of this seems to be #3650 (comment) so if it doesn't work out and you have to manually post to the libssh2 tracker I suggest stick with that and then reference this issue and the one you previously made through a different channel with libssh2. |
This is a libssh2 bug, closing here. |
I did this
Ketul Barot reported this on the mailing list, Feb 14 2019: "sftp upload request hangs with a high latency server using multi interface"
It seems there is a large latency (>500 ms RTT) and packet loss connection over which data is uploaded. Implementing a custom timeout logic, it returns an error from the progress timeout.
If that timeout happens, it seems libssh2 can get stuck on the libssh2_session_disconnect call.
I expected the following
It should disconnect the transfer just fine.
curl/libcurl version
operating system
Probably independent
The text was updated successfully, but these errors were encountered: