Libssh: Use sftp_aio instead of sftp_async for sftp_recv#17440
Libssh: Use sftp_aio instead of sftp_async for sftp_recv#17440galorithm wants to merge 5 commits intocurl:masterfrom
Conversation
What limit is this referring to? |
|
The Source / complexity (pull_request) CI job turns red because the function |
@bagder The removed limit is referring to the one removed by this commit: (94fe2f8) [Removed only for versions since 0.11.0] The limit internally used by libssh refers to the limit that the server may optionally support informing the clients if it supports the |
That limit was always kind of silly on libssh's behalf. libssh2 for example instead sends multiple requests to fill more buffer if there is a larger one to fill. That kind of "pipelining" is what can make SFTP fast and I presume that is what this "aio" system does as well? |
The aio system doesn't directly do this, but provides an API that allows its users to do this (using which they can implement the pipelining themselves in the manner they intend to as described in the later sections here: https://api.libssh.org/stable/libssh_tutor_sftp_aio.html) A single aio read/write still fundamentally corresponds to a single SFTP read/write request/response (unlike libssh2, which if I understand correctly, does read aheads and breaks larger write chunks to multiple requests in its read/write API). The libssh team intends to follow this PR in future with another one that implements the pipelining using the aio API for sftp_send() and sftp_recv(). |
|
Analysis of PR #17440 at 94fe2f8e: Test ../../tests/http/test_07_upload.py::TestUpload::test_07_37_upload_307[h3] failed, which has NOT been flaky recently, so there could be a real issue in this PR. Test 637 failed, which has NOT been flaky recently, so there could be a real issue in this PR. Generated by Testclutch |
|
@Jakuje Since libssh supports the aio API and the internal limits from 0.11.0, so should the check consider instead of (existing checks for aio) |
Yes, this would be technically correct, but the 0.11.0 had few bugs that were fixed very soon after the release with 0.11.1 so I hope nobody ends up using that. So in practice it won't matter: |
|
@bagder I wanted to confirm once whether the check and free performed here ( Line 2694 in ed07f59 Since the sftp_aio would be freed and the variable storing the sftp_aio would be assigned NULL by the libssh API in all cases (even errors) except when libssh returns SSH_AGAIN, which has been handled above that line. So I think it isn't needed there, but maybe it was kept for safety (or to satisfy the code analyzers) |
I'm not the one most familiar with the libssh.c code. I presume it was added there for a reason but I don't know it. It was brought in the 8b25949 commit that introduced sftp_aio use. If you don't think it is needed there, take it away! |
I think it is still needed in case the To make the complexity linter happy, we could move this cleanup to separate function ( |
@Jakuje I agree that the However, I was referring to the above done In the above case, there are three things possible:
The free done in the code mentioned above is done in case of success, which (I think) isn't needed. (Or am I missing something here too) |
Sorry, i confused myself what free we were talking about as I read it as a continuation of the complexity failure (which needs to be addressed anyway).
Yes, you are right. There should not be a reason to free the aio here as you show. Lets remove it. |
62bcb94 to
7fa145d
Compare
|
@bagder while moving the block of Unless I am missing something, this would set the (If this interpretation is correct, probably a |
74c410b to
a0fd5ff
Compare
@bagder Pinging for your thoughts on the quoted comment. If:
then this merge request should be good to merge. |
Jakuje
left a comment
There was a problem hiding this comment.
Thank you! Code-wise looks good. I have just some inline comments regarding the structure, that could be improved (but are not really wrong).
Regarding the curlx_dyn_addf(), I think this should change the state and it is an unintentional omission, which should be added. It will likely never happen, but we should not depend on that though.
lib/vssh/libssh.c
Outdated
There was a problem hiding this comment.
For separate commit:
| failf(data, "Could not open remote file for reading: %s", | |
| failf(data, "Could not open remote directory for reading: %s", |
There was a problem hiding this comment.
This seems like a sound suggestion? It is not a file, it is a directory.
lib/vssh/libssh.c
Outdated
There was a problem hiding this comment.
I find it odd that this function always returns SSH_NO_ERROR. Even though we break out of the loop with the state change, I would either:
- remove the return value when it does not have any effect (this might be considered a bad for a function not to return anything though)
- Change it to something semantically different, returning some error code when something fails and handle it by the caller.
There was a problem hiding this comment.
Same for the myssh_state_sftp_readdir_link() -- it also always return the same return value. But in this case, the success could be used to implement the fall-through (even though the optimization would be really marginal).
There was a problem hiding this comment.
Hi Jakub, sorry for the delayed response.
One:
I find it odd that this function always returns SSH_NO_ERROR
In one case it would return SSH_ERROR (
Line 1141 in a0fd5ff
sftp_readdir() fails, because the macro expansion would lead to that assignment for rc.
#define MOVE_TO_SFTP_CLOSE_STATE() do { \
myssh_state(data, sshc, SSH_SFTP_CLOSE); \
sshc->actualcode = \
sftp_error_to_CURLE(sftp_get_error(sshc->sftp_session)); \
rc = SSH_ERROR; \
} while(0)
Two:
I agree that it feels a bit odd that the function returns SSH_NO_ERROR even when something like aprintf() fails.
However, I kept it that way as I noticed that, the code of myssh_statemach_act() keeps track of errors through three ways:
rcfor libssh related errors. (Most places use this as the variable to store return value of libsshssh_*API functions)resultfor curl related errors. (its a CURLcode type variable used to store return values ofCurl_*()functions)sshc->actualcodeseems to be the most consistently set on errors whether they be libssh errors or curl errors (except few cases which I think are overlooks)
Besides, all the myssh_state_*() functions (e.g
Line 668 in a0fd5ff
- set the sshc->actualcode on error
- return rc representing libssh errors
So I decided to follow that pattern for these too
Three:
Regarding the function behaviour, three ways come to mind:
- Keep it same and document the above information.
- return value void (but as you mention this may not be considered good and this would devoid the caller of information about the libssh error code rc, unless we add an additional
int *rc_ptrparameter) - change it to return a CURLCode which is always set (while also setting sshc->actualcode) on error with a parameter
int *rc_ptrso that the caller also has information about the libssh error code because the state machine loop logic depends on it (while(!rc && (sshc->state != SSH_STOP));).
I prefer 3, but it means deviating from the current pattern of the myssh_state_*() functions,
@Jakuje What is your opinion on this ?
There was a problem hiding this comment.
Oh, I missed the point the macros could also return something.
If @bagder is ok with this as it is, I am ok to keep it as it is. The option 3 would be more work and could be done in follow-up PR if needed.
I suspect the conflict is a little complicated to untangle since we did other cleanups in libssh.c in the meantime. The previous widespread macro use is no more for example. |
bagder
left a comment
There was a problem hiding this comment.
It needs a rebase + force-push to get a proper review.
Renamed sftp_aio -> sftp_send_aio to highlight its association with sftp_send() (This is consistent with how sftp_send_state, associated with sftp_send() has been named) Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
libssh's sftp_aio_wait_write() would free the sftp_aio and assign NULL to the variable storing it in case of success and error (not in case of SSH_AGAIN). Hence freeing sftp_send_aio and assigning it NULL is not needed in case of successful sftp_aio_wait_write() due to which this commit removes that code. Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
This commit replaces the usage of the old deprecated sftp_async API with the new sftp_aio API for remote file reading. Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
Since version 0.11.0, libssh has started applying read/write length capping appropriately for both its sync and async API. The limit applied by libssh would be as per what the server imposes if the server supports sharing this info, otherwise the limits would be according to the protocol specified minimum limit that the servers should support. Hence, the calling code does not need to bother applying these limits as per the sftp protocol (libssh's newer versions since 0.11.0 would handle that) Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
a0fd5ff to
1a655ae
Compare
Using convenience macro SFTP_AIO_FREE() for checking for non NULL before freeing and assigning NULL after freeing using sftp_aio_free(). Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
c86ae3a to
bcdce27
Compare
|
Thanks! |
Changes:
Though this change won't give the full performance benefits that are expected by a proper use of sftp aio API (See 'Using the sftp aio API to speed up a transfer' section here: https://api.libssh.org/stable/libssh_tutor_sftp_aio.html) as the current use of sftp aio API here is as good as using the sync API (single request, single response at a time).
But still, this would provide some performance improvements as:
This pull request is a start and would be followed up by a proper use of the sftp aio API for downloading/uploading to increase curl 's performance when using libssh as a backend.