Skip to content

Libssh: Use sftp_aio instead of sftp_async for sftp_recv#17440

Closed
galorithm wants to merge 5 commits intocurl:masterfrom
galorithm:libssh_sftp_aio_api
Closed

Libssh: Use sftp_aio instead of sftp_async for sftp_recv#17440
galorithm wants to merge 5 commits intocurl:masterfrom
galorithm:libssh_sftp_aio_api

Conversation

@galorithm
Copy link
Contributor

Changes:

  1. Use sftp_aio instead of sftp_async API as the latter has been deprecated (since libssh 0.11.0)
  2. Remove the write limit capping for the calling code as libssh manages the limit internally (since 0.11.0)

Though this change won't give the full performance benefits that are expected by a proper use of sftp aio API (See 'Using the sftp aio API to speed up a transfer' section here: https://api.libssh.org/stable/libssh_tutor_sftp_aio.html) as the current use of sftp aio API here is as good as using the sync API (single request, single response at a time).

But still, this would provide some performance improvements as:

  1. The sftp aio API avoids an extra buffer copy in sftp_async API.
  2. The limits used internally by libssh could be more than the protocol specified minimum write limit of 32KB (provided servers support providing this information). Hence removing the write limit applied by curl's calling code, would lead to larger chunks being transferred in one go, potentially leading to faster transfers.

This pull request is a start and would be followed up by a proper use of the sftp aio API for downloading/uploading to increase curl 's performance when using libssh as a backend.

@bagder
Copy link
Member

bagder commented May 24, 2025

Hence removing the write limit applied by curl's calling code, would lead to larger chunks being transferred in one go, potentially leading to faster transfers.

What limit is this referring to?

@bagder
Copy link
Member

bagder commented May 24, 2025

The Source / complexity (pull_request) CI job turns red because the function myssh_statemach_act now scores >100 in "complexity". It was already before this on the border of touching this limit of course so basically anything added makes it hit this limit. Maybe you can split off some part of it into a sub function to make it less complex?

@galorithm
Copy link
Contributor Author

Hence removing the write limit applied by curl's calling code, would lead to larger chunks being transferred in one go, potentially leading to faster transfers.

What limit is this referring to?

@bagder The removed limit is referring to the one removed by this commit: (94fe2f8) [Removed only for versions since 0.11.0]

The limit internally used by libssh refers to the limit that the server may optionally support informing the clients if it supports the limits@openssh.com extension (https://github.com/openssh/openssh-portable/blob/73ef0563a59f90324f8426c017f38e20341b555f/PROTOCOL#L597) and if not, then libssh fall backs to the protocol specified minimum limit of 32KB.

@bagder
Copy link
Member

bagder commented May 24, 2025

@bagder The removed limit is referring to the one removed by this commit

That limit was always kind of silly on libssh's behalf. libssh2 for example instead sends multiple requests to fill more buffer if there is a larger one to fill. That kind of "pipelining" is what can make SFTP fast and I presume that is what this "aio" system does as well?

@galorithm
Copy link
Contributor Author

@bagder The removed limit is referring to the one removed by this commit

That limit was always kind of silly on libssh's behalf. libssh2 for example instead sends multiple requests to fill more buffer if there is a larger one to fill. That kind of "pipelining" is what can make SFTP fast and I presume that is what this "aio" system does as well?

The aio system doesn't directly do this, but provides an API that allows its users to do this (using which they can implement the pipelining themselves in the manner they intend to as described in the later sections here: https://api.libssh.org/stable/libssh_tutor_sftp_aio.html)

A single aio read/write still fundamentally corresponds to a single SFTP read/write request/response (unlike libssh2, which if I understand correctly, does read aheads and breaks larger write chunks to multiple requests in its read/write API).

The libssh team intends to follow this PR in future with another one that implements the pipelining using the aio API for sftp_send() and sftp_recv().

@testclutch
Copy link

Analysis of PR #17440 at 94fe2f8e:

Test ../../tests/http/test_07_upload.py::TestUpload::test_07_37_upload_307[h3] failed, which has NOT been flaky recently, so there could be a real issue in this PR.

Test 637 failed, which has NOT been flaky recently, so there could be a real issue in this PR.

Generated by Testclutch

@galorithm
Copy link
Contributor Author

galorithm commented May 24, 2025

@Jakuje Since libssh supports the aio API and the internal limits from 0.11.0, so should the check consider == also ? (unless I am missing something)

#if LIBSSH_VERSION_INT >= SSH_VERSION_INT(0, 11, 0) 

instead of (existing checks for aio)

#if LIBSSH_VERSION_INT > SSH_VERSION_INT(0, 11, 0)

@Jakuje
Copy link
Contributor

Jakuje commented May 24, 2025

@Jakuje Since libssh supports the aio API and the internal limits from 0.11.0, so should the check consider == also ? (unless I am missing something)

Yes, this would be technically correct, but the 0.11.0 had few bugs that were fixed very soon after the release with 0.11.1 so I hope nobody ends up using that. So in practice it won't matter:

https://www.libssh.org/files/0.11/

@galorithm
Copy link
Contributor Author

@bagder I wanted to confirm once whether the check and free performed here (

if(sshc->sftp_aio) {
) for freeing an sftp_aio was done for safety ?

Since the sftp_aio would be freed and the variable storing the sftp_aio would be assigned NULL by the libssh API in all cases (even errors) except when libssh returns SSH_AGAIN, which has been handled above that line.

So I think it isn't needed there, but maybe it was kept for safety (or to satisfy the code analyzers)

@bagder
Copy link
Member

bagder commented May 25, 2025

So I think it isn't needed there, but maybe it was kept for safety (or to satisfy the code analyzers)

I'm not the one most familiar with the libssh.c code. I presume it was added there for a reason but I don't know it. It was brought in the 8b25949 commit that introduced sftp_aio use.

If you don't think it is needed there, take it away!

@Jakuje
Copy link
Contributor

Jakuje commented May 26, 2025

So I think it isn't needed there, but maybe it was kept for safety (or to satisfy the code analyzers)

I'm not the one most familiar with the libssh.c code. I presume it was added there for a reason but I don't know it. It was brought in the 8b25949 commit that introduced sftp_aio use.

If you don't think it is needed there, take it away!

I think it is still needed in case the sftp_aio_wait_write() would return SSH_AGAIN and then there would be some failure, we are kept with the sftp_aio that was not freed in the ssh_conn structure, if I follow the code right.

To make the complexity linter happy, we could move this cleanup to separate function (sshc_sftp_cleanup() for example).

@galorithm
Copy link
Contributor Author

So I think it isn't needed there, but maybe it was kept for safety (or to satisfy the code analyzers)

I'm not the one most familiar with the libssh.c code. I presume it was added there for a reason but I don't know it. It was brought in the 8b25949 commit that introduced sftp_aio use.
If you don't think it is needed there, take it away!

I think it is still needed in case the sftp_aio_wait_write() would return SSH_AGAIN and then there would be some failure, we are kept with the sftp_aio that was not freed in the ssh_conn structure, if I follow the code right.

To make the complexity linter happy, we could move this cleanup to separate function (sshc_sftp_cleanup() for example).

      nwrite = sftp_aio_wait_write(&sshc->sftp_aio);
      myssh_block2waitfor(conn, sshc, (nwrite == SSH_AGAIN) ? TRUE : FALSE);
      if(nwrite == SSH_AGAIN) {
        *err = CURLE_AGAIN;
        return 0;
      }
      else if(nwrite < 0) {
        *err = CURLE_SEND_ERROR;
        return -1;
      }
      if(sshc->sftp_aio) {
        sftp_aio_free(sshc->sftp_aio);
        sshc->sftp_aio = NULL;
      }
      sshc->sftp_send_state = 0;
      return nwrite;

@Jakuje I agree that the sftp_aio_free() done during the cleanup stage (seemingly SSH_SFTP_SHUTDOWN) needs to be kept.

However, I was referring to the above done sftp_aio_free().

In the above case, there are three things possible:

  1. sftp_aio_wait_write() is successful -> sftp_aio would be freed and assigned NULL
  2. sftp_aio_wait_write() returns SSH_ERROR -> sftp_aio would be freed and assigned NULL
  3. sftp_aio_wait_write() returns SSH_AGAIN -> sftp_aio would be left intact.

The free done in the code mentioned above is done in case of success, which (I think) isn't needed.

(Or am I missing something here too)

@Jakuje
Copy link
Contributor

Jakuje commented May 26, 2025

@Jakuje I agree that the sftp_aio_free() done during the cleanup stage (seemingly SSH_SFTP_SHUTDOWN) needs to be kept.

Sorry, i confused myself what free we were talking about as I read it as a continuation of the complexity failure (which needs to be addressed anyway).

The free done in the code mentioned above is done in case of success, which (I think) isn't needed.

Yes, you are right. There should not be a reason to free the aio here as you show. Lets remove it.

@galorithm galorithm force-pushed the libssh_sftp_aio_api branch 2 times, most recently from 62bcb94 to 7fa145d Compare May 27, 2025 16:47
@galorithm
Copy link
Contributor Author

@bagder while moving the block of SSH_SFTP_READDIR_LINK to a separate function, I wasn't able to understand how the failure handling done here in the original code helps.

if(curlx_dyn_addf(&sshc->readdir_buf, " -> %s",
                  sshc->readdir_filename)) {
  sshc->actualcode = CURLE_OUT_OF_MEMORY;
  break;
}

Unless I am missing something, this would set the actualcode and break out of the switch case, but the state machine loop will continue on (as neither rc has changed, nor state has moved to STOP) with the same SSH_SFTP_READDIR_LINK state next time.

  } while(!rc && (sshc->state != SSH_STOP));

(If this interpretation is correct, probably a myssh_state(data, sshc, SSH_STOP); or MOVE_TO_SFTP_CLOSE_STATE() should be added in that if block)

@galorithm galorithm force-pushed the libssh_sftp_aio_api branch 2 times, most recently from 74c410b to a0fd5ff Compare June 7, 2025 07:50
@galorithm
Copy link
Contributor Author

galorithm commented Jun 7, 2025

@bagder while moving the block of SSH_SFTP_READDIR_LINK to a separate function, I wasn't able to understand how the failure handling done here in the original code helps.

if(curlx_dyn_addf(&sshc->readdir_buf, " -> %s",
                  sshc->readdir_filename)) {
  sshc->actualcode = CURLE_OUT_OF_MEMORY;
  break;
}

Unless I am missing something, this would set the actualcode and break out of the switch case, but the state machine loop will continue on (as neither rc has changed, nor state has moved to STOP) with the same SSH_SFTP_READDIR_LINK state next time.

  } while(!rc && (sshc->state != SSH_STOP));

(If this interpretation is correct, probably a myssh_state(data, sshc, SSH_STOP); or MOVE_TO_SFTP_CLOSE_STATE() should be added in that if block)

@bagder Pinging for your thoughts on the quoted comment.

If:

  • the mentioned comment is not an issue and
  • the failing check is not related (I don't think it is)

then this merge request should be good to merge.

Copy link
Contributor

@Jakuje Jakuje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Code-wise looks good. I have just some inline comments regarding the structure, that could be improved (but are not really wrong).

Regarding the curlx_dyn_addf(), I think this should change the state and it is an unintentional omission, which should be added. It will likely never happen, but we should not depend on that though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For separate commit:

Suggested change
failf(data, "Could not open remote file for reading: %s",
failf(data, "Could not open remote directory for reading: %s",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a sound suggestion? It is not a file, it is a directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jakuje thank you for pointing it out, I missed it during copying.

@bagder I've opened a new pull request for the readdir related changes and other fixes (#17856)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it odd that this function always returns SSH_NO_ERROR. Even though we break out of the loop with the state change, I would either:

  • remove the return value when it does not have any effect (this might be considered a bad for a function not to return anything though)
  • Change it to something semantically different, returning some error code when something fails and handle it by the caller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the myssh_state_sftp_readdir_link() -- it also always return the same return value. But in this case, the success could be used to implement the fall-through (even though the optimization would be really marginal).

Copy link
Contributor Author

@galorithm galorithm Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jakub, sorry for the delayed response.

One:

I find it odd that this function always returns SSH_NO_ERROR

In one case it would return SSH_ERROR (

MOVE_TO_SFTP_CLOSE_STATE();
) i.e when sftp_readdir() fails, because the macro expansion would lead to that assignment for rc.

#define MOVE_TO_SFTP_CLOSE_STATE() do {                         \
    myssh_state(data, sshc, SSH_SFTP_CLOSE);                    \
    sshc->actualcode =                                          \
      sftp_error_to_CURLE(sftp_get_error(sshc->sftp_session));  \
    rc = SSH_ERROR;                                             \
  } while(0)

Two:

I agree that it feels a bit odd that the function returns SSH_NO_ERROR even when something like aprintf() fails.

However, I kept it that way as I noticed that, the code of myssh_statemach_act() keeps track of errors through three ways:

  1. rc for libssh related errors. (Most places use this as the variable to store return value of libssh ssh_* API functions)
  2. result for curl related errors. (its a CURLcode type variable used to store return values of Curl_*() functions)
  3. sshc->actualcode seems to be the most consistently set on errors whether they be libssh errors or curl errors (except few cases which I think are overlooks)

Besides, all the myssh_state_*() functions (e.g

static int myssh_state_authlist(struct Curl_easy *data,
) representing a state seem to be following the pattern where they:

  • set the sshc->actualcode on error
  • return rc representing libssh errors

So I decided to follow that pattern for these too

Three:

Regarding the function behaviour, three ways come to mind:

  1. Keep it same and document the above information.
  2. return value void (but as you mention this may not be considered good and this would devoid the caller of information about the libssh error code rc, unless we add an additional int *rc_ptr parameter)
  3. change it to return a CURLCode which is always set (while also setting sshc->actualcode) on error with a parameter int *rc_ptr so that the caller also has information about the libssh error code because the state machine loop logic depends on it (while(!rc && (sshc->state != SSH_STOP));).

I prefer 3, but it means deviating from the current pattern of the myssh_state_*() functions,

@Jakuje What is your opinion on this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed the point the macros could also return something.

If @bagder is ok with this as it is, I am ok to keep it as it is. The option 3 would be more work and could be done in follow-up PR if needed.

@bagder bagder self-requested a review June 9, 2025 15:54
@bagder
Copy link
Member

bagder commented Jul 7, 2025

This branch has conflicts that must be resolved

I suspect the conflict is a little complicated to untangle since we did other cleanups in libssh.c in the meantime. The previous widespread macro use is no more for example.

Copy link
Member

@bagder bagder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs a rebase + force-push to get a proper review.

galorithm added 4 commits July 8, 2025 00:23
Renamed sftp_aio -> sftp_send_aio to highlight its association
with sftp_send() (This is consistent with how sftp_send_state,
associated with sftp_send() has been named)

Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
libssh's sftp_aio_wait_write() would free the sftp_aio
and assign NULL to the variable storing it in case of
success and error (not in case of SSH_AGAIN).

Hence freeing sftp_send_aio and assigning it NULL is
not needed in case of successful sftp_aio_wait_write()
due to which this commit removes that code.

Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
This commit replaces the usage of the old deprecated
sftp_async API with the new sftp_aio API for remote
file reading.

Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
Since version 0.11.0, libssh has started applying
read/write length capping appropriately for both
its sync and async API.

The limit applied by libssh would be as per what the
server imposes if the server supports sharing this info,
otherwise the limits would be according to the protocol
specified minimum limit that the servers should support.

Hence, the calling code does not need to bother applying
these limits as per the sftp protocol (libssh's newer
versions since 0.11.0 would handle that)

Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
@galorithm
Copy link
Contributor Author

It needs a rebase + force-push to get a proper review.

Hi @bagder, I've rebased and fixed the conflicts.

Also added a new commit (LINK) for: (from my reading) missing cleanup of the sftp aio in sshc_cleanup()

Using convenience macro SFTP_AIO_FREE() for checking
for non NULL before freeing and assigning NULL after
freeing using sftp_aio_free().

Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
@galorithm galorithm force-pushed the libssh_sftp_aio_api branch from c86ae3a to bcdce27 Compare July 23, 2025 15:00
Copy link
Contributor

@Jakuje Jakuje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from my side!

@bagder bagder closed this in f7af8ad Jul 28, 2025
@bagder
Copy link
Member

bagder commented Jul 28, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants

Comments