Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFTP: Incomplete file when size changes during transfer #4344

Closed
Ben-Voris opened this issue Sep 13, 2019 · 5 comments
Closed

SFTP: Incomplete file when size changes during transfer #4344

Ben-Voris opened this issue Sep 13, 2019 · 5 comments

Comments

@Ben-Voris
Copy link

@Ben-Voris Ben-Voris commented Sep 13, 2019

I did this

Got a file using sftp but the received file was incomplete.

I expected the following

A complete file.

curl/libcurl version

curl 7.65.3 (x86_64-pc-cygwin) libcurl/7.65.3 OpenSSL/1.1.1c zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.0.4) libssh/0.8.7/openssl/zlib nghttp2/1.37.0
Release-Date: 2019-07-19
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS brotli Debug GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz Metalink NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP TrackMemory UnixSockets

operating system

Client on Cygwin
Server is a HPE NonStop.

Under rather unique circumstances, curl using sftp gets less data than sftp does when used directly. curl seems to stop transfer at the reported eof rather than reading until eof. sftp gets the entire file. The file below is on a HPE NonStop system in their "Guardian" file system. This file ("an edit file") has spaces compressed on disk and put back upon read. That is, the reported size is sometimes less than the amount of data returned.

Here's how the file is reported:

: ssh bvoris@ssg-1.houston.hpecorp.net "ls -og /G/ipsv1/t9633abc/dfmtr"
-r-x------ 1 84518 May 11  2010 /G/ipsv1/t9633abc/dfmtr
: curl -vs sftp://bvoris@ssg-1.houston.hpecorp.net/G/ipsv1/t9633abc/dfmtr -O
* STATE: INIT => CONNECT handle 0x600077118; line 1356 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* STATE: CONNECT => WAITRESOLVE handle 0x600077118; line 1397 (connection #0)
*   Trying 16.209.76.76:22...
* TCP_NODELAY set
* STATE: WAITRESOLVE => WAITCONNECT handle 0x600077118; line 1476 (connection #0)
* Connected to ssg-1.houston.hpecorp.net (16.209.76.76) port 22 (#0)
* STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x600077118; line 1532 (connection #0)
* Marked for [keep alive]: SSH default
* User: bvoris
* Known hosts: /home/BVoris/.ssh/known_hosts
* SSH 0x600079200 state change from SSH_STOP to SSH_INIT (line 2115)
* SSH 0x600079200 state change from SSH_INIT to SSH_S_STARTUP (line 582)
* STATE: SENDPROTOCONNECT => PROTOCONNECT handle 0x600077118; line 1547 (connection #0)
* SSH 0x600079200 state change from SSH_S_STARTUP to SSH_HOSTKEY (line 595)
* SSH 0x600079200 state change from SSH_HOSTKEY to SSH_AUTHLIST (line 605)
* SSH 0x600079200 state change from SSH_AUTHLIST to SSH_AUTH_PKEY_INIT (line 628)
* Authentication using SSH public key file
* Completed public key authentication
* SSH 0x600079200 state change from SSH_AUTH_PKEY_INIT to SSH_AUTH_DONE (line 693)
* Authentication complete
* SSH 0x600079200 state change from SSH_AUTH_DONE to SSH_SFTP_INIT (line 807)
* SSH 0x600079200 state change from SSH_SFTP_INIT to SSH_SFTP_REALPATH (line 833)
* SSH CONNECT phase done
* SSH 0x600079200 state change from SSH_SFTP_REALPATH to SSH_STOP (line 850)
* STATE: PROTOCONNECT => DO handle 0x600077118; line 1566 (connection #0)
* DO phase starts
* SSH 0x600079200 state change from SSH_STOP to SSH_SFTP_QUOTE_INIT (line 2332)
* SSH 0x600079200 state change from SSH_SFTP_QUOTE_INIT to SSH_SFTP_GETINFO (line 868)
* SSH 0x600079200 state change from SSH_SFTP_GETINFO to SSH_SFTP_TRANS_INIT (line 1060)
* SSH 0x600079200 state change from SSH_SFTP_TRANS_INIT to SSH_SFTP_DOWNLOAD_INIT (line 1085)
* SSH 0x600079200 state change from SSH_SFTP_DOWNLOAD_INIT to SSH_SFTP_DOWNLOAD_STAT (line 1485)
* SSH 0x600079200 state change from SSH_SFTP_DOWNLOAD_STAT to SSH_STOP (line 1629)
* DO phase is complete
* STATE: DO => DO_DONE handle 0x600077118; line 1621 (connection #0)
* STATE: DO_DONE => PERFORM handle 0x600077118; line 1743 (connection #0)
{ [84518 bytes data]
* readwrite_data: we're done!
* nread <= 0, server closed connection, bailing
* STATE: PERFORM => DONE handle 0x600077118; line 1933 (connection #0)
* multi_done
* SSH 0x600079200 state change from SSH_STOP to SSH_SFTP_CLOSE (line 2390)
* SFTP DONE done
* SSH 0x600079200 state change from SSH_SFTP_CLOSE to SSH_STOP 

sftp shows this version information:

debug1: Local version string SSH-2.0-OpenSSH_8.0
debug1: Remote protocol version 2.0, remote software version 1.37g sshlib: T9999L02_14JUL2017_comForte_SSH2_0104:\\SSG.$SSH00
debug1: no match: 1.37g sshlib: T9999L02_14JUL2017_comForte_SSH2_0104:\\SSG.$SSH00

Then, using sftp get /G/ipsv1/t9633abc/dfmtr dfmtr.sftp gives this:

: ls -do dfmtr*
-rw-r--r--+ 1 BVoris 84518 Sep 12 21:29 dfmtr
-rwx------+ 1 BVoris 91332 Sep 12 21:31 dfmtr.sftp*

Could curl be changed to just read the file until eof, like sftp?

Although this server will always give a short output for a given file, isn't it possible on other systems if some adds data after the size is got and before the transfer starts?

@bagder
Copy link
Member

@bagder bagder commented Sep 13, 2019

The SFTP protocol doesn't have a concept of "read until eof". It instead reads a block of data from a specified offset. To read a full file numerous such read instructions are sent to the server.

curl starts out by checking the file size of the remote file and then iterates through and ask for block after block until its done. To take account for that the server lies about the size, or that the size changes during the transfer, curl would have to either do size checks during transfer or just try reading beyond the (supposedly known) file size to see if that works.

It can certainly be done, but is not something curl does now. With any of the SSH backends.

@Ben-Voris
Copy link
Author

@Ben-Voris Ben-Voris commented Sep 13, 2019

As shown, when run directly, sftp reads until the server returns no more data. (The eof indicator is that read returns a count of zero.) So, curl has added an unnecessary dependency, at least for sftp.

@bagder
Copy link
Member

@bagder bagder commented Sep 13, 2019

I already explained how SFTP works and what curl does there. It is not "an unnecessary dependency", it is perhaps an incomplete or even wrong way to do it depending on your view. I wouldn't mind seeing it improved.

@Ben-Voris
Copy link
Author

@Ben-Voris Ben-Voris commented Sep 14, 2019

I don't doubt that curl does what you describe, but that is not "SFTP". At least to me, the OpenSSH sftp client is the reference implementation.

I won't argue the semantics of unnecessary dependency (on the returned file size at a particular time) versus incomplete or wrong.

@bagder bagder changed the title Incomplete file transfer getting some files using SFTP from HPE NonStop server SFTP: Incomplete file when size changes during transfer Sep 14, 2019
@Ben-Voris
Copy link
Author

@Ben-Voris Ben-Voris commented Sep 17, 2019

For reference here's what OpenSSH sftp does for this same file when processing the get:

debug3: Looking up /G/ipsv1/t9633abc/dfmtr
debug3: Sent message fd 7 T:7 I:2
debug3: Received stat reply T:105 I:2
debug3: Sent message fd 7 T:17 I:3
debug3: Received stat reply T:105 I:3
debug3: Sent message SSH2_FXP_OPEN I:4 P:/G/ipsv1/t9633abc/dfmtr
debug3: Request range 0 -> 32767 (0/1)
debug3: Received reply T:103 I:5 R:1
debug3: Received data 0 -> 32767
debug3: Request range 32768 -> 65535 (0/2)
debug3: Request range 65536 -> 98303 (1/2)
debug3: Received reply T:103 I:6 R:2
debug3: Received data 32768 -> 65535
debug3: Finish at 98304 ( 1)
debug3: Received reply T:103 I:7 R:1
debug3: Received data 65536 -> 91331
debug3: Short data block, re-requesting 91332 -> 98303 ( 1)
debug3: Finish at 98304 ( 1)
debug3: Received reply T:101 I:8 R:1
debug3: Sent message SSH2_FXP_CLOSE I:9

The important thing is that it requests more data than the size of the file and only stops reading when it gets no data.

@bagder bagder closed this in 07e9878 Oct 13, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants