Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early EOF errors when running git fetch over ssh #1322

Closed
congyiwu opened this issue Jan 11, 2019 · 82 comments
Closed

Early EOF errors when running git fetch over ssh #1322

congyiwu opened this issue Jan 11, 2019 · 82 comments

Comments

@congyiwu
Copy link

If I configure the Git for Windows client to use the SSH version that ships with Windows, I occasionally see the following error during git fetch:

fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

Here's a log that includes ssh -vvv output:
fetch.log

And here's the test script that I used, which replays the same fetch 100 times in a row:
fetchloop.sh.txt

I believe that this is a bug in the SSH client that ships with Windows, because:

  • It does not reproduce with the version of OpenSSH built in to Git for Windows.
  • It reproduces with multiple Git server implementations that don't share Git or SSH code.

The bug also seems data-shape dependent in that it reproduces more or less frequently depending on the exact state of the Git repos on the client and server. I have snapshots of the client and server Git repo state. I can't share it publicly, can can share it internally with Microsoft employees.

My client OS is Windows 10 1809 Build 17763.253
((Get-Item (Get-Command sshd).Source).VersionInfo.FileVersion) says 7.7.2.2

The Git servers that I have tested against are:

  • Ubuntu 18.04.1 LTS in a Hyper-V VM running in the same machine as my git client. I followed the instructions here: https://git-scm.com/book/en/v2/Git-on-the-Server-Setting-Up-the-Server
  • Azure DevOps, which uses custom implementations of both Git and SSH which do not share code with either Git for Windows (we wrote it from scratch) or OpenSSH (ours is based on a proprietary SSH library (SecureBlackBox)). I am the current maintainer of the code that handles Git SSH requests for Azure DevOps.

From messing around with the Azure DevOps SSH code, I found the following:

  • I'm fairly certain that git.exe, which shells out to ssh.exe, is hitting EOF before the end of the expected fetch response.
  • I'm fairly certain our server is sending the correct SSH_MSG_CHANNEL_DATA packets for the entire fetch response
  • I can "fix" the issue by adding a sleep immediately before sending SSH_MSG_CHANNEL_CLOSE from our server.

So this looks like a race condition where channel data from the server is lost when the client receives SSH_MSG_CHANNEL_CLOSE.

BTW, I tried searching for related issues and found #752 and a few others, but they all seemed to be issues with running the Windows version of sshd on the server, as opposed to the client.

@dscho this should probably block git-for-windows/git#1981

@congyiwu congyiwu changed the title Early EOF errors when running git fetching over ssh Early EOF errors when running git fetch over ssh Feb 5, 2019
@congyiwu
Copy link
Author

congyiwu commented Feb 5, 2019

bump?

@NoMoreFood
Copy link

What is the latency between the client and server?

I'm willing to take a look at this, but I'd want some sort of repo snapshot (accessible to me) I know can reproduce the issue before diving into setting up a test.

petemounce pushed a commit to spatialos/gdk-for-unity that referenced this issue Apr 26, 2019
@petemounce
Copy link

petemounce commented Apr 26, 2019

I think we're seeing this as well.

Windows 2016 Server, desktop edition.
git for windows 2.21.0 via https://chocolatey.org/packages/git/2.21.0
openssh 7.9.0.0 via https://chocolatey.org/packages/openssh/7.9.0.1

ssh key is loaded into Windows ssh-agent service.

Verbose log output (using public repositories):

+ GIT_TRACE=1
+ GIT_TRACE_PACKET=1
+ GIT_TRACE_SETUP=1
+ GIT_CURL_VERBOSE=1
+ GIT_SSH_COMMAND='c:/Program\ Files/OpenSSH-Win64/ssh.exe -vvv'
+ git clone --verbose git@github.com:spatialos/gdk-for-unity-shared-ci.git /c/somewhere/2/gdk-for-unity-premerge/.shared-ci
13:39:23.682306 exec-cmd.c:236          trace: resolved executable dir: C:/Program Files/Git/mingw64/bin
13:39:23.683273 git.c:419               trace: built-in: git clone --verbose git@github.com:spatialos/gdk-for-unity-shared-ci.git C:/somewhere/2/gdk-for-unity-premerge/.shared-ci
13:39:23.696192 run-command.c:643       trace: run_command: unset GIT_DIR; 'c:/Program\ Files/OpenSSH-Win64/ssh.exe -vvv' git@github.com 'git-upload-pack '\''spatialos/gdk-for-unity-shared-ci.git'\'''
Cloning into 'C:/somewhere/2/gdk-for-unity-premerge/.shared-ci'...
bash.exe: warning: could not find /tmp, please create!
OpenSSH_for_Windows_7.9p1, LibreSSL 2.6.5
debug1: Reading configuration data C:\\Users\\someone-harmless/.ssh/config
debug1: C:\\Users\\someone-harmless/.ssh/config line 6: Applying options for *
debug1: C:\\Users\\someone-harmless/.ssh/config line 13: Applying options for github.com
debug3: Failed to open file:C:/ProgramData/ssh/ssh_config error:2
debug2: resolving "github.com" port 22
debug2: ssh_connect_direct
debug1: Connecting to github.com [140.82.118.3] port 22.
debug1: Connection established.
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_rsa error:2
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_rsa.pub error:2
debug1: identity file C:\\Users\\someone-harmless/.ssh/id_rsa type -1
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_rsa-cert error:2
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_rsa-cert.pub error:2
debug1: identity file C:\\Users\\someone-harmless/.ssh/id_rsa-cert type -1
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_dsa error:2
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_dsa.pub error:2
debug1: identity file C:\\Users\\someone-harmless/.ssh/id_dsa type -1
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_dsa-cert error:2
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_dsa-cert.pub error:2
debug1: identity file C:\\Users\\someone-harmless/.ssh/id_dsa-cert type -1
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_ecdsa error:2
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_ecdsa.pub error:2
debug1: identity file C:\\Users\\someone-harmless/.ssh/id_ecdsa type -1
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_ecdsa-cert error:2
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_ecdsa-cert.pub error:2
debug1: identity file C:\\Users\\someone-harmless/.ssh/id_ecdsa-cert type -1
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_ed25519 error:2
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_ed25519.pub error:2
debug1: identity file C:\\Users\\someone-harmless/.ssh/id_ed25519 type -1
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_ed25519-cert error:2
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_ed25519-cert.pub error:2
debug1: identity file C:\\Users\\someone-harmless/.ssh/id_ed25519-cert type -1
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_xmss error:2
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_xmss.pub error:2
debug1: identity file C:\\Users\\someone-harmless/.ssh/id_xmss type -1
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_xmss-cert error:2
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/id_xmss-cert.pub error:2
debug1: identity file C:\\Users\\someone-harmless/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_for_Windows_7.9
debug1: Remote protocol version 2.0, remote software version babeld-d125c03e
debug1: no match: babeld-d125c03e
debug2: fd 3 setting O_NONBLOCK
debug1: Authenticating to github.com:22 as 'git'
debug3: hostkeys_foreach: reading file "C:\\Users\\someone-harmless/.ssh/known_hosts"
debug3: record_hostkey: found key type RSA in file C:\\Users\\someone-harmless/.ssh/known_hosts:3
debug3: load_hostkeys: loaded 1 keys from github.com
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/known_hosts2 error:2
debug3: Failed to open file:C:/ProgramData/ssh/ssh_known_hosts error:2
debug3: Failed to open file:C:/ProgramData/ssh/ssh_known_hosts2 error:2
debug3: order_hostkeyalgs: prefer hostkeyalgs: rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,rsa-sha2-512,rsa-sha2-256,ssh-rsa
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug3: receive packet: type 20
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,ext-info-c
debug2: host key algorithms: rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,rsa-sha2-512,rsa-sha2-256,ssh-rsa,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none
debug2: compression stoc: none
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256
debug2: host key algorithms: ssh-dss,rsa-sha2-512,rsa-sha2-256,ssh-rsa
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc
debug2: MACs ctos: hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib,zlib@openssh.com
debug2: compression stoc: none,zlib,zlib@openssh.com
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: rsa-sha2-512
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug3: receive packet: type 31
debug1: Server host key: ssh-rsa SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8
debug3: hostkeys_foreach: reading file "C:\\Users\\someone-harmless/.ssh/known_hosts"
debug3: record_hostkey: found key type RSA in file C:\\Users\\someone-harmless/.ssh/known_hosts:3
debug3: load_hostkeys: loaded 1 keys from github.com
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/known_hosts2 error:2
debug3: Failed to open file:C:/ProgramData/ssh/ssh_known_hosts error:2
debug3: Failed to open file:C:/ProgramData/ssh/ssh_known_hosts2 error:2
debug3: hostkeys_foreach: reading file "C:\\Users\\someone-harmless/.ssh/known_hosts"
debug3: record_hostkey: found key type RSA in file C:\\Users\\someone-harmless/.ssh/known_hosts:5
debug3: load_hostkeys: loaded 1 keys from 140.82.118.3
debug3: Failed to open file:C:/Users/someone-harmless/.ssh/known_hosts2 error:2
debug3: Failed to open file:C:/ProgramData/ssh/ssh_known_hosts error:2
debug3: Failed to open file:C:/ProgramData/ssh/ssh_known_hosts2 error:2
debug1: Host 'github.com' is known and matches the RSA host key.
debug1: Found key in C:\\Users\\someone-harmless/.ssh/known_hosts:3
Host key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8
+---[RSA 2048]----+
| =+o...+=o..     |
|o++... *o .      |
|*.o.  *o.        |
|oo.  ..o.= .     |
|.+o. .. S =      |
|*=+ .  o = .     |
|OE .  . o        |
| o     .         |
|                 |
+----[SHA256]-----+
debug3: send packet: type 21
debug2: set_newkeys: mode 1
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug3: receive packet: type 21
debug1: SSH2_MSG_NEWKEYS received
debug2: set_newkeys: mode 0
debug1: rekey after 134217728 blocks
debug1: Will attempt key: c:/users/someone-harmless/nothing-to-see-here RSA SHA256:some-key-signature agent
debug1: Will attempt key: C:\\Users\\someone-harmless/.ssh/id_rsa
debug1: Will attempt key: C:\\Users\\someone-harmless/.ssh/id_dsa
debug1: Will attempt key: C:\\Users\\someone-harmless/.ssh/id_ecdsa
debug1: Will attempt key: C:\\Users\\someone-harmless/.ssh/id_ed25519
debug1: Will attempt key: C:\\Users\\someone-harmless/.ssh/id_xmss
debug2: pubkey_prepare: done
debug3: send packet: type 5
debug3: receive packet: type 7
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-rsa,rsa-sha2-512,rsa-sha2-256,ssh-dss>
debug3: receive packet: type 6
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug3: send packet: type 50
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey
debug3: start over, passed a different list publickey
debug3: preferred publickey,keyboard-interactive,password
debug3: authmethod_lookup publickey
debug3: remaining preferred: keyboard-interactive,password
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Offering public key: c:/users/someone-harmless/nothing-to-see-here RSA SHA256:some-key-signature agent
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 60
debug1: Server accepts key: c:/users/someone-harmless/nothing-to-see-here RSA SHA256:some-key-signature agent
debug3: sign_and_send_pubkey: RSA SHA256:some-key-signature
debug3: sign_and_send_pubkey: signing using rsa-sha2-512
debug3: send packet: type 50
debug3: receive packet: type 52
debug1: Authentication succeeded (publickey).
Authenticated to github.com ([140.82.118.3]:22).
debug2: fd 5 setting O_NONBLOCK
debug2: fd 6 setting O_NONBLOCK
debug2: fd 7 setting O_NONBLOCK
debug1: channel 0: new [client-session]
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug3: send packet: type 90
debug1: Entering interactive session.
debug1: pledge: network
debug3: receive packet: type 91
debug2: channel_input_open_confirmation: channel 0: callback start
debug2: fd 3 setting TCP_NODELAY
debug2: client_session2_setup: id 0
debug1: Sending command: git-upload-pack 'spatialos/gdk-for-unity-shared-ci.git'
debug2: channel 0: request exec confirm 1
debug3: send packet: type 98
debug2: channel_input_open_confirmation: channel 0: callback done
debug2: channel 0: open confirm rwindow 32000 rmax 35000
debug3: receive packet: type 99
debug2: channel_input_status_confirm: type 99 id 0
debug2: exec request accepted on channel 0
13:39:24.767552 pkt-line.c:80           packet:        clone< 8f6b03dbc78f507f6753663bc7480d1ee2c2f2b1 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/master agent=git/github-g14823918a5db
13:39:24.767552 pkt-line.c:80           packet:        clone< 4ff2cf8863123af0a933c8d46f1028dd1cded4c7 refs/heads/feature/add-test-runner
13:39:24.768613 pkt-line.c:80           packet:        clone< 59dbfe262747d735e206fdbb32e963f6adeeab4b refs/heads/feature/api-docs-generator
13:39:24.768613 pkt-line.c:80           packet:        clone< abf7edaa258bd6e24a3f46178f0c1841e49fd1c3 refs/heads/feature/buildkite_premerge
13:39:24.768613 pkt-line.c:80           packet:        clone< 9753b61daac9ab12e2048d799d6985d4e7ddb3ec refs/heads/feature/ci-pipefail-fix
13:39:24.768613 pkt-line.c:80           packet:        clone< 995cc8471eb4f13779a25c51f8402840a55ebb06 refs/heads/feature/release-tool
13:39:24.768613 pkt-line.c:80           packet:        clone< 8f6b03dbc78f507f6753663bc7480d1ee2c2f2b1 refs/heads/master
13:39:24.768613 pkt-line.c:80           packet:        clone< e915467132935c74cc68be9e42f4165d4efb7e10 refs/pull/1/head
13:39:24.768613 pkt-line.c:80           packet:        clone< d6e6eff1a7356835a759c51176cf5a128f7bf47f refs/pull/10/head
13:39:24.768613 pkt-line.c:80           packet:        clone< be5ab167ac8e0930cb37ad5f4895d52b35b8d425 refs/pull/11/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 59dbfe262747d735e206fdbb32e963f6adeeab4b refs/pull/12/head
13:39:24.768613 pkt-line.c:80           packet:        clone< e099c93243af35974200acc44f387fdfde556cec refs/pull/13/head
13:39:24.768613 pkt-line.c:80           packet:        clone< be5ab167ac8e0930cb37ad5f4895d52b35b8d425 refs/pull/14/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 4a7ced7335017e8b8563790a3a350500ade199bf refs/pull/15/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 08d1e889439670b5fed0464d1182d9331d519166 refs/pull/16/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 0a6feaef63c327130d919157f9b35392b1e5f79f refs/pull/17/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 989ea6b4e240502f8de055431b56c234d77cbe0b refs/pull/18/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 060747b0a81506d0f8f9b98a5156633b5dfe7caa refs/pull/19/head
13:39:24.768613 pkt-line.c:80           packet:        clone< a9885703dea124e67e85a53e6e28cdea9f7c62fb refs/pull/2/head
13:39:24.768613 pkt-line.c:80           packet:        clone< f4b7b16934cd544468afcc4b2397fb54de6b1493 refs/pull/20/head
13:39:24.768613 pkt-line.c:80           packet:        clone< ecac4ae9159feaf301d038cbd642a0c66e06eb13 refs/pull/21/head
13:39:24.768613 pkt-line.c:80           packet:        clone< abf7edaa258bd6e24a3f46178f0c1841e49fd1c3 refs/pull/22/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 107a7183bf750b44e52c8117639491dcb1c93c97 refs/pull/23/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 995cc8471eb4f13779a25c51f8402840a55ebb06 refs/pull/24/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 8a58a3d12d7d99291767aad5d0f1ccc5d49b1e01 refs/pull/25/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 05295c4d819273d55ec684bfa17f603693cd3154 refs/pull/26/head
13:39:24.768613 pkt-line.c:80           packet:        clone< aaf3e032761242010345680ac247f6b191211d13 refs/pull/27/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 45e1c0579a64316bcef4d0bf1fabf3e02311305f refs/pull/28/head
13:39:24.768613 pkt-line.c:80           packet:        clone< fea48547a6393c3968b58b0de4bbf3e73fb5d579 refs/pull/29/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 47f1279126a218aea387a63b442d54bdd2d76f5d refs/pull/3/head
13:39:24.768613 pkt-line.c:80           packet:        clone< c16dbba21065767ad729bf5e40585d9352549558 refs/pull/30/head
13:39:24.768613 pkt-line.c:80           packet:        clone< d39c743b78209750580be152e15f1b724649a261 refs/pull/31/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 09a1c68285ca57a94d0cf6771123cb0d6212b772 refs/pull/4/head
13:39:24.768613 pkt-line.c:80           packet:        clone< bd276089863db4ec6a8ec06dcdec74f9c129f419 refs/pull/5/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 3640f47a1a4b41747b49b34ad22ced2371b31824 refs/pull/6/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 4ff2cf8863123af0a933c8d46f1028dd1cded4c7 refs/pull/7/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 54367fb969625c43a214c574c4e023a7d4704dbd refs/pull/8/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 49e01a15aee1392a66831915c3aefd8d56593fe9 refs/pull/9/head
13:39:24.768613 pkt-line.c:80           packet:        clone< 0000
13:39:24.771174 pkt-line.c:80           packet:        clone> want 8f6b03dbc78f507f6753663bc7480d1ee2c2f2b1 multi_ack_detailed side-band-64k thin-pack no-progress ofs-delta deepen-since deepen-not agent=git/2.21.0.windows.1
13:39:24.771174 pkt-line.c:80           packet:        clone> want 4ff2cf8863123af0a933c8d46f1028dd1cded4c7
13:39:24.771174 pkt-line.c:80           packet:        clone> want 59dbfe262747d735e206fdbb32e963f6adeeab4b
13:39:24.771174 pkt-line.c:80           packet:        clone> want abf7edaa258bd6e24a3f46178f0c1841e49fd1c3
13:39:24.771174 pkt-line.c:80           packet:        clone> want 9753b61daac9ab12e2048d799d6985d4e7ddb3ec
13:39:24.771174 pkt-line.c:80           packet:        clone> want 995cc8471eb4f13779a25c51f8402840a55ebb06
13:39:24.771174 pkt-line.c:80           packet:        clone> want 8f6b03dbc78f507f6753663bc7480d1ee2c2f2b1
13:39:24.771174 pkt-line.c:80           packet:        clone> 0000
13:39:24.771174 pkt-line.c:80           packet:        clone> done
debug2: channel 0: rcvd adjust 1248470
13:39:24.906427 pkt-line.c:80           packet:        clone< NAK
13:39:24.920583 run-command.c:643       trace: run_command: git index-pack --stdin --fix-thin '--keep=fetch-pack 14780 on some-uninteresting-hostname' --check-self-contained-and-connected
13:39:24.927445 pkt-line.c:80           packet:     sideband< PACK ...
13:39:24.930373 exec-cmd.c:236          trace: resolved executable dir: C:/Program Files/Git/mingw64/libexec/git-core
13:39:24.932299 trace.c:377             setup: git_dir: C:/somewhere/2/gdk-for-unity-premerge/.shared-ci/.git
13:39:24.932299 trace.c:378             setup: git_common_dir: C:/somewhere/2/gdk-for-unity-premerge/.shared-ci/.git
13:39:24.932299 trace.c:379             setup: worktree: C:/somewhere/2/gdk-for-unity-premerge
13:39:24.932299 trace.c:380             setup: cwd: C:/somewhere/2/gdk-for-unity-premerge
13:39:24.932299 trace.c:381             setup: prefix: (null)
13:39:24.932299 git.c:419               trace: built-in: git index-pack --stdin --fix-thin '--keep=fetch-pack 14780 on some-uninteresting-hostname' --check-self-contained-and-connected
debug2: channel 0: window 1971561 sent adjust 125591
debug2: channel 0: window 1966082 sent adjust 131070
debug2: channel 0: window 1966082 sent adjust 65535
debug2: channel 0: window 1966082 sent adjust 131070
debug2: channel 0: window 1966082 sent adjust 65535
debug2: channel 0: window 1966082 sent adjust 65535
debug2: channel 0: window 1802254 sent adjust 102400
debug2: channel 0: window 1904654 sent adjust 102400
debug2: channel 0: window 1974277 sent adjust 122875
debug2: channel 0: window 1966082 sent adjust 131070
debug2: channel 0: window 1966082 sent adjust 32758
debug2: channel 0: window 1998840 sent adjust 98312
debug2: channel 0: window 1553527 sent adjust 102400
debug2: channel 0: window 1584080 sent adjust 102400
debug2: channel 0: window 1686480 sent adjust 102400
debug2: channel 0: window 1788880 sent adjust 102400
debug2: channel 0: window 1891280 sent adjust 102400
debug2: channel 0: window 1993680 sent adjust 102400
debug2: channel 0: window 1988551 sent adjust 75808
debug2: channel 0: window 1908651 sent adjust 102400
debug2: channel 0: window 1978274 sent adjust 86101
debug2: channel 0: window 1933305 sent adjust 102400
debug2: channel 0: window 1413132 sent adjust 102400
debug2: channel 0: window 1155080 sent adjust 102400
debug2: channel 0: window 1093652 sent adjust 102400
debug2: channel 0: window 1083509 sent adjust 102400
debug2: channel 0: window 1083509 sent adjust 102400
debug2: channel 0: window 1171470 sent adjust 102400
debug2: channel 0: window 1273870 sent adjust 102400
debug2: channel 0: window 1376270 sent adjust 102400
debug2: channel 0: window 1478670 sent adjust 102400
debug2: channel 0: window 1581070 sent adjust 102400
debug2: channel 0: window 1683470 sent adjust 102400
debug2: channel 0: window 1785870 sent adjust 102400
debug2: channel 0: window 1888270 sent adjust 102400
debug2: channel 0: window 1990670 sent adjust 102400
debug2: channel 0: window 1972351 sent adjust 36840
debug2: channel 0: window 1998840 sent adjust 98312
debug2: channel 0: window 1935461 sent adjust 102400
debug2: channel 0: window 1859653 sent adjust 102400
debug2: channel 0: window 1962053 sent adjust 102400
debug2: channel 0: window 1875952 sent adjust 102400
debug2: channel 0: window 1875952 sent adjust 102400
debug2: channel 0: window 1978352 sent adjust 102400
debug2: channel 0: window 1978352 sent adjust 102400
debug2: channel 0: window 1994774 sent adjust 49158
debug2: channel 0: window 1978397 sent adjust 118755
debug2: channel 0: window 1998840 sent adjust 86045
debug2: channel 0: window 1986592 sent adjust 110560
debug2: channel 0: window 1998840 sent adjust 61482
debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1994752 sent adjust 102400
debug2: channel 0: window 1994752 sent adjust 102400
debug3: receive packet: type 98
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug3: receive packet: type 96
debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
debug3: receive packet: type 97
debug2: channel 0: rcvd close
debug2: channel 0: chan_shutdown_read (i0 o1 sock -1 wfd 5 efd 7 [write])
debug2: channel 0: input open -> closed
debug3: channel 0: will not send data after close
debug3: channel 0: will not send data after close
debug2: channel 0: obuf empty
debug2: channel 0: chan_shutdown_write (i3 o1 sock -1 wfd 6 efd 7 [write])
debug2: channel 0: output drain -> closed
debug2: channel 0: almost dead
debug2: channel 0: gc: notify user
debug2: channel 0: gc: user detached
debug2: channel 0: send close
debug3: send packet: type 97
debug2: channel 0: is dead
debug2: channel 0: garbage collecting
debug1: channel 0: free: client-session, nchannels 1
debug3: channel 0: status: The following connections are open:
  #0 client-session (t4 r43 i3/0 o3/0 e[write]/0 fd -1/-1/7 sock -1 cc -1)
 
debug3: send packet: type 1
fatal: early EOF
debug3: fd 0 is not O_NONBLOCK
debug3: fd 2 is not O_NONBLOCK
Transferred: sent 5076, received 5604380 bytes, in 1.2 seconds
Bytes per second: sent 4175.1, received 4609725.3
debug1: Exit status 0
fatal: the remote end hung up unexpectedly
fatal: index-pack failed

jamiebrynes7 pushed a commit to spatialos/gdk-for-unity that referenced this issue Apr 29, 2019
@petemounce
Copy link

@NoMoreFood I think in our case (based on the timestamps in my log above), around 1.1s between running the command in the client and starting to receive data.

@analogrelay
Copy link

I get this fairly consistently (but not 100% reliably) when cloning Git repos using Win32 SSH. Most commonly when cloning https://github.com/aspnet/AspNetCore

@petemounce
Copy link

We get this around 50% of the time, intermittently. This also happens with OpenSSH 8.0.0.0p1-Beta

@SteveL-MSFT SteveL-MSFT added this to the vNext milestone Jul 9, 2019
@NoMoreFood
Copy link

@anurse @petemounce What OSes are you running?

@petemounce
Copy link

@NoMoreFood I am running Windows Server 2016 Desktop/GUI edition within Google Cloud. Image Family is windows-2016, specific image at the moment is windows-server-2016-dc-v20190620. The images are regularly changed by Google as patch Tuesdays and other changes happen - if that image is no longer present, it's possible to show deprecated images via the console or a gcloud flag.

If it's relevant, https://github.com/GoogleCloudPlatform/compute-image-windows/ is I think what goes into the image-creation process to prepare them for customer usage.

@petemounce
Copy link

This is the script I'm using to bootstrap sshd onto the instances in question - this is relevant to this issue I think because it's how ssh.exe becomes present, and how the default shell is set to powershell.

(The verbatim script relies on GCE-GetMetadata to pull the base64 encoded sshd_config (also snippet'd here) and a base64 encoded public key from GCE instance-metadata. Those sections can be hacked out with hard-coded values if you don't run this within GCE to reproduce; I don't have the Azure/AWS equivalents supported, sorry. I included the full script + sshd_config because I thought you might be interested in adapting it into your docs around enabling pubkey auth - if there's an issue around that, please point me at it and I can replicate there if that's helpful)

bootstrap.ps1

param(
    [string] $username = "different", # This isn't who we use ;)
    [string] $open_ssh_version = "v8.0.0.0p1-Beta"
)
$ErrorActionPreference = 'Stop'; # stop on all errors

Start-Transcript -Path "$($PSScriptRoot)/bootstrap.log" -IncludeInvocationHeader

Write-Output "Make the administrator user for packer: $($username)"
$Count = Get-Random -min 24 -max 32
$TempPassword = -join ((65..90) + (97..122) + (48..57) | Get-Random -Count $Count | % {[char]$_})
New-LocalUser -Name $username -PasswordNeverExpires -Password ($TempPassword | ConvertTo-SecureString -AsPlainText -Force) | out-null
Add-LocalGroupMember -Group "Administrators" -Member $username | out-null

Write-Output "Install openssh..."
# https://github.com/PowerShell/Win32-OpenSSH/wiki/Install-Win32-OpenSSH
$url = "https://github.com/PowerShell/Win32-OpenSSH/releases/download/$($open_ssh_version)/OpenSSH-Win64.zip"
$output = "$($env:TEMP)/OpenSSH-Win64.zip"
# github went TLS 1.2 only from 2018
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12;
(New-Object System.Net.WebClient).DownloadFile($url, $output)
Expand-Archive -Path $output -DestinationPath "$($env:PROGRAMFILES)"
. "$($env:PROGRAMFILES)/OpenSSH-Win64/install-sshd.ps1"

Write-Output "Set default shell for sshd connections to powershell to avoid legacy cmd nonsense"
# https://github.com/PowerShell/Win32-OpenSSH/wiki/DefaultShell
[System.Environment]::SetEnvironmentVariable("PATH", "$($env:PATH);c:\Program Files\OpenSSH-Win64", [System.EnvironmentVariableTarget]::Machine)
New-Item -path "HKLM:\SOFTWARE\OpenSSH" -force
New-ItemProperty -Path "HKLM:\SOFTWARE\OpenSSH" -Name DefaultShell -Value "C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -PropertyType String -Force
New-ItemProperty -Path "HKLM:\SOFTWARE\OpenSSH" -Name DefaultShellCommandOption -Value "/c" -PropertyType String -Force

Write-Output "Set up sshd_config..."
# An sshd_config that's basically the stock one but
# * allows public key authentication (which is commented by default)
$sshd_config = "$($env:PROGRAMDATA)/ssh"
New-Item "$($sshd_config)" -Type Directory -Force
$sshd_config_file = "$($sshd_config)/sshd_config"
$sshd_config_base64 = get-gcemetadata -path "instance/attributes/imp-sshd-config-base64"
$sshd_config_text = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($sshd_config_base64))
Set-Content -Path "$($sshd_config_file)" -Value "$($sshd_config_text)"

Write-Output "Place public key in system-wide authorized keys..."
# Put the public key in the system-wide config location to avoid needing to programmatically create user profile. Sigh.
$public_key_base64 = get-gcemetadata -path "instance/attributes/imp-public-key-base64"
$public_key = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($public_key_base64))
$system_authorized_keys_file = "$($sshd_config)/administrators_authorized_keys"
if (Test-Path "$($system_authorized_keys_file)") {
  Remove-Item -Path "$($system_authorized_keys_file)" -force
}
Set-Content -Path "$($system_authorized_keys_file)" -Value "$($public_key)"

Write-Output "Ensure access control on authorized_keys meets the requirements..."
# https://stackoverflow.com/questions/16212816/setting-up-openssh-for-windows-using-public-key-authentication
# https://github.com/jen20/packer-aws-windows-ssh/blob/master/files/configure-source-ssh.ps1#L99-L114
$acl = Get-ACL -Path $system_authorized_keys_file
$acl.SetAccessRuleProtection($True, $True)
Set-Acl -Path $system_authorized_keys_file -AclObject $acl
$acl = Get-ACL -Path $system_authorized_keys_file
$ar = New-Object System.Security.AccessControl.FileSystemAccessRule( `
	"NT Authority\Authenticated Users", "ReadAndExecute", "Allow")
$acl.RemoveAccessRule($ar)
$ar = New-Object System.Security.AccessControl.FileSystemAccessRule( `
	"BUILTIN\Administrators", "FullControl", "Allow")
$acl.RemoveAccessRule($ar)
$ar = New-Object System.Security.AccessControl.FileSystemAccessRule( `
	"BUILTIN\Users", "FullControl", "Allow")
$acl.RemoveAccessRule($ar)
Set-Acl -Path $system_authorized_keys_file -AclObject $acl

Write-Output "Restart sshd..."
Restart-Service sshd
Set-Service sshd -StartupType Automatic

Write-Output "Open up the firewall port for ssh"
New-NetFirewallRule -Name sshd -DisplayName 'OpenSSH Server (sshd)' -Enabled True -Direction Inbound -Protocol TCP -Action Allow -LocalPort 22

Write-Output "NGEN powershell assembly, improves startup time of powershell by 10x"
$env:path = [Runtime.InteropServices.RuntimeEnvironment]::GetRuntimeDirectory()
[AppDomain]::CurrentDomain.GetAssemblies() | % {
  if (! $_.location) {continue}
  $Name = Split-Path $_.location -leaf
  if ($Name.startswith("Microsoft.PowerShell.")) {
    Write-Host -ForegroundColor Yellow "NGENing : $Name"
    ngen install $_.location | % {"`t$_"}
  }
}

Write-Output "Done."
Stop-Transcript

sshd_config

Port 22
# Logging
#SyslogFacility AUTH
#LogLevel INFO

# Authentication:

#LoginGraceTime 2m
#PermitRootLogin prohibit-password
#StrictModes yes
#MaxAuthTries 6
#MaxSessions 10

PubkeyAuthentication yes

# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
AuthorizedKeysFile	.ssh/authorized_keys

PasswordAuthentication no
PermitEmptyPasswords no

TCPKeepAlive yes
ClientAliveInterval 300
ClientAliveCountMax 10

# override default of no subsystems
Subsystem	sftp	sftp-server.exe

Match Group administrators
    AuthorizedKeysFile __PROGRAMDATA__/ssh/administrators_authorized_keys

@analogrelay
Copy link

@anurse @petemounce What OSes are you running?

Windows 10 (various builds, including 18932.1000). Just a standard interactive instance (no docker containers, etc.). Literally just tried and got it again:

cmd /c ver

Microsoft Windows [Version 10.0.18932.1000]

git clone git@github.com:aspnet/AspNetCore.git ANCTest1

Cloning into 'ANCTest1'...
Enter passphrase for key 'C:\Users\anurse/.ssh/id_rsa':
remote: Enumerating objects: 44, done.
remote: Counting objects: 100% (44/44), done.
remote: Compressing objects: 100% (43/43), done.
fatal: The remote end hung up unexpectedly77.63 MiB | 19.90 MiB/s
fatal: early EOF
fatal: index-pack failed

Get-Command ssh

C:\Windows\System32\OpenSSH\ssh.exe

$env:GIT_SSH

C:\Windows\System32\OpenSSH\ssh.exe

C:\Windows\System32\OpenSSH\ssh.exe -V

OpenSSH_for_Windows_7.7p1, LibreSSL 2.6.5

git clone git@github.com:aspnet/AspNetCore.git ANCTest
With GIT_SSH_COMMAND=C:\\Windows\\System32\\OpenSSH\\ssh.exe -vvv

clone.log

@analogrelay
Copy link

It's not a 100% consistent repro for me, but it's like.. 80-90% consistent. I can usually get a successful clone by repeating a bunch of times. Alternately, if I reset git to use it's built-in build of ssh (OpenSSH_7.7p1, OpenSSL 1.0.2o 27 Mar 2018) then it clones successfully every time.

@petemounce
Copy link

I badly don't want to use the ssh bundled with git-for-windows because then I (I think?) would need to find a different arrangement for auth, differently from the other two platforms I support.

I think my options would be:

  • git-for-windows ssh-agent - but then I need to manage the ssh-agent as a windows service (probably nssm) but find a way to capture its stdout when it launches to record that for my CI processes to know about, and deal with restarts/crashes etc
  • https auth - but then, git credential helper and a different auth flow from the other two platforms I support
  • see if git-for-windows ssh can talk with Windows ssh-agent via a named pipe - but I know very little about those beyond the phrase.

@petemounce
Copy link

@NoMoreFood friendly ping? Is there any more information I could provide that might help?

@RNabel
Copy link

RNabel commented Aug 13, 2019

@NoMoreFood Running into this on Windows Server 2019 as well (OpenSSH version 7.7.2.1 from Windows Update / pre-installed with Windows). Is there a timeline on addressing this?

@NoMoreFood
Copy link

@petemounce @RNabel I've been working on other projects, but might be able to sneak this in. Can you verify it's still reproducible on 8.0?

@petemounce
Copy link

It's definitely still reproducible on 8. I'm less sure it's a data problem, since on retry it can succeed against the same sha being cloned.

@333fred
Copy link

333fred commented Aug 20, 2019

I see this all the time when attempting to clone dotnet/roslyn with 7.7.2.1. It's getting to the point that I'm going to have to disable the service.

@petemounce
Copy link

@NoMoreFood friendly ping? Any news?

@333fred
Copy link

333fred commented Sep 24, 2019

If you need a repo that commonly reproduces this, try cloning https://github.com/dotnet/roslyn. It's a pretty reliable failure for me.

@goldstar611
Copy link

goldstar611 commented Sep 24, 2019

I also consistently get this on Windows 10 when running git clone <<any repo>> over a SSH connection. Even doing ssh localhost before running git clone causes failures described here

@petemounce
Copy link

@NoMoreFood do you need any more information for repro, or want any other help?

@asherccohen
Copy link

Any news on this?I'm seeing the same behaviour...

@vvuk
Copy link

vvuk commented Nov 20, 2019

This seemed to be much better once I moved to OpenSSH_for_Windows_8.0p1 but now I'm back to seeing it all the time.

@mgkuhn
Copy link

mgkuhn commented Jul 22, 2022

Regarding your "real fix" comment: can you explain a bit more why you think this problem can't be solved properly via stdio and warrants use of a Windows named pipe? Especially in case all that is missing is an fsync() at the end? Since several additional widely-used tools, beyond Git, use ssh via stdio this way, such as Subversion and Rsync, it would be nice if this could be fixed without all these other tools needing new command-line options. After all, PuTTY's plink also appears to manage without that.

@vvuk
Copy link

vvuk commented Jul 22, 2022

Windows doesn't really have stdio in the same way that Unix-like machines do as just boring standard file descriptors, at least as far as I'm aware; there are extra layers involved because they're tied to the windows "console" subsystem. These layers add quirks that unix-y apps try to work around to bring things back to just boring file descriptors. This also gets even more complicated when multiple approaches to this are involved... e.g. git-for-windows uses the mingw layer, while this openssh does not.

I took a guess as to what might be happening: that there was still some data still in a buffer somewhere that gets lost when the ssh process exits. I have no idea how to validate this theory to confirm that that's exactly what's happening; it could be that the fsync() perturbs things just enough to make it seem like it works more frequently. On Unix machines, fsync on stdout that's being piped to another process shouldn't have any effect.

The named pipe idea just adds an optional path that some apps could decide to implement support for (assuming they don't have a way to read from a file already, because it can just be a \\.\pipe\... path), because it would remove a bunch of the console layers and allow things to get back to "just boring standard file descriptors". Certainly wouldn't be required to have all apps implement it, and might not even be needed if we can validate that flushing the stdout handle fixes things, but I'm always a fan of getting rid of unnecessary layers. I'm not sure what plink does; maybe it does the same flushing already.

@vvuk
Copy link

vvuk commented Jul 22, 2022

From a glance at plink code -- they have a full native-Windows IO layer, not going through the Microsoft CRT. Entirely possible (actually, totally likely) this buffering is at the CRT level.

@mgkuhn
Copy link

mgkuhn commented Jul 22, 2022

Hm. On POSIX systems, fflush() flushes the CRT/libc buffers of file streams to write(), whereas fsync() flushes kernel buffers of file descriptors to disk, and shouldn't really have any effect on pipes. On Windows, there is FlushFileBuffers on file handles, and fflush() in the Visual Studio C runtime library. But the fsync() that your patch calls on Windows comes from contrib/win32/win32compat/inc/unistd.h and is a macro for w32_fsync() in contrib/win32/win32compat/w32fd.c which just looks up the handle and calls FlushFileBuffers.

@vvuk
Copy link

vvuk commented Jul 22, 2022

Correct, this should be a no-op on POSIX systems; the fsync() should be doing nothing. I didn't realize there was a custom POSIX emulation layer in here, but ultimately FlushFileBuffers is where I assumed this would get to. It's poking something, and I'm not sure what the actual handle is at that point. I guess that's an easy thing to debug and look at (e.g. is it ultimately a pipe?).

@mgkuhn
Copy link

mgkuhn commented Jul 22, 2022

Does your fsync = FlushFileBuffers call essentially come before OpenSSH for Windows calls ClosePseudoConsole to close the ConPTY? Does this mean that a call to ClosePseudoConsole does not implicitly also do everything that FlushFileBuffers would do?

On the ClosePseudoConsole man page, I see this curious remark:

Upon closing a pseudoconsole, client applications attached to the session will be terminated as well.

A final painted frame may arrive on the hOutput handle originally provided to CreatePseudoConsole when this API is called. It is expected that the caller will drain this information from the communication channel buffer and either present it or discard it. Failure to drain the buffer may cause the Close call to wait indefinitely until it is drained or the communication channels are broken another way.

Is all that “draining of information and presenting it” after calling ClosePseudoConsole really done correctly by OpenSSH for Windows? (And if not, is your pre-closure fsync() call just helping to sometimes mitigate the effects of a failure to do that?)

Hm ... Looking at contrib/win32/win32compat/win32_pty.c I can't actually see any call to ClosePseudoConsole (and subsequent draining). Is that perhaps the real problem?

@vvuk
Copy link

vvuk commented Jul 22, 2022

Yeah; that's one thing I noticed from plink -- it does explicitly CloseHandle() on EOF. So it might well be that the lack of a ClosePseudoHandle is the actual problem.

@mgkuhn
Copy link

mgkuhn commented Jul 22, 2022

Does CreatePseudoConsole actually happen in conhost.exe (which is called by sshd), and is that what seems to be open-sourced here? But anyway, that's all on the sshd server side, whereas your fsync is on the ssh client, right?

@vvuk
Copy link

vvuk commented Jul 22, 2022

I don't think this goes through the win32_pty.c paths at all; ssh isn't creating a pseudoconsole that I can see. It's using whatever stdin/out/err handles are given to the process. Instead, we go through this https://github.com/PowerShell/openssh-portable/blob/latestw_all/ssh.c#L1879 which ultimately gets to DuplicateHandle() and then I'm not sure what the stdio hookup does. But perhaps my fsync() should be a close(); testing that now.

@vvuk
Copy link

vvuk commented Jul 22, 2022

Bah. Seeing the issue again with either fsync() or close(). So back to looking for another fix.

@vvuk
Copy link

vvuk commented Jul 26, 2022

So another theory of what's happening... git has this code that's doing the reading:

https://github.com/git/git/blob/6a475b71f8c4ce708d69fdc9317aefbde3769e25/builtin/index-pack.c#L300-L305

It's clearly expecting reads from stdin to be blocking (bytes read == 0 is the error we're seeing). Digging through the Windows ucrt source, all this ultimately flows into a call to ReadFile as you'd expect. MSDN says that amongst the reasons why ReadFile returns is "A write operation completes on the write end of the pipe." (which stdout->stdin should be). Later the docs say that 0 will be returned if the sender called WriteFile with a length of 0. I don't know if 0 is only if the sender called WriteFile with 0 size, or if a timing issue could cause 0 bytes to be returned with otherwise no error.

I ended up hitting the same "early EOF" issue even while moving comms between the two to a named pipe on windows; the git side is just using the microsoft CRT's implementation of posix file IO though, whereas openssh has its own win32 API layer.

@SixHourDays
Copy link

SixHourDays commented Jul 31, 2022

TL; DR: I had same issue on 8.1.p1, uninstalled it and installed Git-2.37.1-64, choosing its bundled ssh, which is OpenSSH_9.0p1, which has resolved it for me.

Full post:
I've recently had this issue as well. My test case is ssh cloning a 200mb repo from my local Ubuntu 20.04 server to my Win10 box, all over gigabit LAN. The server just has a gitea install serving my repo.

On Win10, with my .gitconfig containing sshCommand = "ssh -vvv", I'd been using OpenSSH v8.1.p1, and I'd get about 30% failure rate.

The command was simple, I just increment NN as 00, 01, 02 etc.
git.exe clone --progress -v "git@192.168.0.2:me/myrepo.git" localpath\myrepoNN

Here's the relevant ending output from a failed transfer. Notice the 'fatal: early EOF line'.

			debug2: channel 0: window 1718393 sent adjust 102400
			debug2: channel 0: window 1820793 sent adjust 102400
			debug3: receive packet: type 96
			debug2: channel 0: rcvd eof
			debug2: channel 0: output open -> drain
			debug3: receive packet: type 98
			debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
			debug3: receive packet: type 98
			debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
			debug2: channel 0: rcvd eow
			debug2: channel 0: chan_shutdown_read (i0 o1 sock -1 wfd 4 efd 6 [write])
			debug2: channel 0: input open -> closed
			debug3: receive packet: type 97
			debug2: channel 0: rcvd close
			debug3: channel 0: will not send data after close
			debug3: channel 0: will not send data after close
			debug3: channel 0: will not send data after close
			debug2: channel 0: obuf empty
			debug2: channel 0: chan_shutdown_write (i3 o1 sock -1 wfd 5 efd 6 [write])
			debug2: channel 0: output drain -> closed
			debug2: channel 0: almost dead
			debug2: channel 0: gc: notify user
			debug2: channel 0: gc: user detached
			debug2: channel 0: send close
			debug3: send packet: type 978/1874), 61.25 MiB | 61.25 MiB/s
			fetch-pack: unexpected disconnect while reading sideband packet
			debug2: channel 0: is dead
			fatal: early EOF
			debug2: channel 0: garbage collecting
			debug1: channel 0: free: client-session, nchannels 1
			debug3: channel 0: status: The following connections are open:
			  #0 client-session (t4 r0 i3/0 o3/0 e[write]/0 fd -1/-1/6 sock -1 cc -1)

			debug3: send packet: type 1
			debug3: fd 0 is not O_NONBLOCK
			Transferred: sent 15884, received 66034268 bytes, in 1.6 seconds
			Bytes per second: sent 9679.3, received 40239429.8
			debug1: Exit status 0
			fatal: fetch-pack: invalid index-pack output

Having found this thread, I tried a newer OpenSSH. Sadly, OpenSSH V8.9.1.0p1 - same results, roughly 30% failure rate.
I removed OpenSSH (and the env vars pointing to it).

Then,
I uninstalled and reinstalled Git-2.37.1-64 (same version I'd used the whole time) ...but this time I let it use its bundled SSH, which turns out to be OpenSSH_9.0p1!

Retrying my test again exactly as above - perfect run, 20/20 clones now succeed.

So, I'm unsure if 9.0p1 has a fix, or if it's simply built different than the win32-openssh versions available here...

Either way, I suggest people try out OpenSSH_9.0p1 themselves, and see if it resolves the issue.

@mgkuhn
Copy link

mgkuhn commented Aug 1, 2022

If you enable level4 debug output with -vvvv, that should enable the lines

debug4("write - reporting %d bytes written, io:%p", bytes_copied, pio);

and

debug4("read - io:%p read: %d remaining: %d", pio, bytes_copied,
		pio->read_details.remaining);

in contrib/win32/win32compat/fileio.c, which might give clues whether the custom Win32 wrappers for read and write used here do anything inappropriate with zero-length reads and writes, as suggested by @vvuk.

@mgkuhn
Copy link

mgkuhn commented Aug 1, 2022

@vvuk Good idea! Looking at fileio_write_wrapper there does not appear to be any protection against a zero-length write happening, which Win32 (unlike POSIX) might pass on the the consumer, that a POSIX application might receive as a zero-length read and interpret as an EOF indication.

What happens, if you simply add at the start of that function something like

if (bytes_to_copy == 0) return 0;

to reduce the likelihood of WriteFileEx being called to do a zero-length write?

Or also add debug output in fileio_write to check if WriteFileEx or syncio_initiate_write get called with bytes_copied == 0 when things go wrong.

Similarly add debug checks in WriteThread in termio.c, to check that the functions WriteConsoleW and WriteFile there never get called with a zero length buffer. Or, if they do, add a test that doesn't call them in case there are zero bytes to be written.

@mgkuhn
Copy link

mgkuhn commented Aug 1, 2022

@manojampalam Are you confident that the write calls in fileio.h and termios.h will not accidentally cause zero-byte writes, which the receiving git process might then interpret as EOF indications?

@SixHourDays
Copy link

@mgkuhn

If you enable level4 debug output with -vvvv, that would enable...

so I re-added the Optional Features OpenSSH (8.1.p1), and added this to my global .gitconfig

[core]
	sshCommand = "C:/Windows/System32/OpenSSH/ssh.exe -vvvv"

I can still reliably reproduce the issue, but the output doesn't contain any new lines with "write - reporting" or "read - io" in it :-(

a recent output:

debug2: channel 0: window 1622541 sent adjust 102400
debug2: channel 0: window 1724941 sent adjust 102400
debug2: channel 0: window 1827341 sent adjust 102400
debug2: channel 0: window 1929741 sent adjust 102400
debug3: receive packet: type 96
debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
debug3: receive packet: type 98
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug3: receive packet: type 98
debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
debug2: channel 0: rcvd eow
debug2: channel 0: chan_shutdown_read (i0 o1 sock -1 wfd 4 efd 6 [write])
debug2: channel 0: input open -> closed
debug3: receive packet: type 97
debug2: channel 0: rcvd close
debug3: channel 0: will not send data after close
debug3: channel 0: will not send data after close
debug2: channel 0: obuf empty
debug2: channel 0: chan_shutdown_write (i3 o1 sock -1 wfd 5 efd 6 [write])
debug2: channel 0: output drain -> closed
debug2: channel 0: almost dead
debug2: channel 0: gc: notify user
debug2: channel 0: gc: user detached
debug2: channel 0: send close
debug3: send packet: type 97
debug2: channel 0: is dead
debug2: channel 0: garbage collecting
debug1: channel 0: free: client-session, nchannels 1
debug3: channel 0: status: The following connections are open:
  #0 client-session (t4 r0 i3/0 o3/0 e[write]/0 fd -1/-1/6 sock -1 cc -1)

debug3: send packet: type 1
debug3: fd 0 is not O_NONBLOCK
Transferred: sent 15904, received 66034628 bytes, in 1.6 seconds
Bytes per second: sent 10053.0, received 41740834.6
debug1: Exit status 0
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

I'm happy to keep experimenting on my 'easy' test case here - lmk what else I could try.

@vvuk
Copy link

vvuk commented Aug 1, 2022

debug4/debug5 are actually disabled in code here -- https://github.com/PowerShell/openssh-portable/blob/latestw_all/contrib/win32/win32compat/Debug.h#L8 -- and are aliases for debug2/debug3; you need to recompile to enable them. openssh-portable doesn't have a debug level > 3.

@cwgreene
Copy link

cwgreene commented Oct 7, 2022

Hi I believe that I know what is causing this. Why was this issue closed?

@cwgreene
Copy link

cwgreene commented Oct 7, 2022

When git establishes its connection with ssh it uses a non-blocking pipe, so the nonblocking code paths are the relevant ones. In non-blocking mode, the main write (reallyt w32_write) method that happens in channel_handle_wfd,

https://github.com/PowerShell/openssh-portable/blob/latestw_all/channels.c#L2051

ultimately dispatches the copying writing to a pipe to a worker thread (this is because it appears that the pipe uses the termio.c code to do the communication). If another write to the pipe is needed before that completes, the main thread will block based on the write_details.pending attribute of the pio io descriptor (and is updated on the main thread). This ensures that all writes end up being done synchronously to the output.

However the close method

https://github.com/PowerShell/openssh-portable/blob/latestw_all/contrib/win32/win32compat/termio.c#L263

introduces several race conditions. If the thread has not yet invoked WriteFile, then this will actually work, as later on in that method we wait for the thread to complete if it's still writing. However, if WriteFile has been invoked in the thread, than the ouput will be truncated, the file will close, and git will raise the error.

I have found this issue is not reproducible in a debug build, but is reproducible in a release build. It is not clear to me why CancelIoEx(WINHANDLE(pio), NULL); is being invoked, and it seems like it should be removeable. Alternatively the check that happens below it

	if (pio->write_details.pending)
		WaitForSingleObject(pio->write_overlapped.hEvent, INFINITE);
	/* drain queued APCs */
	SleepEx(0, TRUE);

Could be moved prior to the invocation of CancelIoEx to allow the final packet to be fully written.

@octolith
Copy link

This is definitely not fixed.

@gruselglatz
Copy link

@bagajjal @tgauth this issue is not fixed, we see it in the latest version (9.2) too!

Pls where should I open an issue to get this fixed? In the other project?

@LukeCarrier
Copy link

I gave up waiting for Microsoft to fix the bug and ship it — if they’ve failed to fix it for three years and the issue is closed I’m not sure how much clearer they can make the fact they don’t care.

I have switched to using PuTTY’s plink as my GIT_SSH and set up a startup item for Pageant to serve as my SSH agent. Hope this helps.

@gruselglatz
Copy link

@LukeCarrier Thanks we also found this solution, but in reality its pretty sad to use another tool, roll it out, update it and so on. In an Enterprise context it's not so nice tbh.

And it's pretty sad for MS to just ignore or be unable to simply ship a small bugfix...

@333fred
Copy link

333fred commented Apr 28, 2023

For those still following along on this issue, the fix was merged and is available in the current beta: #2012 (comment).

@Okeanos
Copy link

Okeanos commented Apr 28, 2023

@333fred can you give us any information whether this fix will be backported to Windows 10 given Microsoft's current stance on the Windows 10 update policy? I am asking because Windows 10 is still used in a lot of corporate environments (sometimes not even version 22H2) and it's likely it'll stay this way for a long while.
Particularly for developers constrained by this it would be a very, very good thing if Microsoft could include this in some Windows 10 patch in the foreseeable future.

@333fred
Copy link

333fred commented Apr 28, 2023

I have absolutely no idea. I'm just a user who was hit by this bug here 🙂.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests