You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been seeing an intermittent issue where my pipeline starts erroring out on jobs and than resolves itself. On 8/7 starting at 11:40a on a pipeline a bunch of jobs that were triggered in close succession all failed with similar errors.
(don't mind the streak of red failures following up to the event)
Build # | Errors
93 | Put /volumes/42615995-7b9d-48af-75ea-c9c61b524a3c/stream-in?path=.: unexpected EOF
94 | Put /volumes/fb3c1dde-a9df-4292-41b2-6dbd94c90562/stream-in?path=.: http: unexpected EOF reading trailer
95 | Put /volumes/be66f2b7-82f3-49db-7dc3-f8abdce5380e/stream-in?path=.: unexpected EOF
96 | Put /volumes/26f1514f-8a49-4049-4e0f-b106edcdd8e9/stream-in?path=.: http: unexpected EOF reading trailer
97 | Put /volumes/0356294f-165c-413a-49f9-a96af4568cc0/stream-in?path=.: http: unexpected EOF reading trailer
99 | Put /volumes/bf63b10d-8a94-4ec5-6ad1-13f287051adb/stream-in?path=.: write tcp 127.0.0.1:49000->127.0.0.1:33561: write: connection reset by peer
~12:02p @ worker-1
{"timestamp":"1502107349.425066948","source":"guardian","message":"guardian.api.garden-server.destroy.destroyed","log_level":1,"data":{"handle":"5b371a5f-9d33-49ba-6d39-ea55453b76e8","session":"3.1.111336"}}
{"timestamp":"1502107354.649356604","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-out.done","log_level":0,"data":{"session":"4.1.91795","volume":"ae831976-e1f8-4961-7593-6c6377159c97"}}
{"timestamp":"1502107357.741293907","source":"guardian","message":"guardian.api.garden-server.run.exited","log_level":1,"data":{"handle":"dabd5591-79c5-4f00-7114-f459c50eef64","id":"task","session":"3.1.111335","status":1}}
{"timestamp":"1502107370.631527662","source":"worker","message":"worker.beacon.keepalive.failed","log_level":2,"data":{"error":"EOF","session":"4.9"}}
{"timestamp":"1502107370.631785393","source":"worker","message":"worker.beacon.restarting","log_level":2,"data":{"error":"wait: remote command exited without exit status or exit signal","session":"4"}}
{"timestamp":"1502107375.768595457","source":"tsa","message":"tsa.connection.channel.forward-worker.register.start","log_level":1,"data":{"remote":"192.168.8.1:42946","session":"121.1.1.5","worker-address":"127.0.0.1:40437","worker-platform":"linux","worker-tags":""}}
{"timestamp":"1502107375.771958351","source":"guardian","message":"guardian.list-containers.starting","log_level":1,"data":{"session":"79247"}}
~11:49a @ worker-2
{"timestamp":"1502131778.239357471","source":"guardian","message":"guardian.destroy.finished","log_level":1,"data":{"handle":"0b4b62b2-f5a8-46b1-4a72-c81aaaba96ee","session":"16919"}}
{"timestamp":"1502131778.239391804","source":"guardian","message":"guardian.api.garden-server.destroy.destroyed","log_level":1,"data":{"handle":"0b4b62b2-f5a8-46b1-4a72-c81aaaba96ee","session":"3.1.17088"}}
{"timestamp":"1502131785.931120634","source":"worker","message":"worker.beacon.restarting","log_level":2,"data":{"error":"wait: remote command exited without exit status or exit signal","session":"4"}}
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
{"timestamp":"1502131785.932104349","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-in.bad-stream-payload","log_level":1,"data":{"error":"exit status 2","session":"4.1.17910","volume":"bf63b10d-8a94-4ec5-6ad1-13f287051adb"}}
{"timestamp":"1502131785.932196379","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-in.done","log_level":0,"data":{"session":"4.1.17910","volume":"bf63b10d-8a94-4ec5-6ad1-13f287051adb"}}
~11:48a @ worker-3
{"timestamp":"1502131709.111960411","source":"guardian","message":"guardian.destroy.start","log_level":1,"data":{"handle":"51d6edca-04e2-4042-5ccf-2cbb7d9639df","session":"27912"}}
{"timestamp":"1502131709.112079144","source":"guardian","message":"guardian.destroy.finished","log_level":1,"data":{"handle":"51d6edca-04e2-4042-5ccf-2cbb7d9639df","session":"27912"}}
{"timestamp":"1502131709.112115145","source":"guardian","message":"guardian.api.garden-server.destroy.failed","log_level":2,"data":{"error":"unknown handle: 51d6edca-04e2-4042-5ccf-2cbb7d9639df","handle":"51d6edca-04e2-4042-5ccf-2cbb7d9639df","session":"3.1.28059"}}
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
{"timestamp":"1502131734.684698820","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-in.bad-stream-payload","log_level":1,"data":{"error":"exit status 2","session":"4.1.17860","volume":"42615995-7b9d-48af-75ea-c9c61b524a3c"}}
{"timestamp":"1502131734.684844494","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-in.done","log_level":0,"data":{"session":"4.1.17860","volume":"42615995-7b9d-48af-75ea-c9c61b524a3c"}}
~11:51 @ worker-3
{"timestamp":"1502131866.838005304","source":"guardian","message":"guardian.run.finished","log_level":1,"data":{"handle":"f6e0af69-2491-49da-77d2-6d68094336e5","path":"code/pikachu/ci/clean_build_and_unit_test","session":"27929"}}
{"timestamp":"1502131866.838015795","source":"guardian","message":"guardian.api.garden-server.run.spawned","log_level":1,"data":{"handle":"f6e0af69-2491-49da-77d2-6d68094336e5","id":"task","session":"3.1.28080","spec":{"Path":"code/pikachu/ci/clean_build_and_unit_test","Dir":"/tmp/build/eae1f940","User":"root","Limits":{},"TTY":{}}}}
{"timestamp":"1502131880.435690165","source":"worker","message":"worker.beacon.restarting","log_level":2,"data":{"error":"wait: remote command exited without exit status or exit signal","session":"4"}}
{"timestamp":"1502131880.435745001","source":"worker","message":"worker.beacon.keepalive.failed","log_level":2,"data":{"error":"EOF","session":"4.15"}}
{"timestamp":"1502131885.488456726","source":"tsa","message":"tsa.connection.channel.forward-worker.register.start","log_level":1,"data":{"remote":"192.168.8.1:53120","session":"132.1.1.5","worker-address":"127.0.0.1:36753","worker-platform":"linux","worker-tags":"compiler,demo_sim"}}
{"timestamp":"1502131885.488703012","source":"guardian","message":"guardian.list-containers.starting","log_level":1,"data":{"session":"27930"}}
{"timestamp":"1502131885.488749743","source":"guardian","message":"guardian.list-containers.finished","log_level":1,"data":{"session":"27930"}}
~11:49a @ worker-4
{"timestamp":"1502131720.101345539","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-in.start","log_level":0,"data":{"session":"4.1.18126","volume":"0356294f-165c-413a-49f9-a96af4568cc0"}}
{"timestamp":"1502131720.101536751","source":"baggageclaim","message":"baggageclaim.repository.stream-in.namespace.start","log_level":0,"data":{"full-path":"/home/ci/concourse-ci/data/volumes/live/0356294f-165c-413a-49f9-a96af4568cc0/volume","path":"/home/ci/concourse-ci/data/volumes/live/0356294f-165c-413a-49f9-a96af4568cc0/volume","session":"3.391262.1","sub-path":".","volume":"0356294f-165c-413a-49f9-a96af4568cc0"}}
{"timestamp":"1502131720.101616621","source":"baggageclaim","message":"baggageclaim.repository.stream-in.namespace.done","log_level":0,"data":{"full-path":"/home/ci/concourse-ci/data/volumes/live/0356294f-165c-413a-49f9-a96af4568cc0/volume","path":"/home/ci/concourse-ci/data/volumes/live/0356294f-165c-413a-49f9-a96af4568cc0/volume","session":"3.391262.1","sub-path":".","volume":"0356294f-165c-413a-49f9-a96af4568cc0"}}
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
{"timestamp":"1502131734.682876825","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-in.bad-stream-payload","log_level":1,"data":{"error":"exit status 2","session":"4.1.18126","volume":"0356294f-165c-413a-49f9-a96af4568cc0"}}
{"timestamp":"1502131734.682912588","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-in.done","log_level":0,"data":{"session":"4.1.18126","volume":"0356294f-165c-413a-49f9-a96af4568cc0"}}
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
{"timestamp":"1502131734.684317589","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-in.bad-stream-payload","log_level":1,"data":{"error":"exit status 2","session":"4.1.18106","volume":"be66f2b7-82f3-49db-7dc3-f8abdce5380e"}}
{"timestamp":"1502131734.684339523","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-in.done","log_level":0,"data":{"session":"4.1.18106","volume":"be66f2b7-82f3-49db-7dc3-f8abdce5380e"}}
{"timestamp":"1502131748.612316370","source":"worker","message":"worker.beacon.keepalive.failed","log_level":2,"data":{"error":"EOF","session":"4.11"}}
{"timestamp":"1502131748.612370968","source":"worker","message":"worker.beacon.restarting","log_level":2,"data":{"error":"wait: remote command exited without exit status or exit signal","session":"4"}}
{"timestamp":"1502131753.759077072","source":"tsa","message":"tsa.connection.channel.forward-worker.register.start","log_level":1,"data":{"remote":"192.168.8.1:53996","session":"130.1.1.5","worker-address":"127.0.0.1:41389","worker-platform":"linux","worker-tags":"compiler,dockerfile"}}
{"timestamp":"1502131753.759771585","source":"guardian","message":"guardian.list-containers.starting","log_level":1,"data":{"session":"16949"}}
~11:49 @ web-atc-tsa
{"log":"{\"timestamp\":\"1502131785.933176041\",\"source\":\"atc\",\"message\":\"atc.syncer.master:scheduler.tick.scheduling.try-start-next-pending-build.do.task.failed-to-create-container-in-garden\",\"log_level\":2,\"data\":{\"build-id\":228,\"build-name\":\"99\",\"container\":\"7f390717-384b-40b6-7eb8-048632981963\",\"error\":\"Put /volumes/bf63b10d-8a94-4ec5-6ad1-13f287051adb/stream-in?path=.: write tcp 127.0.0.1:49000-\\u003e127.0.0.1:33561: write: connection reset by peer\",\"session\":\"16.4.42886.3.5.4.2\"}}\n","stream":"stdout","time":"2017-08-07T18:49:45.93414368Z"}
{"log":"{\"timestamp\":\"1502131785.933877468\",\"source\":\"atc\",\"message\":\"atc.syncer.master:scheduler.tick.scheduling.try-start-next-pending-build.do.task.failed-to-run-action\",\"log_level\":2,\"data\":{\"build-id\":228,\"build-name\":\"99\",\"error\":\"Put /volumes/bf63b10d-8a94-4ec5-6ad1-13f287051adb/stream-in?path=.: write tcp 127.0.0.1:49000-\\u003e127.0.0.1:33561: write: connection reset by peer\",\"session\":\"16.4.42886.3.5.4.2\"}}\n","stream":"stdout","time":"2017-08-07T18:49:45.934521425Z"}
{"log":"{\"timestamp\":\"1502131785.932796955\",\"source\":\"tsa\",\"message\":\"tsa.connection.tcpip-forward.interrupted\",\"log_level\":1,\"data\":{\"remote\":\"192.168.8.1:32986\",\"session\":\"127.2\"}}\n","stream":"stdout","time":"2017-08-07T18:49:45.934542911Z"}
{"log":"{\"timestamp\":\"1502131785.934935570\",\"source\":\"tsa\",\"message\":\"tsa.connection.channel.forward-worker.wait-for-process.failed-to-close-channel\",\"log_level\":2,\"data\":{\"error\":\"EOF\",\"remote\":\"192.168.8.1:32986\",\"session\":\"127.1.1.4\"}}\n","stream":"stdout","time":"2017-08-07T18:49:45.936218943Z"}
{"log":"{\"timestamp\":\"1502131785.933032990\",\"source\":\"tsa\",\"message\":\"tsa.connection.tcpip-forward.failed-to-accept\",\"log_level\":2,\"data\":{\"error\":\"accept tcp [::]:33561: use of closed network connection\",\"remote\":\"192.168.8.1:32986\",\"session\":\"127.3\"}}\n","stream":"stdout","time":"2017-08-07T18:49:45.936513946Z"}
{"log":"{\"timestamp\":\"1502131785.937350035\",\"source\":\"atc\",\"message\":\"atc.syncer.master:scheduler.tick.scheduling.try-start-next-pending-build.do.task.errored\",\"log_level\":1,\"data\":{\"build-id\":228,\"build-name\":\"99\",\"error\":\"Put /volumes/bf63b10d-8a94-4ec5-6ad1-13f287051adb/stream-in?path=.: write tcp 127.0.0.1:49000-\\u003e127.0.0.1:33561: write: connection reset by peer\",\"session\":\"16.4.42886.3.5.4.2\"}}\n","stream":"stdout","time":"2017-08-07T18:49:45.93771808Z"}
{"log":"{\"timestamp\":\"1502131785.948861122\",\"source\":\"atc\",\"message\":\"atc.syncer.master:scheduler.tick.scheduling.try-start-next-pending-build.finish.errored\",\"log_level\":1,\"data\":{\"build-id\":228,\"build-name\":\"99\",\"error\":\"Put /volumes/bf63b10d-8a94-4ec5-6ad1-13f287051adb/stream-in?path=.: write tcp 127.0.0.1:49000-\\u003e127.0.0.1:33561: write: connection reset by peer\",\"session\":\"16.4.42886.3.5.5\"}}\n","stream":"stdout","time":"2017-08-07T18:49:45.949145862Z"}
Concourse version: 3.3.4
Deployment type (BOSH/Docker/binary): Binary
Infrastructure/IaaS: On-site. Mixture of VMs and bare metal. All Ubuntu 16.04, 4.10.0-30. All machines exist together in our LAN on the same switch.
Browser (if applicable): n/a
Did this used to work? This is an intermittent issue that's been happening for a while (hard to say when it started)
The text was updated successfully, but these errors were encountered:
Looking into the code, it appears worker.beacon.keepalive.failed which appears in a few of the workers logs at the incident is caused by a heartbeat request failing. More specifically it's a SendRequest returning EOF for golangs ssh library.
@AdamDorwart Looking to validate this as a Concourse specific issue or a deployment specific issue. Is it possible that there is some network traffic / congestion / other network related jitter which could be causing the ATC to lose ability to talk to workers or vice-versa?
Bug Report
I've been seeing an intermittent issue where my pipeline starts erroring out on jobs and than resolves itself. On 8/7 starting at 11:40a on a pipeline a bunch of jobs that were triggered in close succession all failed with similar errors.
(don't mind the streak of red failures following up to the event)
Build # | Errors
93 |
Put /volumes/42615995-7b9d-48af-75ea-c9c61b524a3c/stream-in?path=.: unexpected EOF
94 |
Put /volumes/fb3c1dde-a9df-4292-41b2-6dbd94c90562/stream-in?path=.: http: unexpected EOF reading trailer
95 |
Put /volumes/be66f2b7-82f3-49db-7dc3-f8abdce5380e/stream-in?path=.: unexpected EOF
96 |
Put /volumes/26f1514f-8a49-4049-4e0f-b106edcdd8e9/stream-in?path=.: http: unexpected EOF reading trailer
97 |
Put /volumes/0356294f-165c-413a-49f9-a96af4568cc0/stream-in?path=.: http: unexpected EOF reading trailer
99 |
Put /volumes/bf63b10d-8a94-4ec5-6ad1-13f287051adb/stream-in?path=.: write tcp 127.0.0.1:49000->127.0.0.1:33561: write: connection reset by peer
~12:02p @ worker-1
~11:49a @ worker-2
~11:48a @ worker-3
~11:51 @ worker-3
~11:49a @ worker-4
~11:49 @ web-atc-tsa
The text was updated successfully, but these errors were encountered: