New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed getting response from http://127.0.0.1:4001/: cancelled #1250

Open
Vinceveve opened this Issue Jun 15, 2015 · 11 comments

Comments

Projects
None yet
10 participants
@Vinceveve

Vinceveve commented Jun 15, 2015

7 machines cluster, 3 mains etcd, 4 etcd proxies, all running on 4001 and 2379

  • etcd 2.0.10
  • fleet : 0.10.1
  • coreos : 681.0.0

On all machine I'm getting this

fleetd[24283]: INFO client.go:292: Failed getting response from http://127.0.0.1:4001/: cancelled

Fleet agent is working : reconcile, schedule ...

But fleetctl on many requests send :

Error retrieving list of units from repository: googleapi: Error 503: fleet server unable to communicate with etcd

OR

2015/06/15 14:29:50 WARN fleetctl.go:772: Error retrieving Unit(worker.collector@x3.service) from Registry: Get http://domain-sock/fleet/v1/units/worker.collector%40x3.service?alt=json: forwarding request denied

Etcd is fine on all machines and answers fine when I query it during the request denied.

Any idea ?

@tclavier

This comment has been minimized.

Show comment
Hide comment
@tclavier

tclavier Jun 23, 2015

Contributor

I have same pb and when i put fleet in debug mode i have : https://gist.github.com/tclavier/3aa1cb84a229aedb53ab

And when i put value into key space with etcdctl that work fine.

Contributor

tclavier commented Jun 23, 2015

I have same pb and when i put fleet in debug mode i have : https://gist.github.com/tclavier/3aa1cb84a229aedb53ab

And when i put value into key space with etcdctl that work fine.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jul 1, 2015

I was getting a lot of the same messages, increasing the etcd_request_timeout made them go away for me.

ghost commented Jul 1, 2015

I was getting a lot of the same messages, increasing the etcd_request_timeout made them go away for me.

@bcwaldon

This comment has been minimized.

Show comment
Hide comment
@bcwaldon

bcwaldon Jul 9, 2015

Contributor

@tclavier is it a timeout issue? Are you able to reproduce the error if you raise the etcd_request_timeout in both fleetctl and fleetd?

Contributor

bcwaldon commented Jul 9, 2015

@tclavier is it a timeout issue? Are you able to reproduce the error if you raise the etcd_request_timeout in both fleetctl and fleetd?

@tclavier

This comment has been minimized.

Show comment
Hide comment
@tclavier

tclavier Jul 10, 2015

Contributor

For last test i use

  • etcd v2.0.13
  • fleet master (with nativ etcd client library)
  • and etcd_request_timeout set to 2

Errors are same :

fleet server unable to communicate with etcd
Contributor

tclavier commented Jul 10, 2015

For last test i use

  • etcd v2.0.13
  • fleet master (with nativ etcd client library)
  • and etcd_request_timeout set to 2

Errors are same :

fleet server unable to communicate with etcd
@SkyWriter

This comment has been minimized.

Show comment
Hide comment
@SkyWriter

SkyWriter Aug 7, 2015

Looks like I've got the same issue on my side. I keep on getting INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled messages, and at some point Fleet randomly restarts units:

06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:39:45 host fleetd[23463]: WARN job.go:272: No Unit found in Registry for Job(consul-registrator.service)
06:39:45 host fleetd[23463]: ERROR job.go:109: Failed to parse Unit from etcd: unable to parse Unit in Registry at key /_coreos.com/fleet/job/consul-registrator.service/object
06:39:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:39:46 host fleetd[23463]: INFO manager.go:138: Triggered systemd unit consul-registrator.service stop: job=11845869
06:39:46 host fleetd[23463]: INFO manager.go:259: Removing systemd unit consul-registrator.service
06:39:47 host fleetd[23463]: INFO manager.go:182: Instructing systemd to reload units
06:39:52 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=UnloadUnit job=consul-registrator.service reason="unit loaded but not scheduled here"
06:39:52 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=ReloadUnitFiles job=N/A reason="always reload unit files"
06:39:52 host fleetd[23463]: WARN reconcile.go:59: AgentReconciler completed reconciliation in 8.692721762s
06:39:52 host fleetd[23463]: INFO manager.go:246: Writing systemd unit consul-registrator.service (618b)
06:39:52 host fleetd[23463]: INFO manager.go:182: Instructing systemd to reload units
06:39:55 host fleetd[23463]: INFO manager.go:127: Triggered systemd unit consul-registrator.service start: job=11846043
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=LoadUnit job=consul-registrator.service reason="unit scheduled here but not loaded"
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=ReloadUnitFiles job=N/A reason="always reload unit files"
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=StartUnit job=consul-registrator.service reason="unit currently loaded but desired state is launched"
06:40:03 host fleetd[23463]: WARN reconcile.go:59: AgentReconciler completed reconciliation in 6.414672939s
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled

Running:

  • etcd Version: 2.0.12
  • fleetd version 0.10.2

Timeouts for etcd2:

  • heartbeat-interval: 200
  • election-timeout: 2000

And for fleet:

  • etcd_request_timeout: 3

Happens on just one out of three machines in the cluster.

SkyWriter commented Aug 7, 2015

Looks like I've got the same issue on my side. I keep on getting INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled messages, and at some point Fleet randomly restarts units:

06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:39:45 host fleetd[23463]: WARN job.go:272: No Unit found in Registry for Job(consul-registrator.service)
06:39:45 host fleetd[23463]: ERROR job.go:109: Failed to parse Unit from etcd: unable to parse Unit in Registry at key /_coreos.com/fleet/job/consul-registrator.service/object
06:39:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:39:46 host fleetd[23463]: INFO manager.go:138: Triggered systemd unit consul-registrator.service stop: job=11845869
06:39:46 host fleetd[23463]: INFO manager.go:259: Removing systemd unit consul-registrator.service
06:39:47 host fleetd[23463]: INFO manager.go:182: Instructing systemd to reload units
06:39:52 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=UnloadUnit job=consul-registrator.service reason="unit loaded but not scheduled here"
06:39:52 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=ReloadUnitFiles job=N/A reason="always reload unit files"
06:39:52 host fleetd[23463]: WARN reconcile.go:59: AgentReconciler completed reconciliation in 8.692721762s
06:39:52 host fleetd[23463]: INFO manager.go:246: Writing systemd unit consul-registrator.service (618b)
06:39:52 host fleetd[23463]: INFO manager.go:182: Instructing systemd to reload units
06:39:55 host fleetd[23463]: INFO manager.go:127: Triggered systemd unit consul-registrator.service start: job=11846043
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=LoadUnit job=consul-registrator.service reason="unit scheduled here but not loaded"
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=ReloadUnitFiles job=N/A reason="always reload unit files"
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=StartUnit job=consul-registrator.service reason="unit currently loaded but desired state is launched"
06:40:03 host fleetd[23463]: WARN reconcile.go:59: AgentReconciler completed reconciliation in 6.414672939s
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled

Running:

  • etcd Version: 2.0.12
  • fleetd version 0.10.2

Timeouts for etcd2:

  • heartbeat-interval: 200
  • election-timeout: 2000

And for fleet:

  • etcd_request_timeout: 3

Happens on just one out of three machines in the cluster.

@tadas-subonis

This comment has been minimized.

Show comment
Hide comment
@tadas-subonis

tadas-subonis Sep 3, 2015

Same here.

Sep 03 14:49:06 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:49:06 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled

Also excessive CPU and RAM usage. CPU goes from 10%to40% and ram is around 15% out of 2GB

tadas-subonis commented Sep 3, 2015

Same here.

Sep 03 14:49:06 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:49:06 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled

Also excessive CPU and RAM usage. CPU goes from 10%to40% and ram is around 15% out of 2GB

@bronger

This comment has been minimized.

Show comment
Hide comment
@bronger

bronger Sep 5, 2015

I got rid of the first error message of the original poster by fixing the dependencies in my unit files: I had made them dependent on etcd although etcd2 is running.

I get the second error message, for example,

2015/09/05 13:07:45 WARN fleetctl.go:797: Error retrieving Unit(memcached.service) from Registry: Get http://domain-sock/fleet/v1/units/memcached.service?alt=json: forwarding request denied

only if I try to start a new unit using fleetctl through a tunnel, with the unit file residing on the host system. Executing it within the VM that's running CoreOS is fine.

bronger commented Sep 5, 2015

I got rid of the first error message of the original poster by fixing the dependencies in my unit files: I had made them dependent on etcd although etcd2 is running.

I get the second error message, for example,

2015/09/05 13:07:45 WARN fleetctl.go:797: Error retrieving Unit(memcached.service) from Registry: Get http://domain-sock/fleet/v1/units/memcached.service?alt=json: forwarding request denied

only if I try to start a new unit using fleetctl through a tunnel, with the unit file residing on the host system. Executing it within the VM that's running CoreOS is fine.

@l15k4

This comment has been minimized.

Show comment
Hide comment
@l15k4

l15k4 Sep 7, 2015

I have the same issue as @bronger on Vagrant... Trying to load services via --tunnel leads to infinite hanging with these warnings :

Error retrieving Unit(es-data@1.service) from Registry: Get http://domain-sock/fleet/v1/units/es-data%401.service?alt=json: forwarding request denied

This is how I start up Vagrant : https://github.com/l15k4/coreos-es/blob/master/bin/start-vagrant.sh#L47

And then I'm just loading and starting services :

https://github.com/l15k4/coreos-es/blob/master/bin/start-vagrant.sh#L91

Using fleetctl --debug=true --strict-host-key-checking=false.

l15k4 commented Sep 7, 2015

I have the same issue as @bronger on Vagrant... Trying to load services via --tunnel leads to infinite hanging with these warnings :

Error retrieving Unit(es-data@1.service) from Registry: Get http://domain-sock/fleet/v1/units/es-data%401.service?alt=json: forwarding request denied

This is how I start up Vagrant : https://github.com/l15k4/coreos-es/blob/master/bin/start-vagrant.sh#L47

And then I'm just loading and starting services :

https://github.com/l15k4/coreos-es/blob/master/bin/start-vagrant.sh#L91

Using fleetctl --debug=true --strict-host-key-checking=false.

@bweston92

This comment has been minimized.

Show comment
Hide comment
@bweston92

bweston92 Jan 21, 2016

@bronger any update as to why you got that error? I'm getting a similar one too.

$ fleetctl start app-dashboard@{1..4}
2016/01/21 10:29:12 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@1.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%401.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@2.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%402.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@1.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%401.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@3.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%403.service?alt=json: forwarding request denied
Unit app-dashboard@4.service inactive
Unit app-dashboard@2.service inactive
Unit app-dashboard@1.service inactive
Unit app-dashboard@3.service inactive
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@2.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%402.service?alt=json: forwarding request denied
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@3.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%403.service?alt=json: forwarding request denied
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@4.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%404.service?alt=json: forwarding request denied

$ fleetctl -version
fleetctl version 0.11.5

bweston92 commented Jan 21, 2016

@bronger any update as to why you got that error? I'm getting a similar one too.

$ fleetctl start app-dashboard@{1..4}
2016/01/21 10:29:12 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@1.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%401.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@2.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%402.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@1.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%401.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@3.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%403.service?alt=json: forwarding request denied
Unit app-dashboard@4.service inactive
Unit app-dashboard@2.service inactive
Unit app-dashboard@1.service inactive
Unit app-dashboard@3.service inactive
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@2.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%402.service?alt=json: forwarding request denied
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@3.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%403.service?alt=json: forwarding request denied
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@4.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%404.service?alt=json: forwarding request denied

$ fleetctl -version
fleetctl version 0.11.5
@bronger

This comment has been minimized.

Show comment
Hide comment
@bronger

bronger Jan 21, 2016

I abandoned fleet for my workflow, so I cannot help on this.

bronger commented Jan 21, 2016

I abandoned fleet for my workflow, so I cannot help on this.

@brandonkruse

This comment has been minimized.

Show comment
Hide comment
@brandonkruse

brandonkruse Jan 22, 2016

@bweston92 We handled this by changing the timeout settings. See issue 1289 and specifically this comment: #1289 (comment)

brandonkruse commented Jan 22, 2016

@bweston92 We handled this by changing the timeout settings. See issue 1289 and specifically this comment: #1289 (comment)

dongsupark pushed a commit to dongsupark/fleet that referenced this issue Jul 8, 2016

Dongsu Park
functional: tune intervals in etcd config to avoid connectivity errors
Increase heartbeat-interval from 100 to 600, also election-timeout from
1000 to 6000. This is a workaround for avoiding occasional failures of
TestSingleNodeConnectivityLoss like below:

===
--- FAIL: TestSingleNodeConnectivityLoss (41.41s)
    connectivity-loss_test.go:98: Failed listing unit files:
        stdout:
        stderr: Error retrieving list of units from repository:
googleapi: Error 503: fleet server unable to  communicate with etcd

        err: exit status 1
===

See also coreos#1289 (comment)
Workaround for coreos#1250

dongsupark pushed a commit to endocode/fleet that referenced this issue Jul 11, 2016

Dongsu Park
functional: tune intervals in etcd config to avoid connectivity errors
Increase heartbeat-interval from 100 to 600, also election-timeout from
1000 to 6000. This is a workaround for avoiding occasional failures of
TestSingleNodeConnectivityLoss like below:

===
--- FAIL: TestSingleNodeConnectivityLoss (41.41s)
    connectivity-loss_test.go:98: Failed listing unit files:
        stdout:
        stderr: Error retrieving list of units from repository:
googleapi: Error 503: fleet server unable to  communicate with etcd

        err: exit status 1
===

See also coreos#1289 (comment)
Workaround for coreos#1250

dongsupark pushed a commit to endocode/fleet that referenced this issue Jul 11, 2016

Dongsu Park
functional: tune intervals in etcd config to avoid connectivity errors
Increase heartbeat-interval from 100 to 600, also election-timeout from
1000 to 6000. This is a workaround for avoiding occasional failures of
TestSingleNodeConnectivityLoss like below:

===
--- FAIL: TestSingleNodeConnectivityLoss (41.41s)
    connectivity-loss_test.go:98: Failed listing unit files:
        stdout:
        stderr: Error retrieving list of units from repository:
googleapi: Error 503: fleet server unable to  communicate with etcd

        err: exit status 1
===

See also coreos#1289 (comment)
Workaround for coreos#1250

dongsupark pushed a commit to endocode/fleet that referenced this issue Jul 11, 2016

Dongsu Park
functional: tune intervals in etcd config to avoid connectivity errors
Increase heartbeat-interval from 100 to 600, also election-timeout from
1000 to 6000. This is a workaround for avoiding occasional failures of
TestSingleNodeConnectivityLoss like below:

===
--- FAIL: TestSingleNodeConnectivityLoss (41.41s)
    connectivity-loss_test.go:98: Failed listing unit files:
        stdout:
        stderr: Error retrieving list of units from repository:
googleapi: Error 503: fleet server unable to  communicate with etcd

        err: exit status 1
===

See also coreos#1289 (comment)
Workaround for coreos#1250

dongsupark pushed a commit to endocode/fleet that referenced this issue Jul 12, 2016

Dongsu Park
functional: tune intervals in etcd config to avoid connectivity errors
Increase heartbeat-interval from 100 to 600, also election-timeout from
1000 to 6000. This is a workaround for avoiding occasional failures of
TestSingleNodeConnectivityLoss like below:

===
--- FAIL: TestSingleNodeConnectivityLoss (41.41s)
    connectivity-loss_test.go:98: Failed listing unit files:
        stdout:
        stderr: Error retrieving list of units from repository:
googleapi: Error 503: fleet server unable to  communicate with etcd

        err: exit status 1
===

See also coreos#1289 (comment)
Workaround for coreos#1250
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment