Skip to content
This repository has been archived by the owner. It is now read-only.

Failed getting response from http://127.0.0.1:4001/: cancelled #1250

Open
Vinceveve opened this issue Jun 15, 2015 · 11 comments
Open

Failed getting response from http://127.0.0.1:4001/: cancelled #1250

Vinceveve opened this issue Jun 15, 2015 · 11 comments

Comments

@Vinceveve
Copy link

@Vinceveve Vinceveve commented Jun 15, 2015

7 machines cluster, 3 mains etcd, 4 etcd proxies, all running on 4001 and 2379

  • etcd 2.0.10
  • fleet : 0.10.1
  • coreos : 681.0.0

On all machine I'm getting this

fleetd[24283]: INFO client.go:292: Failed getting response from http://127.0.0.1:4001/: cancelled

Fleet agent is working : reconcile, schedule ...

But fleetctl on many requests send :

Error retrieving list of units from repository: googleapi: Error 503: fleet server unable to communicate with etcd

OR

2015/06/15 14:29:50 WARN fleetctl.go:772: Error retrieving Unit(worker.collector@x3.service) from Registry: Get http://domain-sock/fleet/v1/units/worker.collector%40x3.service?alt=json: forwarding request denied

Etcd is fine on all machines and answers fine when I query it during the request denied.

Any idea ?

@tclavier
Copy link
Contributor

@tclavier tclavier commented Jun 23, 2015

I have same pb and when i put fleet in debug mode i have : https://gist.github.com/tclavier/3aa1cb84a229aedb53ab

And when i put value into key space with etcdctl that work fine.

@ghost
Copy link

@ghost ghost commented Jul 1, 2015

I was getting a lot of the same messages, increasing the etcd_request_timeout made them go away for me.

@bcwaldon
Copy link
Contributor

@bcwaldon bcwaldon commented Jul 9, 2015

@tclavier is it a timeout issue? Are you able to reproduce the error if you raise the etcd_request_timeout in both fleetctl and fleetd?

@tclavier
Copy link
Contributor

@tclavier tclavier commented Jul 10, 2015

For last test i use

  • etcd v2.0.13
  • fleet master (with nativ etcd client library)
  • and etcd_request_timeout set to 2

Errors are same :

fleet server unable to communicate with etcd
@SkyWriter
Copy link

@SkyWriter SkyWriter commented Aug 7, 2015

Looks like I've got the same issue on my side. I keep on getting INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled messages, and at some point Fleet randomly restarts units:

06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:39:45 host fleetd[23463]: WARN job.go:272: No Unit found in Registry for Job(consul-registrator.service)
06:39:45 host fleetd[23463]: ERROR job.go:109: Failed to parse Unit from etcd: unable to parse Unit in Registry at key /_coreos.com/fleet/job/consul-registrator.service/object
06:39:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:39:46 host fleetd[23463]: INFO manager.go:138: Triggered systemd unit consul-registrator.service stop: job=11845869
06:39:46 host fleetd[23463]: INFO manager.go:259: Removing systemd unit consul-registrator.service
06:39:47 host fleetd[23463]: INFO manager.go:182: Instructing systemd to reload units
06:39:52 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=UnloadUnit job=consul-registrator.service reason="unit loaded but not scheduled here"
06:39:52 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=ReloadUnitFiles job=N/A reason="always reload unit files"
06:39:52 host fleetd[23463]: WARN reconcile.go:59: AgentReconciler completed reconciliation in 8.692721762s
06:39:52 host fleetd[23463]: INFO manager.go:246: Writing systemd unit consul-registrator.service (618b)
06:39:52 host fleetd[23463]: INFO manager.go:182: Instructing systemd to reload units
06:39:55 host fleetd[23463]: INFO manager.go:127: Triggered systemd unit consul-registrator.service start: job=11846043
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=LoadUnit job=consul-registrator.service reason="unit scheduled here but not loaded"
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=ReloadUnitFiles job=N/A reason="always reload unit files"
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=StartUnit job=consul-registrator.service reason="unit currently loaded but desired state is launched"
06:40:03 host fleetd[23463]: WARN reconcile.go:59: AgentReconciler completed reconciliation in 6.414672939s
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled

Running:

  • etcd Version: 2.0.12
  • fleetd version 0.10.2

Timeouts for etcd2:

  • heartbeat-interval: 200
  • election-timeout: 2000

And for fleet:

  • etcd_request_timeout: 3

Happens on just one out of three machines in the cluster.

@tadas-subonis
Copy link

@tadas-subonis tadas-subonis commented Sep 3, 2015

Same here.

Sep 03 14:49:06 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:49:06 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled

Also excessive CPU and RAM usage. CPU goes from 10%to40% and ram is around 15% out of 2GB

@bronger
Copy link

@bronger bronger commented Sep 5, 2015

I got rid of the first error message of the original poster by fixing the dependencies in my unit files: I had made them dependent on etcd although etcd2 is running.

I get the second error message, for example,

2015/09/05 13:07:45 WARN fleetctl.go:797: Error retrieving Unit(memcached.service) from Registry: Get http://domain-sock/fleet/v1/units/memcached.service?alt=json: forwarding request denied

only if I try to start a new unit using fleetctl through a tunnel, with the unit file residing on the host system. Executing it within the VM that's running CoreOS is fine.

@l15k4
Copy link

@l15k4 l15k4 commented Sep 7, 2015

I have the same issue as @bronger on Vagrant... Trying to load services via --tunnel leads to infinite hanging with these warnings :

Error retrieving Unit(es-data@1.service) from Registry: Get http://domain-sock/fleet/v1/units/es-data%401.service?alt=json: forwarding request denied

This is how I start up Vagrant : https://github.com/l15k4/coreos-es/blob/master/bin/start-vagrant.sh#L47

And then I'm just loading and starting services :

https://github.com/l15k4/coreos-es/blob/master/bin/start-vagrant.sh#L91

Using fleetctl --debug=true --strict-host-key-checking=false.

@bweston92
Copy link

@bweston92 bweston92 commented Jan 21, 2016

@bronger any update as to why you got that error? I'm getting a similar one too.

$ fleetctl start app-dashboard@{1..4}
2016/01/21 10:29:12 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@1.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%401.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@2.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%402.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@1.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%401.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@3.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%403.service?alt=json: forwarding request denied
Unit app-dashboard@4.service inactive
Unit app-dashboard@2.service inactive
Unit app-dashboard@1.service inactive
Unit app-dashboard@3.service inactive
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@2.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%402.service?alt=json: forwarding request denied
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@3.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%403.service?alt=json: forwarding request denied
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@4.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%404.service?alt=json: forwarding request denied

$ fleetctl -version
fleetctl version 0.11.5
@bronger
Copy link

@bronger bronger commented Jan 21, 2016

I abandoned fleet for my workflow, so I cannot help on this.

@brandonkruse
Copy link

@brandonkruse brandonkruse commented Jan 22, 2016

@bweston92 We handled this by changing the timeout settings. See issue 1289 and specifically this comment: #1289 (comment)

dongsupark pushed a commit to dongsupark/fleet that referenced this issue Jul 8, 2016
Increase heartbeat-interval from 100 to 600, also election-timeout from
1000 to 6000. This is a workaround for avoiding occasional failures of
TestSingleNodeConnectivityLoss like below:

===
--- FAIL: TestSingleNodeConnectivityLoss (41.41s)
    connectivity-loss_test.go:98: Failed listing unit files:
        stdout:
        stderr: Error retrieving list of units from repository:
googleapi: Error 503: fleet server unable to  communicate with etcd

        err: exit status 1
===

See also coreos#1289 (comment)
Workaround for coreos#1250
dongsupark pushed a commit to endocode/fleet that referenced this issue Jul 11, 2016
Increase heartbeat-interval from 100 to 600, also election-timeout from
1000 to 6000. This is a workaround for avoiding occasional failures of
TestSingleNodeConnectivityLoss like below:

===
--- FAIL: TestSingleNodeConnectivityLoss (41.41s)
    connectivity-loss_test.go:98: Failed listing unit files:
        stdout:
        stderr: Error retrieving list of units from repository:
googleapi: Error 503: fleet server unable to  communicate with etcd

        err: exit status 1
===

See also coreos#1289 (comment)
Workaround for coreos#1250
dongsupark pushed a commit to endocode/fleet that referenced this issue Jul 11, 2016
Increase heartbeat-interval from 100 to 600, also election-timeout from
1000 to 6000. This is a workaround for avoiding occasional failures of
TestSingleNodeConnectivityLoss like below:

===
--- FAIL: TestSingleNodeConnectivityLoss (41.41s)
    connectivity-loss_test.go:98: Failed listing unit files:
        stdout:
        stderr: Error retrieving list of units from repository:
googleapi: Error 503: fleet server unable to  communicate with etcd

        err: exit status 1
===

See also coreos#1289 (comment)
Workaround for coreos#1250
dongsupark pushed a commit to endocode/fleet that referenced this issue Jul 11, 2016
Increase heartbeat-interval from 100 to 600, also election-timeout from
1000 to 6000. This is a workaround for avoiding occasional failures of
TestSingleNodeConnectivityLoss like below:

===
--- FAIL: TestSingleNodeConnectivityLoss (41.41s)
    connectivity-loss_test.go:98: Failed listing unit files:
        stdout:
        stderr: Error retrieving list of units from repository:
googleapi: Error 503: fleet server unable to  communicate with etcd

        err: exit status 1
===

See also coreos#1289 (comment)
Workaround for coreos#1250
dongsupark pushed a commit to endocode/fleet that referenced this issue Jul 12, 2016
Increase heartbeat-interval from 100 to 600, also election-timeout from
1000 to 6000. This is a workaround for avoiding occasional failures of
TestSingleNodeConnectivityLoss like below:

===
--- FAIL: TestSingleNodeConnectivityLoss (41.41s)
    connectivity-loss_test.go:98: Failed listing unit files:
        stdout:
        stderr: Error retrieving list of units from repository:
googleapi: Error 503: fleet server unable to  communicate with etcd

        err: exit status 1
===

See also coreos#1289 (comment)
Workaround for coreos#1250
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants
You can’t perform that action at this time.