Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Problem w/ Running fleetctl submit #1113

Closed
micahasmith opened this issue Jan 29, 2015 · 10 comments
Closed

Problem w/ Running fleetctl submit #1113

micahasmith opened this issue Jan 29, 2015 · 10 comments

Comments

@micahasmith
Copy link

When i run submit on my local vagrant coreos cluster i get the following:

core@core-01 ~/share/containers/webapp-test $ fleetctl submit webapp-test.service
2015/01/29 01:13:55 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:55 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms
2015/01/29 01:13:55 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:55 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 200ms
2015/01/29 01:13:55 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:55 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 400ms
2015/01/29 01:13:56 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:56 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 800ms
2015/01/29 01:13:56 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:56 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s
2015/01/29 01:13:57 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:57 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s
2015/01/29 01:13:58 ERROR fleetctl.go:171: error attempting to check latest fleet version in Registry: timeout reached
2015/01/29 01:13:58 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:58 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/job/webapp-test.service}, retrying in 100ms
2015/01/29 01:13:58 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:58 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/job/webapp-test.service}, retrying in 200ms
2015/01/29 01:13:58 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:58 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/job/webapp-test.service}, retrying in 400ms
2015/01/29 01:13:59 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:59 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/job/webapp-test.service}, retrying in 800ms
2015/01/29 01:13:59 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:13:59 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/job/webapp-test.service}, retrying in 1s
2015/01/29 01:14:00 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
2015/01/29 01:14:00 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/job/webapp-test.service}, retrying in 1s
Error creating units: error retrieving Unit(webapp-test.service) from Registry: timeout reached

I have a separate script to test etcd, so while the above looks like an issue with etcd my etcd test script validates that putting/getting definitely works against 127.0.0.1:4001.

The only thing different about my setup is that im configuring the fleet etcd-* security settings in my cloud-config. Not sure if that could be part of the issue (although when i remove them i correctly get cert errors).

Any ideas?

Things I Have Tried

  • setting the FLEET_ETCD_* env vars manually -- same error
  • removing the etcd-* settings from the user-data.coreos.fleet config settings -- same error
@bcwaldon
Copy link
Contributor

bcwaldon commented Feb 9, 2015

It looks like your etcd cluster is running with TLS enabled. You'll need to configure fleetctl using the FLEETCTL_ environment variables accordingly.

@micahasmith
Copy link
Author

if providing the etcd-* settings in the user-data file imply TLS then shouldnt it auto switch to using TLS if those config values are set?

https://github.com/coreos/fleet/blob/master/Documentation/deployment-and-configuration.md#etcd_cafile-etcd_keyfile-etcd_certfile

My current settings:

fleet:
    public-ip: $public_ipv4
    etcd-cafile: /etc/etcd/printio.pem
    etcd-keyfile: /etc/etcd/printio.key.insecure
    etcd-certfile: /etc/etcd/printio.crt

Mind pointing me towards what specific TLS change i need to make in my env?

Etcd to etcd communication works, just not fleet -> etcd.

@bcwaldon
Copy link
Contributor

@micahasmith you're conflating the daemon fleetd and the command-line tool fleetctl. They do not share configuration. The cloud-config you have only configures fleetd. Since fleetctl still hits etcd directly (rather than going through an HTTP API hosted by fleetd), you need to configure fleetctl with the same information you gave to fleetd.

Looking at the help text for fleetctl:

fleetctl -h
NAME:
    fleetctl - fleetctl is a command-line interface to fleet, the cluster-wide CoreOS init system.

USAGE:
    fleetctl [global options] <command> [command options] [arguments...]

VERSION:
    0.9.0+git

COMMANDS:
    cat     Output the contents of a submitted unit
    destroy     Destroy one or more units in the cluster
    fd-forward  Proxy stdin and stdout to a unix domain socket
    help        Show a list of commands or help for one command
    journal     Print the journal of a unit in the cluster to stdout
    list-machines   Enumerate the current hosts in the cluster
    list-unit-files List the units that exist in the cluster.
    list-units  List the current state of units in the cluster
    load        Schedule one or more units in the cluster, first submitting them if necessary.
    ssh     Open interactive shell on a machine in the cluster
    start       Instruct systemd to start one or more units in the cluster, first submitting and loading if necessary.
    status      Output the status of one or more units in the cluster
    stop        Instruct systemd to stop one or more units in the cluster.
    submit      Upload one or more units to the cluster without starting them
    unload      Unschedule one or more units in the cluster.
    verify      DEPRECATED - No longer works
    version     Print the version and exit

GLOBAL OPTIONS:
    --ca-file=                  Location of TLS CA file used to secure communication with the fleet API or etcd
    --cert-file=                    Location of TLS cert file used to secure communication with the fleet API or etcd
    --debug=false                   Print out more debug information to stderr
    --driver=etcd                   Adapter used to execute fleetctl commands. Options include "API" and "etcd".
    --endpoint=http://127.0.0.1:4001        Location of the fleet API if --driver=API. Alternatively, if --driver=etcd, location of the etcd API.
    --etcd-key-prefix=/_coreos.com/fleet/       Keyspace for fleet data in etcd (development use only!)
    -h=false                    Print usage information and exit
    --help=false                    Print usage information and exit
    --key-file=                 Location of TLS key file used to secure communication with the fleet API or etcd
    --known-hosts-file=~/.fleetctl/known_hosts  File used to store remote machine fingerprints. Ignored if strict host key checking is disabled.
    --request-timeout=3             Amount of time in seconds to allow a single request before considering it failed.
    --ssh-timeout=10                Amount of time in seconds to allow for SSH connection initialization before failing.
    --ssh-username=core             Username to use when connecting to CoreOS instance.
    --strict-host-key-checking=true         Verify host keys presented by remote machines before initiating SSH connections.
    --tunnel=                   Establish an SSH tunnel through the provided address for communication with fleet and etcd.
    --version=false                 Print the version and exit

Global options can also be configured via upper-case environment variables prefixed with "FLEETCTL_"
For example, "some-flag" => "FLEETCTL_SOME_FLAG"

Run "fleetctl help <command>" for more details on a specific command.

...there seem to be three relevant flags you can set: --ca-file, --cert-file and --key-file. The blurb at the bottom instructs you how to use environment variables.

@micahasmith
Copy link
Author

Very true.

If i run:

fleetctl \
    --debug=true \
    --etcd-keyfile=/etc/etcd/printio.key.insecure  \
    --etcd-certfile=/etc/etcd/printio.crt \
    --etcd-cafile=/etc/etcd/printio.pem \
    --endpoint=https://127.0.0.1:4001 \
    status

i get

core@core-01 ~ $ fleetctl \
> --debug=true \
> --etcd-keyfile=/etc/etcd/printio.key.insecure  \
> --etcd-certfile=/etc/etcd/printio.crt \
> --etcd-cafile=/etc/etcd/printio.pem \
> --endpoint=https://127.0.0.1:4001 \
> status
2015/02/10 20:33:42 INFO client.go:353: etcd: sending HTTP request GET https://127.0.0.1:4001/v2/keys/_coreos.com/fleet/machines?consistent=true&recursive=true&sorted=true
2015/02/10 20:33:42 INFO client.go:356: etcd: recv error response from GET https://127.0.0.1:4001/v2/keys/_coreos.com/fleet/machines?consistent=true&recursive=true&sorted=true: dial tcp 127.0.0.1:4001: connection refused
2015/02/10 20:33:42 INFO client.go:278: Failed getting response from https://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused
2015/02/10 20:33:42 ERROR client.go:200: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms
2015/02/10 20:33:42 INFO client.go:353: etcd: sending HTTP request GET https://127.0.0.1:4001/v2/keys/_coreos.com/fleet/machines?consistent=true&recursive=true&sorted=true

but if i run curl --key /etc/etcd/printio.key.insecure --cert /etc/etcd/printio.crt --cacert /etc/etcd/printio.pem -L https://127.0.0.1:4001/v2/keys/foo -XPUT -d value=bar -v everything works.

@bcwaldon
Copy link
Contributor

@micahasmith It's not entirely obvious why you would get "connection refused" from fleetctl, but not from curl. Are you running those two commands from the exact same shell? Can you share the output of curl -v?

@micahasmith
Copy link
Author

well, it was working but i did a vagrant destroy and box update and vagrant up and now im getting connection refused on my etcd curl test as well.

@bcwaldon
Copy link
Contributor

Is this a multinode etcd cluster? If so, did you use a new discovery token when you brought the cluster back up?

@micahasmith
Copy link
Author

no i hadnt. here is the etcd curl -v results

core@core-01 ~ $ . share/tests/etcdctrl-works.sh
-bash: $'\r': command not found
* About to connect() to 127.0.0.1 port 4001 (#0)
*   Trying 127.0.0.1...
* Adding handle: conn: 0x7f3205b80570
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7f3205b80570) send_pipe: 1, recv_pipe: 0
* Connected to 127.0.0.1 (127.0.0.1) port 4001 (#0)
* successfully set certificate verify locations:
*   CAfile: /etc/etcd/printio.pem
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Request CERT (13):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS handshake, CERT verify (15):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using AES256-SHA
* Server certificate:
*        subject: C=USA; O=etcd-ca; OU=printio; CN=127.0.0.1
*        start date: 2014-12-19 21:15:58 GMT
*        expire date: 2024-12-19 21:16:08 GMT
*        subjectAltName: 127.0.0.1 matched
*        issuer: C=USA; O=etcd-ca; OU=CA
*        SSL certificate verify ok.
> PUT /v2/keys/foo HTTP/1.1
> User-Agent: curl/7.30.0
> Host: 127.0.0.1:4001
> Accept: */*
> Content-Length: 9
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 9 out of 9 bytes
< HTTP/1.1 201 Created
< Content-Type: application/json
< X-Etcd-Index: 3
< X-Raft-Index: 208
< X-Raft-Term: 0
< Date: Wed, 18 Feb 2015 17:24:24 GMT
< Transfer-Encoding: chunked
<
* Connection #0 to host 127.0.0.1 left intact
{"action":"set","node":{"key":"/foo","value":"bar","modifiedIndex":3,"createdIndex":3}}
core@core-core@core-01 ~ $

fleetctl test

i've updated my fleetctl variables since that seemed to change in v0.9, here is the command/response from it:

core@core-01 ~ $ fleetctl \
> --debug=true \
> --key-file=/etc/etcd/printio.key.insecure  \
> --cert-file=/etc/etcd/printio.crt \
> --ca-file=/etc/etcd/printio.pem \
> --endpoint=https://127.0.0.1:4001 \
> status
2015/02/18 17:27:41 INFO client.go:366: etcd: sending HTTP request GET https://127.0.0.1:4001/v2/
.com/fleet/machines?consistent=true&recursive=true&sorted=true
2015/02/18 17:27:41 INFO client.go:373: etcd: recv response from GET https://127.0.0.1:4001/v2/ke
om/fleet/machines?consistent=true&recursive=true&sorted=true: 404 Not Found
2015/02/18 17:27:41 INFO client.go:366: etcd: sending HTTP request GET https://127.0.0.1:4001/v2/
.com/fleet/job?consistent=true&recursive=true&sorted=true
2015/02/18 17:27:41 INFO client.go:373: etcd: recv response from GET https://127.0.0.1:4001/v2/ke
om/fleet/job?consistent=true&recursive=true&sorted=true: 404 Not Found
2015/02/18 17:27:41 INFO client.go:366: etcd: sending HTTP request GET https://127.0.0.1:4001/v2/
.com/fleet/job?consistent=true&recursive=true&sorted=true
2015/02/18 17:27:41 INFO client.go:373: etcd: recv response from GET https://127.0.0.1:4001/v2/ke
om/fleet/job?consistent=true&recursive=true&sorted=true: 404 Not Found

@bcwaldon
Copy link
Contributor

bcwaldon commented May 1, 2015

@micahasmith It's not clear to me why the etcd requests coming out of fleetctl don't have the required /v2/keys prefix on the fleet keys... is there anything else unique about your deployment?

@bcwaldon
Copy link
Contributor

bcwaldon commented Jul 9, 2015

No updates in 2 months

@bcwaldon bcwaldon closed this as completed Jul 9, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants