Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Something wrong with fleet 0.3.1 in CoreOS master 315.0.0+2014-05-13-2126 #452

Closed
YungSang opened this issue May 13, 2014 · 53 comments
Closed

Comments

@YungSang
Copy link

core@core-3 ~ $ fleetctl list-units
UNIT                STATE       LOAD    ACTIVE  SUB DESC            MACHINE
etcd-amb-redis.service      launched    loaded  active  running Ambassador on A     d8695a82.../192.168.65.4
etcd-amb-redis2.service     launched    loaded  active  running Ambassador on B     7382eb69.../192.168.65.2
redis-demo.service      inactive    -   -   -   Redis on A      -
redis-docker-reg.service    launched    loaded  active  running Register on A       d8695a82.../192.168.65.4
redis-dyn-amb.service       launched    loaded  active  running Etcd Ambassador on B    7382eb69.../192.168.65.2
core@core-3 ~ $ systemctl status redis-demo.service -l
● redis-demo.service - Redis on A
   Loaded: loaded (/run/fleet/units/redis-demo.service; linked-runtime)
   Active: active (running) since Tue 2014-05-13 23:01:36 UTC; 13min ago
  Process: 13338 ExecStartPre=/usr/bin/docker pull crosbymichael/redis (code=exited, status=0/SUCCESS)
 Main PID: 13610 (docker)
   CGroup: /system.slice/redis-demo.service
           └─13610 /usr/bin/docker run --rm --name redis-demo.service -p 192.168.65.4::6379 crosbymichael/redis

May 13 23:01:36 core-3 docker[13610]: `-._    `-._`-.__.-'_.-'    _.-'
May 13 23:01:36 core-3 docker[13610]: |`-._`-._    `-.__.-'    _.-'_.-'|
May 13 23:01:36 core-3 docker[13610]: |    `-._`-._        _.-'_.-'    |
May 13 23:01:36 core-3 docker[13610]: `-._    `-._`-.__.-'_.-'    _.-'
May 13 23:01:36 core-3 docker[13610]: `-._    `-.__.-'    _.-'
May 13 23:01:36 core-3 docker[13610]: `-._        _.-'
May 13 23:01:36 core-3 docker[13610]: `-.__.-'
May 13 23:01:36 core-3 docker[13610]: [1] 13 May 23:01:36.592 # Server started, Redis version 2.8.8
May 13 23:01:36 core-3 docker[13610]: [1] 13 May 23:01:36.592 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
May 13 23:01:36 core-3 docker[13610]: [1] 13 May 23:01:36.593 * The server is now ready to accept connections on port 6379
@YungSang
Copy link
Author

It happenes with fleet 0.3.0 in CoreOS alpha 315.0.0, too.

@bcwaldon
Copy link
Contributor

@YungSang You need to give me a bit more of a hint here. What problem are you seeing?

@YungSang
Copy link
Author

It works fine with Ambassador Pattern and I can connect to a redis server from redis client in another container. But fleetctl list-units says,

redis-demo.service      inactive    -   -   -   Redis on A      -

@bcwaldon
Copy link
Contributor

@YungSang How did you get to this point? Did you actually call fleetctl start redis-demo.service? You're not giving me a lot to go on here...

@YungSang
Copy link
Author

Yes, I did. So I can connect to the redis server from a client.
Like this http://coreos.com/blog/docker-dynamic-ambassador-powered-by-etcd/
CoreOS beta and before was fine.

@bcwaldon
Copy link
Contributor

@YungSang Start from the beginning. What did the cluster look like initially? Did you do a rolling upgrade to a newer version of CoreOS? What happens if you call fleetctl start redis-demo.service now?

@YungSang
Copy link
Author

systemctl says 'redis-demo.service` is running, but

$ fleetctl status redis-demo.service
Job redis-demo.service does not appear to be running.

@YungSang
Copy link
Author

Actually I did exactly same as http://coreos.com/blog/docker-dynamic-ambassador-powered-by-etcd/.
https://github.com/YungSang/coreos-cluster#ambassador-pattern

$ fleetctl start redis-demo.service
Timed out waiting for job redis-demo.service to report state launched

Still

redis-demo.service      inactive    -   -   -   Redis on A      -

Waiting.

@bcwaldon
Copy link
Contributor

@YungSang Do you get any useful information if you run fleetctl --debug start redis-demo.service ?

@YungSang
Copy link
Author

Here it is.

$ fleetctl --debug start redis-demo.service
I0513 16:31:35.874488 12885 fleetctl.go:359] Found Job(redis-demo.service) in Registry, no need to recreate it
I0513 16:31:35.879512 12885 fleetctl.go:421] Setting Job(redis-demo.service) target state to launched
Timed out waiting for job redis-demo.service to report state launched

@bcwaldon
Copy link
Contributor

@YungSang The output of the following commands will be useful:

fleetctl list-machines
etcdctl get /_coreos.com/fleet/job/redis-demo.service/target
etcdctl get /_coreos.com/fleet/job/redis-demo.service/target-state

@YungSang
Copy link
Author

$ fleetctl list-machines
MACHINE     IP      METADATA
7382eb69... 192.168.65.2    -
0da32005... 192.168.65.3    -
d8695a82... 192.168.65.4    -
$ etcdctl get /_coreos.com/fleet/job/redis-demo.service/target
d8695a82ad9f4f3497c4910e7cae34ea
$ etcdctl get /_coreos.com/fleet/job/redis-demo.service/target-state
launched

@bcwaldon
Copy link
Contributor

@YungSang Can you check the fleet logs on d8695a82ad9f4f3497c4910e7cae34ea?

@YungSang
Copy link
Author

@bcwaldon How to do that?

@bcwaldon
Copy link
Contributor

@YungSang Use fleetctl ssh d8695a82ad9f4f3497c4910e7cae34ea to open an ssh connection to the machine. Then you can use journalctl -u fleet -e -l to inspect the log.

@YungSang
Copy link
Author

-- Logs begin at Tue 2014-05-13 22:58:06 UTC, end at Tue 2014-05-13 23:35:36 UTC. --
May 13 22:58:17 core-3 systemd[1]: Starting fleet...
May 13 22:58:17 core-3 systemd[1]: Started fleet.
May 13 22:58:17 core-3 fleet[5446]: I0513 22:58:17.615999 05446 fleet.go:143] No provided or default config file found - proceeding without
May 13 22:58:17 core-3 fleet[5446]: I0513 22:58:17.628957 05446 agent.go:104] Initializing Agent
May 13 22:58:17 core-3 fleet[5446]: I0513 22:58:17.861967 05446 engine.go:49] Polling etcd for actionable Jobs
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.033411 05446 event.go:29] CommandLoadJob(etcd-amb-redis.service): publishing JobOffer
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.090469 05446 event.go:21] EventJobOffered(etcd-amb-redis.service): verifying ability to run Job
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.091175 05446 agent.go:448] Job(etcd-amb-redis.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.095790 05446 agent.go:467] Required Peer(redis-demo.service) of Job(etcd-amb-redis.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.096178 05446 event.go:37] EventJobOffered(etcd-amb-redis.service): not all criteria met, not bidding
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.096566 05446 event.go:29] CommandLoadJob(etcd-amb-redis2.service): publishing JobOffer
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.132022 05446 event.go:29] CommandLoadJob(redis-demo.service): publishing JobOffer
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.195014 05446 event.go:21] EventJobOffered(etcd-amb-redis2.service): verifying ability to run Job
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.195609 05446 agent.go:448] Job(etcd-amb-redis2.service) has requirements: map[Conflicts:[redis-demo.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.195919 05446 event.go:41] EventJobOffered(etcd-amb-redis2.service): passed all criteria, submitting JobBid
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.196749 05446 agent.go:382] Submitting JobBid for Job(etcd-amb-redis2.service)
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.209322 05446 event.go:21] EventJobOffered(redis-demo.service): verifying ability to run Job
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.218033 05446 event.go:29] CommandLoadJob(redis-docker-reg.service): publishing JobOffer
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.223355 05446 agent.go:448] Job(redis-demo.service) has requirements: map[]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.223994 05446 event.go:41] EventJobOffered(redis-demo.service): passed all criteria, submitting JobBid
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.224355 05446 agent.go:382] Submitting JobBid for Job(redis-demo.service)
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.279410 05446 event.go:84] EventJobBidSubmitted(etcd-amb-redis2.service): failed to schedule Job to Machine(7382eb69a03948cfab68f2f5808835c7)
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.317053 05446 event.go:84] EventJobBidSubmitted(etcd-amb-redis2.service): failed to schedule Job to Machine(0da320053b94479ba97b9f01ff5319b5)
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.319233 05446 event.go:84] EventJobBidSubmitted(etcd-amb-redis2.service): failed to schedule Job to Machine(d8695a82ad9f4f3497c4910e7cae34ea)
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.326767 05446 event.go:21] EventJobOffered(redis-docker-reg.service): verifying ability to run Job
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.327949 05446 agent.go:448] Job(redis-docker-reg.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.338599 05446 agent.go:467] Required Peer(redis-demo.service) of Job(redis-docker-reg.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.339256 05446 event.go:37] EventJobOffered(redis-docker-reg.service): not all criteria met, not bidding
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.339729 05446 event.go:29] CommandLoadJob(redis-dyn-amb.service): publishing JobOffer
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.373194 05446 event.go:84] EventJobBidSubmitted(redis-demo.service): failed to schedule Job to Machine(7382eb69a03948cfab68f2f5808835c7)
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.374142 05446 event.go:84] EventJobBidSubmitted(redis-demo.service): failed to schedule Job to Machine(0da320053b94479ba97b9f01ff5319b5)
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.374481 05446 event.go:84] EventJobBidSubmitted(redis-demo.service): failed to schedule Job to Machine(d8695a82ad9f4f3497c4910e7cae34ea)
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.392021 05446 event.go:21] EventJobOffered(redis-dyn-amb.service): verifying ability to run Job
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.392434 05446 agent.go:448] Job(redis-dyn-amb.service) has requirements: map[ConditionMachineOf:[etcd-amb-redis2.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.397041 05446 agent.go:467] Required Peer(etcd-amb-redis2.service) of Job(redis-dyn-amb.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.397928 05446 event.go:37] EventJobOffered(redis-dyn-amb.service): not all criteria met, not bidding
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.429644 05446 event.go:56] EventJobScheduled(etcd-amb-redis2.service): Job not scheduled to this Agent, purging related data from cache
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.430023 05446 event.go:59] EventJobScheduled(etcd-amb-redis2.service): Checking outstanding job offers
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.430136 05446 agent.go:448] Job(etcd-amb-redis.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.433185 05446 agent.go:467] Required Peer(redis-demo.service) of Job(etcd-amb-redis.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.434254 05446 agent.go:448] Job(redis-docker-reg.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.436894 05446 agent.go:467] Required Peer(redis-demo.service) of Job(redis-docker-reg.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.437219 05446 agent.go:448] Job(redis-dyn-amb.service) has requirements: map[ConditionMachineOf:[etcd-amb-redis2.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.438558 05446 agent.go:467] Required Peer(etcd-amb-redis2.service) of Job(redis-dyn-amb.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.464095 05446 engine.go:109] Published JobOffer(redis-dyn-amb.service)
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.474001 05446 event.go:56] EventJobScheduled(redis-demo.service): Job not scheduled to this Agent, purging related data from cache
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.474419 05446 event.go:59] EventJobScheduled(redis-demo.service): Checking outstanding job offers
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.474617 05446 agent.go:448] Job(etcd-amb-redis.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.480680 05446 agent.go:467] Required Peer(redis-demo.service) of Job(etcd-amb-redis.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.481110 05446 agent.go:448] Job(redis-docker-reg.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.485027 05446 agent.go:467] Required Peer(redis-demo.service) of Job(redis-docker-reg.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.485494 05446 agent.go:448] Job(redis-dyn-amb.service) has requirements: map[ConditionMachineOf:[etcd-amb-redis2.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.489137 05446 event.go:21] EventJobOffered(redis-dyn-amb.service): verifying ability to run Job
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.492421 05446 agent.go:467] Required Peer(etcd-amb-redis2.service) of Job(redis-dyn-amb.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.494021 05446 agent.go:448] Job(redis-dyn-amb.service) has requirements: map[ConditionMachineOf:[etcd-amb-redis2.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.498543 05446 agent.go:467] Required Peer(etcd-amb-redis2.service) of Job(redis-dyn-amb.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.499923 05446 event.go:37] EventJobOffered(redis-dyn-amb.service): not all criteria met, not bidding
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.557245 05446 event.go:84] EventJobBidSubmitted(redis-dyn-amb.service): failed to schedule Job to Machine(7382eb69a03948cfab68f2f5808835c7)
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.620571 05446 event.go:56] EventJobScheduled(redis-dyn-amb.service): Job not scheduled to this Agent, purging related data from cache
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.621189 05446 event.go:59] EventJobScheduled(redis-dyn-amb.service): Checking outstanding job offers
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.621385 05446 agent.go:448] Job(etcd-amb-redis.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.624016 05446 agent.go:467] Required Peer(redis-demo.service) of Job(etcd-amb-redis.service) is not scheduled locally
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.625151 05446 agent.go:448] Job(redis-docker-reg.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:00:12 core-3 fleet[5446]: I0513 23:00:12.629313 05446 agent.go:467] Required Peer(redis-demo.service) of Job(redis-docker-reg.service) is not scheduled locally
May 13 23:00:45 core-3 fleet[5446]: I0513 23:00:45.970928 05446 event.go:130] EventJobUnscheduled(redis-demo.service): not scheduled here, ignoring
May 13 23:00:45 core-3 fleet[5446]: I0513 23:00:45.992506 05446 event.go:67] EventJobUnscheduled(redis-demo.service): publishing JobOffer
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.062013 05446 event.go:84] EventJobBidSubmitted(redis-dyn-amb.service): failed to schedule Job to Machine(7382eb69a03948cfab68f2f5808835c7)
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.077336 05446 event.go:21] EventJobOffered(redis-demo.service): verifying ability to run Job
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.078404 05446 agent.go:448] Job(redis-demo.service) has requirements: map[]
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.078727 05446 event.go:41] EventJobOffered(redis-demo.service): passed all criteria, submitting JobBid
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.078894 05446 agent.go:382] Submitting JobBid for Job(redis-demo.service)
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.186601 05446 event.go:84] EventJobBidSubmitted(redis-demo.service): failed to schedule Job to Machine(0da320053b94479ba97b9f01ff5319b5)
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.308230 05446 engine.go:139] Scheduled Job(redis-demo.service) to Machine(d8695a82ad9f4f3497c4910e7cae34ea)
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.309725 05446 event.go:64] EventJobScheduled(redis-demo.service): Job scheduled to this Agent
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.316446 05446 agent.go:448] Job(redis-demo.service) has requirements: map[]
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.316869 05446 event.go:84] EventJobScheduled(redis-demo.service): Loading Job
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.316916 05446 agent.go:290] Loading Job(redis-demo.service)
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.317148 05446 manager.go:198] Writing systemd unit redis-demo.service
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.361272 05446 event.go:87] EventJobScheduled(redis-demo.service): Bidding for all possible peers of Job
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.362658 05446 event.go:82] EventJobBidSubmitted(redis-demo.service): successfully scheduled Job to Machine(d8695a82ad9f4f3497c4910e7cae34ea)
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.369247 05446 agent.go:448] Job(etcd-amb-redis.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.370696 05446 agent.go:382] Submitting JobBid for Job(etcd-amb-redis.service)
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.415405 05446 agent.go:448] Job(redis-docker-reg.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.415955 05446 agent.go:382] Submitting JobBid for Job(redis-docker-reg.service)
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.458488 05446 event.go:84] EventJobBidSubmitted(etcd-amb-redis.service): failed to schedule Job to Machine(d8695a82ad9f4f3497c4910e7cae34ea)
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.467586 05446 event.go:95] EventJobScheduled(redis-demo.service): Starting Job
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.525985 05446 event.go:84] EventJobBidSubmitted(redis-docker-reg.service): failed to schedule Job to Machine(d8695a82ad9f4f3497c4910e7cae34ea)
May 13 23:00:46 core-3 fleet[5446]: I0513 23:00:46.984847 05446 event.go:163] EventUnitStateUpdated(redis-demo.service): pushing state (loadState=loaded, activeState=activating, subState=start-pre) to Registry
May 13 23:01:36 core-3 fleet[5446]: I0513 23:01:36.151341 05446 manager.go:142] Started systemd unit redis-demo.service(done)
May 13 23:01:36 core-3 fleet[5446]: I0513 23:01:36.156305 05446 event.go:64] EventJobScheduled(etcd-amb-redis.service): Job scheduled to this Agent
May 13 23:01:36 core-3 fleet[5446]: I0513 23:01:36.190242 05446 agent.go:448] Job(etcd-amb-redis.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:01:36 core-3 fleet[5446]: I0513 23:01:36.195082 05446 event.go:84] EventJobScheduled(etcd-amb-redis.service): Loading Job
May 13 23:01:36 core-3 fleet[5446]: I0513 23:01:36.195964 05446 agent.go:290] Loading Job(etcd-amb-redis.service)
May 13 23:01:36 core-3 fleet[5446]: I0513 23:01:36.196275 05446 manager.go:198] Writing systemd unit etcd-amb-redis.service
May 13 23:01:36 core-3 fleet[5446]: I0513 23:01:36.262240 05446 event.go:87] EventJobScheduled(etcd-amb-redis.service): Bidding for all possible peers of Job
May 13 23:01:36 core-3 fleet[5446]: I0513 23:01:36.264698 05446 event.go:95] EventJobScheduled(etcd-amb-redis.service): Starting Job
May 13 23:01:37 core-3 fleet[5446]: I0513 23:01:37.022166 05446 event.go:163] EventUnitStateUpdated(redis-demo.service): pushing state (loadState=loaded, activeState=active, subState=running) to Registry
May 13 23:01:37 core-3 fleet[5446]: I0513 23:01:37.022853 05446 event.go:163] EventUnitStateUpdated(etcd-amb-redis.service): pushing state (loadState=loaded, activeState=activating, subState=start-pre) to Registry
May 13 23:02:11 core-3 fleet[5446]: I0513 23:02:11.070605 05446 manager.go:142] Started systemd unit etcd-amb-redis.service(done)
May 13 23:02:11 core-3 fleet[5446]: I0513 23:02:11.070684 05446 event.go:64] EventJobScheduled(redis-docker-reg.service): Job scheduled to this Agent
May 13 23:02:11 core-3 fleet[5446]: I0513 23:02:11.083716 05446 agent.go:448] Job(redis-docker-reg.service) has requirements: map[ConditionMachineOf:[redis-demo.service]]
May 13 23:02:11 core-3 fleet[5446]: I0513 23:02:11.085751 05446 event.go:84] EventJobScheduled(redis-docker-reg.service): Loading Job
May 13 23:02:11 core-3 fleet[5446]: I0513 23:02:11.086137 05446 agent.go:290] Loading Job(redis-docker-reg.service)
May 13 23:02:11 core-3 fleet[5446]: I0513 23:02:11.086197 05446 manager.go:198] Writing systemd unit redis-docker-reg.service
May 13 23:02:11 core-3 fleet[5446]: I0513 23:02:11.107623 05446 event.go:87] EventJobScheduled(redis-docker-reg.service): Bidding for all possible peers of Job
May 13 23:02:11 core-3 fleet[5446]: I0513 23:02:11.111646 05446 event.go:95] EventJobScheduled(redis-docker-reg.service): Starting Job
May 13 23:02:12 core-3 fleet[5446]: I0513 23:02:12.040830 05446 event.go:163] EventUnitStateUpdated(etcd-amb-redis.service): pushing state (loadState=loaded, activeState=active, subState=running) to Registry
May 13 23:02:12 core-3 fleet[5446]: I0513 23:02:12.041262 05446 event.go:163] EventUnitStateUpdated(redis-docker-reg.service): pushing state (loadState=loaded, activeState=activating, subState=start-pre) to Registry
May 13 23:02:27 core-3 fleet[5446]: I0513 23:02:27.929087 05446 manager.go:142] Started systemd unit redis-docker-reg.service(done)
May 13 23:02:28 core-3 fleet[5446]: I0513 23:02:28.080846 05446 event.go:163] EventUnitStateUpdated(redis-docker-reg.service): pushing state (loadState=loaded, activeState=active, subState=running) to Registry
May 13 23:29:46 core-3 fleet[5446]: I0513 23:29:46.336111 05446 event.go:29] CommandLoadJob(redis-demo.service): publishing JobOffer
May 13 23:29:46 core-3 fleet[5446]: I0513 23:29:46.422553 05446 event.go:21] EventJobOffered(redis-demo.service): verifying ability to run Job
May 13 23:29:46 core-3 fleet[5446]: I0513 23:29:46.423571 05446 agent.go:448] Job(redis-demo.service) has requirements: map[]
May 13 23:29:46 core-3 fleet[5446]: I0513 23:29:46.423910 05446 event.go:41] EventJobOffered(redis-demo.service): passed all criteria, submitting JobBid
May 13 23:29:46 core-3 fleet[5446]: I0513 23:29:46.424125 05446 agent.go:382] Submitting JobBid for Job(redis-demo.service)
May 13 23:29:46 core-3 fleet[5446]: I0513 23:29:46.511768 05446 event.go:84] EventJobBidSubmitted(redis-demo.service): failed to schedule Job to Machine(d8695a82ad9f4f3497c4910e7cae34ea)
May 13 23:29:46 core-3 fleet[5446]: I0513 23:29:46.512201 05446 event.go:84] EventJobBidSubmitted(redis-demo.service): failed to schedule Job to Machine(0da320053b94479ba97b9f01ff5319b5)
May 13 23:31:35 core-3 fleet[5446]: I0513 23:31:35.131797 05446 event.go:29] CommandLoadJob(redis-demo.service): publishing JobOffer
May 13 23:31:35 core-3 fleet[5446]: I0513 23:31:35.229180 05446 event.go:21] EventJobOffered(redis-demo.service): verifying ability to run Job
May 13 23:31:35 core-3 fleet[5446]: I0513 23:31:35.230678 05446 agent.go:448] Job(redis-demo.service) has requirements: map[]
May 13 23:31:35 core-3 fleet[5446]: I0513 23:31:35.231221 05446 event.go:41] EventJobOffered(redis-demo.service): passed all criteria, submitting JobBid
May 13 23:31:35 core-3 fleet[5446]: I0513 23:31:35.231519 05446 agent.go:382] Submitting JobBid for Job(redis-demo.service)
May 13 23:31:35 core-3 fleet[5446]: I0513 23:31:35.311468 05446 event.go:84] EventJobBidSubmitted(redis-demo.service): failed to schedule Job to Machine(0da320053b94479ba97b9f01ff5319b5)
May 13 23:31:35 core-3 fleet[5446]: I0513 23:31:35.314599 05446 event.go:84] EventJobBidSubmitted(redis-demo.service): failed to schedule Job to Machine(d8695a82ad9f4f3497c4910e7cae34ea)

@bcwaldon
Copy link
Contributor

@YungSang Run this command:

etcdctl rm /_coreos.com/fleet/job/redis-demo.service/target

What does fleetctl list-units look like after that?

@YungSang
Copy link
Author

$ fleetctl list-units
UNIT                STATE       LOAD    ACTIVE  SUB DESC            MACHINE
etcd-amb-redis.service      launched    loaded  active  running Ambassador on A     d8695a82.../192.168.65.4
etcd-amb-redis2.service     launched    loaded  active  running Ambassador on B     7382eb69.../192.168.65.2
redis-demo.service      inactive    -   -   -   Redis on A      -
redis-docker-reg.service    launched    loaded  active  running Register on A       d8695a82.../192.168.65.4
redis-dyn-amb.service       launched    loaded  active  running Etcd Ambassador on B    7382eb69.../192.168.65.2
$ etcdctl rm /_coreos.com/fleet/job/redis-demo.service/target

$ fleetctl list-units
UNIT                STATE       LOAD    ACTIVE      SUB     DESC            MACHINE
etcd-amb-redis.service      launched    loaded  active      running     Ambassador on A     d8695a82.../192.168.65.4
etcd-amb-redis2.service     launched    loaded  active      running     Ambassador on B     7382eb69.../192.168.65.2
redis-demo.service      launched    loaded  activating  start-pre   Redis on A      0da32005.../192.168.65.3
redis-docker-reg.service    launched    loaded  active      running     Register on A       d8695a82.../192.168.65.4
redis-dyn-amb.service       launched    loaded  active      running     Etcd Ambassador on B    7382eb69.../192.168.65.2
$ fleetctl list-units
UNIT                STATE       LOAD    ACTIVE      SUB DESC            MACHINE
etcd-amb-redis.service      inactive    loaded  failed      failed  Ambassador on A     d8695a82.../192.168.65.4
etcd-amb-redis2.service     launched    loaded  active      running Ambassador on B     7382eb69.../192.168.65.2
redis-demo.service      launched    loaded  inactive    dead    Redis on A      d8695a82.../192.168.65.4
redis-docker-reg.service    inactive    loaded  failed      failed  Register on A       d8695a82.../192.168.65.4
redis-dyn-amb.service       launched    loaded  active      running Etcd Ambassador on B    7382eb69.../192.168.65.2
$ fleetctl list-units
UNIT                STATE       LOAD    ACTIVE      SUB     DESC            MACHINE
etcd-amb-redis.service      launched    loaded  activating  start-pre   Ambassador on A     0da32005.../192.168.65.3
etcd-amb-redis2.service     launched    loaded  active      running     Ambassador on B     7382eb69.../192.168.65.2
redis-demo.service      launched    loaded  active      running     Redis on A      0da32005.../192.168.65.3
redis-docker-reg.service    loaded      loaded  failed      failed      Register on A       d8695a82.../192.168.65.4
redis-dyn-amb.service       launched    loaded  active      running     Etcd Ambassador on B    7382eb69.../192.168.65.2
$ fleetctl list-units
UNIT                STATE       LOAD    ACTIVE      SUB     DESC            MACHINE
etcd-amb-redis.service      launched    loaded  active      running     Ambassador on A     0da32005.../192.168.65.3
etcd-amb-redis2.service     launched    loaded  active      running     Ambassador on B     7382eb69.../192.168.65.2
redis-demo.service      launched    loaded  active      running     Redis on A      0da32005.../192.168.65.3
redis-docker-reg.service    launched    loaded  activating  start-pre   Register on A       0da32005.../192.168.65.3
redis-dyn-amb.service       launched    loaded  active      running     Etcd Ambassador on B    7382eb69.../192.168.65.2
$ fleetctl list-units
UNIT                STATE       LOAD    ACTIVE  SUB DESC            MACHINE
etcd-amb-redis.service      launched    loaded  active  running Ambassador on A     0da32005.../192.168.65.3
etcd-amb-redis2.service     launched    loaded  active  running Ambassador on B     7382eb69.../192.168.65.2
redis-demo.service      launched    loaded  active  running Redis on A      0da32005.../192.168.65.3
redis-docker-reg.service    launched    loaded  active  running Register on A       0da32005.../192.168.65.3
redis-dyn-amb.service       launched    loaded  active  running Etcd Ambassador on B    7382eb69.../192.168.65.2

It seems fine now.

@YungSang
Copy link
Author

Is this OK?

@bcwaldon
Copy link
Contributor

@YungSang Apparently the initial attempt to schedule the unit failed. You could have fixed this by destroying and starting the unit, too.

@YungSang
Copy link
Author

So you mean it's a normal behavior?

@bcwaldon
Copy link
Contributor

@YungSang No, but we do not have the information to debug the issue. I'll reproduce it locally and file any issues I find.

@YungSang
Copy link
Author

@bcwaldon Thanks.

@YungSang
Copy link
Author

FYI: Fleet v0.3.2 in CoreOS master 317.0.0+2014-05-14-0333 works fine so far.

@YungSang
Copy link
Author

I have same problem in CoreOS master 317.0.0+2014-05-14-0333 again.

@bcwaldon bcwaldon reopened this May 14, 2014
@bcwaldon
Copy link
Contributor

@YungSang Ok. Did you start with a fresh cluster, or did you attempt to upgrade the cluster we were debugging yesterday?

@YungSang
Copy link
Author

I destroyed VMs yesterday and created them again today.

@YungSang
Copy link
Author

Right after I started fleetctl start *.services,

redis-demo.service      launched    loaded  activating  start-pre   Redis on A      f3ee2982.../192.168.65.4

Then,

redis-demo.service      inactive    -   -   -   Redis on A      -

But redis-demo.service is running at f3ee2982.../192.168.65.4, as I reported yesterday.

Anything else works fine.

@bcwaldon
Copy link
Contributor

@YungSang It would help if you set fleet's verbosity to 1 and reproduce the problem. Without that log information, we are not going to get to the bottom of this easily.

@YungSang
Copy link
Author

@bcwaldon OK, how to set verbosity to 1? in cloud-config/user-data?

    - name: fleet.service
      command: start
      runtime: no
      content: |
        [Unit]
        Description=fleet

        [Service]
        Environment=FLEET_PUBLIC_IP=$public_ipv4
        ExecStart=/usr/bin/fleet

@bcwaldon
Copy link
Contributor

Yes. Add this to your [Service] section:

Environment=FLEET_VERBOSITY=1

Documented here: https://github.com/coreos/fleet/blob/master/Documentation/configuration.md#verbosity

@YungSang
Copy link
Author

Thanks. I will. And then journalctl -u fleet -e -l at the target?

@bcwaldon
Copy link
Contributor

Yep

@YungSang
Copy link
Author

Re-starting

@YungSang
Copy link
Author

@YungSang
Copy link
Author

FYI: After that, I restarted the target VM only and then the services switched over another VM nicely without this issue.

@bcwaldon
Copy link
Contributor

From your log, I can clearly see where the state was published, and subsequently deleted:

May 14 18:19:57 core-3 fleet[6837]: I0514 18:19:57.768574 06837 event.go:163] EventUnitStateUpdated(red
is-demo.service): pushing state (loadState=loaded, activeState=active, subState=running) to Registry
May 14 18:19:57 core-3 fleet[6837]: I0514 18:19:57.779628 06837 event.go:52] Received response from etcd watcher: Action=set ModifiedIndex=150 Key=/_coreos.com/fleet/state/redis-demo.service
...
May 14 18:20:28 core-3 fleet[6837]: I0514 18:20:28.824712 06837 event.go:52] Received response from etcd watcher: Action=delete ModifiedIndex=166 Key=/_coreos.com/fleet/job/redis-demo.service/job-state
May 14 18:20:29 core-3 fleet[6837]: I0514 18:20:29.030115 06837 event.go:52] Received response from etcd watcher: Action=delete ModifiedIndex=167 Key=/_coreos.com/fleet/state/redis-demo.service

I have no clue what would have deleted those keys. Do you have the logs from your other VMs?

@YungSang
Copy link
Author

I will get it.

@YungSang
Copy link
Author

@YungSang
Copy link
Author

@bcwaldon
Copy link
Contributor

@YungSang These two log files are lacking the appropriate logs. Could you make sure you pull the logs back to at least May 14 18:15:00 from all three nodes?

@YungSang
Copy link
Author

I see. I will re-work from scratch.

@bcwaldon
Copy link
Contributor

@YungSang You can just run journalctl with the --lines option to go further back in history.

@YungSang
Copy link
Author

I rebooted them, as I mentioned at #452 (comment). So It seems no log at that time.

@bcwaldon
Copy link
Contributor

@YungSang Ok. I guess we'll have to wait until it reproduces again. Please grab the logs from every host next time.

@YungSang
Copy link
Author

I will. Now restarting.

@YungSang
Copy link
Author

@bcwaldon
Copy link
Contributor

@YungSang Ok, I'll look through it.

@bcwaldon
Copy link
Contributor

@YungSang It still appears that your logs don't go back far enough in time.

@YungSang
Copy link
Author

OK. I will get it more longer.

@YungSang
Copy link
Author

I updated the files and this time I got full logs from start.

@bcwaldon
Copy link
Contributor

@YungSang I finally got to the bottom of this: #466. This was nasty to track down, and thank you for sticking with me. It will be fixed in the next release, slated for May 26.

@YungSang
Copy link
Author

@bcwaldon You're welcome and thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants