Something wrong with fleet 0.3.1 in CoreOS master 315.0.0+2014-05-13-2126 #452
Comments
It happenes with fleet 0.3.0 in CoreOS alpha 315.0.0, too. |
@YungSang You need to give me a bit more of a hint here. What problem are you seeing? |
It works fine with Ambassador Pattern and I can connect to a redis server from redis client in another container. But
|
@YungSang How did you get to this point? Did you actually call |
Yes, I did. So I can connect to the redis server from a client. |
@YungSang Start from the beginning. What did the cluster look like initially? Did you do a rolling upgrade to a newer version of CoreOS? What happens if you call |
systemctl says 'redis-demo.service` is running, but
|
Actually I did exactly same as http://coreos.com/blog/docker-dynamic-ambassador-powered-by-etcd/.
Still
Waiting. |
@YungSang Do you get any useful information if you run |
Here it is.
|
@YungSang The output of the following commands will be useful:
|
|
@YungSang Can you check the fleet logs on d8695a82ad9f4f3497c4910e7cae34ea? |
@bcwaldon How to do that? |
@YungSang Use |
|
@YungSang Run this command:
What does |
It seems fine now. |
Is this OK? |
@YungSang Apparently the initial attempt to schedule the unit failed. You could have fixed this by destroying and starting the unit, too. |
So you mean it's a normal behavior? |
@YungSang No, but we do not have the information to debug the issue. I'll reproduce it locally and file any issues I find. |
@bcwaldon Thanks. |
FYI: Fleet v0.3.2 in CoreOS master 317.0.0+2014-05-14-0333 works fine so far. |
I have same problem in CoreOS master 317.0.0+2014-05-14-0333 again. |
@YungSang Ok. Did you start with a fresh cluster, or did you attempt to upgrade the cluster we were debugging yesterday? |
I destroyed VMs yesterday and created them again today. |
Right after I started
Then,
But redis-demo.service is running at f3ee2982.../192.168.65.4, as I reported yesterday. Anything else works fine. |
@YungSang It would help if you set fleet's verbosity to 1 and reproduce the problem. Without that log information, we are not going to get to the bottom of this easily. |
@bcwaldon OK, how to set verbosity to 1? in cloud-config/user-data?
|
Yes. Add this to your [Service] section:
Documented here: https://github.com/coreos/fleet/blob/master/Documentation/configuration.md#verbosity |
Thanks. I will. And then |
Yep |
Re-starting |
FYI: After that, I restarted the target VM only and then the services switched over another VM nicely without this issue. |
From your log, I can clearly see where the state was published, and subsequently deleted:
I have no clue what would have deleted those keys. Do you have the logs from your other VMs? |
I will get it. |
@YungSang These two log files are lacking the appropriate logs. Could you make sure you pull the logs back to at least May 14 18:15:00 from all three nodes? |
I see. I will re-work from scratch. |
@YungSang You can just run |
I rebooted them, as I mentioned at #452 (comment). So It seems no log at that time. |
@YungSang Ok. I guess we'll have to wait until it reproduces again. Please grab the logs from every host next time. |
I will. Now restarting. |
https://www.dropbox.com/s/tjinmt41nif8nyp/fleet1.log This time, fleet2.log is the target VM's one. |
@YungSang Ok, I'll look through it. |
@YungSang It still appears that your logs don't go back far enough in time. |
OK. I will get it more longer. |
I updated the files and this time I got full logs from start. |
@bcwaldon You're welcome and thank you. |
The text was updated successfully, but these errors were encountered: