Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marathon crash, /migration-in-progress, where is zkCli/zookeepercli in panteras ? #299

Closed
kopax opened this issue Sep 27, 2019 · 7 comments

Comments

@kopax
Copy link
Contributor

kopax commented Sep 27, 2019

Dear PanteraS users, @sielaq,

I am running PanteraS with marathon 1.4.9-1.0.668.ubuntu1604

I have all my 3 masters marathon service that fails to elect the leader and cause marathon to crash.

It happens after I have deployed an application trough marathon, this happens often and force me to restart the whole platform, it's super annoying.

This time I have created a JIRA issue for Marathon, and also contacted the Mesos mailing list hoping to get a solution.

Apparently there is a possible fix, that prevents from losing all my deployments and restoring every app 1 by 1...

It is explained here.

It requires to access the zookeepercli, do you know how I can't access the zookeeper cli within the PanteraS container?

Any help would be appreciated.

@kopax
Copy link
Contributor Author

kopax commented Sep 27, 2019

I managed with I think it is a temporary solution:

wget https://github.com/outbrain/zookeepercli/releases/download/v1.0.12/zookeepercli_1.0.12_amd64.deb
dpkg -i zookeepercli_1.0.12_amd64.deb
zookeepercli --servers master-rbx-01,master-sbg-01,master-bhs-01 -c rm /marathon/state/migration-in-progress

zookeepercli is a golang client, I wish I could find where is bin/zkCli.sh without downloading this. Thanks

@sielaq
Copy link
Contributor

sielaq commented Sep 27, 2019

Hi @kopax

Strange that we have never faced the problems similar like yours.
The marathon version battle tested is:
1.4.5-1.0.654.ubuntu1604

version we promote now (and believe me, we had may issues with previous)
and works finally fine is
1.7.189-48bfd6000 (Panteras 0.4.2) unfortunately debs are missing from vendor.

I will take a look and provide zookeepercli for you if needed in 0.4.3

@sielaq
Copy link
Contributor

sielaq commented Sep 27, 2019

btw. inside container you should have
/usr/share/zookeeper/bin/zkCli.sh

@sielaq
Copy link
Contributor

sielaq commented Sep 27, 2019

keep in mind that you can always play with
marathon and zookeeper memory ENV variables like:
generate_yml.sh:

# Memory settings
MARATHON_JAVA_OPTS=${MARATHON_JAVA_OPTS:-"-Xmx512m"}
ZOOKEEPER_JAVA_OPTS=${ZOOKEEPER_JAVA_OPTS:-"-Xmx512m"}

which reflect to have injected:

JVMFLAGS=-Xmx512m
MARATHON_JAVA_OPTS=-Xmx512m

It might help tweaking in big scale cluster

@kopax
Copy link
Contributor Author

kopax commented Sep 27, 2019

Thanks a lot for /usr/share/zookeeper/bin/zkCli.sh, this was exactly what I was looking for, it should be written somewhere in the documentation.

Also thanks for the extra advice.

The marathon version battle tested is:
1.4.5-1.0.654.ubuntu1604

version we promote now (and believe me, we had may issues with previous)
and works finally fine is
1.7.189-48bfd6000 (Panteras 0.4.2) unfortunately debs are missing from vendor.

I am stuck to upgrade PanteraS since you removed HAproxy in profit of Fabio.
I wanted to switch to Fabio but I didn't succeed to have the same working configuration. I still look forward to upgrade PanteraS when I have enough time to affoard. Since I am stuck with my PanteraS version. I could try to rebuild the image with HAproxy and a newer marathon image.

What problem exactly do you talk about?

The one problem I often have is this one, and since I have zkCli now, I can fix it, but I hope this will be fixed by marathon in the futur. They say that when marathon is being upgraded, this can happen. In my situation, marathon was not upgraded and I just had some deployement in progress, this happen every 6 month.

@kopax kopax closed this as completed Sep 27, 2019
@sielaq
Copy link
Contributor

sielaq commented Oct 2, 2019

you can play with traefik - it is included in 0.4.2
In examples we shows how to use it

@kopax
Copy link
Contributor Author

kopax commented Oct 2, 2019

Traffic is replacing HAproxy, console, and nginx at the same time right? I've never heard it's new. Why do you use traefik?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants