Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: mesos rpm add command and change mesos_isolation. #422

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion support/mesos-mini/mesos/agent_environment
Expand Up @@ -3,7 +3,7 @@ MESOS_HOSTNAME=localhost
MESOS_WORK_DIR=/var/lib/mesos/agent
MESOS_MASTER=127.0.0.1:5050
MESOS_CONTAINERIZERS=mesos,docker
MESOS_ISOLATION=filesystem/linux,network/cni,cgroups/cpu,cgroups/mem,cgroups/blkio,cgroups/devices,disk/du,docker/runtime,volume/sandbox_path,volume/host_path,posix/rlimits,namespaces/pid,linux/capabilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is the linux/capabilities isolator causes Mesos agent cannot start, can you please paste the error log here? I'd like to know why agent fails to start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. :-)

Mar 31 09:57:55 andreas-pc mesos-agent[40]: Reached unreachable statement at linux/capabilities.cpp:497
Mar 31 09:57:55 andreas-pc mesos-agent[40]: *** Aborted at 1648720675 (unix time) try "date -d @1648720675" if you are using GNU date ***
Mar 31 09:57:55 andreas-pc mesos-agent[40]: PC: @     0x7f2e97e6f387 __GI_raise
Mar 31 09:57:55 andreas-pc mesos-agent[40]: *** SIGABRT (@0x16) received by PID 22 (TID 0x7f2e9db92a00) from PID 22; stack trace: ***
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e98733630 (unknown)
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e97e6f387 __GI_raise
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e97e70a78 __GI_abort
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e9ae164d7 Unreachable()
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e9b81dba1 mesos::internal::capabilities::operator<<()
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e9b8217ac stringify<>()
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e9b820076 mesos::internal::capabilities::Capabilities::create()
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e9b908c2d mesos::internal::slave::LinuxCapabilitiesIsolatorProcess::create()
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e9b42f9ec std::_Function_handler<>::_M_invoke()
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e9b41c9f2 mesos::internal::slave::MesosContainerizer::create()
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e9b399696 mesos::internal::slave::Containerizer::create()
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x55d68541b10a (unknown)
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x7f2e97e5b555 __libc_start_main
Mar 31 09:57:55 andreas-pc mesos-agent[40]:     @     0x55d68541e221 (unknown)
Mar 31 09:57:56 andreas-pc systemd[1]: mesos-slave.service: main process exited, code=dumped, status=6/ABRT
Mar 31 09:57:56 andreas-pc systemd[1]: Unit mesos-slave.service entered failed state.
Mar 31 09:57:56 andreas-pc systemd[1]: mesos-slave.service failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seemed to run into https://github.com/apache/mesos/blob/1.11.0/src/linux/capabilities.cpp#L497. Can you please let me know how you ran Mesos agent? E.g. the OS version, in a Docker container? I'd suggest to figure out why it ran into that unreachable statement first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @qianzhangxa , I found out, I have the same issue under Debian bullseye (outside of docker). Under Ubuntu focal I does not have this issue. 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It match to this one: #377

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a patch release sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it going with the patch release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I do not have the free cycle to do a patch release 😞

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-( What can I do to support you?

MESOS_ISOLATION=filesystem/linux,network/cni,cgroups/cpu,cgroups/mem,cgroups/blkio,cgroups/devices,disk/du,docker/runtime,volume/sandbox_path,volume/host_path,posix/rlimits,namespaces/pid
MESOS_LAUNCHER=linux
MESOS_CGROUPS_ROOT=`grep memory /proc/1/cgroup | cut -d: -f3`/mesos
MESOS_IMAGE_PROVIDERS=DOCKER
Expand Down
2 changes: 1 addition & 1 deletion support/packaging/centos/build-docker-centos.sh
Expand Up @@ -24,7 +24,7 @@ DOCKER_CONTEXT_DIR="${SOURCE_DIR}/centos${CENTOS_DISTRO}/rpmbuild/RPMS/x86_64"

cat <<EOF > "${DOCKER_CONTEXT_DIR}/Dockerfile"
FROM centos:${CENTOS_DISTRO}
ADD mesos-?.?.?-*.rpm /
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate a bit on why ?.?.? does not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! A single question-mark "?" is a wildcard for a single character. So it can't work for mesos greater 1.9.9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback? :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single * seems too wide to me, can we have a pattern to still enforce the format of semantic versioning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

ADD mesos-*.rpm /
RUN yum --nogpgcheck -y localinstall /mesos-*.rpm
EOF

Expand Down