Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest waagent coredumps python interpreter when started at boot on FreeBSD #1687

Closed
minusbat opened this issue Oct 29, 2019 · 14 comments
Closed
Assignees
Labels

Comments

@minusbat
Copy link
Contributor

minusbat commented Oct 29, 2019

'I am running the latest waagent on the 'develop branch. Operating system is FreeBSD 12.1 and I am using python3.6 which is the current default python in FreeBSD. Have removed all traces of older python 2.7.

When I run this using 'service waagent start' logged in as root then it runs fine.

If, however, I reboot the machine, and waagent is started on boot, then I find this repeated in dmesg:

2019/10/29 16:31:23.617805 INFO Daemon Agent WALinuxAgent-2.2.44 launched with command 'python3 -u /usr/local/sbin/waagent -run-exthandlers' is successfully running
2019/10/29 16:31:23.628337 INFO Daemon Installed Agent WALinuxAgent-2.2.44 is the most current agent
pid 4766 (python3.6), jid 0, uid 0: exited on signal 6 (core dumped)
2019/10/29 16:31:24.933556 INFO Daemon Agent WALinuxAgent-2.2.44 launched with command 'python3 -u /usr/local/sbin/waagent -run-exthandlers' is successfully running
2019/10/29 16:31:24.939914 INFO Daemon Installed Agent WALinuxAgent-2.2.44 is the most current agent
pid 4767 (python3.6), jid 0, uid 0: exited on signal 6 (core dumped)
2019/10/29 16:31:26.246526 INFO Daemon Agent WALinuxAgent-2.2.44 launched with command 'python3 -u /usr/local/sbin/waagent -run-exthandlers' is successfully running
2019/10/29 16:31:26.257418 INFO Daemon Installed Agent WALinuxAgent-2.2.44 is the most current agent
pid 4768 (python3.6), jid 0, uid 0: exited on signal 6 (core dumped)

It keeps trying to start it and then coredumping. If I run gdb against the coredump I see this:

Core was generated by `python3 -u /usr/local/sbin/waagent -run-exthandlers'.
Program terminated with signal SIGABRT, Aborted.
#0  thr_kill () at thr_kill.S:3
3	RSYSCALL(thr_kill)
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x0000000800768b94 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x00000008006dd229 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x000000080045e535 in Py_FatalError () from /usr/local/lib/libpython3.6m.so.1.0
#4  0x000000080045e2eb in _Py_InitializeEx_Private () from /usr/local/lib/libpython3.6m.so.1.0
#5  0x000000080047c914 in Py_Main () from /usr/local/lib/libpython3.6m.so.1.0
#6  0x000000000020144b in main ()
(gdb) 
@minusbat minusbat added the triage Needs Triaging label Oct 29, 2019
@narrieta
Copy link
Member

@minusbat - could you capture a verbose log? it may point us to where the problem is.

In /etc/waagent.conf, set this this to 'y':

# Enable verbose logging (y|n)
Logs.Verbose=n

The log will be in /var/log/waagent.log

Thanks

@minusbat
Copy link
Contributor Author

OK, just doing that now....

@minusbat
Copy link
Contributor Author

here you are... I think its just repeating over and over again, so I stopped it and uploaded what there is...

waagent.log

@narrieta
Copy link
Member

Thank you.

The stack trace from the core dump seems to indicate a problem while initializing python3.

Could you see if these 2 commands produce the same issue?


python3 -u /usr/local/sbin/waagent -run-exthandlers
python -u /usr/local/sbin/waagent -run-exthandlers

Also, could you see if you can execute other Python scripts using python3?

@minusbat
Copy link
Contributor Author

Python3 works fine exverywhere - but bear in mind that the coredumps only happen when waagent is started at boot time. It doesn't coredump if I start it from a shell. So testing anything will involve changing what it does on boot and rebooting the machine. Am not sure how to chnage the interpreter from 'python' to 'python3'.

By the way, if I execute those commands directly at the command line they both do this:

2019/10/29 21:33:25.004634 INFO ExtHandler Agent WALinuxAgent-2.2.44 is running as the goal state agent
2019/10/29 21:33:25.006073 INFO ExtHandler Distro info: freebsd 12.1, osutil class being used: FreeBSDOSUtil, agent service name: waagent
2019/10/29 21:33:25.008658 INFO ExtHandler Wire server endpoint:168.63.129.16
2019/10/29 21:33:25.670394 INFO ExtHandler Start env monitor service.
2019/10/29 21:33:25.672922 INFO ExtHandler Configure routes
2019/10/29 21:33:25.679322 INFO ExtHandler Gateway:None
2019/10/29 21:33:25.680125 INFO ExtHandler Routes:None
2019/10/29 21:33:25.964397 ERROR ExtHandler Command: [iptables --version], return code: [127], result: [/bin/sh: iptables: not found
]
2019/10/29 21:33:25.965888 WARNING ExtHandler Unable to determine version of iptables
2019/10/29 21:33:25.966762 WARNING ExtHandler Unable to retrieve firewall packets droppedUnable to determine version of iptables
2019/10/29 21:33:25.990633 INFO ExtHandler Wire server endpoint:168.63.129.16
2019/10/29 21:33:25.991696 INFO ExtHandler WALinuxAgent-2.2.44 running as process 12581
2019/10/29 21:33:26.409316 INFO ExtHandler CGroups Status: Cgroups are not supported by the platform
2019/10/29 21:33:26.416503 INFO ExtHandler Route table: [{"Iface": "hn0", "Destination": "0.0.0.0", "Gateway": "10.113.10.1", "Mask": "255.255.255.255", "Flags": "0x0003", "Metric": "6"},{"Iface": "hn0", "Destination": "10.113.10.0", "Gateway": "0.0.0.0", "Mask": "255.255.255.0", "Flags": "0x0017", "Metric": "5"},{"Iface": "lo0", "Destination": "10.113.10.202", "Gateway": "0.0.0.0", "Mask": "255.255.255.255", "Flags": "0x0005", "Metric": "4"},{"Iface": "lo0", "Destination": "127.0.0.1", "Gateway": "0.0.0.0", "Mask": "255.255.255.255", "Flags": "0x0021", "Metric": "3"},{"Iface": "hn0", "Destination": "168.63.129.16", "Gateway": "10.113.10.1", "Mask": "255.255.255.255", "Flags": "0x0003", "Metric": "2"},{"Iface": "hn0", "Destination": "169.254.169.254", "Gateway": "10.113.10.1", "Mask": "255.255.255.255", "Flags": "0x0003", "Metric": "1"}]
2019/10/29 21:33:26.434822 INFO ExtHandler Agent WALinuxAgent-2.2.44 is an orphan -- exiting
root@clementine-ams:/home/webadmin # 

@minusbat
Copy link
Contributor Author

Could this be something to do with the environment in which rc.d scripts are executed at boot time compared to from the shell ? Is waagent needed for any part of the boot process - I could try changign the rc.d script to make it start much later in the process... after login for example ?

@minusbat
Copy link
Contributor Author

minusbat commented Oct 29, 2019

'python and 'python3' are indentical by the way:

root@clementine-ams:/home/webadmin # which python
/usr/local/bin/python
root@clementine-ams:/home/webadmin # file /usr/local/bin/python
/usr/local/bin/python: symbolic link to python3

@pgombar pgombar removed the triage Needs Triaging label Oct 29, 2019
@narrieta
Copy link
Member

Thanks.

The tasks performed by the Linux Agent are split in 2 processes, which in this case are executing the same script (/usr/local/sbin/waagent) with 2 different command line options: -daemon and -run-exthandlers.

The system starts the first process, and that process is running fine. But when this process tries to start the same script with -run-exthandlers then Python is crashing on initialization.

I see here https://github.com/Azure/WALinuxAgent/blob/develop/init/freebsd/waagent#L14 that -daemon process is started with "python", while the logs show that the -run-exthandlers is started with "python3", but you already confirmed that they are the same binary, and that -run-exthandlers runs fine from the command line.

Sorry, at this point I do not have other suggestions for you, but I'll let you know if I think of any.

As far as moving the agent later in the boot process... not sure if that'll work. Do you know if FreeBSD is provisioned in Azure using cloud-init? If not, then the agent is required to provision new VMs.

@minusbat
Copy link
Contributor Author

Its midnight ere, so am going to have to stop trying stuff, but I did try moving the startup to much later in the process, but with no effect. I always create new machines in Azure byu clonign old ones, so I don't need any of the provisioning features. If it is possible to simply run without waagent then that would be a solution for me - but I would quite like to have this fixed, as would you I suspect!

I'll do some more experiments tomorrow, and also maybe look into what the difference is between starring a script at boot compared to from a shell. Thanks.

@frostygoth
Copy link

Expressing interest in this issue. If i can replicate the issue or discover any solution I'll update the thread.

@minusbat
Copy link
Contributor Author

There has been some discussion of it over on the FreeBSD STABLE mailing list:

https://lists.freebsd.org/pipermail/freebsd-stable/2019-November/thread.html

Should eb easy to reprodcue, but tracking down has proved elusive for me! Let me know how you get on or if you have any ideas I could try. Just capturing the stderr from the subprocess would help.

@lwhsu
Copy link
Contributor

lwhsu commented Dec 7, 2019

With help from @delphij, we found the error log via ktrace:

https://gist.github.com/lwhsu/c020efbd6388fe1636b5ca31baaa1eee

and led to:

https://bugs.python.org/issue32849

But this fix is only in Python >= 3.7. I tried to backport the patch to 3.6 and waagent can start at the boot without dumping core.

I will work on backporting this to FreeBSD's lang/python36 port.

uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Dec 7, 2019
This is needed for starting sysutils/azure-agent at boot:
Azure/WALinuxAgent#1687

Obtained from:	python/cpython@f9c01a1
MFH:		2019Q4
Sponsored by:	The FreeBSD Foundation


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@519241 35697150-7ecd-e111-bb59-0022644237b5
uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Dec 7, 2019
This is needed for starting sysutils/azure-agent at boot:
Azure/WALinuxAgent#1687

Obtained from:	python/cpython@f9c01a1
MFH:		2019Q4
Sponsored by:	The FreeBSD Foundation
uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Dec 9, 2019
Fix namespace pollution in python3.5 and python3.6 (upstreamed fix)

The standard math library (libm) may follow IEEE-754 recommendation to
include an implementation of sinPi(), i.e. sinPi(x):=sin(pi*x).
And this triggers a name clash, found by FreeBSD developer
Steve Kargl, who worked on putting sinpi into libm used on FreeBSD
(it has to be named "sinpi", not "sinPi", cf. e.g.
https://en.cppreference.com/w/c/experimental/fpext4).

- python2.7 and > 3.6 are already fixed

PR:		232792
Submitted by:	Steve Kargl <sgk@troutmask.apl.washington.edu>, Dima Pasechnik <dimpase+freebsd@gmail.com>
Approved by:	python (maintainer timeout)
Obtained from:	python/cpython@b545ba0

Backport fix of https://bugs.python.org/issue32849

This is needed for starting sysutils/azure-agent at boot:
Azure/WALinuxAgent#1687

Obtained from:	python/cpython@f9c01a1
Sponsored by:	The FreeBSD Foundation

Fix makefile ordering.

Reported by:	mat

Approved by:	ports-secteam (miwi)
Sponsored by:	The FreeBSD Foundation
Jehops pushed a commit to Jehops/freebsd-ports-legacy that referenced this issue Dec 9, 2019
This is needed for starting sysutils/azure-agent at boot:
Azure/WALinuxAgent#1687

Obtained from:	python/cpython@f9c01a1
MFH:		2019Q4
Sponsored by:	The FreeBSD Foundation


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@519241 35697150-7ecd-e111-bb59-0022644237b5
@lwhsu
Copy link
Contributor

lwhsu commented Jan 20, 2020

This should be fixed and the default Python version of FreeBSD has been updated to 3.7.

@minusbat
Copy link
Contributor Author

I can verify that its fixed for me, and works with the latest default python. Thank you - I never would have found this on my own given the nature of the bug!

uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Apr 1, 2021
Fix namespace pollution in python3.5 and python3.6 (upstreamed fix)

The standard math library (libm) may follow IEEE-754 recommendation to
include an implementation of sinPi(), i.e. sinPi(x):=sin(pi*x).
And this triggers a name clash, found by FreeBSD developer
Steve Kargl, who worked on putting sinpi into libm used on FreeBSD
(it has to be named "sinpi", not "sinPi", cf. e.g.
https://en.cppreference.com/w/c/experimental/fpext4).

- python2.7 and > 3.6 are already fixed

PR:		232792
Submitted by:	Steve Kargl <sgk@troutmask.apl.washington.edu>, Dima Pasechnik <dimpase+freebsd@gmail.com>
Approved by:	python (maintainer timeout)
Obtained from:	python/cpython@b545ba0

Backport fix of https://bugs.python.org/issue32849

This is needed for starting sysutils/azure-agent at boot:
Azure/WALinuxAgent#1687

Obtained from:	python/cpython@f9c01a1
Sponsored by:	The FreeBSD Foundation

Fix makefile ordering.

Reported by:	mat

Approved by:	ports-secteam (miwi)
Sponsored by:	The FreeBSD Foundation
svmhdvn pushed a commit to svmhdvn/freebsd-ports that referenced this issue Jan 10, 2024
This is needed for starting sysutils/azure-agent at boot:
Azure/WALinuxAgent#1687

Obtained from:	python/cpython@f9c01a1
MFH:		2019Q4
Sponsored by:	The FreeBSD Foundation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants