-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recently released EdgeAgent container v1.0.10 causes iotedged v1.0.9.4 to crash on Yocto arm64 build #3746
Comments
Thanks for the report, we'll investigate. |
Similar to the issue linked in this one, the backtrace is truncated. This is probably because it's a release build. Are you able to try with a debug build? This might let us see the backtrace code. There is also a chance that it's actual truncation, and that by doing "journalctl" with no filtering we might see more. |
I modified both meta-iotedge/recipes-core/iotedge-daemon/iotedge-daemon.inc and meta-iotedge/recipes-core/iotedge-cli/iotedge-cli.inc to include |
I'm not familiar with the yocto build but I assume it implicitly invokes |
I've confirmed that the information provided was, in fact, from a debug build. The poky build generates an auto.conf which has |
There would be a lot more frames in the backtrace if it were a debug build. So either it isn't a debug build, or something else has broken the backtrace-taking code on yocto (not impossible). Like I said in #2747 the string that's being indexed incorrectly doesn't appear to be in our code nor our deps, so we can't tell without a proper backtrace who the culprit is. Maybe configure your distro to take a coredump, and see if a debugger gives a better stack trace than the built-in stack-walker. |
I added gdb to our build. I focused on
Output of
Backtrace of thread 7 which SIGABRT'd:
Backtrace of thread 1:
Backtrace of thread 2:
Backtrace of thread 8:
Backtrace of thread 9:
Backtrace of thread 10:
|
This likely isn't helpful, but a colleague of mine who is well versed in Rust helped me to set breakpoints to navigate back up through the stack to try to get a better callstack. This is as far as I could get:
|
Thread 2 parked in You might get a better stack with |
Okay, so if |
This time I set a breakpoint on mod.rs:1792..... progress?
|
That is excellent! Let me check what sysinfo 0.9.6 is doing. |
Okay, so https://docs.rs/sysinfo/0.9.6/src/sysinfo/linux/component.rs.html#51 is the problem. It was fixed by GuillaumeGomez/sysinfo@01e0c73 which was made just after 0.9.6 was released. It is in 0.10.0 - https://docs.rs/sysinfo/0.10.0/src/sysinfo/linux/component.rs.html#66 In our 1.0.10 we've updated the dependency to 0.12.0, so it should be fixed there. |
Awesome. I think you nailed it. I was chatting in real-time with @srwalter and he pointed out the following:
I can try updating that crate first as a quick fix. We have plans to move to 1.0.10, but will that will take a bit more time. Thanks for all your help! |
That also explains why both So please update iotedged to 1.0.10 and see if the issue is resolved. |
Your hwmon0 listing doesn't have any files that would cause the problem. Do you have any other hwmon devices that have a file named just |
Sorry, wasn't paying close enough attention. Yes, there is also an hwmon1 that has just 'temp'. |
Are you sure 1.0.10 is updated to sysinfo 0.12.0? https://github.com/Azure/iotedge/blob/release/1.0.10/edgelet/iotedge/Cargo.toml I've successfully built 1.0.10, but I'm crashing in the same place. Please note that I'm still using the prior pinned version of Rust in the build and I'm working on bringing that up to Rust 1.44.1. |
fyi. I patched the above file to pin sysinfo to version 0.12.0 and 'iotedge check' now successfully runs on our platform. |
This issue is being marked as stale because it has been open for 30 days with no activity. |
Looks like we updated |
This change upgrades the sysinfo package in the iotedge CLI and iotedged. It fixes bugs called out in Azure#2747 and Azure#3746. This change was fixed in master earlier and was supposed to by ported to the 1.1 branch, but never was.
* Fix GetModules call error using RFC3339 DateTime format. (#4293) Fix issue calling GetModuleLogs using DirectMethod with a payload that contains parameters using RF3339 DateTime format. * Prepare for release 1.1.0 (#4342) * Update nuget config per security recommendations (#4289) * Set IOTEDGE_HOST before 'make install' (#4396) After commit a34e0ed, the iotedge CLI fails in Mariner because IOTEDGE_HOST was not set during compilation, so the CLI has the wrong path to iotedged's management endpoint. To fix this, we set IOTEDGE_HOST prior to running `make install`. It was already being set prior to `make release`. * Update Base Images for Security Vulnerability 3.1.12 (#4380) - Update ARM32/ARM64 Bionic base images to 3.1.12 - Update AMD64 (Linux) Alpine base images to 3.1.12 - Update AMD64 (Windows) Nanoserver base images to 3.1.12 * Update release 1.1.0 (#4399) Update the CHANGELOG to include the latest commits. * Config yaml 1.1 (#4434) This change updates the default agent tag in config.yaml to 1.1. Cherry-pick a4faab5 and update 1.1-specific Windows config.yaml. * Upgrade sysinfo package (#4444) This change upgrades the sysinfo package in the iotedge CLI and iotedged. It fixes bugs called out in #2747 and #3746. This change was fixed in master earlier and was supposed to by ported to the 1.1 branch, but never was. * EFLOW: Introduce environment variables for nuget operations (#4499) Set env variables to direct nuget caches to the data partition instead of to the rootfs Co-authored-by: Pedro Marcelo Zara <pmzara@hotmail.com> Co-authored-by: Damon Barry <damonbarry@users.noreply.github.com> Co-authored-by: yophilav <54859653+yophilav@users.noreply.github.com> Co-authored-by: ms-mahuber <60939654+ms-mahuber@users.noreply.github.com>
We have a custom embedded yocto build on an arm64 platform utilizing Azure IoT Edge. Earlier this week when new versions of Azure IoT Edge and associated EdgeAgent / EdgeHub containers were released we observed iotedge caught in a crash loop in all of our devices. Both config.yaml and the deployment manifests specified edgeAgent versions of "1.0" (i.e., 'rolling tags'). After the release earlier in the week the devices updated to the 1.0.10 version of edgeAgent and upon restart of edgeAgent iotedge then began to crash in a loop. We have since pinned our config.yaml and deployment manifests to version 1.0.9 to get everything working again.
Expected Behavior
I expected that the recently released containers (v1.0.10) would have been compatible with the previous version of iotedge (v1.0.9.4).
Current Behavior
iotedge always crashes with this output:
Steps to Reproduce
See problem description above.
Context (Environment)
Output of
iotedge check
Interestingly,
iotedge check
also fails in a similar way.A similar issue was previously logged, but never resolved. There seems to be a difference though in that the issue below was at least able to get a full stacktrace (although RUST_BACKTRACE was not enabled).
#2747
Device Information
Runtime Versions
iotedge version
: 1.0.9.4docker version
]:Logs
iotedged logs
Additional Information
We are building iotedge via this meta-iotedge layer:
https://github.com/Azure/meta-iotedge/tree/zeus
The text was updated successfully, but these errors were encountered: