-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: extend support to all Jetsons #10
Conversation
As of L4T R32.x, nv_update_engine can be told not to reboot the system after an update. This eliminates the need for the RebootEnter script, which we can replace with install and rollback scripts to handle the Tegra specifics. This also eliminates the need for the mender-install-manual command, since we no longer need to manually invoke the RebootEnter script. We also split out the U-Boot-specific scripts separately, so they only get used on platforms with U-Boot. Signed-off-by: Matt Madison <matt@madison.systems>
Tegra updates include the bootloaders, and if we are using U-Boot, bootloader updates can change the location of the U-Boot environment. So to ensure that Mender updates propagate U-Boot variables correctly, we use scripts to ensure that needed variables are set as part of the installation and rollback flows. Signed-off-by: Matt Madison <matt@madison.systems>
A subset of those used in other meta layers. Signed-off-by: Matt Madison <matt@madison.systems>
For Mender updates, we disable the nv_update_verifier service and handle update verification through a commit state script in the mender service. Signed-off-by: Matt Madison <matt@madison.systems>
and rework the bbappends so we don't have to modify the tegra-binaries recipe - doing so causes many recipes to get rebuilt when something changes. Signed-off-by: Matt Madison <matt@madison.systems>
U-Boot is either not supported or optional on some Tegra platforms, so only include the mender-uboot feature when U-Boot is the bootloader. Signed-off-by: Matt Madison <matt@madison.systems>
A fake version for fw_printenv and fw_setenv for Mender to use on non-U-Boot Tegra platforms. The bootloader interface is implemented in the state scripts instead. Signed-off-by: Matt Madison <matt@madison.systems>
For cboot, the BUP payload is generated with the initrd image. Also, remove the RDEPENDS on tegra-state-scripts, since that recipe no longer installs files into the rootfs. Signed-off-by: Matt Madison <matt@madison.systems>
to add a runtime dependency on the BUP payload and fake U-Boot fw-utils for non-U-Boot Tegra platforms. Signed-off-by: Matt Madison <matt@madison.systems>
We need to ensure that the state scripts are deployed before the mender image is created. Signed-off-by: Matt Madison <matt@madison.systems>
Bootloader redundancy isn't supported on tegra210 platforms yet, so make sure we don't try to depend on BUP payload construction for them. Signed-off-by: Matt Madison <matt@madison.systems>
The variable migration scripts are *only* needed on tegra186 platforms that use U-Boot, to provide for the possibility of relocation of the U-Boot environment during a bootloader update. Only tegra186 and tegra194 platforms need the other scripts that interface with NVIDIA's bootloader update system. Signed-off-by: Matt Madison <matt@madison.systems>
to match the real u-boot-fw-utils. Signed-off-by: Matt Madison <matt@madison.systems>
to support for TX1 and Nano development platforms. Nano dev kits use SPI flash for the U-Boot environment, and an SDcard for main storage. Signed-off-by: Matt Madison <matt@madison.systems>
to pull in a custom sdcard-layout file for Nano devkits. Signed-off-by: Matt Madison <matt@madison.systems>
This will need to be reworked a bit for Nano production modules, which have a different layout for their eMMCs. Signed-off-by: Matt Madison <matt@madison.systems>
Signed-off-by: Matt Madison <matt@madison.systems>
to restore CONFIG_ENV_IS_IN_MMC. Signed-off-by: Matt Madison <matt@madison.systems>
I thought that the Mender overrides could also be used for file paths, but I was wrong, so handle the U-Boot vs. non-U-Boot cases via functions in the recipe. Signed-off-by: Matt Madison <matt@madison.systems>
Nice changes! I am back from holiday on the 6th and should have some time to review the branch. Regarding the u-boot environment being in the SD-card in my PR, the reason I chose the SD-card instead of the SPI flash is that currently, in the company I am working for, only the SD card is flashed to provision a new Jetson Nano, and the SPI flash is left in its factory state. AFAIU having the environment in the SPI flash would add one more step to the provisioning process without any advantage for our use-case. In the past, as I worked on projects having different machines/models using the same module but a different carrier-board, we solved this by using some identification resistor on the carrier-board connected to some GPIO of the module so that the software on the module could recognize the carrier-board, rather than by storing the machine ID in the SPI flash on the carrier-board. This has the advantage that it is more robust than storing the machine ID in software. |
@madisongh Thanks for sharing! This is great! I just read through, haven't tried the changes myself yet.
Heroic efforts to handle this! I'd be less worried about them not working properly as I would be that they could somehow overwrite an area they shouldn't overwrite, although I can't think of a specific scenario where this would happen or would cause problems. If anything I'd say keep them in git and just remove from do_compile_append with a comment saying they aren't fully tested.
I don't have a strong opinion on this one, but since it's useful for @Ecordonnier to have this option it seems like it would be worth keeping it.
Fine with me. |
I tried today, using cboot-prebuilt configuration on TX2 and tegraflash to flash the new image. I scp'd in the .mender file to the /tmp directory, then ran this command
This completed successfully.
Here's mender-install.log I then rebooted, here's an example reboot log: At this point I'm still on I also noticed I'm still on slot 0 after the reboot, was expecting to go to slot 1
and I noticed the retry counter went to 0 on slot 1
I don't understand how the root filesystem is supposed to be setup in the cboot case. I see that fw_printenv and fw_setenv u-boot-fw-utils-fake files just use a Any pointers are appreciated. This is the first time I've tried to use cboot only as bootloader so I'm probably missing something obvious. |
From the reboot log, it looks like there was something wrong with the BUP payload:
which caused the bootloader to revert back to slot 0.... the slots info you see reflect the unsuccessful boot. The boot-time rootfs selection for the cboot case is driven by the boot slot. cboot passes the current boot slot as a kernel command line parameter, and the init script in the tegra-minimal-initramfs initrd uses that to decide which rootfs to mount. My posted changes may be missing something there, though, since the script in meta-tegra by default expects the slot 1 rootfs partition to be labeled |
I checked the code and didn't find any issue in the logic. Also see my comment, basically since my branch and yours are not compatible I think only your branch should get merged. Feel free to take over the README.md change of my branch as well as update the wiki entry for jetson-nano once you send a PR to the upstream repository. Regarding the u-boot environment I still think it would be better to have it in the sd-card (one step less for provisioning, meaning reduced cost of provisioning and reduced risk of error). |
Signed-off-by: Matt Madison <matt@madison.systems>
to use the '_b' suffix implemented by NVIDIA's A/B boot slot support and meta-tegra's initrd scripts. The naming isn't important for the platforms using U-Boot, but make the change across the board for consistency. Signed-off-by: Matt Madison <matt@madison.systems>
@dwalkes I ran a bunch of tests today after reproducing the issue you're seeing. Looks like it might be related to the issue I was seeing on my Xavier. I don't see the issue with the TX2 I have set up for secure boot, though - bootloader updates work fine there (although my setup for that machine is a bit different from the default). Bootloader updates also work fine on both Xavier and TX2/cboot when I don't include meta-mender-core/meta-mender-tegra in my layers. And they appear to work fine when using TX2/U-Boot with the mender stuff. I'm not sure what's going on yet. I'm going to run some comparisons of the built artifacts from working vs. non-working builds and see if that will point me in the right direction. |
Thanks @madisongh I spent a bit of time on it last night mostly just understanding more about how bup creation works in tegra. I was planning to take out mender layers as a next step which it looks like you already did. I should have more time to spend on it this weekend if you haven't solved by then. Were you able to find out what |
I tracked down the problem and will push updates to address that and the other issues reported here shortly. |
We need back the mmc dev and part #defines, just as we have for the TX1. Signed-off-by: Matt Madison <matt@madison.systems>
The scripts were sending non-error output to stderr Signed-off-by: Matt Madison <matt@madison.systems>
…e-scripts * Split the rollback state script in two - pre-reboot and post-reboot. Mark the current (rolled-back-to) boot slot successful post-reboot. * When using U-Boot on TX2, a boot failure detected after U-Boot takes control would desyncrhonize the the Tegra boot slot and Mender's rootfs slot. Have the ArtifactRollback_Leave state script detect this and switch the boot slot to resynchronize them. * Also with U-Boot on TX2, a failure in the Tegra bootloader (before U-Boot takes over) would cause the processor to switch boot slots back to the good one, and Mender would happily mark that boot as successful since U-Boot didn't detect the problem. To deal with this, add an ArtifactCommit_Enter state script to check that the boot slot matches the rootfs slot and if not, return an error so Mender rolls back the update. * Remove the '-e' from the #! lines of the scripts. Some use expr, which returns a non-zero status if the resulting expression is zero, and that was causing error exits from the script. Signed-off-by: Matt Madison <matt@madison.systems>
For Jetson-Nano devices with SPI flash and SDcards, the default is to use SPI flash for the U-Boot environment. However, if you prefer to store the U-Boot environment on the SDcard, add TEGRA_MENDER_UBOOT_ENV_IN_SPIFLASH = "0" to your build configuration (e.g., local.conf) to override the default. Signed-off-by: Matt Madison <matt@madison.systems>
to increase the ENV partition to 256K, so it can actually hold two full copies of the Mender-sized (128K) U-Boot environment. Signed-off-by: Matt Madison <matt@madison.systems>
* Set MENDER_RESERVED_SPACE_BOOTLOADER_DATA when using SDcard space for the U-Boot environment on Jetson-Nano devices. * Decouple the sizing of the rootfs partition from the requested size for the image rootfs, so the partition is always a bit larger. In particular with SDCard partitioning, the image rootfs size was exceeding the partition size. Also renamed 'mender_tegra_*' to 'tegra_mender_*' to make sure that there are no conflicts with Mender's own naming of things, and to be consistent with our local variable names (which cannot begin with MENDER_). Signed-off-by: Matt Madison <matt@madison.systems>
So "shortly" turned out to take longer than I expected, but I've pushed changes for:
I also did some fault injection testing and made additional changes to the state scripts to better handle update failures and rollbacks. In particular, there are checks to make sure that we don't let the Tegra A/B boot slot get out of sync with Mender's for different failure scenarios. I'll run some further tests this weekend, but this latest should be in reasonable shape. |
Thanks @madisongh! I can confirm this fixes issues I noticed with cboot TX2 related updates. I've also tested TX2 u-boot, Nano with and without TEGRA_MENDER_UBOOT_ENV_IN_SPIFLASH and rollback in each scenario, all worked for me. I don't have an Xavier platform to test with but based on the relatively small number of changes from TX2 I'm not expecting issues with that platform. I started a conversation on mender hub to get some feedback on the fake fw_printenv/fw_setenv approach here. You are of course welcome to add your name to the maintainer list if you like but I'm guessing you have enough on your plate with meta-tegra. If you did want to maintain this it might make more sense to make it a part of meta-tegra. If not I'll plan to continue to maintain as I can through meta-mender-community. Thanks for sharing your expertise and massive improvements with these contributions, very much appreciated! |
The wrapper script to unlock the eMMC boot1 partition is also needed there. Signed-off-by: Matt Madison <matt@madison.systems>
For platforms that configure ROOTFSPART_SIZE directly we need to base the MENDER_IMAGE_ROOTFS_SIZE setting on that value rather than on MENDER_CALC_ROOTFS_SIZE. Signed-off-by: Matt Madison <matt@madison.systems>
which is in bytes, rather than Kibytes. Signed-off-by: Matt Madison <matt@madison.systems>
…tall script Missed that on the last pass. Signed-off-by: Matt Madison <matt@madison.systems>
I've pushed a handful a fixes for problems that cropped up in my testing this weekend. With these changes, Jetson-TX1 now updates properly, and the one system I have with a custom flash layout that hard-codes the value for ROOTFSPART_SIZE also works. Retested across all my devices this morning with these changes, all check out. Let me know if you'd like me to rebase these changes and either re-push to this PR or open up a new one with the cleaned-up history. |
Thanks @madisongh
No need as far as I'm concerned. Assuming you want to keep these changes in meta-mender-community I think we need to wait for them to add a zeus branch before submitting a PR there. Depending on what happens with mendersoftware#119 I'll rebase these changes on top of warrior as needed. |
Let's keep it separate, although I'm open to making changes in meta-tegra to simplify the integration where it makes sense to do so. As you say, I already have my hands full, so I'm fine with handing off the ongoing maintenance, but feel free to reach out if you need something. |
FYI, I have updates for L4T R32.3.1 on this branch. Flash layouts changed quite a bit between R32.2.x and R32.3.x, so OTA upgrades between them aren't going to be possible without a bunch of work (and even then, maybe not). |
Thanks @madisongh
Would you suggest I just make the zeus branch for the meta-mender-mender community project target zeus-l4t-r32.3.1 for people who don't have specific requirements, assuming 32.3.x is most likely to be what's closest to matching flash layouts going forward? |
Might be a good idea. Another approach which I'm going to try is to use the 32.3.1 layouts with the 32.2.x-based zeus branch, to see if that will work. But going with 32.3.1 as the base has the added advantage that bootloader update support looks better in that version. |
…son-nano" This reverts commit c41fe6e, reversing changes made to f524d9d. Reverted to make rebase merges for zeus easier, since these were placed on top of warrior before initial Nano support. See #10 and mendersoftware#119 (comment)
I've opened #11 to track what I plan to merge on the zeus branch, targeting l4t r32.3.1 and reverting previous merges for warrior support before applying the changes on @madisongh zeus-l4t-r32.3.1 branch |
I've been able to make this work by using an R32.3.1-based flash layout XML file for my R32.2.x builds. |
…son-nano" This reverts commit c41fe6e, reversing changes made to f524d9d. Reverted to make rebase merges for zeus easier, since these were placed on top of warrior before initial Nano support. See #10 and mendersoftware#119 (comment)
…son-nano" This reverts commit c41fe6e, reversing changes made to f524d9d. Reverted to make rebase merges for zeus easier, since these were placed on top of warrior before initial Nano support. See #10 and mendersoftware#119 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Note that I still consider this a work in progress, but I thought I'd get these changes up for review for some early feedback.
This patch series extends Mender support to all of the current Jetson development kits: TX1, TX2, Xavier, and Nano.
Key changes:
nv_update_engine
to eliminate the need for an ArtifactRebootEnter script.I've done some testing both manually and OTA on TX1, TX2 with and without U-Boot, Nano, and Xavier. Xavier has some issues that are probably not Mender-specific - this is the first time I've actually tried using the NVIDIA bootloader update support on that platform, and it looks like something may need fixing in meta-tegra. Other platforms look OK so far.
During testing I ran into some issues that required fixes in meta-tegra, so if you play with these changes, make sure you're up-to-date with the zeus branch there.
A couple of things in particular I'd like feedback on: