Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Xavier NX Devkit SD-CARD] Occasional cboot panic and halt on 32.6.1 after reboot #891

Open
acostach opened this issue Jan 30, 2022 · 14 comments

Comments

@acostach
Copy link

Not sure if this has been happening with older L4Ts but I've noticed this sporadic panic in cboot 32.6.1, happens occasionally after rebooting the device multiple times:

] ��[0001.031] I>  2) Base:0xf2000000 Size:0x00200000
[0001.185] I>  3) Base:0xf1200000 Size:0x00200000
[0001.190] I>  4) Base:0xf1000000 Size:0x00100000
[0001.194] I>  5) Base:0xf0f00000 Size:0x00100000
[0001.199] I>  6) Base:0xf3800000 Size:0x00400000
[0001.203] I>  7) Base:0xf1c00000 Size:0x00400000
[0001.208] I>  8) Base:0xf0e00000 Size:0x00100000
[0001.212] I>  9) Base:0xf0d00000 Size:0x00100000
[0001.217] I> 10) Base:0xf3000000 Size:0x00800000
[0001.221] I> 11) Base:0x40000000 Size:0x00040000
[0001.225] I> 12) Base:0xf0c00000 Size:0x00100000
[0001.230] I> 13) Base:0x40046000 Size:0x00002000
[0001.234] I> 14) Base:0x40048000 Size:0x00002000
[0001.239] I> 15) Base:0xac000000 Size:0x00004000
[0001.243] I> 16) Base:0x4004a000 Size:0x00002000
[0001.248] I> 17) Base:0xf0b00000 Size:0x00100000
[0001.252] I> 18) Base:0x4004c000 Size:0x00002000
[0001.257] I> 19) Base:0xf2200000 Size:0x00600000
[0001.261] I> 20) Base:0x4004e000 Size:0x00002000
[0001.266] I> 21) Base:0xf0ad0000 Size:0x0000c000
[0001.270] I> 22) Base:0x00000000 Size:0x00000000
[0001.275] I> 23) Base:0xf0ae0000 Size:0x00020000
[0001.279] I> 24) Base:0xf6000000 Size:0x02000000
[0001.284] I> 25) Base:0x40050000 Size:0x00002000
[0001.288] I> 26) Base:0x40040000 Size:0x00006000
[0001.292] I> 27) Base:0xf1800000 Size:0x00400000
[0001.297] I> 28) Base:0xf4c00000 Size:0x01400000
[0001.301] I> 29) Base:0xf1400000 Size:0x00400000
[0001.306] I> 30) Base:0x00000000 Size:0x00000000
[0001.310] I> 31) Base:0x00000000 Size:0x00000000
[0001.315] I> 32) Base:0xf8000000 Size:0x08000000
[0001.319] I> 33) Base:0x00000000 Size:0x00000000
[0001.324] I> 34) Base:0xf3c00000 Size:0x01000000
[0001.328] I> 35) Base:0xab000000 Size:0x01000000
[0001.333] I> 36) Base:0xa0000000 Size:0x0b000000
[0001.337] I> 37) Base:0xf2800000 Size:0x00800000
[0001.342] I> 38) Base:0x80000000 Size:0x20000000
[0001.346] I> 39) Base:0xb0000000 Size:0x08000000
[0001.350] I> 40) Base:0x00000000 Size:0x00000000
[0001.355] I> 41) Base:0x00000000 Size:0x00000000
[0001.359] I> 42) Base:0x00000000 Size:0x00000000
[0001.364] I> 43) Base:0x00000000 Size:0x00000000
[0001.368] I> 44) Base:0x00000000 Size:0x00000000
[0001.373] I> 45) Base:0x00000000 Size:0x00000000
[0001.377] GIC-SPI Target CPU: 0
[0001.380] Interrupts Init done
[0001.383] calling constructors
[0001.386] initializing heap
[0001.389] I> Heap: [0xa069cb70 ... 0xab000000]
[0001.393] initializing threads
[0001.396] initializing timers
[0001.399] creating bootstrap completion thread
[0001.403] top of bootstrap2()
[0001.406] CPU: MIDR: 0x4E0F0040, MPIDR: 0x80000000
[0001.410] initializing platform
[0001.413] E> DEVICE_PROD: Invalid value data = 0, size = 0.
[0001.419] W> device prod register failed
[0001.422] I> Bl_dtb @0xaaf00000
[0001.431] W> "plugin-manager" doesn't exist, creating
[0001.431] W> "ids" doesn't exist, creating
[0001.435] W> "connection" doesn't exist, creating
[0001.439] W> "configs" doesn't exist, creating
[0001.450] E> failed to read label property for node 158736: 13
[0001.452] E> failed to read reg property for node 158800: 13
[0001.456] E> failed to read reg property for node 158852: 13
[0001.461] E> failed to read reg property for node 158936: 13
[0001.469] I> Find /i2c@3160000's alias i2c0
[0001.469] I> Reading eeprom i2c=0 address=0x50
[0001.499] I> Device at /i2c@3160000:0x50
[0001.499] I> Reading eeprom i2c=0 address=0x57
[0001.524] I> Device at /i2c@3160000:0x57
[0001.525] I> Find /i2c@c240000's alias i2c1
[0001.526] I> Reading eeprom i2c=1 address=0x50
[0001.527] E> I2C: slave not found in slaves.
[0001.528] E> I2C: Could not write 0 bytes to slave: 0x00a0 with repeat start true.
[0001.529] E> I2C_DEV: Failed to send register address 0x00000000.
[0001.529] E> I2C_DEV: Could not read 256 registers of size 1 from slave 0xa0 at 0x00000000 via instance 1.
[0001.538] E> eeprom: Retry to read I2C slave device.
[0001.543] E> I2C: slave not found in slaves.
[0001.547] E> I2C: Could not write 0 bytes to slave: 0x00a0 with repeat start true.
[0001.555] E> I2C_DEV: Failed to send register address 0x00000000.
[0001.560] E> I2C_DEV: Could not read 256 registers of size 1 from slave 0xa0 at 0x00000000 via instance 1.
[0001.570] E> eeprom: Failed to read I2C slave device
[0001.574] I> Eeprom read failed 0x3526070d
[0001.578] I> create_pm_ids: id: 3668-0000-200-J, len: 15
[0001.584] I> config: mem-type:00,power-config:00,misc-config:00,modem-config:00,touch-config:00,display-config:00,, len: 93
[0001.595] I> create_pm_ids: id: 3509-0000-100-G, len: 15
[0001.600] I> config: mem-type:00,power-config:00,misc-config:00,modem-config:00,touch-config:00,display-config:00,, len: 93
[0001.611] I> Adding plugin-manager/ids/3668-0000-200=/i2c@3160000:module@0x50
[0001.618] W> "i2c@3160000" doesn't exist, creating
[0001.622] W> "module@0x50" doesn't exist, creating
[0001.627] I> Adding plugin-manager/ids/3509-0000-100=/i2c@3160000:module@0x57
[0001.634] W> "module@0x57" doesn't exist, creating
[0001.640] I> Adding plugin-manager/cvm
[0001.643] W> "chip-id" doesn't exist, creating
[0001.647] I> Adding plugin-manager/chip-id/A02P
[0001.651] I> Plugin-manager override starting
[0001.656] I> node /plugin-manager/fragment-pcie-c5-rp matches
[0001.665] I> node /plugin-manager/fragement-tegra-wdt-en matches
[0001.670] I> node /plugin-manager/fragement-tegra-sdhci-emmc-dis matches
[0001.677] I> Disable plugin-manager status in FDT
[0001.678] I> Plugin-manager override finished successfully
[0001.683] I> gpio framework initialized
[0001.688] I> tegrabl_gpio_driver_register: register 'nvidia,tegra194-gpio' driver
[0001.695] I> tegrabl_gpio_driver_register: register 'nvidia,tegra194-gpio-aon' driver
[0001.702] I> tegrabl_tca9539_init: i2c bus: 1, slave addr: 0x46
[0001.709] W> fetch_driver_phandle_from_dt: failed to get node with compatible ti,tca9539
[0001.717] W> fetch_driver_phandle_from_dt: failed to get node with compatible nxp,tca9539
[0001.724] W> tegrabl_tca9539_init: failed to fetch phandle from dt
[0001.730] I> tegrabl_tca9539_init: i2c bus: 1, slave addr: 0x44
[0001.737] W> fetch_driver_phandle_from_dt: failed to get node with compatible ti,tca9539
[0001.745] W> fetch_driver_phandle_from_dt: failed to get node with compatible nxp,tca9539
[0001.751] W> tegrabl_tca9539_init: failed to fetch phandle from dt
[0001.759] I> fixed regulator driver initialized
[0001.767] I> register 'maxim' power off handle
[0001.767] I> virtual i2c enabled
[0001.769] I> registered 'maxim,max20024' pmic
[0001.774] I> tegrabl_gpio_driver_register: register 'max20024-gpio' driver
[0001.780] I> Boot-device: QSPI
[0001.783] I> Boot_device: QSPI_FLASH instance: 0
[0001.788] I> QSPI source rate = 204000 Khz
[0001.791] I> Requested rate for QSPI clock = 34000 Khz
[0001.796] I> BPMP-set rate for QSPI clk = 34000 Khz
[0001.801] I> QSPI Flash Size = 32 MB
[0001.809] I> Qspi initialized successfully
[0001.809] I> qspi flash-0 params source = boot args
[0001.813] I> create_pm_ids: id: 3668-0000-200-J, len: 15
[0001.818] I> config: mem-type:00,power-config:00,misc-config:00,modem-config:00,touch-config:00,display-config:00,, len: 93
[0001.829] I> create_pm_ids: id: 3509-0000-100-G, len: 15
[0001.835] I> config: mem-type:00,power-config:00,misc-config:00,modem-config:00,touch-config:00,display-config:00,, len: 93
[0001.846] I> Found sdcard
[0001.850] I> enabling 'vdd-sdmmc1-sw' regulator
[0001.856] I> regulator 'vdd-sdmmc1-sw' already enabled
[0002.097] I> sdmmc SDR mode
[0002.111] I> -0 params source = 
[0002.113] I> Found 47 partitions in QSPI_FLASH (instance 0)
[0002.125] I> Found 13 partitions in SDCARD (instance 0)
[0002.133] I> regulator 'vdd-hdmi-5v0' already enabled
[0002.138] I> regulator 'vdd-hdmi-5v0' already enabled
[0002.138] E> tegrabl_display_init_regulator: hdmi cable is not connected
[0002.139] E> tegrabl_display_get_pdata, failed to parse dtb settings
[0002.142] E> invalid display type
[0002.144] E> cannot find any other nvdisp nodes
[0002.144] E> no valid display unit config found in dtb
[0002.145] W> display init failed
[0002.147] I> Load in CBoot Boot Options partition and parse it
[0002.152] I> Active slot suffix: 
[0002.156] E> Error -9 when finding node with path /boot-configuration
[0002.162] E> tegrabl_cbo_parse_info: "boot-configuration" not found in CBO file.
[0002.169] I> Using default boot order
[0002.172] I> boot-dev-order :-
[0002.175] I> 1.sd
[0002.177] I> 2.usb
[0002.179] I> 3.nvme
[0002.181] I> 4.emmc
[0002.183] I> 5.net
[0002.185] I> Hit any key to stop autoboot:     4       3       2       1
[0004.192] initializing target
[0004.192] calling apps_init()
[0004.193] starting app kernel_boot_app
[0004.203] I> found decompressor handler: lz4-legacy
[0004.204] I> decompressing BMP blob ...
[0004.215] I> Kernel type = Normal
[0004.215] I> ########## SD (0) boot ##########
[0004.216] I> Found sdcard
[0004.218] I> regulator 'vdd-sdmmc1-sw' already enabled
[0004.222] I> regulator 'vdd-sdmmc1-sw' already enabled
[0004.257] I> sdmmc SDR mode
[0004.271] I> -0 params source = 
[0004.272] I> Already published: 00060000
[0004.272] I> Look for boot partition
[0004.272] I> Fallback: assuming 0th partition is boot partition
[0004.273] I> Detect filesystem
[0004.277] I> fs_detect:173: Unsupported or no filesystem present
[0004.278] I> Loading kernel-bootctrl from partition
[0004.278] I> Loading partition kernel-bootctrl at 0xa42e0000 from device(0x6)
[0004.306] W> tegrabl_get_kernel_bootctrl: magic number(0x00000000) is invalid
[0004.307] W> tegrabl_get_kernel_bootctrl: use default dummy boot control data
[0004.307] I> Active slot suffix: 
[0004.308] I> Slot suffix: 
[0004.308] I> booting into recovery image, disabling A/B slot selection[0004.309] I> Loading recovery ...
[0004.312] E> Cannot open partition recovery
[0004.316] E> Load recovery image or recovery dtb failed, err: 202184205
[0004.322] E> SD boot failed, err: 202184205
[0004.326] I> ########## USB (0) boot ##########
[0004.347] I> Validate XUSB-FW ...
[0004.347] I> T19x: Authenticate XUSB-FW (bin_type: 11), max size 0x30000
[0004.348] E> Stage2Signature validation failed with SHA2!!
[0004.348] C> OEM authentication of XUSB-FW header failed!
[0004.351] E> Failed to validate XUSB-FW binary (err=1077936152)
[0004.357] E> failed to initialize xhci controller
[0004.361] E> Error in init of XUSB host driver, err: 40400018
[0004.367] W> Failed to initialize device 5-0
[0004.371] E> USB boot failed, err: 1077936152
[0004.375] I> ########## NVME (0) boot ##########
[0004.380] I> Initializing nvme device instance 0
[0004.384] I> Initializing nvme controller
[0004.388] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x14180000
[0004.394] I> vpcie3v3-supply not found
[0004.398] I> vpcie12v-supply not found
[0004.402] W> Failed to get nvidia,plat-gpios
[0004.406] I> tegrabl_pcie_soc_preinit: (0):
[0004.410] I> Unpowergate
[0004.412] I> tegrabl_car_clk_disable(0) ...
[0004.416] I> tegrabl_car_rst_set(CORE, 0) ...
[0004.420] I> tegrabl_car_rst_set(APB, 0) ...
[0004.425] I> tegrabl_car_clk_enable(0) ...
[0004.429] I> tegrabl_car_rst_clear(APB, 0) ...
[0004.433] I> tegrabl_set_ctrl_state(0)
[0004.436] I> CLR PCIE_APB:6
[0004.439] I> tegrabl_pcie_soc_init: (0):
[0004.443] I> APPL initialization ...
[0004.446] I> poweron phys
[0004.449] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x14180000
[0004.455] I> tegrabl_power_on_phy: controller 0 not available
[0004.460] E> Failed to power on phy on controller-0
[0004.466] W> Failed tegrabl_pcie_soc_init(), error=0x1
[0004.470] I> Failed to initialize SoC Host PCIe controller
[0004.476] E> tegrabl_nvme_init: Failed tegrabl_pcie_init(0); error=0x1
[0004.482] W> tegrabl_nvme_bdev_open: Failed NVME INIT; error=0x80800601
[0004.488] W> Failed to open NVME-0, err = 80800601
[0004.493] W> Failed to initialize device 10-0
[0004.497] E> NVME (0) boot failed, err: 0x80800601
[0004.502] I> ########## NVME (1) boot ##########
[0004.506] I> Initializing nvme device instance 1
[0004.511] I> Initializing nvme controller
[0004.515] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x14100000
[0004.521] I> vpcie3v3-supply not found
[0004.525] I> vpcie12v-supply not found
[0004.528] W> Failed to get nvidia,plat-gpios
[0004.532] I> tegrabl_pcie_soc_preinit: (1):
[0004.536] I> Unpowergate
[0004.539] I> tegrabl_car_clk_disable(1) ...
[0004.543] I> tegrabl_car_rst_set(CORE, 1) ...
[0004.547] I> tegrabl_car_rst_set(APB, 1) ...
[0004.551] I> tegrabl_car_clk_enable(1) ...
[0004.555] I> tegrabl_car_rst_clear(APB, 1) ...
[0004.559] I> tegrabl_set_ctrl_state(1)
[0004.563] I> CLR PCIE_APB:6
[0004.566] I> tegrabl_pcie_soc_init: (1):
[0004.569] I> APPL initialization ...
[0004.573] I> poweron phys
[0004.575] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x14100000
[0004.582] I> tegrabl_power_on_phy: controller 1 not available
[0004.587] E> Failed to power on phy on controller-1
[0004.592] W> Failed tegrabl_pcie_soc_init(), error=0x1
[0004.597] I> Failed to initialize SoC Host PCIe controller
[0004.602] E> tegrabl_nvme_init: Failed tegrabl_pcie_init(1); error=0x1
[0004.609] W> tegrabl_nvme_bdev_open: Failed NVME INIT; error=0x80800601
[0004.615] W> Failed to open NVME-1, err = 80800601
[0004.620] W> Failed to initialize device 10-1
[0004.624] E> NVME (1) boot failed, err: 0x80800601
[0004.629] I> ########## NVME (2) boot ##########
[0004.633] I> Initializing nvme device instance 2
[0004.637] I> Initializing nvme controller
[0004.642] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x14120000
[0004.648] I> vpcie3v3-supply not found
[0004.651] I> vpcie12v-supply not found
[0004.655] W> Failed to get nvidia,plat-gpios
[0004.659] I> tegrabl_pcie_soc_preinit: (2):
[0004.663] I> Unpowergate
[0004.666] I> tegrabl_car_clk_disable(2) ...
[0004.669] I> tegrabl_car_rst_set(CORE, 2) ...
[0004.674] I> tegrabl_car_rst_set(APB, 2) ...
[0004.678] I> tegrabl_car_clk_enable(2) ...
[0004.682] I> tegrabl_car_rst_clear(APB, 2) ...
[0004.686] I> tegrabl_set_ctrl_state(2)
[0004.690] I> CLR PCIE_APB:6
[0004.692] I> tegrabl_pcie_soc_init: (2):
[0004.696] I> APPL initialization ...
[0004.699] I> poweron phys
[0004.702] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x14120000
[0004.708] I> tegrabl_power_on_phy: controller 2 not available
[0004.714] E> Failed to power on phy on controller-2
[0004.719] W> Failed tegrabl_pcie_soc_init(), error=0x1
[0004.723] I> Failed to initialize SoC Host PCIe controller
[0004.729] E> tegrabl_nvme_init: Failed tegrabl_pcie_init(2); error=0x1
[0004.735] W> tegrabl_nvme_bdev_open: Failed NVME INIT; error=0x80800601
[0004.742] W> Failed to open NVME-2, err = 80800601
[0004.746] W> Failed to initialize device 10-2
[0004.750] E> NVME (2) boot failed, err: 0x80800601
[0004.755] I> ########## NVME (3) boot ##########
[0004.760] I> Initializing nvme device instance 3
[0004.764] I> Initializing nvme controller
[0004.768] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x14140000
[0004.774] I> vpcie3v3-supply not found
[0004.778] I> vpcie12v-supply not found
[0004.781] W> Failed to get nvidia,plat-gpios
[0004.786] I> tegrabl_pcie_soc_preinit: (3):
[0004.790] I> Unpowergate
[0004.792] I> tegrabl_car_clk_disable(3) ...
[0004.796] I> tegrabl_car_rst_set(CORE, 3) ...
[0004.800] I> tegrabl_car_rst_set(APB, 3) ...
[0004.804] I> tegrabl_car_clk_enable(3) ...
[0004.808] I> tegrabl_car_rst_clear(APB, 3) ...
[0004.813] I> tegrabl_set_ctrl_state(3)
[0004.816] I> CLR PCIE_APB:6
[0004.819] I> tegrabl_pcie_soc_init: (3):
[0004.822] I> APPL initialization ...
[0004.826] I> poweron phys
[0004.829] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x14140000
[0004.835] I> tegrabl_power_on_phy: controller 3 not available
[0004.840] E> Failed to power on phy on controller-3
[0004.845] W> Failed tegrabl_pcie_soc_init(), error=0x1
[0004.850] I> Failed to initialize SoC Host PCIe controller
[0004.855] E> tegrabl_nvme_init: Failed tegrabl_pcie_init(3); error=0x1
[0004.862] W> tegrabl_nvme_bdev_open: Failed NVME INIT; error=0x80800601
[0004.868] W> Failed to open NVME-3, err = 80800601
[0004.873] W> Failed to initialize device 10-3
[0004.877] E> NVME (3) boot failed, err: 0x80800601
[0004.882] I> ########## NVME (4) boot ##########
[0004.886] I> Initializing nvme device instance 4
[0004.891] I> Initializing nvme controller
[0004.895] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x14160000
[0004.901] I> vpcie3v3-supply not found
[0004.904] I> vpcie12v-supply not found
[0004.908] W> Failed to get nvidia,plat-gpios
[0004.912] I> tegrabl_pcie_soc_preinit: (4):
[0004.916] I> Unpowergate
[0004.919] I> tegrabl_car_clk_disable(4) ...
[0004.923] I> tegrabl_car_rst_set(CORE, 4) ...
[0004.927] I> tegrabl_car_rst_set(APB, 4) ...
[0004.931] I> tegrabl_car_clk_enable(4) ...
[0004.935] I> tegrabl_car_rst_clear(APB, 4) ...
[0004.939] I> tegrabl_set_ctrl_state(4)
[0004.943] I> CLR PCIE_APB:6
[0004.946] I> tegrabl_pcie_soc_init: (4):
[0004.949] I> APPL initialization ...
[0004.953] I> poweron phys
[0004.956] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x14160000
[0004.962] I> tegrabl_power_on_phy: power on phy @0x3f40000
[0005.076] I> PCIe controller-4 link is up
[0005.077] I> tegra_pcie_info[4].cfg0_base = 0x36000000
[0005.077] I> tegra_pcie_info[4].cfg1_base = 0x36020000
[0005.078] I> tegra_pcie_info[4].atu_dma_base = 0x36040000
[0005.078] I> tegra_pcie_bus[4].mem = 0x36200000
[0005.078] I> Scanning busnr: 0 devfn: 0
[0005.079] I> PCIe IDs: 0x1ad110de
[0005.082] I> PCIe RID_CC: 0x60400a1
[0005.085] I> Scanning busnr: 1 devfn: 0
[0005.089] I> PCIe IDs: 0xc82210ec
[0005.092] I> PCIe RID_CC: 0x2800000
[0005.095] I> PCI Config: I/O=0x36100000, Memory=0x36200000
[0005.101] I> IO bar_num=0 bar=0x36100000
[0005.104] I> MEM64 bar_num=2 bar=0x36200000
[0005.108] I> Number of PCIe devices detected: 2
[0005.113] E> tegrabl_nvme_init: Failed tegrabl_pcie_get_dev(4); error=0x0
[0005.120] W> tegrabl_nvme_bdev_open: Failed NVME INIT; error=0x80800612
[0005.126] W> Failed to open NVME-4, err = 80800612
[0005.130] W> Failed to initialize device 10-4
[0005.135] E> NVME (4) boot failed, err: 0x80800612
[0005.139] I> ########## NVME (5) boot ##########
[0005.144] I> Initializing nvme device instance 5
[0005.148] I> Initializing nvme controller
[0005.153] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x141a0000
[0005.158] I> vpcie3v3-supply not found
[0005.162] I> vpcie12v-supply not found
[0005.166] W> Failed to get nvidia,plat-gpios
[0005.170] I> tegrabl_pcie_soc_preinit: (5):
[0005.174] I> Unpowergate
[0005.176] I> tegrabl_car_clk_disable(5) ...
[0005.180] I> tegrabl_car_rst_set(CORE, 5) ...
[0005.184] I> tegrabl_car_rst_set(APB, 5) ...
[0005.189] I> tegrabl_car_clk_enable(5) ...
[0005.193] I> tegrabl_car_rst_clear(APB, 5) ...
[0005.197] I> tegrabl_set_ctrl_state(5)
[0005.200] I> CLR PCIE_APB:6
[0005.203] I> tegrabl_pcie_soc_init: (5):
[0005.207] I> APPL initialization ...
[0005.210] I> poweron phys
[0005.213] I> tegrabl_locate_pcie_ctrl_in_dt: found match at 0x141a0000
[0005.219] I> tegrabl_power_on_phy: power on phy @0x3eb0000
[0005.224] I> tegrabl_power_on_phy: power on phy @0x3ec0000
[0005.230] I> tegrabl_power_on_phy: power on phy @0x3ed0000
[0005.235] I> tegrabl_power_on_phy: power on phy @0x3ee0000
[0005.240] I> tegrabl_power_on_phy: power on phy @0x3ef0000
[0005.246] I> tegrabl_power_on_phy: power on phy @0x3f00000
[0005.251] I> tegrabl_power_on_phy: power on phy @0x3f10000
[0005.257] I> tegrabl_power_on_phy: power on phy @0x3f20000
[0006.362] C> Failed to link up controller-5
[0006.362] W> Failed tegrabl_pcie_soc_init(), error=0x12
[0006.363] I> Failed to initialize SoC Host PCIe controller
[0006.363] E> tegrabl_nvme_init: Failed tegrabl_pcie_init(5); error=0x12
[0006.364] W> tegrabl_nvme_bdev_open: Failed NVME INIT; error=0x80800612
[0006.364] W> Failed to open NVME-5, err = 80800612
[0006.369] W> Failed to initialize device 10-5
[0006.373] E> NVME (5) boot failed, err: 0x80800612
[0006.378] I> ########## Fixed storage boot ##########
[0006.383] I> Loading kernel-bootctrl from partition
[0006.387] E> Cannot find partition kernel-bootctrl
[0006.392] E> Cannot open partition kernel-bootctrl
[0006.397] W> tegrabl_get_kernel_bootctrl: failed to read primary bootctrl data
[0006.404] I> Loading kernel-bootctrl_b from partition
[0006.408] E> Cannot find partition kernel-bootctrl_b
[0006.413] E> Cannot open partition kernel-bootctrl_b
[0006.418] W> tegrabl_get_kernel_bootctrl: failed to read recovery bootctrl data
[0006.425] W> tegrabl_get_kernel_bootctrl: use default dummy boot control data
[0006.432] E> Blockdev open: exit error
[0006.436] E> tegrabl_display_clear: display is not initialized
[0006.442] W> Boot logo display failed...
[0006.445] I> Kernel EP: 0x0, DTB: 0x90000000
[0006.450] 
[0006.451] -----------------------------------------------
[0006.456] Synchronous Exception: UNKNOWN EXCEPTION
[0006.461] -----------------------------------------------
[0006.466] 
[0006.467] ESR 0x2000000: ec 0x0, il 0x1, iss 0x0
[0006.471] -----------------------------------------------
[0006.477]  [Stack Trace]
[0006.479] 
[0006.480] => pc:0x00000000, sp:0xA06A7F30
[0006.484] => pc:0xA060F880, sp:0xA06A8160
[0006.488] => pc:0xA060F89C, sp:0xA06A81B0
[0006.492] => pc:0xA060F644, sp:0xA06A81E0
[0006.495] => pc:0xA060EB68, sp:0xA06A81F0
[0006.499] => pc:0xA060EB30, sp:0xA06A8200
[0006.503] -----------------------------------------------
[0006.508] iframe 0xa06a7e40:
[0006.511] x0  0x        90000000 x1  0x               0 x2  0x               0 x3  0x               0
[0006.520] x4  0x               0 x5  0x              20 x6  0x         b200123 x7  0x        ffffffc0
[0006.529] x8  0x               1 x9  0xffffffffffffffff x10 0x               6 x11 0x               2
[0006.538] x12 0x               1 x13 0x              40 x14 0x               1 x15 0x             2c0
[0006.548] x16 0x        a061c4a8 x17 0x               0 x18 0x               0 x19 0x               0
[0006.557] x20 0x               0 x21 0x               0 x22 0x               0 x23 0x               0
[0006.566] x24 0x               0 x25 0x               0 x26 0x               0 x27 0x               0
[0006.575] x28 0x               0 x29 0x        a06a8160 lr  0x        a060f7f4 sp  0x        a06a7f30
[0006.584] elr 0x               0
[0006.587] spsr 0x        400003c9
[0006.590] -----------------------------------------------
[0006.595] panic (caller 0xa0601238): die
[0006.599] HALT: spinning forever...

I guess the sd-card is sometimes in a bad state so I'm wondering if it would it be possible to configure cboot to simply reboot in this case instead of halting? Thank you

@madisongh
Copy link
Member

That it's triggering an exception like that is probably a bug - maybe not properly handling the case where no boot device is found? Considering how complicated the t19x boot sequence has become as they've tacked on support for NVME, extlinux, etc., CBO, etc., it wouldn't surprise me that some infrequently-exercised code path is broken.

Still, bugs happen, and cboot should have a watchdog timer enabled by default, unless you've configured otherwise. I don't remember the default timeout off the top of my head, but it's on the order of minutes.

@acostach
Copy link
Author

Thanks Matt, I left it for about 5 minutes and nothing happened but I'll reproduce this again and leave it for longer. I was reading in the docs that the PMIC WDT is disabled by default from odmdata on the Xavier NX and that's why I was asking. I'll get back with an update.

@acostach
Copy link
Author

@madisongh Unfortunately the device did not restart after 40 minutes in this state, perhaps that watchdog is not enabled by default?

@madisongh
Copy link
Member

Sure enough, the default ODMDATA (0xB8190000) disables the PMIC WDT by default. I thought they at least enabled the internal WDT in the processor, but perhaps not. Try flashing with ODMDATA set to 0xB81A0000, that should enable the PMIC WDT, if I'm reading the cboot sources correctly.

@acostach
Copy link
Author

acostach commented Feb 1, 2022

Hi @madisongh , unfortunately the PMIC WDT doesn't appear to work with systemd, the device reboots as soon as it finishes booting. We're using systemd to kick the watchdog with RuntimeWatchdogSec set to 10s. This doesn't happen with PMIC WDT disabled in the previous ODMDATA though.

@madisongh
Copy link
Member

Ah, right, you may need to modify your device tree and/or kernel config and/or systemd configuration to switch to using the PMIC WDT from Linux, too. I believe the driver is enabled by default in the default kernel config (CONFIG_MAX77620_WATCHDOG), but it may not be enabled by default in the device tree. Even if it is, if there are multiple watchdog devices, systemd uses /dev/watchdog0 by default, which could be the wrong one.

@acostach
Copy link
Author

acostach commented Feb 2, 2022

@madisongh yes, the config is enabled and from what I see the plugin manager also enables it in the device tree based on the selected odmdata.

There are no other watchdog devices created (only watchdog and watchdog0) and if I remove the watchdog.conf file systemd no longer opens the device and the board doesn't reboot. If however i do a cat on /dev/watchdog the board is rebooted immediately.

@madisongh
Copy link
Member

Now that you mention it, I vaguely remember running into a similar problem on the TX2 a while back. IIRC there was a bug somewhere that kept the SoC internal WDT around even when I chose the PMIC WDT. Looking at my kernel config, I think the solution was to just disable the support for the other WDTs there - here's my pmic-watchdog-only.cfg fragment:

# CONFIG_TEGRA21X_WATCHDOG is not set
# CONFIG_TEGRA18X_WATCHDOG is not set
# CONFIG_SOFT_PLATFORM_WATCHDOG is not set
CONFIG_MAX77620_WATCHDOG=y

I don't remember now whether the bug was in cboot or the kernel, but there could be something similar going on here.

@acostach
Copy link
Author

acostach commented Feb 2, 2022

Many thanks @madisongh! indeed, disabling the other tegra watchdog modules made the max77620-watchdog work with systemd.

@acostach acostach closed this as completed Feb 2, 2022
@acostach
Copy link
Author

acostach commented Feb 4, 2022

Hi @madisongh, unfortunately enabling the PMIC WDT did not solve the original cboot problem, I managed to reproduce the issue once more and left the board running for 30 minutes, the pmic did not reset it though.

The easiest way to reproduce the watchdog not resetting the device is to flash the NX SD with that odmdata and then remove the sd-card or any other medium from which it could boot before powering it on.

I guess that maybe cboot doesn't actually start this watchdog?

I see it's overriding some nodes in the dtb but a not sure if they have anything to do with this:

[0001.653] I> Plugin-manager override starting
[0001.658] I> node /plugin-manager/fragment-pcie-c5-rp matches
[0001.666] I> node /plugin-manager/fragement-pmic-wdt-en matches
[0001.670] I> node /plugin-manager/fragement-tegra-wdt-dis matches
[0001.676] I> node /plugin-manager/fragement-tegra-sdhci-emmc-dis matches
[0001.684] I> Disable plugin-manager status in FDT
[0001.686] I> Plugin-manager override finished successfully

@acostach acostach reopened this Feb 4, 2022
@madisongh
Copy link
Member

Hmm. I took a closer look at the cboot code, and sure enough, it disables the WDT as part of the kernel handoff. I don't know if that's a recent change, or whether I was misremembering how it worked, but that surprised me. :(

Also, it turns out that cboot always uses the internal WDT, even if you've chosen the PMIC WDT via ODMDATA.

Some additional patches to cboot will be needed to addresses these shortcomings.

@acostach
Copy link
Author

acostach commented Feb 5, 2022

Thanks again for looking into this @madisongh ! I see that tegrabl_reset() makes the boot sequence restart from MB1, at least on the NX devkit, and am wondering if resetting instead of halting is a good idea. I'll try to reproduce the problem to see if cboot is able to load the kernel and dtb from the raw partitions after reset with this:

+++ b/bootloader/partner/t18x/cboot/platform/tegra_shared/debug.c
@@ -15,6 +15,7 @@
 #include <platform_c.h>
 #include <printf.h>
 #include <tegrabl_timer.h>
+#include <tegrabl_exit.h>
 #include <tegrabl_debug.h>

 #if defined(CONFIG_DEBUG_TIMESTAMP)
@@ -91,7 +92,9 @@ int platform_dgetc(char *c, bool wait)

 void platform_halt(void)
 {
-       dprintf(ALWAYS, "HALT: spinning forever...\n");
+       dprintf(ALWAYS, "Will reset in 10 seconds...\n");
+       tegrabl_mdelay(10 * 1000);
+       tegrabl_reset();

@acostach
Copy link
Author

acostach commented Feb 5, 2022

Yes, looks like doing the tegrabl_reset() after the panic caused by sd-card partitions read failure makes the nx boot normally:

[0006.571] x24 0x               0 x25 0x               0 x26 0x               0 x27 0x               0
[0006.580] x28 0x               0 x29 0x        a06a8160 lr  0x        a060f7f4 sp  0x        a06a7f30
[0006.589] elr 0x               0
[0006.592] spsr 0x        400003c9
[0006.595] -----------------------------------------------
[0006.600] panic (caller 0xa0601238): die
[0006.604] Will reset in 10 seconds...
[0016.607] E> tegrabl_display_shutdown: display is not initialized
����Shutdown state requested 1
Rebooting system ...
��
[0000.024] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.033] I> MB1 (prd-version: 1.5.1.7-t194-41334769-98030a79)
[0000.038] I> Boot-mode: Coldboot
[0000.041] I> Chip revision : A02P
[0000.044] I> Bootrom patch version : 15 (correctly patched)
[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.058] I> rst_source : 0xb
[0000.061] I> rst_level : 0x1
[0000.065] I> Boot-device: QSPI
[0000.067] I> Qspi flash params source = brbct

@madisongh
Copy link
Member

@acostach Great, that looks like a much simpler way to solve the problem you're seeing. Please leave this issue open, though, as I will try and solve the problems with cboot's WDT handling, when I get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants