Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset ASPEED 2500 on Gigabyte possible? #51

Closed
bielids opened this issue Mar 27, 2024 · 53 comments
Closed

Reset ASPEED 2500 on Gigabyte possible? #51

bielids opened this issue Mar 27, 2024 · 53 comments

Comments

@bielids
Copy link

bielids commented Mar 27, 2024

Hi, this may be a long shot, but I've got a Gigabyte MZ32-AR0 rev3 with a BMC that's not starting up (steady BMC_LED light, no firmware version displayed during POST, no devices detected by ipmi_si). I came across this utility and I was wondering if I could leverage this to possibly factory reset it and get it back to a working state. It looks like the BMC stopped working after I put the board in BIOS recovery mode and I've spent countless hours trying to get it back up. Connecting to the server's serial port gave me no more useful info and connecting to BMC_UART only got me this:

DRAM Init-V12-DDR4
0abc1-4Gb-Done
Read margin-DL:0.3862/DH:0.3941 CK (min:0.30)


U-Boot 2013.07 (Nov 01 2023 - 17:52:22)

I2C:   ready
DRAM:  424 MiB
eSPI Handshake complete
OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: ERROR: Unable to Detect SPI Flash
*** failed ***
### ERROR ### Please RESET the board ###

I've tried flashing a new FW (a supported one of course, from the mobo's support page) to no avail (it flashes "correctly" but I still can't detect the BMC, and the mgmt port doesn't appear to be in a proper state based on the flashing). I've tried connecting to the console using this tool but that didn't get me anywhere either.

(meson) root@pve:~/culvert/build/src# ./culvert -v console uart3 uart2 115200 admin *****************
[*] Found 5 registered bridge drivers
[*] Trying bridge driver l2a
[*] Failed to initialise L2A bridge: -95
[*] Trying bridge driver ilpc
[*] Probing ilpc
[*] Probing 0x2e for SuperIO
[*] Found SuperIO device at 0x2e
[*] Probing for SoC revision registers
[*] Found revision 0x4030303
[*] Trying bridge driver devmem
[*] failed to initialise devmem bridge: -1
[*] Trying bridge driver debug-uart
[*] Unrecognised argument list for debug interface (0)
[*] Trying bridge driver p2a
[*] Probing p2a
[*] Probing for SoC revision registers
[*] Found revision 0x4030303
[*] Probing for SoC revision registers
[*] Found revision 0x4030303
[*] Selected devicetree for SoC 'aspeed,ast2500'
[*] Found 15 registered drivers
[*] Bound trace driver to /ahb/bus-controller@1e600000
[*] Bound sfc driver to /ahb/apb/spi@1e620000
[*] Bound sfc driver to /ahb/apb/spi@1e630000
[*] Bound sfc driver to /ahb/apb/spi@1e631000
[*] Bound sdmc driver to /ahb/apb/memory-controller@1e6e0000
[*] Bound clk driver to /ahb/apb/syscon@1e6e2000/clock
[*] Bound strap driver to /ahb/apb/syscon@1e6e2000/strapping
[*] Bound sioctl driver to /ahb/apb/syscon@1e6e2000/superio
[*] Bound bridge-controller driver to /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Bound debugctl driver to /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Bound pciectl driver to /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound scu driver to /ahb/apb/syscon@1e6e2000
[*] Bound wdt driver to /ahb/apb/watchdog@1e785000
[*] Bound wdt driver to /ahb/apb/watchdog@1e785020
[*] Bound wdt driver to /ahb/apb/watchdog@1e785040
[*] Bound vuart driver to /ahb/apb/serial@1e787000
[*] Bound ilpcctl driver to /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound uart-mux driver to /ahb/apb/lpc@1e789000
[*] Initialised scu driver
[*] Initialised clk driver
[*] Initialised uart-mux driver
[*] Enabling UART clocks
[*] Routing UART3 to UART5
[*] Initialising SUART3
[*] SUART base address: 0x3e8
[*] SUART SIRQ: 6
[*] Configuring baud rate of 115200 for BMC console
[*] Starting getty from BMC console
[*] Launched getty with: /sbin/agetty -8 -L ttyS1 1200 xterm &
[*] Routing UARTs to connect UART3 with UART2
[*] Setting target baud rate of 115200

At this point I'm running out of options and I've only got until Friday to return the board (which will take a few weeks) so I'm hoping for a hail mary at this point. I went through the issues and the little documentation I could find about this tool and didn't get anywhere. Do you have anything to suggest?

This is what I see when probing:

debug:	Permissive
	Debug UART port: UART5
xdma:	Restricted
	BMC: Disabled
	VGA: Enabled
	XDMA on VGA: Enabled
	XDMA is constrained: Yes
p2a:	Permissive
	BMC: Disabled
	VGA: Enabled
	MMIO on VGA: Enabled
	[0x00000000 - 0x0fffffff]   Firmware: Writable
	[0x10000000 - 0x1fffffff]     SoC IO: Writable
	[0x20000000 - 0x2fffffff]  BMC Flash: Writable
	[0x30000000 - 0x3fffffff] Host Flash: Writable
	[0x40000000 - 0x5fffffff]   Reserved: Writable
	[0x60000000 - 0x7fffffff]   LPC Host: Writable
	[0x80000000 - 0xffffffff]       DRAM: Writable
ilpc:	Permissive
	SuperIO address: 0x2e

Reading the RAM seems to get me info similar to what I'm getting from u-boot

(meson) root@pve:~/culvert/build/src# ./culvert read ram 
[*] failed to initialise devmem bridge: -1
[*] 512MiB DRAM with 32MiB VRAM; dumping 480MiB (0x80000000-0x9dffffff)
t<�T=�t<�������T=�t<�
в424T=�_���������>�й
��������T=��=�l���=�V=�(%P%�|>�424�=�(%P%�u/���>� MiBoot 2013.07 (Nov 01 2023 - 17:52:22)

�, ?���� ?����u/����]���w�M�u/�K�u/��I�,�0�K��k������n���$S�k��~k��n��k�0x1c200�����@�@.........

Thanks!

@amboar
Copy link
Owner

amboar commented Mar 27, 2024

Hello. Firstly, I'm sorry for your situation, it sounds like a bit of a pickle.

Working backwards, I have a few points:

  • I don't think there's going to be much interesting in the way of BMC RAM content given you're stuck in u-boot, but it's good that it appears to work
  • The output from probe at least suggests you can take complete control of the SoC, so we can do what we want with it if we decide there's something to do
  • As for the culvert console ... subcommand not doing anything useful, you've already got access to the BMC UART, so there's no need for you to use it. It's implementation (well, really all of culvert, but I digress) was a bit of a stunt to surprise people with things they expect shouldn't be possible. It relies on the BMC actually having reached Linux userspace in order to function, and that's not the case for you. You should ignore it.

Really, the u-boot output you posted at the start is the meaty bit:

eSPI Handshake complete
OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: ERROR: Unable to Detect SPI Flash
*** failed ***
### ERROR ### Please RESET the board ###

This is interesting on a couple of fronts.

  • Looks like there's some vendor code that runs quite early
  • It's interesting that eSPI and LPC setup are mentioned adjacent to the error
  • You suggested you put the board in BIOS recovery mode

So a few thoughts:

  1. It's not clear to me whether you've taken the board out of BIOS recovery mode (presumably by resetting DIP switch 4 mentioned in the board manual). What state is it currently in?

  2. Have you AC-cycled the machine since it entered this state (I'm not suggesting you should - probably best not to if you haven't).

  3. What's the output of culvert p2a vga read 0x1e6e2070?

The interest here is that LPC and eSPI are both buses that connect the BMC to the host, and that the host BIOS recovery state may in some way be interfering with the BMC start up (... which is pretty unfortunate if true).

There are some experiments that might be worth trying, but I'd want more information on the state of the machine prior to making some guesses.

@zevweiss
Copy link
Collaborator

+1 to all the above from @amboar, though a couple additional things:

I've tried flashing a new FW (a supported one of course, from the mobo's support page) to no avail (it flashes "correctly" but I still can't detect the BMC, ...)

What mechanism did you use for this firmware flashing? If it was something other than culvert I might suggest retrying it via culvert, because its firmware write path also reads the data back and checks it against the input, so if it was via another flashing mechanism that doesn't do similar checking and the flash write actually got borked somehow it could potentially have gone undetected...

And on a related note, what version of culvert are you running? Until fairly recently there were some problems with it that could lead to flash corruption in some cases (see #45 for details), so I'd recommend using the latest commit if you're not already.

@bielids
Copy link
Author

bielids commented Mar 27, 2024

Hey thanks for the quick follow up! I'm not home yet so I can't test for a few more hours, but here's what I can answer for now;

  • I've power cycled the host a few times (turned off the PDU and left it off overnight at times)
  • all DIP switches and jumpers are back to defaults by now so the host should no longer be in recovery mode AFAIK
  • BMC stopped working before flashing anything
  • BIOS was flashed via the EFI shell using Gigabyte's provided script
  • BMC was flashed via the EFI shell first using the AMI tool linked I'm Gigabyte's documentation, then Linux with gigaflash
  • I built culvert from source and pulled the repo yesterday

I'll run the commands you suggested later tonight and report back. Ty!

@amboar
Copy link
Owner

amboar commented Mar 27, 2024

I've power cycled the host a few times (turned off the PDU and left it off overnight at times)

It's not clear to me exactly what your power setup is, but if turning off your PDU removes AC supply to the PSU for the board then that gives me enough. A common design for boards with BMCs is that the BMCs are powered even if the host is not, so just double-checking here.

all DIP switches and jumpers are back to defaults by now so the host should no longer be in recovery mode AFAIK

Okay, interesting. Honestly it sounds like a bit of a QA fail, and returning the board seems like a reasonable course of action. The firmware should behave better than that.

@bielids
Copy link
Author

bielids commented Mar 28, 2024

Alright I just connected to the server and here's what I got when I read the address you suggested:

root@pve:~/culvert/build/src# ./culvert p2a vga read 0x1e6e2070
0x1e6e2070: 0xf100d28a

And yeah, turning off the PDU does cut power to the PSU entirely and all the mobo's LEDs turn off.

Honestly it sounds like a bit of a QA fail, and returning the board seems like a reasonable course of action.

I completely agree, but I see this as a good learning opportunity for me and I'm a bit impatient to try out my new server so if possible I'd rather avoid returning the mobo 😄. If there's anything I can do to help make sense of the current situation let me know. I manage Linux servers on a daily basis so while embedded systems are not my forte I hope to be able to make sense of this given a bit of guidance.

@amboar
Copy link
Owner

amboar commented Mar 28, 2024

Okay, so 0x1e6e2070: 0xf100d28a decodes as:

         Enable SPI Flash Strap Auto Fetch Mode: 0x1 [Enable]
                         Enable GPIO Strap Mode: 0x1 [Enable]
                         Select UART Debug Port: 0x1 [Select UART5 as BMC console port]
                                       Reserved: 0x1
    Enable fast reset mode for ARM ICE debugger: 0x0 [Long reset mode, normal operation (default)]
                         Enable eSPI flash mode: 0x0 [eSPI respond with no flash attached]
                               Enable eSPI mode: 0x0 [LPC mode]
                              Select DDR4 SDRAM: 0x1 [DDR4 SDRAM]
        Select 25MHz reference clock input mode: 0x0 [CLKIN is 24MHz and USBCKI not used]
                 Enable GPIOE pass-through mode: 0x0 [Disable, pass through can be enabled by SCU8C[15:12]]
                 Enable GPIOD pass-through mode: 0x0 [Disable, pass through can be enabled by SCU8C[11:8]]
Disable LPC to decode SuperIO 0x2E/0x4E address: 0x0 [Enable address decoding (default)]
                           Enable ACPI function: 0x0 [Disable ACPI]
                  Select USBCKI input frequency: 0x0 [24MHz (default)]
             Enable BMC 2nd boot watchdog timer: 0x0 [Disable]
        SuperIO configuration address selection: 0x0 [Decode 0x2E]
                       VGA Class Code selection: 0x1 [Select the class code for VGA device (default)]
               Select dedicated LPC reset input: 0x1 [LPC reset is located at pin number G22, shared with GPIOAC7]
                             SPI mode selection: 0x1 [Enable SPI master]
        AXI/AHB clock frequency ratio selection: 0x1 [2:1 (default)]
                                       Reserved: 0x0
                         Define MAC#2 interface: 0x1 [RGMII]
                         Define MAC#1 interface: 0x0 [RMII/NCSI]
                  Enable dedicated VGA BIOS ROM: 0x0 [No VGA BIOS ROM, VGA BIOS is merged in the system BIOS (default)]
                                       Reserved: 0x0
                      VGA memory size selection: 0x2 [32MB]
                                       Reserved: 0x1
                               Disable CPU boot: 0x0 [Enable boot]

It's a bit curious. I wonder why the eSPI handhake message is reported given the SoC is strapped for LPC. It's also interesting that u-boot runs but claims not to be able to detect the flash, given u-boot needs to be loaded from the flash. A dump of the FMC registers might be interesting.

Also it might be worth dumping the flash image (culvert read firmware > bmc.img) and booting it in qemu to see if the problem reproduces? I expect it won't, but maybe we could gain some insight from that. Possibly we could provoke it into failing in the same manner?

@amboar
Copy link
Owner

amboar commented Mar 28, 2024

Alright, I've pushed a real hack of a patch that will dump the FMC controller state for you @bielids: e1a919d

Run it like:

./culvert read controller

Currently it's hard-coded to read the FMC registers. Something to improve as I make the patch less of a hack.

Example output (in this case I'm using the debug UART bridge, the trailing /dev/ttyUSB0 isn't immediately relevant to you as you're using the PCIe VGA P2A bridge):

$ ./build/src/culvert read controller /dev/ttyUSB0
[*] Opening /dev/ttyUSB0
[*] Entering debug mode
0x1e620000: 0x0007002a
0x1e620004: 0x00002a11
0x1e620008: 0x00000600
0x1e62000c: 0x00000000
0x1e620010: 0x406b0641
0x1e620014: 0x00000400
0x1e620018: 0x00000400
...

@bielids
Copy link
Author

bielids commented Mar 28, 2024

Thanks a lot for the patch! I'm having some issues building but I'm sure that it's due to something in my conda environment so let me look into that after my meetings and update you once I get it running. I'll do the QEMU test after that

@bielids
Copy link
Author

bielids commented Mar 29, 2024

Alright I spent quite a while getting it to compile on PVE to no avail (devmem.h not found), which is weird since I first compiled it directly from the host. Anyways, I finally managed to get it to run by building it with -Dprefer_static=true from my laptop and pushing the binary over. I also had to change line 201 in read to keep going even if I only give it 1 arg. I tried figuring out what controller it expects (thought it would be fmc but that did not work) but it's getting late so I just bruteforced it.

Here's what I got in the end:

root@pve:~# ./culvert.bin read controller
[*] failed to initialise devmem bridge: -1
0x1e620000: 0x8007002a
0x1e620004: 0x00000701
0x1e620008: 0x00000600
0x1e62000c: 0x00000000
0x1e620010: 0x00002400
0x1e620014: 0x000b0041
0x1e620018: 0x00000000
0x1e62001c: 0xffffffff
0x1e620020: 0xffffffff
0x1e620024: 0xffffffff
0x1e620028: 0xffffffff
0x1e62002c: 0xffffffff
0x1e620030: 0x50400000
0x1e620034: 0x54500000
0x1e620038: 0x58540000
0x1e62003c: 0xffffffff
0x1e620040: 0xffffffff
0x1e620044: 0xffffffff
0x1e620048: 0xffffffff
0x1e62004c: 0xffffffff
0x1e620050: 0xffffffff
0x1e620054: 0x000000ab
0x1e620058: 0xffffffff
0x1e62005c: 0xffffffff
0x1e620060: 0xffffffff
0x1e620064: 0xffffffff
0x1e620068: 0xffffffff
0x1e62006c: 0xffffffff
0x1e620070: 0xffffffff
0x1e620074: 0xffffffff
0x1e620078: 0xffffffff
0x1e62007c: 0xffffffff
0x1e620080: 0x00000000
0x1e620084: 0x20010204
0x1e620088: 0x54534128
0x1e62008c: 0x00000000
0x1e620090: 0x00000000
0x1e620094: 0x00000000
0x1e620098: 0xffffffff
0x1e62009c: 0x00000301
0x1e6200a0: 0x00000000
0x1e6200a4: 0x00000000
0x1e6200a8: 0x00000000
0x1e6200ac: 0x00000000
0x1e6200b0: 0xafe0e77b
0x1e6200b4: 0xc5acaf9f
0x1e6200b8: 0xbb3eff93
0x1e6200bc: 0xb26f73dc
0x1e6200c0: 0xf6dfc79e

How did you decode the value I gave you to end up with the output you sent here by the way?

@bielids
Copy link
Author

bielids commented Mar 29, 2024

I was able to boot the FW I dumped with culvert by the way. I've never tried dumping firmwares and loading them in QEMU before so let me know if there's anything else I should try to get more info. Here's what I got:

root@pve:~# ls -lsa ./2C600{40,80}
9138 -rw-r--r-- 1 root root 20578240 Mar 28 22:58 ./2C60040
9138 -rw-r--r-- 1 root root 20578176 Mar 28 22:58 ./2C60080
root@pve:~# binwalk ./2C600{40,80}

Scan Time:     2024-03-28 23:14:59
Target File:   /root/new_dump/_fw.extracted/2C60040
MD5 Checksum:  0b629d8180b982d80977a8278099b05e
Signatures:    411

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             uImage header, header size: 64 bytes, header CRC: 0x178CFDDF, created: 2023-11-01 09:51:21, image size: 2792584 bytes, Data Address: 0x80008000, Entry Point: 0x80008000, data CRC: 0x59454E13, OS: Linux, CPU: ARM, image type: OS Kernel Image, compression type: none, image name: "Linux-3.14.17-ami"
64            0x40            Linux kernel ARM boot executable zImage (little-endian)
16919         0x4217          gzip compressed data, maximum compression, from Unix, last modified: 1970-01-01 00:00:00 (null date)
2883520       0x2BFFC0        JFFS2 filesystem, little endian
3407808       0x33FFC0        CramFS filesystem, little endian, size: 5963776, version 2, sorted_dirs, CRC 0x43FDF23D, edition 0, 1566 blocks, 131 files


Scan Time:     2024-03-28 23:15:01
Target File:   /root/new_dump/_fw.extracted/2C60080
MD5 Checksum:  edd0523939eb9d1ea14c082374af568f
Signatures:    411

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             Linux kernel ARM boot executable zImage (little-endian)
16855         0x41D7          gzip compressed data, maximum compression, from Unix, last modified: 1970-01-01 00:00:00 (null date)
2883456       0x2BFF80        JFFS2 filesystem, little endian
3407744       0x33FF80        CramFS filesystem, little endian, size: 5963776, version 2, sorted_dirs, CRC 0x43FDF23D, edition 0, 1566 blocks, 131 files

root@pve:~# qemu-system-arm -m 512 -M g220a-bmc -nographic -drive file=/root/fw,format=raw,if=mtd -net nic 
qemu-system-arm: warning: hub 0 is not connected to host network
qemu-system-arm: warning: nic ftgmac100.1 has no peer


U-Boot 2013.07 (Nov 01 2023 - 17:52:22)

I2C:   ready
DRAM:  424 MiB
eSPI Handshake complete
OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: Found SPI Chip Micron/Numonyx N25Q512A(0x20ba) 2x I/O READ, NORMAL WRITE
Found SPI Chip Micron/Numonyx N25Q512A(0x20ba) 2x I/O READ, NORMAL WRITE
128 MiB
MMC:   
*** Warning - bad CRC, using default environment

Un-Protected 1 sectors
Erasing Flash...
Erasing sector  4 ... ok.
Erased 1 sectors
Writing to Flash... done
Protected 1 sectors
Net:   RTL8211E, EEECR = 0x00
RTL8211E, EEEAR = 0x00
RTL8211E, EEELPAR = 0x00
RTL8211E, LACR = 0x00
RTL8211E, LCR = 0x00
ast_eth0, ast_eth1
DRAM ECC enabled
Hit any key to stop autoboot:  0 
Image to be booted is 1
EMMC and EXT4 is not enabled - Cannot locate kernel file in Root
Initing KCS...done
Uboot waiting for firmware update to start...
Uboot waiting for fwupdate to start timed out
Disabling Watchdog 2 Timer
AST2500EVB>

When I try using one of the two files that appears to be a kernel I just get

root@pve:~# qemu-system-arm -m 256 -M g220a-bmc -nographic -drive file=/root/fw,format=raw,if=mtd -net nic -kernel ./2C60040
qemu-system-arm: warning: hub 0 is not connected to host network
qemu-system-arm: warning: nic ftgmac100.1 has no peer
Uncompressing Linux... done, booting the kernel.

@amboar
Copy link
Owner

amboar commented Mar 29, 2024

How did you decode the value I gave you to end up with the output you sent here by the way?

The tool that I use to decode registers is bitfield, but that requires that you have the necessary configs set up to decode the registers. I write my own bitfield configs for the AST2500, but they're derived from the SoC datasheet, which cannot be freely distributed.

I'll decode the FMC register values at some point, though it may not be until next week.

Something that I should have mentioned was that the culvert read controller ... will need to be taken after u-boot outputs ### ERROR ### Please RESET the board ### from an AC power-cycle, but before you take any other action that might touch the flash controller (such as running culvert read firmware ...). Is this the case with the register dump you provided above?

I was able to boot the FW I dumped with culvert by the way.

This in itself is interesting. It suggests culvert can drive the FMC to read the BMC SPI-NOR just fine in order to dump the image, but the firmware image's u-boot can't. That's strange, and seems to rule out any obvious faults with the SPI bus or flash chip.

Can you paste the log (stderr) output of culvert -vv read firmware? What flash chip does it identify?

was able to boot the FW I dumped with culvert by the way. I've never tried dumping firmwares and loading them in QEMU before so let me know if there's anything else I should try to get more info. Here's what I got:

Nice.

qemu-system-arm -m 512 -M g220a-bmc

Some of the trick here is to choose a machine with the same SPI-NOR part. Given qemu boots it at all I guess you picked one with at least the same size flash part?

When I try using one of the two files that appears to be a kernel I just get

Try tacking the following onto the qemu commandline: -append "console=ttyS4,115200n8 earlyprintk debug"

@amboar
Copy link
Owner

amboar commented Mar 29, 2024

I also had to change line 201 in read to keep going even if I only give it 1 arg. I tried figuring out what controller it expects (thought it would be fmc but that did not work) but it's getting late so I just bruteforced it.

Oh, yeah, sorry, the condition on 201 was an oversight after a few iterations of the patch, you were right to change it. It was also late here when I wrote the patch :)

@amboar
Copy link
Owner

amboar commented Mar 29, 2024

Another thought - what do you see on the BMC console if you run culvert reset soc wdt1 after it hits ### ERROR ### Please RESET the board ###? Does the behaviour change if you run culvert sfc fmc read 0 1 before culvert reset soc wdt1?

@bielids
Copy link
Author

bielids commented Mar 29, 2024

but before you take any other action that might touch the flash controller

Ah OK, I'll need to power-cycle the box again then, I had to do a bit of troubleshooting due to line 201 so it's unlikely that it was in a "clean" state.

The flash chip detected by culvert is a Macronix MXxxL51235F:

[*] LIBFLASH: Flash ID: c2.20.1a (c2201a)
[*] LIBFLASH: Found chip Macronix MXxxL51235F size 64M erase granule: 4K
[*] LIBFLASH: Flash >16MB, enabling 4B mode...
Full output
root@pve:~# ./culvert.bin -vv read firmware > /dev/null 
[*] Found 5 registered bridge drivers
[*] Trying bridge driver l2a
[*] Failed to initialise L2A bridge: -95
[*] Trying bridge driver ilpc
[*] Probing ilpc
[*] Probing 0x2e for SuperIO
[*] Unlocking SuperIO: 0
[*] Selecting SuperIO device 2 (SUART1): 0
[*] Found device 2 selected: 0
[*] Selecting SuperIO device 12 (SUART4): 0
[*] Found device 12 selected: 0
[*] Locking SuperIO
[*] Found SuperIO device at 0x2e
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf78ffed8
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Trying bridge driver devmem
[*] failed to initialise devmem bridge: -1
[*] Trying bridge driver debug-uart
[*] Unrecognised argument list for debug interface (0)
[*] Trying bridge driver p2a
[*] Probing p2a
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf78ffed8
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf78ffed8
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Selected devicetree for SoC 'aspeed,ast2500'
[*] Found 15 registered drivers
[*] Processing devicetree node at /aliases
[*] Processing devicetree node at /memory@80000000
[*] Processing devicetree node at /ahb
[*] Processing devicetree node at /ahb/sram@1e720000
[*] Processing devicetree node at /ahb/bus-controller@1e600000
[*] Bound trace driver to /ahb/bus-controller@1e600000
[*] Processing devicetree node at /ahb/apb
[*] Processing devicetree node at /ahb/apb/spi@1e620000
[*] Bound sfc driver to /ahb/apb/spi@1e620000
[*] Processing devicetree node at /ahb/apb/spi@1e630000
[*] Bound sfc driver to /ahb/apb/spi@1e630000
[*] Processing devicetree node at /ahb/apb/spi@1e631000
[*] Bound sfc driver to /ahb/apb/spi@1e631000
[*] Processing devicetree node at /ahb/apb/memory-controller@1e6e0000
[*] Bound sdmc driver to /ahb/apb/memory-controller@1e6e0000
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/clock
[*] Bound clk driver to /ahb/apb/syscon@1e6e2000/clock
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/strapping
[*] Bound strap driver to /ahb/apb/syscon@1e6e2000/strapping
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/superio
[*] Bound sioctl driver to /ahb/apb/syscon@1e6e2000/superio
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Bound bridge-controller driver to /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Bound debugctl driver to /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound pciectl driver to /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound scu driver to /ahb/apb/syscon@1e6e2000
[*] Processing devicetree node at /ahb/apb/watchdog@1e785000
[*] Bound wdt driver to /ahb/apb/watchdog@1e785000
[*] Processing devicetree node at /ahb/apb/watchdog@1e785020
[*] Bound wdt driver to /ahb/apb/watchdog@1e785020
[*] Processing devicetree node at /ahb/apb/watchdog@1e785040
[*] Bound wdt driver to /ahb/apb/watchdog@1e785040
[*] Processing devicetree node at /ahb/apb/serial@1e787000
[*] Bound vuart driver to /ahb/apb/serial@1e787000
[*] Processing devicetree node at /ahb/apb/lpc@1e789000
[*] Processing devicetree node at /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound ilpcctl driver to /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound uart-mux driver to /ahb/apb/lpc@1e789000
[*] Initialising flash controller
[*] fdt: Looking up device name 'fmc'
[*] fdt: Locating node with device path '/ahb/apb/spi@1e620000'
[*] ahb_readl: 0x1e6e2000: 0x00000001
[*] Initialised scu driver
[*] Initialised clk driver
[*] ahb_readl: 0x1e6e2070: 0xf100d28a
[*] ahb_readl: 0x1e620010: 0x00002400
[*] ahb_readl: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620094: 0x00000000
[*] Initialised sfc driver
[*] Initialising flash chip
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x02020202
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Init status: 02
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0xc21a20c2
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Flash ID: c2.20.1a (c2201a)
[*] LIBFLASH: Found chip Macronix MXxxL51235F size 64M erase granule: 4K
[*] LIBFLASH: Flash >16MB, enabling 4B mode...
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x02020202
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Enabling controller 4B mode...
[*] ahb_readl: 0x1e620004: 0x00000701
[*] ahb_writel: 0x1e620010: 0x00002400
[*] ahb_writel: 0x1e620004: 0x00000701
[*] Write-protecting all chip-selects
[*] ahb_readl: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620000: 0x8007002a
[*] Exfiltrating BMC flash to stdout

................................................................
[*] ahb_readl: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620000: 0x8007002a
[*] Unbound instance of driver uart-mux
[*] Unbound instance of driver ilpcctl
[*] Unbound instance of driver vuart
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver scu
[*] Unbound instance of driver pciectl
[*] Unbound instance of driver debugctl
[*] Unbound instance of driver bridge-controller
[*] Unbound instance of driver sioctl
[*] Unbound instance of driver strap
[*] Unbound instance of driver clk
[*] Unbound instance of driver sdmc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver sfc
[*] ahb_writel: 0x1e620010: 0x00002400
[*] Unbound instance of driver sfc
[*] Unbound instance of driver trace
root@pve:~# ./culvert.bin -vv read firmware > /dev/null 
[*] Found 5 registered bridge drivers
[*] Trying bridge driver l2a
[*] Failed to initialise L2A bridge: -95
[*] Trying bridge driver ilpc
[*] Probing ilpc
[*] Probing 0x2e for SuperIO
[*] Unlocking SuperIO: 0
[*] Selecting SuperIO device 2 (SUART1): 0
[*] Found device 2 selected: 0
[*] Selecting SuperIO device 12 (SUART4): 0
[*] Found device 12 selected: 0
[*] Locking SuperIO
[*] Found SuperIO device at 0x2e
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf78ffed8
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Trying bridge driver devmem
[*] failed to initialise devmem bridge: -1
[*] Trying bridge driver debug-uart
[*] Unrecognised argument list for debug interface (0)
[*] Trying bridge driver p2a
[*] Probing p2a
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf78ffed8
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf78ffed8
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Selected devicetree for SoC 'aspeed,ast2500'
[*] Found 15 registered drivers
[*] Processing devicetree node at /aliases
[*] Processing devicetree node at /memory@80000000
[*] Processing devicetree node at /ahb
[*] Processing devicetree node at /ahb/sram@1e720000
[*] Processing devicetree node at /ahb/bus-controller@1e600000
[*] Bound trace driver to /ahb/bus-controller@1e600000
[*] Processing devicetree node at /ahb/apb
[*] Processing devicetree node at /ahb/apb/spi@1e620000
[*] Bound sfc driver to /ahb/apb/spi@1e620000
[*] Processing devicetree node at /ahb/apb/spi@1e630000
[*] Bound sfc driver to /ahb/apb/spi@1e630000
[*] Processing devicetree node at /ahb/apb/spi@1e631000
[*] Bound sfc driver to /ahb/apb/spi@1e631000
[*] Processing devicetree node at /ahb/apb/memory-controller@1e6e0000
[*] Bound sdmc driver to /ahb/apb/memory-controller@1e6e0000
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/clock
[*] Bound clk driver to /ahb/apb/syscon@1e6e2000/clock
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/strapping
[*] Bound strap driver to /ahb/apb/syscon@1e6e2000/strapping
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/superio
[*] Bound sioctl driver to /ahb/apb/syscon@1e6e2000/superio
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Bound bridge-controller driver to /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Bound debugctl driver to /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound pciectl driver to /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound scu driver to /ahb/apb/syscon@1e6e2000
[*] Processing devicetree node at /ahb/apb/watchdog@1e785000
[*] Bound wdt driver to /ahb/apb/watchdog@1e785000
[*] Processing devicetree node at /ahb/apb/watchdog@1e785020
[*] Bound wdt driver to /ahb/apb/watchdog@1e785020
[*] Processing devicetree node at /ahb/apb/watchdog@1e785040
[*] Bound wdt driver to /ahb/apb/watchdog@1e785040
[*] Processing devicetree node at /ahb/apb/serial@1e787000
[*] Bound vuart driver to /ahb/apb/serial@1e787000
[*] Processing devicetree node at /ahb/apb/lpc@1e789000
[*] Processing devicetree node at /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound ilpcctl driver to /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound uart-mux driver to /ahb/apb/lpc@1e789000
[*] Initialising flash controller
[*] fdt: Looking up device name 'fmc'
[*] fdt: Locating node with device path '/ahb/apb/spi@1e620000'
[*] ahb_readl: 0x1e6e2000: 0x00000001
[*] Initialised scu driver
[*] Initialised clk driver
[*] ahb_readl: 0x1e6e2070: 0xf100d28a
[*] ahb_readl: 0x1e620010: 0x00002400
[*] ahb_readl: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620094: 0x00000000
[*] Initialised sfc driver
[*] Initialising flash chip
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x02020202
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Init status: 02
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0xc21a20c2
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Flash ID: c2.20.1a (c2201a)
[*] LIBFLASH: Found chip Macronix MXxxL51235F size 64M erase granule: 4K
[*] LIBFLASH: Flash >16MB, enabling 4B mode...
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x02020202
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Enabling controller 4B mode...
[*] ahb_readl: 0x1e620004: 0x00000701
[*] ahb_writel: 0x1e620010: 0x00002400
[*] ahb_writel: 0x1e620004: 0x00000701
[*] Write-protecting all chip-selects
[*] ahb_readl: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620000: 0x8007002a
[*] Exfiltrating BMC flash to stdout

Given qemu boots it at all I guess you picked one with at least the same size flash part?

Yeah exactly, I can't say that I know exactly what I'm doing but I noticed that g220a-bmc expects a 64M flash partition which lines up with my dump.

Let me try adding options to qemu to see if it works and running the additional commands you suggested before I head out. It's getting late but this is the furthest I've gotten in like 2 weeks haha

@bielids
Copy link
Author

bielids commented Mar 29, 2024

Here's what I get when I read the FMC controller state right after a reboot:

root@pve:~# ./culvert.bin read controller
[*] failed to initialise devmem bridge: -1
0x1e620000: 0x8003002a
0x1e620004: 0x00000700
0x1e620008: 0x00000600
0x1e62000c: 0x00000000
0x1e620010: 0x000b0041
0x1e620014: 0x000b0041
0x1e620018: 0x00000000
0x1e62001c: 0xffffffff
0x1e620020: 0xffffffff
0x1e620024: 0xffffffff
0x1e620028: 0xffffffff
0x1e62002c: 0xffffffff
0x1e620030: 0x50400000
0x1e620034: 0x54500000
0x1e620038: 0x58540000
0x1e62003c: 0xffffffff
0x1e620040: 0xffffffff
0x1e620044: 0xffffffff
0x1e620048: 0xffffffff
0x1e62004c: 0xffffffff
0x1e620050: 0xffffffff
0x1e620054: 0x000000bb
0x1e620058: 0xffffffff
0x1e62005c: 0xffffffff
0x1e620060: 0xffffffff
0x1e620064: 0xffffffff
0x1e620068: 0xffffffff
0x1e62006c: 0xffffffff
0x1e620070: 0xffffffff
0x1e620074: 0xffffffff
0x1e620078: 0xffffffff
0x1e62007c: 0xffffffff
0x1e620080: 0x00000000
0x1e620084: 0x20010204
0x1e620088: 0x54534128
0x1e62008c: 0x00000000
0x1e620090: 0x00000000
0x1e620094: 0x00000010
0x1e620098: 0xffffffff
0x1e62009c: 0x00000301
0x1e6200a0: 0x00000000
0x1e6200a4: 0x00000000
0x1e6200a8: 0x00000000
0x1e6200ac: 0x00000000
0x1e6200b0: 0xabe0c77b
0x1e6200b4: 0xd5ecaf9f
0x1e6200b8: 0xbb3fff93
0x1e6200bc: 0xba6b71dd
0x1e6200c0: 0xf6dfc79e
and just the diff:
root@pve:~# diff before after      
1,2c1,2
< 0x1e620000: 0x8003002a
< 0x1e620004: 0x00000700
---
> 0x1e620000: 0x8007002a
> 0x1e620004: 0x00000701
5c5
< 0x1e620010: 0x000b0041
---
> 0x1e620010: 0x00002400
22c22
< 0x1e620054: 0x000000bb
---
> 0x1e620054: 0x000000ab
38c38
< 0x1e620094: 0x00000010
---
> 0x1e620094: 0x00000000
45,48c45,48
< 0x1e6200b0: 0xabe0c77b
< 0x1e6200b4: 0xd5ecaf9f
< 0x1e6200b8: 0xbb3fff93
< 0x1e6200bc: 0xba6b71dd
---
> 0x1e6200b0: 0xafe0e77b
> 0x1e6200b4: 0xc5acaf9f
> 0x1e6200b8: 0xbb3eff93
> 0x1e6200bc: 0xb26f73dc

And after doing running culvert reset soc wdt1 I just get the same output as I do immediately after a power-cycle:

Terminal output
Type [C-a] [C-h] to see available commands
Terminal ready


U-Boot 2013.07 (Nov 01 2023 - 17:52:22)

I2C:   ready
DRAM:  424 MiB
eSPI Handshake complete
OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: ERROR: Unable to Detect SPI Flash
*** failed ***
### ERROR ### Please RESET the board ###


U-Boot 2013.07 (Nov 01 2023 - 17:52:22)

I2C:   ready
DRAM:  424 MiB
eSPI Handshake complete
OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: ERROR: Unable to Detect SPI Flash
*** failed ***
### ERROR ### Please RESET the board ###

does the behaviour change if you run culvert sfc fmc read 0 1 before culvert reset soc wdt1

I don't get much when running the sfc read command and no the behaviour does not change:

root@pve:~# ./culvert.bin -v sfc fmc read 0 1 
[*] Found 5 registered bridge drivers
[*] Trying bridge driver l2a
[*] Failed to initialise L2A bridge: -95
[*] Trying bridge driver ilpc
[*] Probing ilpc
[*] Probing 0x2e for SuperIO
[*] Found SuperIO device at 0x2e
[*] Probing for SoC revision registers
[*] Found revision 0x4030303
[*] Trying bridge driver devmem
[*] failed to initialise devmem bridge: -1
[*] Trying bridge driver debug-uart
[*] Unrecognised argument list for debug interface (0)
[*] Trying bridge driver p2a
[*] Probing p2a
[*] Probing for SoC revision registers
[*] Found revision 0x4030303
[*] Probing for SoC revision registers
[*] Found revision 0x4030303
[*] Selected devicetree for SoC 'aspeed,ast2500'
[*] Found 15 registered drivers
[*] Bound trace driver to /ahb/bus-controller@1e600000
[*] Bound sfc driver to /ahb/apb/spi@1e620000
[*] Bound sfc driver to /ahb/apb/spi@1e630000
[*] Bound sfc driver to /ahb/apb/spi@1e631000
[*] Bound sdmc driver to /ahb/apb/memory-controller@1e6e0000
[*] Bound clk driver to /ahb/apb/syscon@1e6e2000/clock
[*] Bound strap driver to /ahb/apb/syscon@1e6e2000/strapping
[*] Bound sioctl driver to /ahb/apb/syscon@1e6e2000/superio
[*] Bound bridge-controller driver to /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Bound debugctl driver to /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Bound pciectl driver to /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound scu driver to /ahb/apb/syscon@1e6e2000
[*] Bound wdt driver to /ahb/apb/watchdog@1e785000
[*] Bound wdt driver to /ahb/apb/watchdog@1e785020
[*] Bound wdt driver to /ahb/apb/watchdog@1e785040
[*] Bound vuart driver to /ahb/apb/serial@1e787000
[*] Bound ilpcctl driver to /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound uart-mux driver to /ahb/apb/lpc@1e789000
[*] fdt: Looking up device name 'fmc'
[*] fdt: Locating node with device path '/ahb/apb/spi@1e620000'
[*] Unlocking SCU
[*] Initialised scu driver
[*] Initialised clk driver
[*] Initialised sfc driver
[*] LIBFLASH: Init status: 02
[*] LIBFLASH: Flash ID: c2.20.1a (c2201a)
[*] LIBFLASH: Found chip Macronix MXxxL51235F size 64M erase granule: 4K
[*] LIBFLASH: Flash >16MB, enabling 4B mode...
[*] LIBFLASH: Enabling controller 4B mode...
![*] Unbound instance of driver uart-mux
[*] Unbound instance of driver ilpcctl
[*] Unbound instance of driver vuart
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver scu
[*] Unbound instance of driver pciectl
[*] Unbound instance of driver debugctl
[*] Unbound instance of driver bridge-controller
[*] Unbound instance of driver sioctl
[*] Unbound instance of driver strap
[*] Re-locking SCU
[*] Unbound instance of driver clk
[*] Unbound instance of driver sdmc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver trace

@bielids
Copy link
Author

bielids commented Mar 29, 2024

By the way, I found this issue which mentions issues with the aspeed 2400 having issues reading from flash at boot due to incorrectly set registers (pflash[1321]: FFS: Flash header not found. Code: 100).

Interestingly they tested it with 3 PNOR chips and only mx66l51235f had the issue.

Is it in any way related?

@bielids bielids changed the title Factory reset ASPEED 2500 on Gigabyte possible? Reset ASPEED 2500 on Gigabyte possible? Mar 29, 2024
@amboar
Copy link
Owner

amboar commented Apr 2, 2024

Is it in any way related?

Haha, yes, in so many ways. Regarding the people commenting on that issue, @legoater maintains the Linux driver for the Aspeed FMC and looks after the Aspeed QEMU models. @shenki is the upstream Linux kernel maintainer for BMC SoCs. In past lives we all worked together on IBM's Power systems firmware. That thread documents some troubles booting the Power8 host processor with an OpenBMC-based BMC firmware and the OpenPOWER host firmware stack. Part of the OpenPOWER host firmware stack is skiboot. culvert's flash subsystem is a vendored copy of skiboot's flash subsystem, with a bunch of integration (and formatting) changes applied. So it's related both socially and technically :)

However, culvert is able to read the BMC flash just fine from what you've indicated so far. So I don't think the 4B command confusion is the issue here, at least for culvert. Is your u-boot affected by it? Potentially, but that doesn't help your cause, and arguably the BMC should never have booted. Perhaps you could try a non MXxxL51235F, 64MB flash chip though? Or set the FMC model for a QEMU BMC machine to mx66l51235f for a negative test?

I've been doing some poking. The AST2400 had a feature where SDRAM could be remapped to 0x0 in the physical address space of the SoC. Unfortunately the datasheet for the AST2500 claims that they dropped that feature. If they hadn't we could potentially do an in-memory boot for you (essentially we'd stop the ARM core, flip the remap bit, load a different u-boot at address 0 (which is now in RAM), perform a reset of the ARM core, and then clear the stop bit). However, without the memory remap capability we have to modify flash to do anything different.

Are you comfortable with reflashing the BMC SPI-NOR? It might be interesting to experiment with a u-boot build from e.g. OpenBMC to see if we can get past the failure to initialise the flash subsystem.

@bielids
Copy link
Author

bielids commented Apr 2, 2024

Oh wow, lots of interesting information to parse through here. Re. the in-mem boot that's a bummer, sounds like a low-risk way to troubleshoot the BMC.

Are you comfortable with reflashing the BMC SPI-NOR? It might be interesting to experiment with a u-boot build from e.g. OpenBMC to see if we can get past the failure to initialise the flash subsystem.

Absolutely, I've got until tomorrow to return this board. This can be done from within the OS right (ie. no need to solder)?

@amboar
Copy link
Owner

amboar commented Apr 2, 2024

This can be done from within the OS right (ie. no need to solder)?

Yeah, no need to solder. We can reflash the BMC firmware via the PCIe bridge you're using currently, or over the BMC UART (without requiring a functioning BMC or host). I'm aware of at least a couple of large-ish cloud-ish vendors doing OpenBMC conversions this way using culvert, so you're not entirely on your own.

I had a look at the marketing photos of the board, and it looks like it has a socketed flash as well (handy if you have a flash programmer).

Given that we have to modify flash we'd need to do a bit more digging to find the partition boundaries being used by the vendor image. If we're to do the experiment, we only want to re-write u-boot and not corrupt anything subsequent. We may be more constrained than the size of current OpenBMC u-boot builds (e.g. the Romulus image is ~366kB). binwalk output for the entire fw image would be useful.

Also before we go re-writing anything on the BMC flash, it would pay to dd the new payload into your dumped firmware image and verify it at least boots in qemu. After that we'd need to figure out any required tricks for loading and jumping into the kernel.

@bielids
Copy link
Author

bielids commented Apr 2, 2024

Here's the data from the FW I dumped a few days ago using culvert:

(py3) root@pve:~# binwalk /root/fw

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
163848        0x28008         CRC32 polynomial table, little endian
213604        0x34264         CRC32 polynomial table, little endian
393216        0x60000         JFFS2 filesystem, little endian
5636096       0x560000        CramFS filesystem, little endian, size: 40894464, version 2, sorted_dirs, CRC 0x63908029, edition 0, 27707 blocks, 6551 files
46530624      0x2C60040       uImage header, header size: 64 bytes, header CRC: 0x178CFDDF, created: 2023-11-01 09:51:21, image size: 2792584 bytes, Data Address: 0x80008000, Entry Point: 0x80008000, data CRC: 0x59454E13, OS: Linux, CPU: ARM, image type: OS Kernel Image, compression type: none, image name: "Linux-3.14.17-ami"
46530688      0x2C60080       Linux kernel ARM boot executable zImage (little-endian)
46547543      0x2C64257       gzip compressed data, maximum compression, from Unix, last modified: 1970-01-01 00:00:00 (null date)
49414144      0x2F20000       JFFS2 filesystem, little endian
49938432      0x2FA0000       CramFS filesystem, little endian, size: 5963776, version 2, sorted_dirs, CRC 0x43FDF23D, edition 0, 1566 blocks, 131 files

(py3) root@pve:~# ls -lsa /root/fw
49756 -rw-r--r-- 1 root root 67108864 Mar 29 10:52 /root/fw
(py3) root@pve:~#  

As for this:

I had a look at the marketing photos of the board, and it looks like it has a socketed flash as well (handy if you have a flash programmer)

Unfortunately I do not. I've got a few arduinos handy but I don't know if I'll have time tomorrow to setup something that could help.

@bielids
Copy link
Author

bielids commented Apr 2, 2024

Perhaps you could try a non MXxxL51235F, 64MB flash chip though? Or set the FMC model for a QEMU BMC machine to mx66l51235f for a negative test?

Mind pointing me in the right direction? I see that it's an option I can set but I don't quite get how I can set it.

@bielids
Copy link
Author

bielids commented Apr 2, 2024

Not sure if that's useful but with ast2600-evb as my machine type I get this:

root@pve:~# qemu-system-arm -m 512 -M ast2600-evb -nographic -drive file=/root/fw,format=raw,if=mtd -net nic 
qemu-system-arm: warning: hub 0 is not connected to host network
qemu-system-arm: warning: nic ftgmac100.1 has no peer
qemu-system-arm: warning: nic ftgmac100.2 has no peer
qemu-system-arm: warning: nic ftgmac100.3 has no peer


U-Boot 2013.07 (Nov 01 2023 - 17:52:22)

I2C:   ready
DRAM:  424 MiB
eSPI Handshake complete
OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: ERROR: Unable to Detect SPI Flash
*** failed ***
### ERROR ### Please RESET the board ###

I noticed that the fmc_model and spi_model for ast2600 in QEMU's code lines up with what I have on my machine, and so does the error...

@amboar
Copy link
Owner

amboar commented Apr 2, 2024

Mind pointing me in the right direction? I see that it's an option I can set but I don't quite get how I can set it.

There's some more detail in the Aspeed-specific qemu documentation, which indicates you can specify fmc-model on the qemu command-line:

https://www.qemu.org/docs/master/system/arm/aspeed.html#boot-options

For instance, try:

qemu-system-arm -M ast2500-evb,fmc-model=mx66l51235f -drive file=/root/fw,format=raw,if=mtd  -nographic

Regarding the flash type, culvert indicates the following:

...
[*] LIBFLASH: Flash ID: c2.20.1a (c2201a)
...

For reference you can find your mapping from the JEDEC chip ID (0xc2201a) to the part name here

https://gitlab.com/qemu-project/qemu/-/blob/master/hw/block/m25p80.c?ref_type=heads#L243

@legoater
Copy link

legoater commented Apr 2, 2024

0x1e620004: 0x00000701 means that 4B mode is selected which can confuse the chip if it doesn't support 4B commands. The CE0 control register is bogus : 0x1e620010: 0x00002400. There is something wrong with the flash driver in U-Boot.

@amboar
Copy link
Owner

amboar commented Apr 3, 2024

What I wonder is how u-boot got into the state where there was something wrong with its flash driver, when the BMC had apparently booted okay prior to entering BIOS recovery mode.

@bielids can you provide more details on which BMC and host firmware updates you applied in what sequence relative to changes to the BIOS recovery state?

@bielids
Copy link
Author

bielids commented Apr 3, 2024

Hey, I didn't make any progress last night as I broke my initrd files and pxeboot stopped working on this host for some reason.

can you provide more details on which BMC and host firmware updates you applied in what sequence relative to changes to the BIOS recovery state?

So basically I went into bios recovery mode in the hopes of getting more info out of my system as I had just received it (CPU & motherboard) but I couldn't get it to POST at all (I think that screw torque was the issue in the end). Here's a full timeline:

  • receive CPU & mobo separately
    • can't POST but BMC works (DHCP IP, BMC_READY light flashes)
  • clear CMOS
  • enable bios recovery
  • disable bios recovery
  • buy torque screwdriver, fix torque
    • don't recall if it POSTs per se, but I remember finally seeing proper CPU info in the IPMI webgui
  • realize I left clear CMOS jumper enabled, move jumper back to default. Install all DIMMs, reinstall cpu cooler
  • system POSTs but BMC no longer works
  • flash 12.61.17 BMC firmware via efi shell in hopes of fixing issue, don't retain existing settings
    • BMC still not working, BMC_READY light is steady
  • flash bios M18_R34 via efi
    • BMC still not working
  • flash 12.61.17 BMC fw via Linux
  • here we are

I spent a few nights troubleshooting post issues so I may be missing a few steps in between but that's the gist of it

@amboar
Copy link
Owner

amboar commented Apr 5, 2024

So I downloaded the 12.61.17 BMC firmware from Gigabyte. Extracting it, it seems they just provide the complete flash image, along with what is presumably their own copy of Aspeed's socflash called gigaflash (culvert is approximately an open-source implementation).

We can boot it directly in qemu:

0 andrew@heihei:/tmp/gigabyte/126117/fw$ truncate -s 64M 126117.bin 
0 andrew@heihei:/tmp/gigabyte/126117/fw$ qemu-system-arm -M ast2500-evb,fmc-model=mx66l51235f -drive file=126117.bin,if=mtd,format=raw -nographic
qemu-system-arm: warning: Aspeed iBT has no chardev backend


U-Boot 2013.07 (Nov 01 2023 - 17:52:22)

I2C:   ready
DRAM:  424 MiB
eSPI Handshake complete
OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: Found SPI Chip Macronix MX66L51235F(0x1a20) 2x I/O READ, NORMAL WRITE
64 MiB
MMC:   
*** Warning - bad CRC, using default environment

Un-Protected 1 sectors
Erasing Flash...
Erasing sector  4 ... ok.
Erased 1 sectors
Writing to Flash... done
Protected 1 sectors
Net:   RTL8211E, EEECR = 0x00
RTL8211E, EEEAR = 0x00
RTL8211E, EEELPAR = 0x00
RTL8211E, LACR = 0x00
RTL8211E, LCR = 0x00
ast_eth0, ast_eth1
DRAM ECC enabled
Hit any key to stop autoboot:  0 
Image to be booted is 1
EMMC and EXT4 is not enabled - Cannot locate kernel file in Root
Initing KCS...done
Uboot waiting for firmware update to start...
Uboot waiting for fwupdate to start timed out
Disabling Watchdog 2 Timer
AST2500EVB>

I wonder if you should try reflashing the image using culvert if you haven't already returned the board...

@bielids
Copy link
Author

bielids commented Apr 6, 2024

Hey, the seller agreed to extend the return window until the end of the month.

I truncated and then flashed the fw using culvert which actually got the BMC to a working state finally! With that being said, a few boots later and I was back to square 1 weirdly enough.

Type [C-a] [C-h] to see available commands
Terminal ready

DRAM Init-V12-DDR4
0abc1-4Gb-Done
Read margin-DL:0.3882/DH:0.3960 CK (min:0.30)
IICC    bbuurrttrraayyDDAA::  IICC    bbuurrtt  ccBBss  00xx    %

112233EErrrr  nnllttIIii22))rrttrree  dd
eSPI Handshake complete
OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: Found SPI Chip Macronix MX66L51235F(0x1a20) 2x I/O READ, NORMAL WRITE
64 MiB
MMC:   
Net:   RTL8211E, EEECR = 0x06
RTL8211E, EEEAR = 0x00
RTL8211E, EEELPAR = 0x00
RTL8211E, LACR = 0xc1
RTL8211E, LCR = 0x9742
ast_eth0, ast_eth1
DRAM ECC enabled
Hit any key to stop autoboot:  0 
Image to be booted is 1
conf @ /dev/mtdblock1 Address 20060000
conf @ /dev/mtdblock2 Address 20260000
ec @ /dev/mtdblock3 Address 20460000
Found Root File System @ /dev/mtdblock4
Root File System is CRAMFS
root @ /dev/mtdblock4 Address 20560000
dre @ /dev/mtdblock5 Address 22f20000
www @ /dev/mtdblock6 Address 22fa0000
Un-Protect Flash Bank # 1
Booting from Primary side
Booting from MODULE_PIMAGE ...
Bootargs = [root=/dev/mtdblock4 ro ip=none mem=424M console=ttyS4,115200 rootfstype=cramfs bigphysarea=6144 imagebooted=1]
## Booting kernel from Legacy Image at 80100000 ...
   Image Name:   Linux-3.14.17-ami
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    2792584 Bytes = 2.7 MiB
   Load Address: 80008000
   Entry Point:  80008000
   Loading Kernel Image ... OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 3.14.17-ami (AMI@localhost) (gcc version 4.9.4 (GCC) ) #1 Wed Nov 1 17:51:13 CST 2023
[    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
[    0.000000] Machine: AST2500EVB
[    0.000000] cma: CMA: reserved 44 MiB at 97c00000
[    0.000000] Memory policy: Data cache writeback
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 107696
[    0.000000] Kernel command line: root=/dev/mtdblock4 ro ip=none mem=424M console=ttyS4,115200 rootfstype=cramfs bigphysarea=6144 imagebooted=1
[    0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Memory: 354976K/434176K available (3854K kernel code, 194K rwdata, 1276K rodata, 150K init, 127K bss, 79200K reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     vmalloc : 0xdb000000 - 0xff000000   ( 576 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xda800000   ( 424 MB)
[    0.000000]     modules : 0xbf000000 - 0xc0000000   (  16 MB)
[    0.000000]       .text : 0xc0008000 - 0xc050add4   (5132 kB)
[    0.000000]       .init : 0xc050b000 - 0xc053082c   ( 151 kB)
[    0.000000]       .data : 0xc0532000 - 0xc0562b40   ( 195 kB)
[    0.000000]        .bss : 0xc0562b40 - 0xc05828c0   ( 128 kB)
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS:64
[    0.000000] AST Interrupt Controller Enabled
[    0.000000] AST Timer Enabled
[    0.000000] sched_clock: 32 bits at 100 Hz, resolution 10000000ns, wraps every 21474836480000000ns
[    0.120000] console [ttyS4] enabled
[    0.120000] Calibrating delay loop... 789.70 BogoMIPS (lpj=3948544)
[    0.170000] pid_max: default: 32768 minimum: 301
[    0.170000] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.180000] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.190000] CPU: Testing write buffer coherency: ok
[    0.190000] Setting up static identity map for 0x803ca060 - 0x803ca098
[    0.200000] devtmpfs: initialized
[    0.220000] NET: Registered protocol family 16
[    0.220000] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.300000] bio: create slab <bio-0> at 0
[    0.310000] usbcore: registered new interface driver usbfs
[    0.320000] usbcore: registered new interface driver hub
[    0.330000] usbcore: registered new device driver usb
[    0.340000] FS-Cache: Loaded
[    0.340000] CacheFiles: Loaded
[    0.360000] NET: Registered protocol family 2
[    0.360000] TCP established hash table entries: 4096 (order: 2, 16384 bytes)
[    0.370000] TCP bind hash table entries: 4096 (order: 2, 16384 bytes)
[    0.380000] TCP: Hash tables configured (established 4096 bind 4096)
[    0.380000] TCP: reno registered
[    0.390000] UDP hash table entries: 256 (order: 0, 4096 bytes)
[    0.390000] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
[    0.400000] NET: Registered protocol family 1
[    0.400000] RPC: Registered named UNIX socket transport module.
[    0.410000] RPC: Registered udp transport module.
[    0.410000] RPC: Registered tcp transport module.
[    0.420000] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.430000] futex hash table entries: 256 (order: -1, 3072 bytes)
[    0.470000] bigphysarea: Allocated 6144 pages at 0xd6089000.
[    0.490000] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.500000] FS-Cache: Netfs 'nfs' registered for caching
[    0.510000] NFS: Registering the id_resolver key type
[    0.510000] Key type id_resolver registered
[    0.520000] Key type id_legacy registered
[    0.520000] FS-Cache: Netfs 'cifs' registered for caching
[    0.530000] jffs2: version 2.2. © 2001-2006 Red Hat, Inc.
[    0.540000] fuse init (API version 7.22)
[    0.550000] msgmni has been set to 781
[    0.560000] alg: No test for stdrng (krng)
[    0.570000] io scheduler noop registered (default)
[    0.570000] Serial: 8250/16550 driver, 5 ports, IRQ sharing disabled
[    0.590000] serial8250: ttyS2 at MMIO 0x1e78e000 (irq = 33, base_baud = 1500000) is a 16550A
[    0.610000] serial8250: ttyS3 at MMIO 0x1e78f000 (irq = 34, base_baud = 1500000) is a 16550A
[    0.630000] serial8250: ttyS4 at MMIO 0x1e784000 (irq = 10, base_baud = 1500000) is a 16550A
[    0.680000] brd: module loaded
[    0.700000] loop: module loaded
[    0.700000] Ractrends Flash mapping: 0x08000000 at 0x20000000
[    0.710000] Flash total banks (2)
[    0.710000] Probing for Flash at Bank # 0
[    0.720000] Ractrends: No spi compatible flash device found
[    0.720000] Probing for Flash at Bank # 1
[    0.730000] Ractrends: No spi compatible flash device found
[    0.730000] ERROR: init_ractrends_flash: flash concat failed
[    0.740000] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[    0.780000] i2c /dev entries driver
[    0.790000] sdhci: Secure Digital Host Controller Interface driver
[    0.790000] sdhci: Copyright(c) Pierre Ossman
[    0.840000] AST SoC SD/MMC Driver Init Success
[    0.840000] Netfilter messages via NETLINK v0.30.
[    0.850000] nfnl_acct: registering with nfnetlink.
[    0.850000] xt_time: kernel timezone is -0000
[    0.860000] ip_tables: (C) 2000-2006 Netfilter Core Team
[    0.870000] arp_tables: (C) 2002 David S. Miller
[    0.870000] TCP: cubic registered
[    0.880000] NET: Registered protocol family 10
[    0.880000] ip6_tables: (C) 2000-2006 Netfilter Core Team
[    0.890000] sit: IPv6 over IPv4 tunneling driver
[    0.900000] NET: Registered protocol family 17
[    0.900000] 8021q: 802.1Q VLAN Support v1.8
[    0.910000] Key type dns_resolver registered
[    0.920000] VFS: Cannot open root device "mtdblock4" or unknown-block(0,0): error -6
[    0.930000] Please append a correct "root=" boot option; here are the available partitions:
[    0.930000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    0.930000] CPU: 0 PID: 1 Comm: swapper Not tainted 3.14.17-ami #1
[    0.930000] Backtrace: 
[    0.930000] [<c000bbc4>] (dump_backtrace) from [<c000be78>] (show_stack+0x18/0x1c)
[    0.930000]  r6:c052c480 r5:d5de8000 r4:c0562e88 r3:d7f3dd5d
[    0.930000] [<c000be60>] (show_stack) from [<c03c727c>] (dump_stack+0x20/0x28)
[    0.930000] [<c03c725c>] (dump_stack) from [<c03c49a0>] (panic+0x80/0x1cc)
[    0.930000] [<c03c4924>] (panic) from [<c050c110>] (mount_block_root+0x234/0x298)
[    0.930000]  r3:d7f3dd5d r2:d7f3dd5d r1:d5c35eb4 r0:c0493855
[    0.930000]  r7:d5de8000
[    0.930000] [<c050bedc>] (mount_block_root) from [<c050c37c>] (mount_root+0xe8/0x110)
[    0.930000]  r10:00000000 r9:c0562b40 r8:c0562b40 r7:c053079c r6:c0562b64 r5:c053ebb8
[    0.930000]  r4:00000000
[    0.930000] [<c050c294>] (mount_root) from [<c050c508>] (prepare_namespace+0x164/0x1c4)
[    0.930000]  r7:c053079c r6:c052c495 r5:c0562b64 r4:c052c480
[    0.930000] [<c050c3a4>] (prepare_namespace) from [<c050bc94>] (kernel_init_freeable+0x17c/0x1c4)
[    0.930000]  r6:c052c45c r5:00000080 r4:00000008
[    0.930000] [<c050bb18>] (kernel_init_freeable) from [<c03c1fd0>] (kernel_init+0x10/0xec)
[    0.930000]  r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c03c1fc0 r4:00000000
[    0.930000] [<c03c1fc0>] (kernel_init) from [<c0009338>] (ret_from_fork+0x14/0x3c)
[    0.930000]  r4:00000000 r3:d5c34000
[    0.930000] Rebooting in 1 seconds..IICC    bbuurrttrraayyDDAA::  IICC    bbuurrtt  ccBBss  00xx    %

112233EErrrr  nnllttIIii22))rrttrree  dd
eSPI Handshake complete
OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: ERROR: Unable to Detect SPI Flash
*** failed ***
### ERROR ### Please RESET the board ###

@bielids
Copy link
Author

bielids commented Apr 6, 2024

Unfortunately I'm no longer able to detect the SPI flash even with culvert now:

root@pve:~/bmc/126117# ./bmc_fw_update_linux.sh 
gigaflash v2.0.8
Do you want to preserve configuration? (Y/N)
n
Loading Firmware...
Failed to connect BMC, try to update BMC!
Update Firmware
Find ASPEED Device 1a03:2000 on c3:0.0 
MMIO Virtual Address: 694e6000 
Relocate IO Base: e000 
Found ASPEED Device 1a03:2500 rev. 41 
Static Memory Controller Information: 
CS0 Flash Type is SPI 
CS1 Flash Type is SPI 
CS2 Flash Type is SPI 
CS3 Flash Type is NOR 
CS4 Flash Type is NOR 
Boot CS is 0 
Option Information: 
CS: 0 
Flash Type: SPI 
[Warning] Don't AC OFF or Reboot System During BMC Firmware Update!! 
Can't Find Flash Chip #1 
Press y to force update, or press other keys to skip the flash .... 
Wait 90 seconds for BMC Ready...
^C
root@pve:~/bmc/126117# /root/culvert.bin write firmware < /root/bmc/126117/fw/126117.bin.trunc 
[*] failed to initialise devmem bridge: -1
[*] Preventing system reset
[*] Gating ARM clock
[*] Configuring VUART for host Tx discard
[*] Initialising flash subsystem
[*] LIBFLASH: Flash identification failed: -6
[*] Deconfiguring VUART host Tx discard
[*] Ungating ARM clock

and here's the BMC console after a reset:

ype [C-a] [C-h] to see available commands
Terminal ready
IICC    bbuurrttrraayyDDAA::  IICC    bbuurrtt  ccBBss  00xx    %

112233EErrrr  nnllttIIii22))rrttrree  dd
eSPI Handshake complete
OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: ERROR: Unable to Detect SPI Flash
*** failed ***
### ERROR ### Please RESET the board ###

The only thing that I did between the BMC working and now is setting an IPv4 address using the BIOS settings. 😕

Current culvert probe output:

root@pve:~# ./culvert.bin probe
[*] failed to initialise devmem bridge: -1
debug:	Permissive
	Debug UART port: UART1
xdma:	Restricted
	BMC: Disabled
	VGA: Enabled
	XDMA on VGA: Enabled
	XDMA is constrained: Yes
p2a:	Permissive
	BMC: Disabled
	VGA: Enabled
	MMIO on VGA: Enabled
	[0x00000000 - 0x0fffffff]   Firmware: Writable
	[0x10000000 - 0x1fffffff]     SoC IO: Writable
	[0x20000000 - 0x2fffffff]  BMC Flash: Writable
	[0x30000000 - 0x3fffffff] Host Flash: Writable
	[0x40000000 - 0x5fffffff]   Reserved: Writable
	[0x60000000 - 0x7fffffff]   LPC Host: Writable
	[0x80000000 - 0xffffffff]       DRAM: Writable
ilpc:	Permissive
	SuperIO address: 0x2e

EDIT: after leaving the server turned off all night the BMC booted up without issues. I've left it off overnight many times before to no avail but this time it made a difference. ipmitool works (locally or via ipv4)

@amboar
Copy link
Owner

amboar commented Apr 8, 2024

Okay. Where would you like to take this? Should we close it for now?

@amboar
Copy link
Owner

amboar commented Apr 11, 2024

Closing this for now, re-open if you think there are things we can do to improve culvert for dealing with these circumstances.

@amboar amboar closed this as completed Apr 11, 2024
@bielids
Copy link
Author

bielids commented Apr 22, 2024

Hey hey, thanks a lot, we can leave this closed, I appreciate everything you folks have done to help. I was away from the computer for the past 2 weeks and unexpectedly did not have any kind of reception/connectivity during that time.

So yesterday I did some memory tests and noticed that I would frequently get errors when running test #4 in memtest86+ on CPUs 23/51. I had been having random kernel panics so the idea that the CPU had some sort of issue causing this didn't seem out of the question. I checked sibling CPUs for core 23 and 51 came up. I did a few more tests using memtester and whenever core/thread 23/51 is used to write to memory I get errors. After disabling this core my system has been stable. I don't think that this caused my issues in the first place but I don't think that it helped either. Moving forward I'll leave that core disabled.

Right now I have a zpool to recover from before I can further look into the IPMI issues I was having but once that's done (hopefully today) I'd be more than happy to do dumps or whatever else you think might be useful in figuring out what happened here in the first place. I would love to contribute to this project after all the work that you've put into this issue but I'm no C dev so there's only so much I can do. Let me know if you want anything from this system and I'll be glad to help. I understand if you're not interested, you've already spent a lot of time on this :)

This tool saved me some many headaches and along the way I've learned so much about embedded systems. I'm now thinking of taking a course on u-boot and embedded systems in general just to get a better understanding of it all. Thanks again!

@amboar
Copy link
Owner

amboar commented Apr 22, 2024

Hey hey, thanks a lot, we can leave this closed, I appreciate everything you folks have done to help. I was away from the computer for the past 2 weeks and unexpectedly did not have any kind of reception/connectivity during that time.

No worries at all, I hope everything is okay for you.

I would love to contribute to this project after all the work that you've put into this issue but I'm no C dev so there's only so much I can do.

Don't be concerned about that, there's certainly no expectation that you contribute back on my part! Sometimes it's interesting to dig into a problem and I enjoy helping others.

Let me know if you want anything from this system and I'll be glad to help. I understand if you're not interested, you've already spent a lot of time on this :)

Yeah, largely I need whatever is to be done to be lead by you. Unfortunately I don't have the capacity to take on the issues with your system as a project, but I can tinker around the edges by improving the tools to help you out.

This tool saved me some many headaches and along the way I've learned so much about embedded systems. I'm now thinking of taking a course on u-boot and embedded systems in general just to get a better understanding of it all

Awesome, I'm glad it was in some part interesting and not just 100% frustration at expensive hardware gone wrong. All the best with the future embedded adventures :)

@y8
Copy link

y8 commented May 21, 2024

Hi!

This thread is only thing in whole internet about AST2500 BMC recovery, so I hope I could chime in with my issue with ASP2500. If you know another place where it's more appropriate, I would be very grateful for pointing it out :)

I did very stupid thing and tried to upgrade BMC firmware from 12.40.17 to 12.61.21 on Gigabyte MZ01-CE1 rev2 from linux running on "host" system via BMC's KVM session.

Obviously (in hindsight) KVM session dropped at some point during firmware process. BMC_LED led switched off, BMC stopped responding to pings. I waited for 15 minutes, nothing happened. I reset power and BMC_LED now in steady on state. I've tried to reset CMOS by setting CLR_CMOS jumper and power cycling the board, but this didn't helped either (and I suspect that's why host cpu doesn't boot anymore)

I've located BMC uart port and can see that BMC is stuck in u-boot:

DRAM Init-V12-DDR4
0abc1-4Gb-Done
Read margin-DL:0.3843/DH:0.4 CK (min:0.30)


U-Boot 2013.07 (Mar 12 2024 - 14:08:49)

I2C:   ready
DRAM:  424 MiB
eSPI Handshake complete

OEM_BOARD_INIT - Start (BMC)
LPC mode
OEM_BOARD_INIT - End
Flash: Found SPI Chip Macronix MX66L51235F(0x1a20) 2x I/O READ, NORMAL WRITE
64 MiB
MMC:
Net:   RTL8211E, EEECR = 0x06
RTL8211E, EEEAR = 0x00
RTL8211E, EEELPAR = 0x00
RTL8211E, LACR = 0xc1
RTL8211E, LCR = 0x9742
ast_eth0, ast_eth1
DRAM ECC enabled
Hit any key to stop autoboot:  0
Image to be booted is 1
conf @ /dev/mtdblock1 Address 20060000
conf @ /dev/mtdblock2 Address 20260000
ec @ /dev/mtdblock3 Address 20460000
Found Root File System @ /dev/mtdblock4
Root File System is CRAMFS
root @ /dev/mtdblock4 Address 20560000
Un-Protect Flash Bank # 1
Booting from Primary side
JFFS2 support is not enabled - Cannot locate kernel File in Root
Unable to locate /boot/uImage
EMMC and EXT4 is not enabled - Cannot locate kernel file in Root
Initing KCS...done
Uboot waiting for firmware update to start...
Uboot waiting for fwupdate to start timed out
Disabling Watchdog 2 Timer
AST2500EVB>

I've binwalk'ed firmware file tried to tftp firmware file and boot it:

# binwalk rom.ima_enc                                                                                                                                                                                                       18:05:40

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
163848        0x28008         CRC32 polynomial table, little endian
213604        0x34264         CRC32 polynomial table, little endian
393216        0x60000         JFFS2 filesystem, little endian
5636096       0x560000        CramFS filesystem, little endian, size: 40951808, version 2, sorted_dirs, CRC 0x417607BC, edition 0, 27801 blocks, 6575 files
46596160      0x2C70040       uImage header, header size: 64 bytes, header CRC: 0x596F847A, created: 2024-03-12 06:08:06, image size: 2792592 bytes, Data Address: 0x80008000, Entry Point: 0x80008000, data CRC: 0xC9C2F025, OS: Linux, CPU: ARM, image type: OS Kernel Image, compression type: none, image name: "Linux-3.14.17-ami"
46596224      0x2C70080       Linux kernel ARM boot executable zImage (little-endian)
46613079      0x2C74257       gzip compressed data, maximum compression, from Unix, last modified: 1970-01-01 00:00:00 (null date)
49479680      0x2F30000       JFFS2 filesystem, little endian
50003968      0x2FB0000       CramFS filesystem, little endian, size: 5963776, version 2, sorted_dirs, CRC 0xBCBB0E63, edition 0, 1566 blocks, 131 files
setenv ipaddr 192.168.1.108
setenv serverip 192.168.1.128
ping 192.168.1.128
tftp 81000000 192.168.1.128:rom.ima_enc

bootm 0x83C70040 # 81000000 + 0x2C70040

## Booting kernel from Legacy Image at 83c70040 ...
   Image Name:   Linux-3.14.17-ami
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    2792592 Bytes = 2.7 MiB
   Load Address: 80008000
   Entry Point:  80008000
   Loading Kernel Image ... OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.

After a while BMC resets and drops back to uboot.

I've also tried to boot from one of addresses listed in fhm output:

AST2500EVB>fmh

Listing FMH Modules

FMH Located at 0x20000000 of Size 0x00050000
Name    : boot
Ver     : 12.01.000000
Type .: 0x0001
Flags.: 0x0000
Size .: 0x0003b488
Location: 0x20000000
LoadAddr: 0xffffffff
CheckSum: Not Computed
--------------------------------------------------

FMH Located at 0x20050000 of Size 0x00200000
Name    : conf
Ver     : 12.01.000000
Type .: 0x0011
Flags.: 0x0001
Size .: 0x001f0000
Location: 0x20060000
LoadAddr: 0xffffffff
CheckSum: Not Computed
--------------------------------------------------

FMH Located at 0x20250000 of Size 0x00200000
Name    : conf
Ver     : 12.01.000000
Type .: 0x0011
Flags.: 0x0001
Size .: 0x001f0000
Location: 0x20260000
LoadAddr: 0xffffffff
CheckSum: Not Computed
--------------------------------------------------

FMH Located at 0x20450000 of Size 0x00100000
Name    : ec
Ver     : 1.25.000000
Type .: 0x0011
Flags.: 0x0001
Size .: 0x00000000
Location: 0x20460000
LoadAddr: 0xffffffff
CheckSum: Not Computed
--------------------------------------------------

FMH Located at 0x20550000 of Size 0x02720000
Name    : root
Ver     : 12.01.000000
Type .: 0x0012
Flags.: 0x0001
Size .: 0x0270e000
Location: 0x20560000
LoadAddr: 0x81000000
CheckSum: Not Computed
--------------------------------------------------

FMH Located at 0x23ef0000 of Size 0x00010000
Name    : ast2500e
Ver     : 12.40.17
Type .: 0x0002
Flags.: 0x0000
Size .: 0x00000098
Location: 0x23ef0040
LoadAddr: 0xffffffff
CheckSum: Not Computed
--------------------------------------------------

AST2500EVB>bootm 0x20000000 # Name    : boot

   No valid image found at 0x20000000
Wrong Image Format for bootm command
ERROR: can't get kernel image!
AST2500EVB>

   No valid image found at 0x20000000
Wrong Image Format for bootm command
ERROR: can't get kernel image!
AST2500EVB>bootm 0x20550000 # 

   No valid image found at 0x20550000
Wrong Image Format for bootm command
ERROR: can't get kernel image!
AST2500EVB>

   No valid image found at 0x20550000
Wrong Image Format for bootm command Name    : root
ERROR: can't get kernel image!
AST2500EVB>

I have very basic understanding of u-boot so I have no further ideas how I can re-flash BMC.

Unfortunately, I don't have VGA → HDMI adapter, it will take about a week to get one, so I don't know what happens when host is powered on. CPU fan start spinning at max PWM but nothing happens. I've made bootable USB flash drive with alpine linux that should autostart, enable network interface, get DHCP address and start with sshd, but it doesn't seem to boot.

Another problem is that I'm on Apple-silicon Mac and don't have access to Intel hardware. I couldn't build the culvert on macOS, so right now I'm setting up ubuntu to try here.

What gives me hope that culvert's readme says Reflash or dump the firmware of a running BMC from the host, but I can't find any details on this process.

I suppose main question is it even possible to reflash via physical UART port if BMC in this state? If not, is there any other options?

Any help is greatly appreciated!

@zevweiss
Copy link
Collaborator

zevweiss commented May 21, 2024

I suppose main question is it even possible to reflash via physical UART port if BMC in this state?

Unless the Gigabyte u-boot version that's currently on it disables the AST2500's on-by-default debug UART functionality (which I'd guess is unlikely, especially given that it's apparently built on a 3.x (!) kernel), yes, it should be possible to reflash the BMC firmware via the UART -- it is very slow, however. If you have a raw flash image for the desired firmware, you can do so by running something like:

$ culvert write firmware $PORT < $IMGFILE

where $PORT is the device node for the serial port connected to the BMC's serial console (e.g. /dev/ttyUSB0 if you're using a common USB/serial adapter), and $IMGFILE is the raw flash image file for the firmware to flash to it. Note that the file provided on a vendor's firmware downloads page may or may not be in an appropriate format for this -- if you're not sure, we may be able to help determine whether or not what you've got is in fact a raw flash image, and if not how you might be able to extract one (a good indicator would be if the file size is exactly 64MiB, or 2^26 bytes).

[EDIT: looking back at previous discussion on this issue, I see from @amboar's comment that it looks like Gigabyte does in fact provide what's pretty much just a raw flash image.]

For a 64MiB flash part a full reflash via the debug UART will likely take something on the order of dozens of hours to finish (perhaps a few days, I don't remember exactly how fast it generally runs offhand).

There's some chance it might be possible to tftp in an alternate kernel/initramfs to boot (e.g. an OpenBMC evb-ast2500 build, which I'd guess would stand a decent chance of booting functionally enough for basic network & flash access to work) and do the full reflash from there, which would likely run quite a bit faster (~5-10 minutes if you can get it going successfully), but would be more experimental -- given the age of the u-boot you've got installed, you might need to also temporarily install OpenBMC's (newer) u-boot to be able to boot a FIT image.

Note also that an x86 host isn't required to run culvert -- I've been successfully using it from a Raspberry Pi (ARM) for quite some time, for example. The host OS may be a bigger issue; I don't know offhand exactly how portable culvert and its various dependencies are expected to be, or if they should compile/work on macOS -- what problem(s) did you run into while trying?

@zevweiss
Copy link
Collaborator

@y8, note also that there happens to be a somewhat related discussion currently going on on the OpenBMC mailing list -- see in particular this post which includes some details on manually booting an OpenBMC FIT image if you want to try that route.

@amboar
Copy link
Owner

amboar commented May 22, 2024

If you know another place where it's more appropriate, I would be very grateful for pointing it out :)

I don't want to derail this issue as there's a lot of interesting stuff happening in it, but for anyone looking in the future, I prefer that we use a discussion instead.

@y8 - @zevweiss has already covered a fair bit here otherwise, so I'll let that run its course.

@y8
Copy link

y8 commented May 22, 2024

First, I'd like to than you for you tremendous dedication and depth of your help! As Johannes said in mailing list it's very inspiring!

The host OS may be a bigger issue; I don't know offhand exactly how portable culvert and its various dependencies are expected to be, or if they should compile/work on macOS -- what problem(s) did you run into while trying?

  1. On latest stable macOS Sonoma, with dependencies installed via hombrew build fails this way: https://gist.github.com/y8/ee765ada10ba8d0df6470c3e4af5db06

  2. On latest ubuntu-24.04-live-server-arm64, but it failed on warning that iob() function is used ahead of its definition. I suspect this happens because [mb.h](https://github.com/amboar/culvert/blob/main/src/mb.h) have no definition of memory actual barrier function for aarch64

But the end I have built a static version on x86 cloud host and moved it to x86 emulator on my Mac and it worked :) Kudos to #46!

However, when I tried to run /culvert -vv read firmware /dev/ttyACM0 > fw.bin it complained that env variable AST_DEBUG_PASSWORD is not set. I set it to empty value but process hang after Entering debug mode:

AST_DEBUG_PASSWORD= ./culvert -vv read firmware /dev/ttyACM0
[*] Found 5 registered bridge drivers
[*] Trying bridge driver l2a
[*] Trying bridge driver ilpc
[*] Trying bridge driver devmem
[*] Trying bridge driver debug-uart
[*] Opening /dev/ttyACM0
[*] Entering debug mode

I played with all kinds of default IPMI password that Gigabyte provide in their documentation, but no luck. According to strace culvert sent password value to UART and waited for response, which won't happen because it basically writes to u-boot shell.

write(2, "Entering debug mode\n", 20Entering debug mode
)   = 20
write(3, "", 0)                         = 0
read(3, ^

Note that the file provided on a vendor's firmware downloads page may or may not be in an appropriate format for this

Image file is slightly smaller than 64M (67,108,864)

  • 126121.bin: 66,060,552 bytes
  • rom.ima_enc: 66,060,424 bytes

possible to tftp in an alternate kernel/initramfs to boot (e.g. an OpenBMC evb-ast2500 build, which I'd guess would stand a decent chance of booting functionally enough for basic network & flash access to work)

I've considered this option, but couldn't find any docs on how this process works. Discussion you pointed to provides very detailed instructions on this, which I'll try later today! Thank you!

Seems like booting in OpenBMC and flashing from there is viable option. I don't mind having it as BMC firmware if KVM, Power Management and Fan control (fan for EPYC 7551P is very loud at full speed) works on AST2500 :)

I'll leave an option to reflash via culvert as a last resort, because we can have short blackouts here and I don't have backup power supply to power BMC while it being flashed.

Once again, than you very much for your help!

@zevweiss
Copy link
Collaborator

On latest stable macOS Sonoma, with dependencies installed via hombrew build fails this way: https://gist.github.com/y8/ee765ada10ba8d0df6470c3e4af5db06

I see...so yeah, I guess dtc has some incompatibility with the Apple toolchain's assembler somehow? It might be worth filing a bug if you feel like it, though I'm not sure if the dtc devs regard macOS as a supported platform.

On latest ubuntu-24.04-live-server-arm64, but it failed on warning that iob() function is used ahead of its definition. I suspect this happens because mb.h have no definition of memory actual barrier function for aarch64

Ah, yeah...I suspect you may be the first to try to build culvert on arm64 -- something like this might fix it:

diff --git src/mb.h src/mb.h
index 58bc83abc6f0..49229e71563b 100644
--- src/mb.h
+++ src/mb.h
@@ -10,7 +10,7 @@
 #elif defined(__x86_64__)
 #include "x86.h"
 #define iob() mfence()
-#elif defined(__arm__)
+#elif defined(__arm__) || defined(__aarch64__)
 /*
  * HACK: Assumes we're running remotely or on the AST itself. If ARM is
  * the host arch then we need to fix up the barriers

...it complained that env variable AST_DEBUG_PASSWORD is not set.

Oh dear -- I'm sorry, I completely forgot about that aspect, because I have that variable set automatically in my shell profile on systems where I use culvert. Unfortunately it needs to be set to a specific 64-byte string of gibberish from a specific page in the AST2500 datasheet -- a document that is (sadly) officially confidential. Aspeed's efforts at security-by-obscurity there haven't been (ahem) entirely successful (heck, it's embedded in their socflash binary and they don't seem to make any effort to restrict the distribution of that AFAICT), but posting it on github would be a little more bold than might be prudent for me at the moment. So uh...internet scavenger hunt to see if you can find it? (Apologies.)

Image file is slightly smaller than 64M (67,108,864)

126121.bin: 66,060,552 bytes
rom.ima_enc: 66,060,424 bytes

As noted somewhere in the mailing list thread I think, it seems like Gigabyte's images are indeed slightly smaller than the flash chip, which seems a bit odd (I guess they just omit an unused tail end of it?), but should be easy to just truncate -s out to the full size.

Looking at those two files, however, reveals something slightly surprising (or maybe not, I dunno):

$ cmp fw/126121.bin fw/rom.ima_enc 
cmp: EOF on fw/rom.ima_enc after byte 66060424, in line 176495

The two are identical aside from a 128-byte footer present in the larger one (126121.bin). Given that, I'd guess the footer is probably some auxiliary metadata (checksums or the like) that isn't meant to actually go into the flash, so I'd probably use rom.ima_enc as the starting point to truncate out to the full 64MiB and then flash (though if it's unused space it will in all likelihood not actually matter).

Seems like booting in OpenBMC and flashing from there is viable option. I don't mind having it as BMC firmware if KVM, Power Management and Fan control (fan for EPYC 7551P is very loud at full speed) works on AST2500 :)

Unfortunately I'm not aware of anyone porting OpenBMC to any Gigabyte hardware thus far (certainly not in mainline OpenBMC), so functionality would be pretty limited -- on most systems KVM support is pretty plug-and-play in my experience and would likely work, but host power and fan control generally require a fair amount of platform-specific information (specific GPIOs, temperature sensors, etc. etc.), so it's probably not going to be something you'll want to install and use day to day (unless you or someone else puts in the time & effort to reverse-engineer a working port...which would be cool, but also quite a bit of work).

I'll leave an option to reflash via culvert as a last resort, because we can have short blackouts here and I don't have backup power supply to power BMC while it being flashed.

Ah -- yeah, given that, aiming for the faster method is probably wise. If you can get an OpenBMC FIT image booted to an initrd shell, you can use the busybox tftp command from it to load the desired flash image in a tmpfs on the BMC, after which running flashcp -v /path/to/image /dev/mtd0 should do the trick.

Once again, than you very much for your help!

No problem!

@amboar
Copy link
Owner

amboar commented May 22, 2024

@y8 (and @zevweiss) Regarding AST_DEBUG_PASSWORD, grab the freely published Aspeed SDK User Guide and turn to page 381, where you'll see a handy 64 character string under 5. start use debug command

@y8
Copy link

y8 commented May 22, 2024

So uh...internet scavenger hunt to see if you can find it? (Apologies.)

I found datasheet a bit before @amboar kindly provided public link :D But it didn't work for some reason. I tried to manually input it over 1200b terminal connection but it didn't work either: no output after pasting password (I waited for ~5 minutes)

From SDK User Guide it seems like BMC got 2 UART ports and only one has Debug mode enabled. Maybe JTAG_BMC is the wrong one and I need to use SOC_UART port that is not populated.

I followed Johannes instructions and was able to chain-load patched u-boot, yay!

But haven't had much luck with OpenBMC fit image:

. setup evb-ast2500
bitbake core-image-full-cmdline

From tmp/deploy/images I copied fitImage--6.6.31+git0+e0d77d0f38-r0-evb-ast2500-20240522093611.bin and loaded it over tftp:

ast# setenv ethact ethernet@1e680000
ast# dhcp
ast# tftpboot 0x84000000 192.168.2.123:openbmc-fit-image
ast# setenv bootargs console=ttyS4,115200n8 root=/dev/ram rw enable-initrd-debug-sh debug-init-sh
ast# bootm 0x84000000

But linux panics with Cannot open root device "/dev/ram"

[    1.457002] /dev/root: Can't open blockdev
[    1.461218] VFS: Cannot open root device "/dev/ram" or unknown-block(1,0): error -6
[    1.469037] Please append a correct "root=" boot option; here are the available partitions:
[    1.477554] 1f00           65536 mtdblock0 
[    1.477594]  (driver?)
[    1.484275] 1f01             384 mtdblock1 
[    1.484309]  (driver?)
[    1.490897] 1f02             128 mtdblock2 
[    1.490924]  (driver?)
[    1.497613] 1f03            4352 mtdblock3 
[    1.497650]  (driver?)
[    1.504324] 1f04           23808 mtdblock4 
[    1.504361]  (driver?)
[    1.510940] 1f05            4096 mtdblock5 
[    1.510966]  (driver?)
[    1.517633] List of all bdev filesystems:
[    1.521682]  squashfs
[    1.521701] 
[    1.525579] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
[    1.533881] CPU: 0 PID: 1 Comm: swapper Not tainted 6.6.31-e0d77d0-00168-ge0d77d0f38aa #1
[    1.542103] Hardware name: Generic DT based system
[    1.546935]  unwind_backtrace from show_stack+0x18/0x1c
[    1.552263]  show_stack from dump_stack_lvl+0x24/0x2c
[    1.557402]  dump_stack_lvl from panic+0xf4/0x30c
[    1.562180]  panic from mount_root_generic+0x1fc/0x2d0
[    1.567385]  mount_root_generic from prepare_namespace+0x1d4/0x254
[    1.573616]  prepare_namespace from kernel_init+0x1c/0x130
[    1.579154]  kernel_init from ret_from_fork+0x14/0x28
[    1.584254] Exception stack(0x9b015fb0 to 0x9b015ff8)

Does it mean that there is something wrong with device tree in this image? I tried both /dev/ram0

I'm building core-image-minimal now, maybe be it will boot

@y8
Copy link

y8 commented May 22, 2024

Uh, I suppose this story abruptly ends here: seems like my USB UART ground wire was loose and I fried the BMC when I was powering PSU on :| Any tips on how to safely wire UART to mains powered motherboard to avoid incidents like this are very appreciated :)

But anyway, now there is no serial output at all. I tried different UART adapter and still got radio silence.

I will solder and check what is going on with SOC_UART later today and maybe manage to find VGA → HDMI adapter later this week to check if host system can boot, but it feels unlikely. If by chance it does, I can manage to live without BMC.

A bit sad, but after all, it was my attempt to build a cheap-y homelab for machine learning. I got this motherboard for few hundred dollars on eBay. I bet seller will be more than happy to supply me with another one :D

I will post update on how it went.

Once again, I'd like to thank you: your help is priceless and I've learned a lot! I believe in the spirit of this project my main takeaway is first hand experience how easy it is to plant something in BMC and blurry (and already scary) picture about what implications it might have for host system

Thanks again!

@zevweiss
Copy link
Collaborator

@y8 (and @zevweiss) Regarding AST_DEBUG_PASSWORD, grab the freely published Aspeed SDK User Guide and turn to page 381, where you'll see a handy 64 character string under 5. start use debug command

Well I'll be...news to me! Though given that they apparently consider the password straight-up public now, is there any reason we shouldn't just include it in culvert instead of requiring it to be passed via an environment variable?

@zevweiss
Copy link
Collaborator

From tmp/deploy/images I copied fitImage--6.6.31+git0+e0d77d0f38-r0-evb-ast2500-20240522093611.bin and loaded it over tftp:

   ast# setenv ethact ethernet@1e680000
   ast# dhcp
   ast# tftpboot 0x84000000 192.168.2.123:openbmc-fit-image
   ast# setenv bootargs console=ttyS4,115200n8 root=/dev/ram rw enable-initrd-debug-sh debug-init-sh
   ast# bootm 0x84000000

But linux panics with Cannot open root device "/dev/ram"

The long file names and multiple levels of symlinks in the tmp/deploy/images/$PLATFORM directory are admittedly a bit hard to keep track of, but I think you may have inadvertently grabbed a FIT image that doesn't include the initramfs -- the one pointed to by the image-kernel symlink should include it though, so I'd recommend using that instead (also a conveniently shorter/friendlier name).

Uh, I suppose this story abruptly ends here: seems like my USB UART ground wire was loose and I fried the BMC when I was powering PSU on :

Damn, well that's a shame. In my experience the serial adapters themselves are usually the easier thing to accidentally fry, so I was going to suggest trying another, but it sounds like you've already ruled that out.

If it's any consolation, I once accidentally connected my serial adapter's GND to +12VDC (I could have sworn I checked the rail with a multimeter and confirmed it was ground, but apparently I mixed something up somewhere along the line)...it got quite warm, and never worked again. Though actually, now that I think about it I've also managed to cook the BMC side of the UART on one unit at some point, though I no longer remember how. Experiences like these are why I now have one of these posters on the wall in the corner of my office where the relevant hardware sits:
smoke

@zevweiss
Copy link
Collaborator

Oh, and re:

I found datasheet a bit before @amboar kindly provided public link :D But it didn't work for some reason. I tried to manually input it over 1200b terminal connection but it didn't work either: no output after pasting password (I waited for ~5 minutes)

The debug UART mechanism is in my experience a bit fragile and lockup-prone; if you haven't done so already I'd suggest (once you've got some working hardware) trying a full DC power cut/restore to get it back into a stable initial state before attempting to poke it with culvert. If it works it should do so fairly promptly (within a second); if it still doesn't after a power cycle I'm not sure what else to suggest other than double/triple checking the password (e.g. for 1 vs l mixups or the like).

@y8
Copy link

y8 commented May 23, 2024

A bit of good news: kind people from local user group where I was looking to lend some VGA-equipped hardware, suggest that if it starts the fans and enables ethernet controller, then it most likely can POST, but can't boot from my USB pen drive it might try to boot from PXE.

And yes, it indeed does! I'm not sure why it's hit and miss after power on, sometimes it resets, sometimes I can see how negotiated link speed jumps from 10 to 1000M. But in a minute after it stabilizes on 1000M, it broadcasts DHCP requests!

Also network interface on BMC also establishes link with router so hopefully only UART side is fried and I still can boot host system and flash it with with culvert :) I also tried both COM1 and COM2, but now luck here: according to gigabyte docs serial forwarding is disabled by default and unfortunately both ports were silent. I have double-checked my both UARt adapter by connecting RX to TX and they both properly echo input on 9600 and 115200 bauds.

Now I need to figure out how to properly boot it over PXE given that I don't have ethernet on my mac (first time when wireless bite my arse) and my router is from ISP with all the fun stuff locked out. Worst of all, on average it takes about 12 minutes for motherboard to actually broadcast DHCP request. I wonder what really happens during this 12 minutes!

So far, I managed to configure dnsmasq to respond to PXE requests only originated from from motherboard MAC to avoid messing up dhcp on router. I have dnsmasq announcing PXE options to MB but PXE is (according to internet and my fresh experience) complete PITA. I can't even find the proper combination of pxelinux.0 and ldlinux.c32 to boot qemu VM with SystemResque Linux: it fails with Failed to load ldlinux.c32 after actually loading it over tftp. Maybe it's something platform/architecture dependent.

So this story is not over yet :)

@zevweiss I'm totally sticking that warning sign on this motherboard!

@amboar
Copy link
Owner

amboar commented May 23, 2024

I see...so yeah, I guess dtc has some incompatibility with the Apple toolchain's assembler somehow? It might be worth filing a bug if you feel like it, though I'm not sure if the dtc devs regard macOS as a supported platform.

Note that I pinned the dtc dependency to some past release for ... reasons I don't quite recall. We'd need to re-evaluate with current upstream before filing any bugs.

Well I'll be...news to me! Though given that they apparently consider the password straight-up public now, is there any reason we shouldn't just include it in culvert instead of requiring it to be passed via an environment variable?

Yep, I'm thinking the same. Just need to be clear in the commit message for the change what the source of the password is.

The debug UART mechanism is in my experience a bit fragile and lockup-prone; if you haven't done so already I'd suggest (once you've got some working hardware) trying a full DC power cut/restore to get it back into a stable initial state before attempting to poke it with culvert.

100% this (setting aside your other hardware concerns). The debug UART init sequence is fragile and will enter a bad state if someone even so much as looks at it funny.

@y8
Copy link

y8 commented May 24, 2024

Hooray, I have working BMC! 🥳

I finally managed to boot over PXE (not an easy task) into linux with ssh and re-flash firmware

However I had issues with culvert.

It could probe the controller:

[root@sysrescue ~]# ./culvert probe
[*] failed to initialise devmem bridge: -1
debug:	Permissive
	Debug UART port: UART5
xdma:	Restricted
	BMC: Disabled
	VGA: Enabled
	XDMA on VGA: Enabled
	XDMA is constrained: Yes
p2a:	Permissive
	BMC: Disabled
	VGA: Enabled
	MMIO on VGA: Enabled
	[0x00000000 - 0x0fffffff]   Firmware: Writable
	[0x10000000 - 0x1fffffff]     SoC IO: Writable
	[0x20000000 - 0x2fffffff]  BMC Flash: Writable
	[0x30000000 - 0x3fffffff] Host Flash: Writable
	[0x40000000 - 0x5fffffff]   Reserved: Writable
	[0x60000000 - 0x7fffffff]   LPC Host: Writable
	[0x80000000 - 0xffffffff]       DRAM: Writable
ilpc:	Permissive
	SuperIO address: 0x2e

But couldn't read the firmware (I wanted to dump it just in case things go sideways)

[root@sysrescue ~]# ./culvert -vv read firmware > fw.bin
[*] Found 5 registered bridge drivers
[*] Trying bridge driver l2a
[*] Failed to initialise L2A bridge: -95
[*] Trying bridge driver ilpc
[*] Probing ilpc
[*] Probing 0x2e for SuperIO
[*] Unlocking SuperIO: 0
[*] Selecting SuperIO device 2 (SUART1): 0
[*] Found device 2 selected: 0
[*] Selecting SuperIO device 12 (SUART4): 0
[*] Found device 12 selected: 0
[*] Locking SuperIO
[*] Found SuperIO device at 0x2e
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf7cffedc
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Trying bridge driver devmem
[*] failed to initialise devmem bridge: -1
[*] Trying bridge driver debug-uart
[*] Unrecognised argument list for debug interface (0)
[*] Trying bridge driver p2a
[*] Probing p2a
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf7cffedc
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf7cffedc
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Selected devicetree for SoC 'aspeed,ast2500'
[*] Found 15 registered drivers
[*] Processing devicetree node at /aliases
[*] Processing devicetree node at /memory@80000000
[*] Processing devicetree node at /ahb
[*] Processing devicetree node at /ahb/sram@1e720000
[*] Processing devicetree node at /ahb/bus-controller@1e600000
[*] Bound trace driver to /ahb/bus-controller@1e600000
[*] Processing devicetree node at /ahb/apb
[*] Processing devicetree node at /ahb/apb/spi@1e620000
[*] Bound sfc driver to /ahb/apb/spi@1e620000
[*] Processing devicetree node at /ahb/apb/spi@1e630000
[*] Bound sfc driver to /ahb/apb/spi@1e630000
[*] Processing devicetree node at /ahb/apb/spi@1e631000
[*] Bound sfc driver to /ahb/apb/spi@1e631000
[*] Processing devicetree node at /ahb/apb/memory-controller@1e6e0000
[*] Bound sdmc driver to /ahb/apb/memory-controller@1e6e0000
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/clock
[*] Bound clk driver to /ahb/apb/syscon@1e6e2000/clock
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/strapping
[*] Bound strap driver to /ahb/apb/syscon@1e6e2000/strapping
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/superio
[*] Bound sioctl driver to /ahb/apb/syscon@1e6e2000/superio
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Bound bridge-controller driver to /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Bound debugctl driver to /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound pciectl driver to /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound scu driver to /ahb/apb/syscon@1e6e2000
[*] Processing devicetree node at /ahb/apb/watchdog@1e785000
[*] Bound wdt driver to /ahb/apb/watchdog@1e785000
[*] Processing devicetree node at /ahb/apb/watchdog@1e785020
[*] Bound wdt driver to /ahb/apb/watchdog@1e785020
[*] Processing devicetree node at /ahb/apb/watchdog@1e785040
[*] Bound wdt driver to /ahb/apb/watchdog@1e785040
[*] Processing devicetree node at /ahb/apb/serial@1e787000
[*] Bound vuart driver to /ahb/apb/serial@1e787000
[*] Processing devicetree node at /ahb/apb/lpc@1e789000
[*] Processing devicetree node at /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound ilpcctl driver to /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound uart-mux driver to /ahb/apb/lpc@1e789000
[*] Initialising flash controller
[*] fdt: Looking up device name 'fmc'
[*] fdt: Locating node with device path '/ahb/apb/spi@1e620000'
[*] ahb_readl: 0x1e6e2000: 0x00000001
[*] Initialised scu driver
[*] Initialised clk driver
[*] ahb_readl: 0x1e6e2070: 0xf120d286
[*] ahb_readl: 0x1e620010: 0x00000000
[*] ahb_readl: 0x1e620000: 0x8000002a
[*] ahb_writel: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620094: 0x00000000
[*] Initialised sfc driver
[*] Initialising flash chip
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x00000000
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Init status: 00
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x00000000
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Flash ID: 00.00.00 (000000)
[*] LIBFLASH: Flash identification failed: -6
[*] Unbound instance of driver uart-mux
[*] Unbound instance of driver ilpcctl
[*] Unbound instance of driver vuart
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver scu
[*] Unbound instance of driver pciectl
[*] Unbound instance of driver debugctl
[*] Unbound instance of driver bridge-controller
[*] Unbound instance of driver sioctl
[*] Unbound instance of driver strap
[*] Unbound instance of driver clk
[*] Unbound instance of driver sdmc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver sfc
[*] ahb_writel: 0x1e620010: 0x00000400
[*] Unbound instance of driver sfc
[*] Unbound instance of driver trace

(without verbose output it fails at LIBFLASH: Flash identification failed: -6)

So I decided to try to flash it with stock utility first. And it worked!

[root@sysrescue ~/126121]# ./bmc_fw_update_linux.sh
gigaflash v2.0.10
Do you want to preserve configuration? (Y/N)
n
Loading Firmware...
Failed to connect BMC, try to update BMC!
Update Firmware
Find ASPEED Device 1a03:2000 on 2:0.0
MMIO Virtual Address: 60eae000
Relocate IO Base: 1000
Found ASPEED Device 1a03:2500 rev. 41
Static Memory Controller Information:
CS0 Flash Type is SPI
CS1 Flash Type is SPI
CS2 Flash Type is SPI
CS3 Flash Type is NOR
CS4 Flash Type is NOR
Boot CS is 0
Option Information:
CS: 0
Flash Type: SPI
[Warning] Don't AC OFF or Reboot System During BMC Firmware Update!!
Find Flash Chip #1: 64MB SPI Flash
Update Flash Chip #1 O.K.
Update Flash Chip O.K.
Wait 90 seconds for BMC Ready...

After power cycle BMC booted, blinked at me and it seems like everything works (and much better than firmware from 2020)

I will test host hardware to figure out is there any damage beyond UART and then, if needed, I can do whatever it might help to contribute to culvert! I have aarch64 around, so I will try patch suggested by @zevweiss.

If you don't mind I can contribute GitHub Actions to produce static binaries for PR's and tag pushes for supported platforms. This might be handy!

Can't thank you enough for your help on this journey! This experience is so inspiring and help me believe in FOSS community

If we happen to meet drinks and dinner is on me :)

Thanks again!

@amboar
Copy link
Owner

amboar commented May 27, 2024

Based on some of the commentary here I've pushed a dev/gigabyte-misc branch. It would be helpful if you could test it.

[*] LIBFLASH: Init status: 00
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x00000000
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Flash ID: 00.00.00 (000000)
[*] LIBFLASH: Flash identification failed: -6

It's not clear to me why it might have failed to identify the flash. However, from the logs I'm implying that culvert has chosen the P2A bridge to access the BMC. I expect gigaflash is doing the same. If you're up for a bit of experimentation my quick poking around suggests we can use mmiotrace to understand what gigaflash is doing. Attaching a trace here may allow us to understand what gigaflash did different to succeed at identifying the flash chip.

If you don't mind I can contribute GitHub Actions to produce static binaries for PR's and tag pushes for supported platforms. This might be handy!

Thanks for the enthusiasm, but I prefer we don't distribute culvert binaries.

If we happen to meet drinks and dinner is on me :)

Oh, no need at all :)

@y8
Copy link

y8 commented May 28, 2024

Here's what I've got running culvert from that branch:

Linux host 6.8.0-31-generic #31-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr 20 00:40:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04 LTS
Release:	24.04
Codename:	noble
list of packages needed to build culvert on clean ubuntu, because I lost list I've made to make last build (maybe add them to README?)
sudo apt update && sudo apt install -y build-essential  meson bison swig flex cmake ccache pkg-config device-tree-compiler python3-dev libyaml-dev libftdi-devlibreadline-dev zlib1g-dev  libssl-dev
$ sudo ./build/src/culvert -vvv probe
[*] Found 5 registered bridge drivers
[*] Trying bridge driver l2a
[*] Failed to initialise L2A bridge: -95
[*] Trying bridge driver ilpc
[*] Probing ilpc
[*] Probing 0x2e for SuperIO
[*] Unlocking SuperIO: 0
[*] Selecting SuperIO device 2 (SUART1): 0
[*] Found device 2 selected: 0
[*] Selecting SuperIO device 12 (SUART4): 0
[*] Found device 12 selected: 0
[*] Locking SuperIO
[*] Found SuperIO device at 0x2e
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf70ea098
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Trying bridge driver devmem
[*] failed to initialise devmem bridge: -1
[*] Trying bridge driver debug-uart
[*] Unrecognised argument list for debug interface (0)
[*] Trying bridge driver p2a
[*] Probing p2a
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf70ea098
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Accessing the BMC's AHB via the p2a bridge
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf70ea098
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Selected devicetree for SoC 'aspeed,ast2500'
[*] Found 15 registered drivers
[*] Processing devicetree node at /aliases
[*] Processing devicetree node at /memory@80000000
[*] Processing devicetree node at /ahb
[*] Processing devicetree node at /ahb/sram@1e720000
[*] Processing devicetree node at /ahb/bus-controller@1e600000
[*] Bound trace driver to /ahb/bus-controller@1e600000
[*] Processing devicetree node at /ahb/apb
[*] Processing devicetree node at /ahb/apb/spi@1e620000
[*] Bound sfc driver to /ahb/apb/spi@1e620000
[*] Processing devicetree node at /ahb/apb/spi@1e630000
[*] Bound sfc driver to /ahb/apb/spi@1e630000
[*] Processing devicetree node at /ahb/apb/spi@1e631000
[*] Bound sfc driver to /ahb/apb/spi@1e631000
[*] Processing devicetree node at /ahb/apb/memory-controller@1e6e0000
[*] Bound sdmc driver to /ahb/apb/memory-controller@1e6e0000
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/clock
[*] Bound clk driver to /ahb/apb/syscon@1e6e2000/clock
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/strapping
[*] Bound strap driver to /ahb/apb/syscon@1e6e2000/strapping
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/superio
[*] Bound sioctl driver to /ahb/apb/syscon@1e6e2000/superio
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Bound bridge-controller driver to /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Bound debugctl driver to /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound pciectl driver to /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound scu driver to /ahb/apb/syscon@1e6e2000
[*] Processing devicetree node at /ahb/apb/watchdog@1e785000
[*] Bound wdt driver to /ahb/apb/watchdog@1e785000
[*] Processing devicetree node at /ahb/apb/watchdog@1e785020
[*] Bound wdt driver to /ahb/apb/watchdog@1e785020
[*] Processing devicetree node at /ahb/apb/watchdog@1e785040
[*] Bound wdt driver to /ahb/apb/watchdog@1e785040
[*] Processing devicetree node at /ahb/apb/serial@1e787000
[*] Bound vuart driver to /ahb/apb/serial@1e787000
[*] Processing devicetree node at /ahb/apb/lpc@1e789000
[*] Processing devicetree node at /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound ilpcctl driver to /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound uart-mux driver to /ahb/apb/lpc@1e789000
[*] ahb_readl: 0x1e6e2000: 0x00000000
[*] Unlocking SCU
[*] ahb_writel: 0x1e6e2000: 0x1688a8a8
[*] Initialised scu driver
[*] Initialised strap driver
[*] Initialised sioctl driver
[*] Initialised ilpcctl driver
[*] Initialised ilpcctl AHB bridge controller
[*] fdt: Searching devicetree for type 'memory'
[*] Initialised sdmc driver
[*] Initialised pciectl driver
[*] Initialised pciectl AHB bridge controller
[*] Initialised bridge-controller driver
[*] Initialised debugctl driver
[*] Initialised debugctl AHB bridge controller
[*] ahb_readl: 0x1e6e202c: 0x00600401
debug:	Disabled
[*] ahb_readl: 0x1e6e2180: 0x000c003b
[*] ahb_readl: 0x1e6e2180: 0x000c003b
[*] ahb_readl: 0x1e6e0008: 0x2003000f
xdma:	Disabled
[*] ahb_readl: 0x1e6e2180: 0x000c003b
[*] ahb_readl: 0x1e6e2180: 0x000c003b
[*] ahb_readl: 0x1e6e202c: 0x00600401
p2a:	Permissive
[*] ahb_readl: 0x1e6e2180: 0x000c003b
	BMC: Disabled
[*] ahb_readl: 0x1e6e2180: 0x000c003b
	VGA: Enabled
	MMIO on VGA: Enabled
[*] ahb_readl: 0x1e6e202c: 0x00600401
	[0x00000000 - 0x0fffffff]   Firmware: Readable
	[0x10000000 - 0x1fffffff]     SoC IO: Writable
	[0x20000000 - 0x2fffffff]  BMC Flash: Readable
	[0x30000000 - 0x3fffffff] Host Flash: Readable
	[0x40000000 - 0x5fffffff]   Reserved: Writable
	[0x60000000 - 0x7fffffff]   LPC Host: Writable
	[0x80000000 - 0xffffffff]       DRAM: Writable
[*] ahb_readl: 0x1e6e2070: 0xf100d28a
[*] ahb_readl: 0x1e789100: 0x00000040
ilpc:	Restricted
[*] ahb_readl: 0x1e6e2070: 0xf100d28a
	SuperIO address: 0x2e
[*] Unbound instance of driver uart-mux
[*] Unbound instance of driver ilpcctl
[*] Unbound instance of driver vuart
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver scu
[*] Unbound instance of driver pciectl
[*] Unbound instance of driver debugctl
[*] Unbound instance of driver bridge-controller
[*] Unbound instance of driver sioctl
[*] Re-locking SCU
[*] ahb_writel: 0x1e6e2000: 0xe9775757
[*] Unbound instance of driver strap
[*] Unbound instance of driver clk
[*] Unbound instance of driver sdmc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver trace
$ sudo ./build/src/culvert -vvv read firmware > fw.bin
[*] Found 5 registered bridge drivers
[*] Trying bridge driver l2a
[*] Failed to initialise L2A bridge: -95
[*] Trying bridge driver ilpc
[*] Probing ilpc
[*] Probing 0x2e for SuperIO
[*] Unlocking SuperIO: 0
[*] Selecting SuperIO device 2 (SUART1): 0
[*] Found device 2 selected: 0
[*] Selecting SuperIO device 12 (SUART4): 0
[*] Found device 12 selected: 0
[*] Locking SuperIO
[*] Found SuperIO device at 0x2e
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf70ea098
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Trying bridge driver devmem
[*] failed to initialise devmem bridge: -1
[*] Trying bridge driver debug-uart
[*] Unrecognised argument list for debug interface (0)
[*] Trying bridge driver p2a
[*] Probing p2a
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf70ea098
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Accessing the BMC's AHB via the p2a bridge
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf70ea098
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Selected devicetree for SoC 'aspeed,ast2500'
[*] Found 15 registered drivers
[*] Processing devicetree node at /aliases
[*] Processing devicetree node at /memory@80000000
[*] Processing devicetree node at /ahb
[*] Processing devicetree node at /ahb/sram@1e720000
[*] Processing devicetree node at /ahb/bus-controller@1e600000
[*] Bound trace driver to /ahb/bus-controller@1e600000
[*] Processing devicetree node at /ahb/apb
[*] Processing devicetree node at /ahb/apb/spi@1e620000
[*] Bound sfc driver to /ahb/apb/spi@1e620000
[*] Processing devicetree node at /ahb/apb/spi@1e630000
[*] Bound sfc driver to /ahb/apb/spi@1e630000
[*] Processing devicetree node at /ahb/apb/spi@1e631000
[*] Bound sfc driver to /ahb/apb/spi@1e631000
[*] Processing devicetree node at /ahb/apb/memory-controller@1e6e0000
[*] Bound sdmc driver to /ahb/apb/memory-controller@1e6e0000
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/clock
[*] Bound clk driver to /ahb/apb/syscon@1e6e2000/clock
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/strapping
[*] Bound strap driver to /ahb/apb/syscon@1e6e2000/strapping
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/superio
[*] Bound sioctl driver to /ahb/apb/syscon@1e6e2000/superio
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Bound bridge-controller driver to /ahb/apb/syscon@1e6e2000/bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Bound debugctl driver to /ahb/apb/syscon@1e6e2000/debug-bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound pciectl driver to /ahb/apb/syscon@1e6e2000/pcie-bridge-controller
[*] Bound scu driver to /ahb/apb/syscon@1e6e2000
[*] Processing devicetree node at /ahb/apb/watchdog@1e785000
[*] Bound wdt driver to /ahb/apb/watchdog@1e785000
[*] Processing devicetree node at /ahb/apb/watchdog@1e785020
[*] Bound wdt driver to /ahb/apb/watchdog@1e785020
[*] Processing devicetree node at /ahb/apb/watchdog@1e785040
[*] Bound wdt driver to /ahb/apb/watchdog@1e785040
[*] Processing devicetree node at /ahb/apb/serial@1e787000
[*] Bound vuart driver to /ahb/apb/serial@1e787000
[*] Processing devicetree node at /ahb/apb/lpc@1e789000
[*] Processing devicetree node at /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound ilpcctl driver to /ahb/apb/lpc@1e789000/bridge-controller
[*] Bound uart-mux driver to /ahb/apb/lpc@1e789000
[*] Initialising flash controller
[*] fdt: Looking up device name 'fmc'
[*] fdt: Locating node with device path '/ahb/apb/spi@1e620000'
[*] ahb_readl: 0x1e6e2000: 0x00000000
[*] Unlocking SCU
[*] ahb_writel: 0x1e6e2000: 0x1688a8a8
[*] Initialised scu driver
[*] Initialised clk driver
[*] ahb_readl: 0x1e6e2070: 0xf100d28a
[*] ahb_readl: 0x1e620010: 0x00000400
[*] ahb_readl: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620094: 0x00000000
[*] Initialised sfc driver
[*] Initialising flash chip
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x00000000
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Init status: 00
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x00000000
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Flash ID: 00.00.00 (000000)
[*] LIBFLASH: Flash identification failed: -6
[*] Unbound instance of driver uart-mux
[*] Unbound instance of driver ilpcctl
[*] Unbound instance of driver vuart
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver scu
[*] Unbound instance of driver pciectl
[*] Unbound instance of driver debugctl
[*] Unbound instance of driver bridge-controller
[*] Unbound instance of driver sioctl
[*] Unbound instance of driver strap
[*] Re-locking SCU
[*] ahb_writel: 0x1e6e2000: 0xe9775757
[*] Unbound instance of driver clk
[*] Unbound instance of driver sdmc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver sfc
[*] ahb_writel: 0x1e620010: 0x00000400
[*] Unbound instance of driver sfc

PSA: gigaflash -dump fw.bin shuts down BMC while it makes dump, so either wait for it to complete dump or power cycle the motherboard (or maybe there is some magic hidden button somewhere?)

Surprisingly, trace is quite small:

# echo mmiotrace > /sys/kernel/tracing/current_tracer
# cat /sys/kernel/tracing/trace_pipe > gigatrace.txt &
[1] 4856
# ./gigaflash_x64 -dump out.bin
gigaflash v2.0.10

--- Dump image from BMC...
Find ASPEED Device 1a03:2000 on 2:0.0
MMIO Virtual Address: 589a7000
Relocate IO Base: 1000
Found ASPEED Device 1a03:2500 rev. 41
Static Memory Controller Information:
CS0 Flash Type is SPI
CS1 Flash Type is SPI
CS2 Flash Type is SPI
CS3 Flash Type is NOR
CS4 Flash Type is NOR
Boot CS is 0
Option Information:
CS: 0
Flash Type: SPI
[Warning] Don't AC OFF or Reboot System During BMC Firmware Update!!
Find Flash Chip #1: 64MB SPI Flash
Backup Flash Chip O.K.
--- Dump image finished
--- Wait 90 secs for BMC ready...
# echo nop > /sys/kernel/tracing/current_tracer
bash: echo: write error: Device or resource busy
# fg
cat /sys/kernel/tracing/trace_pipe > gigatrace.txt
^C
# grep -i lost gigatrace.txt
# cat gigatrace.txt
VERSION 20070824
PCIDEV 0000 10221450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 0008 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 000e 10221453 1a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 0010 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 0018 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 0020 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 0038 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 0039 10221454 1b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 0040 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 0041 10221454 1d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 00a0 1022790b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 piix4_smbus
PCIDEV 00a3 1022790e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00c0 10221460 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00c1 10221461 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00c2 10221462 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00c3 10221463 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 k10temp
PCIDEV 00c4 10221464 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00c5 10221465 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00c6 10221466 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00c7 10221467 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00c8 10221460 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00c9 10221461 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00ca 10221462 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00cb 10221463 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 k10temp
PCIDEV 00cc 10221464 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00cd 10221465 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00ce 10221466 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00cf 10221467 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00d0 10221460 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00d1 10221461 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00d2 10221462 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00d3 10221463 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 k10temp
PCIDEV 00d4 10221464 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00d5 10221465 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00d6 10221466 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00d7 10221467 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00d8 10221460 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00d9 10221461 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00da 10221462 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00db 10221463 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 k10temp
PCIDEV 00dc 10221464 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00dd 10221465 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00de 10221466 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 00df 10221467 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 0100 1a031150 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 0200 1a032000 18 ec000000 ee000000 1001 0 0 0 c0002 2000000 20000 80 0 0 0 20000 ast
PCIDEV 0300 1022145a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 0302 10221456 8f 0 0 ee300000 0 0 ee400000 0 0 0 100000 0 0 2000 0 ccp
PCIDEV 0303 1022145f 2c ee200004 0 0 0 0 0 0 100000 0 0 0 0 0 0 xhci_hcd
PCIDEV 0400 10221455 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 0401 10221468 92 0 0 ee500000 0 0 ee600000 0 0 0 100000 0 0 2000 0 ccp
PCIDEV 0402 10227901 37 0 0 0 0 0 ee602000 0 0 0 0 0 0 1000 0 ahci
PCIDEV 2000 10221450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 2008 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 2010 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 2018 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 2020 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 2038 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 2039 10221454 1e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 2040 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 2041 10221454 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 2100 1022145a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 2102 10221456 94 0 0 e7b00000 0 0 e7c00000 0 0 0 100000 0 0 2000 0 ccp
PCIDEV 2103 1022145f 35 e7a00004 0 0 0 0 0 0 100000 0 0 0 0 0 0 xhci_hcd
PCIDEV 2200 10221455 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 2201 10221468 97 0 0 e7d00000 0 0 e7e00000 0 0 0 100000 0 0 2000 0 ccp
PCIDEV 2202 10227901 41 0 0 0 0 0 e7e02000 0 0 0 0 0 0 1000 0 ahci
PCIDEV 4000 10221450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 4008 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 4010 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 4018 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 4020 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 4038 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 4039 10221454 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 4040 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 4041 10221454 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 4100 1022145a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 4102 10221456 99 0 0 e7600000 0 0 e7700000 0 0 0 100000 0 0 2000 0 ccp
PCIDEV 4200 10221455 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 4201 10221468 9c 0 0 e7400000 0 0 e7500000 0 0 0 100000 0 0 2000 0 ccp
PCIDEV 6000 10221450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 6008 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 6010 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 6018 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 601a 10221453 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 601b 10221453 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 601c 10221453 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 6020 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 6038 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 6039 10221454 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 6040 10221452 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 6041 10221454 2b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
PCIDEV 6100 80861533 42 e3100000 0 3001 e3180000 0 0 0 80000 0 20 4000 0 0 0 igb
PCIDEV 6200 15b75030 2a e3000004 0 0 0 0 0 0 4000 0 0 0 0 0 0 nvme
PCIDEV 6300 80861533 48 e2f00000 0 2001 e2f80000 0 0 0 80000 0 20 4000 0 0 0 igb
PCIDEV 6400 1022145a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 6402 10221456 9e 0 0 e2d00000 0 0 e2e00000 0 0 0 100000 0 0 2000 0 ccp
PCIDEV 6500 10221455 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCIDEV 6501 10221468 a1 0 0 e2b00000 0 0 e2c00000 0 0 0 100000 0 0 2000 0 ccp
#

I did it right, right? :) I'm a bit confused by bash: echo: write error: Device or resource busy, but there nothing except mmiotrace: CPUxx is down in dmesg.

@amboar
Copy link
Owner

amboar commented May 29, 2024

So we do have this:

PCIDEV 0200 1a032000 18 ec000000 ee000000 1001 0 0 0 c0002 2000000 20000 80 0 0 0 20000 ast

but it looks like I missed this note towards the end of the documentation:

PID is always zero as tracing MMIO accesses originating in user space memory is not yet supported.

@zevweiss
Copy link
Collaborator

PID is always zero as tracing MMIO accesses originating in user space memory is not yet supported.

Hmm...this would of course be its own chunk of development effort, but could a userspace mmio interposer/tracer perhaps be rigged up via a combination of ptrace and userfaultfd?

@amboar
Copy link
Owner

amboar commented May 30, 2024

hmm, that's a bit of a nerd snipe...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants