-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5. Implement TPM command parsing and communication between FPGA and MCU #20
Comments
We have at least partially working communication between MCU and FPGA, however we are unable to fully test it due to non-working LPC controller (more on this below). MCU-side firmware is partially done, before finishing this, we must fix FPGA. The problem with FPGA is a non-working LPC controller, this is caused by timing issues and too high delays. We tried to solve this problem by manually writing timing constraints which LPC controller must meet to work properly. However, we ran into issues with the toolchain we use. Initially, we were using Qorc SDK which contains official tools for FPGA synthesis. Later on, we tried using F4PGA. The first problem is that we are unable to use any of pads designed to be used as clock input as FPGA does not receive any signal on those pads. This looks like a bug in SDK. Another problem is that we are not able to use timing constraints. Qorc SDK contains an old version of VPR (part of Verilog to Routing) which has a number of problems. The first problem is that timings don't propagate properly through clock buffers if we use negative-edge trigger on that clock. Also, we are unable to constraint the clock buffer directly nor anything that isn't an input or output pin and if creating a constrained clock, the clock must use one of special clock pins (which don't work). Same is true for clocks coming Cortex-M4, such as Wishbone clock. Inability to constraint Wishbone clock, as well as to manually propagate clocks through buffers results in all connections being unconstrained:
Those problems seem to be partially related to
Due to problems with Qorc SDK we tried using F4PGA (this is a continuation of Symbiflow, which was used in Qorc SDK), however we ran into similar problems. Clock input pads are still not working, what's more, newer Yosys detects clocks and moves them into Due to these issues we are going to try Quicklogic proprietary tools. Those tools were used to synthesize usb2ser which works on 48 Mhz, this gives a better chance to get things working until problems with open-source tools are solved. |
Tests performed by manually connecting relevant signals and flipping clock signal show that with lower frequencies (below 1 Hz, wires were physically moved around on breadboard) it i possible to read TPM registers. This reassured us that it is indeed problem with timing issues. However, this exposed another issue. As part of the test, TPM_ACCESS register was read before any other operation, with the result of `TPM_ACCESS: begin
data_o <= {/* tpmRegValidSts */ 1'b1, /* Reserved */ 1'b0,
addrLocality === activeLocality ? 1'b1 : 1'b0,
beenSeized[addrLocality], /* Seize, write only */ 1'b0,
/* pendingRequest */ |(requestUse & ~(5'h01 << addrLocality)),
requestUse[addrLocality], tpmEstablishment};
end At this point expected register value is |
We tried to QuickLogic's proprietary tools - the latest version (and the only version that supports EOS-S3) is available on GitHub, that version uses yosys for synthesis (instead of Precision Synthesis) and SpDE for PnR. We saw some improvements, most notably the problem non-working CLOCK cells is gone and it is possible to use both positive and negative triggered procedural blocks and timings do propagate properly. However, other issues showed off, and we were unable to get LPC working at the target frequency. We could achieve working LPC only when most of the code was commented out - basically we left only LPC (without SERIRQ), removed Wishbone, RAM, and any registers (responding with a single constant value to any reads). Even then, reads were successful at approximately 50% rate. SpDE has some bugs which result in wrong maximum frequency being reported, so we had to resort to manual analysis of propagation delays which turned to be complicated as SpDE does not report critical path. We were also unable to set most of constraints, such as placement constraints, max fanout, or even false path. Attempt to set those constraints resulted in SpDE crashes. Addition of any modules, such as SERIRQ, or even those not related directly to LPC, such as Wishbone resulted in propagation delay increase between nets used by LPC controller. In effect the controller breaks completely even after minor changes. This suggests that the resources available in FPGA are not enough to build a functional design with LPC controller, and additional interfaces we need. Due to these problems we resign from using QuickLogic EOS-S3 and we will look for another platform to use. |
We decided to use Lattice ECP5. For development purposes we will use either Radiona ULX3S (FPGA variant 25F or higher) or ORANGECRAB-R0D2-25 whichever is more easily available. ECP5-25F contains near 128 KiB of BlockRAM which will be enough to implement FIFO for LPC controller and SRAM for softcpu. We have 3 choices for a softcpu implementation: NeoRV32, VexRiscV and PicoRV32. NeoRV32 is the most complete implementation and forms a full SoC, while other interfaces need to be provided from external sources, such a source can be LiteX. NeoRV32 does have peripherals we need, namely:
Unfortunately, NeoRV32 SPI master and slave are not supported by Zephyr. In case of LiteX we have LiteSPI which does have a basic support in Zephyr, however LiteSPI itself does not support operation as a slave. NeoRV32 contains everything we need except for LPC controller which can be integrated through Wishbone. So, this would be probably the best CPU. Other CPUs could be usable if we decide to port TwPM to smaller FPGAs. As for SPI implementation on EOS-S3 we could try to:
|
Currently we are working on SPI on implementation, SPI has a chance to work as it requires lower frequency and is simpler than LPC. For testing we are using LiteX. So far, I made a few bug fixes in F4PGA:
Additionally, a few fixes to LiteX were needed:
I got my SPI test code to build, however most of it is optimized out, I'm not sure yet whether I'm missing something or there is something else wrong. Hopefully, tomorrow I will solve remaining issues and get meaningful results. With that, we will know whether SPI can work on EOS-S3. The test code is available here ( |
Synthesis of very simple SPI code that responds with constant value (https://github.com/Dasharo/twpm-f4pga-tests/blob/f12b1790de2920376bac80822efe44fc3baef026/spi_litex/test_spi.py) gave the result: {
"cpd": 21.357,
"fmax": 46.8232,
"swns": -4.69029,
"stns": -80.2244
} This is the result for CPU clock domain (on which SPI depends). Results don't look promising, taking the fact that real circuit will be more complex. I couldn't constraint SPI clock port as VPR keeps claiming there is no such port/net:
So far I found out that this happens if we use clock as data input instead of as clock input for FFs. Litex SPI slave uses probes SPI clock on CPU domain clock edge: https://github.com/enjoy-digital/litex/blob/6ab156e2253b3a832203d726fdb04f069894adf8/litex/soc/cores/spi/spi_slave.py#L56-L62. We actually hit this bug before, when using negative trigger on clock ( I tried to add |
I tried to finish LiteX-based SPI controller as well as creating custom SPI controller from scratch in Verilog after the first attempt failed. Unfortunately, despite better timings with Verilog controller, the target frequency was not achieved. Latest version of LiteX controller is available here. It supports reading and writing of single register, the maximum frequency as reported by VPR was 30 MHz. However, SPI controller runs entirely in CPU clock domain, sampling SPI signals (including SPI clock) at positive edge of CPU clock, so we need to run at a frequency twice of SPI frequency. I tried with custom SPI controller written in Verilog (available here). The controller follows TPM protocol, implements wait states and register writes. Register writes and some other mandatory features are not implemented as maximum frequency quickly dropped the more logic I added, resulting with frequency of 23 MHz at the minimum required to write single byte to register. We decided to resign further work on EOS-S3 and continue on ECP5 platforms. The half-baked SPI controller I made can be used as basis for further work on SPI support, however, currently we are implementing LPC. For further work we are using ORANGECRAB-R0D2-25 with LFE5U-25F, since OrangeCrab does not contain hard CPU we need to use a softcore CPU, our choice is NeoRV32. In Dasharo/TwPM_toplevel#9 I integrated NeoRV32 and TwPM LPC controller. The design synthesizes and if flashed now it should work up to what is implemented, that is, the softcore should get in BootROM, it should be possible to communicate with it through UART and LPC controller together with TPM register interface should work. JTAG is enabled NeoRV32 but not connected to anything - ECP5 has internal JTAG which can be used for programming FPGA, the port cannot be "accessed directly" by assigning pads to toplevel port, but can be accessed through CPU configuration needs fine-tuning some parameters, such as SRAM size, I/D-cache size, and boot-source. NeoRV32 can support XIP from SPI flash. Overall results with ECP5 were positive, basic design sythesizes and gives good timings:
|
We continued work on Neorv32 and ECP5 and we got a working soft core on Orangecrab platform. At this stage Neorv32 should be able to boot from SPI flash, however firmware has not been ported yet, and at least debug versions of firmware may not fit in 64 KiB. |
DRAM initialization added in Dasharo/TwPM_toplevel#11. Initialization is controlled mostly by software, required code was added to bootloader. Unfortunately, it can only start after DDR clocks are stable, which takes relatively long, and CPU is held in reset until that happens. However, LPC and TPM registers modules should work independently of CPU, so host should be able to start sending TPM commands before software TPM stack is available. DMEM was completely disabled, now stack is located on DRAM. We can use saved BlockRAM for implementing cache, which should help significantly if we decide to execute code directly from flash. |
I started firmware porting to Orangecrab (Dasharo/twpm-firmware#3), however the work has stalled as I had to focus on a DRAM problem I discovered during first attempt to boot firmware. The initial symptoms were that NeoRV32 bootloader was printing
The problem occurs in get_exe_word, however the code is correct. It turned out, DRAM does not handle properly transfers smaller than 4 bytes which results in bytes being shifted around. We've been testing LiteX and found out that RAM works properly there, @krystian-hebel has found this which should help us. |
After applying workaround mentioned in previous comment RAM seems to work properly. BootROM is working as expected and I can transmit firmware to the CPU over UART, however it does not boot currently (no output on UART). I'm currently debugging this problem, tried to increase Zephyr verbosity and adding debug prints in various places but it didn't work. I have exported Neorv32 JTAG to debug this issue, due to some problem with nextpnr I had to comment-out most components, removing LPC, TPM registers, and leaving only DRAM and CPU. Otherwise nextpnr was freezing. DRAM fix is available here. Recent work is located here |
I've been trying to get JTAG working, however it was unstable and I couldn't do anything useful with it. I started inspecting Zephyr's code and found out that DTS definitions for Neorv32 are wrong - zephyr's has support for Neorv32 V1.6.1, and memory map has changed since then.
Bits 0, 1, 2 control red, green, blue leds, respectively. After loading up Zephyr, red and green leds light up, so execution gets to Blue LED never lights up so probably MTIMER interrupt is not arriving. I don't know what is the cause. |
Got Zephyr working:
Latest work is available in Dasharo/TwPM_toplevel#14 and Dasharo/twpm-firmware#3. I've opened draft PR for latest neorv32 support in Zephyr, currently it isn't working and latest Zephyr does not boot properly. Zephyr v3.4.0 with custom patches works properly as long as NeoTRNG is disabled. |
We are working on updating TwPM build environment and integrating https://github.com/Dasharo/twpm-firmware into https://github.com/Dasharo/TwPM_toplevel. In Dasharo/TwPM_toplevel#15 I added some bits of Zephyr SDK to Nix Flake and added ability to turn SDK in container - for reproducible builds and to make sure nothing depends on host. Development using standard Nix shell will still be possible. There are some problems with yosys's ABC and build fails both in container and in pure mode:
If command is run manually from shell then it succeeds. |
Can't wait to see that, good job 👍 |
Setup like the above didn't work initially after connecting TwPM through proper goldpins instead of ad-hoc "lets-hope-it-connects" approach. This was caused by double ground connection - one through LPC connector to the platform I was testing on (Protectli), and the other one through USB cable connected to USB hub and then my computer. I hadn't noticed any proper read with that setup, although they weren't all FFs either. Most likely those two grounds created a loop, and this relatively high-frequency connection is susceptible to electromagnetic noise. Disconnecting the LPC ground made things better, after that I got <0.5% error rate. Supplying power through USB connected to Protectli made number of errors go to 0, across 500k reads of 4B register. Unfortunately, connecting either logic analyzer or UART to the machine other than Protectli used for testing recreates that ground loop, and with that up to 1% error rate (seems to be higher with UART than the analyzer, but maybe that is just the different cabling). This makes debugging and development harder, but shouldn't be a problem in the final solution. |
Current code can be found in Dasharo/TwPM_toplevel#22 and submodules pointed to by it. Communication between host and TPM stack mostly works, including command execution and sending the response back to PC, unfortunately "mostly" isn't enough. I'm using UART connected to OrangeCrab without ground and it is surprisingly reliable. There are very few random bytes sent every now and then, but I've seen setups where "properly" connected UART was worse than this. Today I used this to get some output from execution of proper TPM2 functions and I haven't noticed any random errors, however there is one nibble in one command that seems to be always wrong. Here's part of the log, as produced by current code when booting Ubuntu 22.04.1:
First command gets supported hash algorithms and their PCRs out of TPM. It properly returns info that there are 4 algorithms, along with their registered IDs ( Unfortunately, this is part of kernel probing for TPM existence, any failure results in Footnotes
|
I got few more notes on that subject:
|
With Despite that, most of longer commands time out. Based on the serial output from TwPM, After (re-)enabling DMEM execution time went down to 5 minutes 15 seconds, barely not enough. On this hardware we can get up to 64KB of DMEM, which is not enough to cover all required data and instructions. By moving data below the code (which is already cached anyway) in linker script and stripping every possible buffer (and some impossible ones just for testing, like breaking failure mode by returning pointer to data on stack instead of static buffer) to fit as much as possible in those 64KB, I managed to get Unfortunately, there is an issue with data cache. Enabling it with more than one block ( Using one block allows for booting, but it actually makes it slower - every time data is needed from outside cached region, a whole block is trashed and fetched, even if only one byte is acted upon. This is clearly visible by setting big block size (4KB or more) and watching as serial output characters are printed one after another. |
I am closing this issue as we have tested changes made so far and published the test results in #21. |
Minimal parsing of commands and responses (limited to just their sizes) must be done on FPGA side in order to properly set status bits that host can use to check whether TPM expects more bytes of command or has more bytes of response. Full command parsing and execution takes place on MCU, so FPGA has to implement and expose buffer with command sent by host, along with any required metadata like type of message in the buffer or currently active locality.
Milestones:
The text was updated successfully, but these errors were encountered: