New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
icoprog upload to SRAM is flaky on icoboard gamma+USB #2
Comments
The title of the issue suggests you are using an UP Board (GPIOMODE=1), but the text suggests you are using the USB base board (USBMODE=1). Which one is it? I do not have an UP board that I could use for testing right now. I have a pre-production USB base board and have now used it to program an icoboard 100 times in a loop with the examples/pinout/ bitstream, it did not fail a single time. Do you have any other information that you can share with me that could help me reproduce the problem?
It does not matter how small or large the circuit is, the bitstream is always the same size.
Where did you get the 100+ figure? The Lattice documentation says to send 49 dummy bits:
Do you have the equipment to measure this? The maximum "idle times" I see when capturing programming via FTDI is less than 1 us: I know that the iCE40 FPGAs don't like very long pauses during SRAM programming. So if you see very long idle times during programming, this might explain the problem. |
I'm using the USB board; I did not realize the distinction. Title clarified.
I agree. But I can reliably load small circuits, including your example circuits, to SRAM. Larger SoC circuits tend to fail more often. (It is possible that the larger SoC circuits are tickling some other bug and the size is a red herring... but so far once I pass about 700 LBs in use the success rate is near zero.)
Appendix A in TN1248, page 23, table line 5. The number is repeated in the pseudocode a few pages later. I agree that this is inconsistent with the timing diagram, which says 49. But I can report that either way, it makes no difference in my case; CDONE is observed as high at the appropriate time
I do! I've got a flaky programming section captured at 100Ms/s and am analyzing it. I'll report back if I see anything obvious. |
I should clarify: when I say "fail," I don't mean that icoprog reports an error:
I mean that the design misbehaves. This is the behavior that seems to correlate with design size. (Unsupported conjecture: the way the design fails would be consistent with incomplete initialization of BRAMs. I'm adding some more nets to boundary scan to see if I can prove this.) My initial review of the waveforms I captured shows no smoking gun. My host is able to keep the MPSSE engine fed, and no inter-byte delay exceeds 0.92 µs. The chip raises CDONE quickly (actually synchronous with bit 4 of the final 0x00 in the bitstream, or ~840ns after the final bit of My assumption from the docs was that the chip would pulse MISO low while erasing itself after CRESET. I don't see that, but I admit the docs are ambiguous (they really just say "high means housekeeping is completed," they don't say it ever gets pulled low). I do see MISO activity during flash read-back, so I think I've got the right pin. I can provide the waveform capture in VCD or Salae Logic format if you'd like to see it. Here is syn.bin.gz. When uploaded with
When uploaded with The design is relatively untested but simulates correctly on iverilog (and works in Flash). (Flow is CLaSH -> Yosys -> Arachne.) My boards were sourced from Trenz. The Icoboard is marked "gamma". The baseboard is TE0889 01 revision with two bluewire fixes that implies. Hope that helps. Let me know if I can answer any other questions. |
Then I'd like to see the design sources, not just the generated .bin file. My guess would be that this is some kind of reset issue and your circuit does not initialize correctly during SRAM programming. It is a known issue that iCE40 block rams do read zeros during the first few clock cycles when booting in SRAM mode (see YosysHQ/icestorm#76). That is a hardware issue and has nothing to do with the programming and/or synthesis tools. From your .bin file I can see that you use one of the PLLs to generate 16 MHz from the 100 MHz on the IcoBoard, which is good considering that the design itself is only good for approx. 37 MHz. I also see some BRAM with "deadbeef" in it. :) But it's not so easy to tell from the .bin file what the 3.5k LUTs in your design are doing, and if the issues with BRAMs I mentioned above is the cause for the problems. |
Good call! I was waiting for PLL lock but not delaying any further past that. I clearly missed an errata sheet. Waiting the magical 36 cycles after the clock stabilizes before deasserting reset makes everything stable. Thank you for your help! |
It may be worth noting for posterity that at 25MHz the same design needs more than 36 clocks to get valid data out of BRAM. I'm having a hard time finding the details of the BRAM reset behavior. Did you discover this by experimentation, or has Lattice copped to it? |
I think it's a time-based delay, not cycle based. It is 36 cycles at 12 MHz = 3 us. So I'd expect it to be around 75 cycles at 25 MHz. I have not seen a Lattice errata for that (but I did not look for one either). @aappleby reported the issue that I linked to, and when I looked into it I figured out that there is a hardware problem with BRAMS at initialization time. |
I've been hunting around for documentation on this pretty aggressively, and all I can find are your writeups. Surprising. Thank you for identifying this and publishing it, I suspect there are a bunch of people early in ICE40 design flows who are tearing their hair out or cargo-culting solutions, like I was. |
Writing images to serial flash and then restarting the FPGA with
icoprog -b
works great.Writing images to SRAM with
icoprog -p
seems to write an incorrect bitstream most of the time. Small circuits are more reliable than larger ones, which suggests intermittent corruption to me...but the bitstream packets seem to be covered by CRC (I read your format docs and picked my bitstream apart) so simple line noise is probably not the cause. Moreover, Flash programming and read-back are both solid, so I don't think it's likely a signal integrity issue.I read over the USBMODE implementation (I have some MPSSE experience) and it looks reasonable. In comparing the implementation to the ICE40 programming reference, I noticed that icoprog emits fewer clocks after programming than suggested (49 vs 100+), but changing this has no effect -- and after the default 49 clocks we still observe CDONE high at the expected time, so.
The datasheet specifies a pretty broad range of acceptable SCLK speeds during programming. Messing with the clock divider didn't seem to change things (except to make it slower). The FTDI is probably generating significant idle times between each 1kiB transfer, but the datasheet doesn't discuss clock stability requirements (and I'd be kind of surprised if that were it).
I don't have a Raspberry Pi available (to rule out the USB base board and USBMODE).
Any debugging suggestions?
The text was updated successfully, but these errors were encountered: