Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

icoprog upload to SRAM is flaky on icoboard gamma+USB #2

Closed
cbiffle opened this issue May 7, 2017 · 8 comments
Closed

icoprog upload to SRAM is flaky on icoboard gamma+USB #2

cbiffle opened this issue May 7, 2017 · 8 comments
Assignees

Comments

@cbiffle
Copy link

cbiffle commented May 7, 2017

Writing images to serial flash and then restarting the FPGA with icoprog -b works great.

Writing images to SRAM with icoprog -p seems to write an incorrect bitstream most of the time. Small circuits are more reliable than larger ones, which suggests intermittent corruption to me...but the bitstream packets seem to be covered by CRC (I read your format docs and picked my bitstream apart) so simple line noise is probably not the cause. Moreover, Flash programming and read-back are both solid, so I don't think it's likely a signal integrity issue.

I read over the USBMODE implementation (I have some MPSSE experience) and it looks reasonable. In comparing the implementation to the ICE40 programming reference, I noticed that icoprog emits fewer clocks after programming than suggested (49 vs 100+), but changing this has no effect -- and after the default 49 clocks we still observe CDONE high at the expected time, so.

The datasheet specifies a pretty broad range of acceptable SCLK speeds during programming. Messing with the clock divider didn't seem to change things (except to make it slower). The FTDI is probably generating significant idle times between each 1kiB transfer, but the datasheet doesn't discuss clock stability requirements (and I'd be kind of surprised if that were it).

I don't have a Raspberry Pi available (to rule out the USB base board and USBMODE).

Any debugging suggestions?

@cliffordwolf
Copy link
Owner

The title of the issue suggests you are using an UP Board (GPIOMODE=1), but the text suggests you are using the USB base board (USBMODE=1). Which one is it?

I do not have an UP board that I could use for testing right now. I have a pre-production USB base board and have now used it to program an icoboard 100 times in a loop with the examples/pinout/ bitstream, it did not fail a single time.

Do you have any other information that you can share with me that could help me reproduce the problem?

Small circuits are more reliable than larger ones

It does not matter how small or large the circuit is, the bitstream is always the same size.

I noticed that icoprog emits fewer clocks after programming than suggested (49 vs 100+),

Where did you get the 100+ figure? The Lattice documentation says to send 49 dummy bits:

image

The FTDI is probably generating significant idle times between each 1kiB transfer

Do you have the equipment to measure this? The maximum "idle times" I see when capturing programming via FTDI is less than 1 us:

image

I know that the iCE40 FPGAs don't like very long pauses during SRAM programming. So if you see very long idle times during programming, this might explain the problem.

@cbiffle cbiffle changed the title icoprog upload to SRAM is flaky on icoboard gamma+UP icoprog upload to SRAM is flaky on icoboard gamma+USB May 7, 2017
@cliffordwolf cliffordwolf self-assigned this May 7, 2017
@cbiffle
Copy link
Author

cbiffle commented May 7, 2017

I'm using the USB board; I did not realize the distinction. Title clarified.

Small circuits are more reliable than larger ones

It does not matter how small or large the circuit is, the bitstream is always the same size.

I agree. But I can reliably load small circuits, including your example circuits, to SRAM. Larger SoC circuits tend to fail more often. (It is possible that the larger SoC circuits are tickling some other bug and the size is a red herring... but so far once I pass about 700 LBs in use the success rate is near zero.)

Where did you get the 100+ figure?

Appendix A in TN1248, page 23, table line 5. The number is repeated in the pseudocode a few pages later. I agree that this is inconsistent with the timing diagram, which says 49. But I can report that either way, it makes no difference in my case; CDONE is observed as high at the appropriate time

Do you have the equipment to measure this?

I do! I've got a flaky programming section captured at 100Ms/s and am analyzing it. I'll report back if I see anything obvious.

@cbiffle
Copy link
Author

cbiffle commented May 7, 2017

I should clarify: when I say "fail," I don't mean that icoprog reports an error:

$ icoprog -p <out/syn.bin
reset..
cdone: low
programming..
cdone: high
$ echo $?
0

I mean that the design misbehaves. This is the behavior that seems to correlate with design size.

(Unsupported conjecture: the way the design fails would be consistent with incomplete initialization of BRAMs. I'm adding some more nets to boundary scan to see if I can prove this.)

My initial review of the waveforms I captured shows no smoking gun. My host is able to keep the MPSSE engine fed, and no inter-byte delay exceeds 0.92 µs. The chip raises CDONE quickly (actually synchronous with bit 4 of the final 0x00 in the bitstream, or ~840ns after the final bit of wakeup).

My assumption from the docs was that the chip would pulse MISO low while erasing itself after CRESET. I don't see that, but I admit the docs are ambiguous (they really just say "high means housekeeping is completed," they don't say it ever gets pulled low). I do see MISO activity during flash read-back, so I think I've got the right pin.

I can provide the waveform capture in VCD or Salae Logic format if you'd like to see it.


Here is syn.bin.gz.

When uploaded with icoprog -p < syn.bin the behavior I get is (proportions approximate)

  • 80% of the time LED1 blinks around 4Hz
  • 10% of the time no LEDs light.

When uploaded with icoprog -f <syn.bin && icoprog -b I reliably get the correct behavior (LED1 and LED2 alternate around 4Hz).

The design is relatively untested but simulates correctly on iverilog (and works in Flash). (Flow is CLaSH -> Yosys -> Arachne.)

My boards were sourced from Trenz. The Icoboard is marked "gamma". The baseboard is TE0889 01 revision with two bluewire fixes that implies. icoprog built at 7ab66f0 with USBMODE=1.

Hope that helps. Let me know if I can answer any other questions.

@cliffordwolf
Copy link
Owner

I mean that the design misbehaves.

Then I'd like to see the design sources, not just the generated .bin file. My guess would be that this is some kind of reset issue and your circuit does not initialize correctly during SRAM programming. It is a known issue that iCE40 block rams do read zeros during the first few clock cycles when booting in SRAM mode (see YosysHQ/icestorm#76). That is a hardware issue and has nothing to do with the programming and/or synthesis tools.

From your .bin file I can see that you use one of the PLLs to generate 16 MHz from the 100 MHz on the IcoBoard, which is good considering that the design itself is only good for approx. 37 MHz. I also see some BRAM with "deadbeef" in it. :) But it's not so easy to tell from the .bin file what the 3.5k LUTs in your design are doing, and if the issues with BRAMs I mentioned above is the cause for the problems.

@cbiffle
Copy link
Author

cbiffle commented May 7, 2017

Good call! I was waiting for PLL lock but not delaying any further past that. I clearly missed an errata sheet.

Waiting the magical 36 cycles after the clock stabilizes before deasserting reset makes everything stable.

Thank you for your help!

@cbiffle cbiffle closed this as completed May 7, 2017
@cbiffle
Copy link
Author

cbiffle commented May 7, 2017

It may be worth noting for posterity that at 25MHz the same design needs more than 36 clocks to get valid data out of BRAM. I'm having a hard time finding the details of the BRAM reset behavior. Did you discover this by experimentation, or has Lattice copped to it?

@cliffordwolf
Copy link
Owner

I think it's a time-based delay, not cycle based. It is 36 cycles at 12 MHz = 3 us. So I'd expect it to be around 75 cycles at 25 MHz. I have not seen a Lattice errata for that (but I did not look for one either). @aappleby reported the issue that I linked to, and when I looked into it I figured out that there is a hardware problem with BRAMS at initialization time.

@cbiffle
Copy link
Author

cbiffle commented May 8, 2017

I've been hunting around for documentation on this pretty aggressively, and all I can find are your writeups. Surprising. Thank you for identifying this and publishing it, I suspect there are a bunch of people early in ICE40 design flows who are tearing their hair out or cargo-culting solutions, like I was.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants