Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsidering AsyncFIFO and resets #181

Closed
nmigen-issue-migration opened this issue Aug 22, 2019 · 19 comments
Closed

Reconsidering AsyncFIFO and resets #181

nmigen-issue-migration opened this issue Aug 22, 2019 · 19 comments
Milestone

Comments

@nmigen-issue-migration
Copy link

nmigen-issue-migration commented Aug 22, 2019

Issue by whitequark
Thursday Aug 22, 2019 at 02:05 GMT
Originally opened as m-labs/nmigen#180


Both ports of a SyncFIFO belong to the same control set, which means that the entire FIFO is always reset at once.

However, the ports of AsyncFIFO belong to different control sets, which means that, if either if the control sets includes a reset, only half of the FIFO will get reset; and as a result, stale data will be read.

In some cases, this problem may be alleviated by using a reset synchronizer: one port is chosen as the one that resets the entire FIFO, its reset signal is brought into the other clock domain, and used to reset either the entire clock domain, or only the port state. However, if the duration of the reset pulse at the leading port is shorter than the clock period of the following port, the following port will not be reset properly, and stale data will be read. In particular this happens if the clock to one of the ports is gated completely.

In some cases, this problem may be further alleviated by using asynchronous reset (of course, with synchronous release) for the entire clock domain of the following port. However, this carries the restriction that the entire domain has to be reset: since the assertion of the reset is not synchronized to the clock of the following port, it incurs metastability. Further, many FPGA elements will not be inferred from logic that is asynchronously reset, notably BRAMs and DSP blocks, severely penalizing this design.


It seems that there is no easy solution to the issue of resetting AsyncFIFO reliably in all circumstances, making it a completely safe design element that never spuriously produces stale data. I would suggest the following:

  1. AsyncFIFO should not have a reset at all, that is, it should not have a signal whose stated purpose is to ensure that one or both pointers becomes zero.
  2. The read side only of AsyncFIFO should have a synchronous clear signal that sets the read pointer to the same value as the write pointer, therefore flushing the buffer of any stale data that might be stuck there. Because AsyncFIFO reads are synchronous, this clear signal works regardless of how the read side is clocked. Because this clear signal operates entirely in the read domain, it works correctly regardless of how the write domain is clocked--the write clock could even be stopped completely.

This solves everything described above:

  1. The two ports of AsyncFIFO are never reset, therefore neither of them is reset while the other is not. Therefore stale data is never read as a result of incorrectly sequenced reset.
  2. When the leading clock domain resets the following clock domain to bring it into a well-defined state (in such case the FIFO could be full of garbage data), it is necessary to assert, simultaneously, all three of: following clock domain reset, clear signal of any FIFO read port in the leading clock domain (through an explicit connection), and clear signal of any FIFO read port in the following clock domain (the clear signal should be implicitly ORed with ResetSignal("read") in the FIFO). Therefore any garbage data is eliminated from FIFOs oriented in both directions before the reset is deasserted.

One possible logic bug with this scheme would be a case where the CDC FF chain inside AsyncFIFO is for some reason longer than the FF chain inside the reset synchronizer used to reset the following domain. Ideally, the CDC checker (#4) should flag any such cases.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Aug 22, 2019

Comment by sbourdeauducq
Thursday Aug 22, 2019 at 04:45 GMT


Sounds good.
IIRC the only state that needs to be in the write domain is one pointer value modulo the size of the register, so we can never get that into an unrecoverable illegal state (e.g. because of clock glitches) and this should not need a reset.
The situation may become more complicated if we add support for watermarks ("almost empty/full" signals) - I'm not sure how those would work.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Aug 22, 2019

Comment by whitequark
Thursday Aug 22, 2019 at 04:46 GMT


AsyncFIFO currently doesn't support non-power-of-2 memories anyway so it has no illegal states. And I'm not sure if it's possible to support such memories while still using a Gray counter.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Aug 22, 2019

Comment by jordens
Thursday Aug 22, 2019 at 07:38 GMT


Good.
How would the write half discard the data in the FIFO though (with minimal side effects)? CDC ping-pong with a "clear-request plus acknowledge" to the read side?

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Aug 22, 2019

Comment by whitequark
Thursday Aug 22, 2019 at 18:25 GMT


How would the write half discard the data in the FIFO though (with minimal side effects)? CDC ping-pong with a "clear-request plus acknowledge" to the read side?

I've thought a bit about this, and the exact same approach would work, since the FIFO is already symmetric: set the write pointer to the read pointer. In fact, it's possible to simplify this further: instead of an explicit clear signal, use the ResetSignal() of every port like a normal synchronous input to set the counter of that port to the value of the counter of the other port.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Aug 22, 2019

Comment by programmerjake
Thursday Aug 22, 2019 at 20:06 GMT


one problem you may run into is that resetting changes multiple bits simultaneously in the grey-counter read and write pointers, so relying on just flip-flops to synchronize the signals won't work anymore, so there will need to be some sort of reset synchronization mechanism anyway

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Aug 22, 2019

Comment by whitequark
Thursday Aug 22, 2019 at 20:08 GMT


@programmerjake You are (unfortunately, for this approach) completely correct. I will need to find a different one.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Aug 22, 2019

Comment by sbourdeauducq
Thursday Aug 22, 2019 at 23:53 GMT


For the clear signal on the read side, all this will cause is glitches on the writable signal that perhaps don't matter (I wonder if they could actually cause missing entries in FIFO data that need to be contiguous?). If we do want to suppress them, we can have a handshake mechanism:

  • read section sends request to block the write section (e.g. assert a signal through a FF synchronizer)
  • read section confirms that write section is blocked (by return of that signal through another FF synchronizer). Blocking forces writable to 0.
  • read section sets read pointer to write pointer
  • read section unblocks write section and waits for confirmation (to avoid a bug where you read the confirmation of a previous clear in step 2)

@awygle
Copy link
Collaborator

awygle commented Feb 7, 2020

Can we require that the control sets on the read and write ports share a (logical) reset signal? Reset synchronizers would still result in physically distinct reset signals but if both domains are reset simultaneously the problem becomes much easier (this appears to be the assumption held by the Sunburst Design papers).

If not, it may be worth trying to work out a solution to this problem with the reset synchronizer approach:

However, if the duration of the reset pulse at the leading port is shorter than the clock period of the following port, the following port will not be reset properly, and stale data will be read.

I'm not completely sure, but it seems like a synchronized feedback mechanism such as that described on page 17 of Cummings' SNUG-2008 paper could be used to solve this problem. Obviously such a solution would need to be thoroughly analyzed and ideally formally proven to be correct, which is why I proposed the first option first. I'll think on this more and maybe float some designs here if I come up with anything.

@whitequark
Copy link
Member

whitequark commented Feb 7, 2020

Can we require that the control sets on the read and write ports share a (logical) reset signal?

What do you mean exactly by "share a logical reset signal"? Would a case where one side uses synchronous reset and has a free-running clock, and another side uses asynchronous reset and has a clock that may be stopped indefinitely, count as sharing?

@awygle
Copy link
Collaborator

awygle commented Feb 7, 2020

What I mean is essentially that the write clock domain is never reset without the read clock domain also being reset. The scenario you describe would count provided that the side with the asynchronous reset could not miss a reset pulse asserted while its clock was stopped. That may or may not be achievable depending on the specific topology, which is why we'd have to impose restrictions in nMigen someplace, I think.

@whitequark
Copy link
Member

whitequark commented Feb 20, 2020

Per IRC discussion, we decided that it makes the most sense to change AsyncFIFO as follows:

  1. The read pointer is marked reset_less. This way, resetting the read domain will cause it to continue reading from the FIFO starting at whatever point it was when it was reset. This also eliminates a currently present CDC hazard, since resetting a gray counter violates its CDC invariant.
  2. The write domain reset is synchronized to the read domain. When it's asserted, the empty flag is overridden/asserted, and the read pointer is overridden/reset to the write pointer value. It's necessary to override the empty flag because changing a gray counter by more than 1 violates its CDC invariant. Because the read domain clock may be much slower than the write domain clock, it's necessary to use an asynchronous ResetSynchronizer here.

@awygle
Copy link
Collaborator

awygle commented Mar 5, 2020

Working through a design using an AsyncFIFO just now, I ran into the desire to also have a "write domain has been reset" signal in the read domain which I can use to reset a state machine that expects to be able to read a whole packet. This would be a purely informational signal, synchronous to the read domain, not an actual reset-reset. Thoughts on whether that makes sense?

@whitequark
Copy link
Member

whitequark commented Mar 5, 2020

Seems a bit niche, but I'm OK with having it. Naming bikeshed: w_rst input, r_rst output?

@awygle
Copy link
Collaborator

awygle commented Mar 5, 2020

Works for me.

@mithro
Copy link

mithro commented Mar 5, 2020

@awygle FYI IIRC the LiteX stream stuff uses "start of frame/packet" and "end of frame/packet" type signals for this. Then on a SOF signal you reset the receiver. There is some complications between a SOP signal signal the data in the fifo synchronization verses SOP being used to reset...

@whitequark
Copy link
Member

whitequark commented Mar 5, 2020

I think you still need a SOF signal once this issue is fixed, since a read domain reset doesn't flush the FIFO. Thus once the reader comes out of reset it'll have to realign to the framing somehow.

@awygle
Copy link
Collaborator

awygle commented Apr 9, 2020

Is this fixed now? We did make the described changes to resets.

@whitequark
Copy link
Member

whitequark commented Apr 9, 2020

Working through a design using an AsyncFIFO just now, I ran into the desire to also have a "write domain has been reset" signal in the read domain which I can use to reset a state machine that expects to be able to read a whole packet.

@awygle I think this part is missing.

@awygle
Copy link
Collaborator

awygle commented Apr 9, 2020

Ah, fair point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants