Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression Bug in AXI-Stream slave when upgrading from v4.0.8 #573

Closed
FranzForstmayr opened this issue Oct 23, 2019 · 37 comments · Fixed by #858
Closed

Regression Bug in AXI-Stream slave when upgrading from v4.0.8 #573

FranzForstmayr opened this issue Oct 23, 2019 · 37 comments · Fixed by #858

Comments

@FranzForstmayr
Copy link
Contributor

FranzForstmayr commented Oct 23, 2019

Hi,
I triggered a regression bug when upgrading VUnit to v4.2.0 from v4.0.8.
With git bisect I found the offinding commit 3e098f5, merged in #429 .

When using the axi-stream slave, the ready signal starts toggling.
image

Same issue is in the axi_stream_slave, although the tests are passing, just need approximately twice as long. The offending line is https://github.com/VUnit/vunit/blob/master/vunit/vhdl/verification_components/src/axi_stream_slave.vhd#L95, however I haven't digged deep enough to find out why this change was made.

@umarcor
Copy link
Member

umarcor commented Oct 23, 2019

FTR, other changes to AXI-Stream a few days before #429, introduced another partial regression as explained in #494. Does GitHub allow to easily compare v4.0.8 and v4.2.0 of subdir https://github.com/VUnit/vunit/tree/master/vunit/vhdl/verification_components/src only?

@eschmidscs
Copy link
Contributor

We've hit something similar some time ago.
You probably use the pop function with direct response variables or the check function with blocking=>True.
In this case, the slave takes one message, handles it and goes back to idle. Going back to idle causes ready to go low for one cycle.
You either have to use pop with references or check with blocking=>False.

That was different, probably before the merge you mentioned. But the process structure in the slave was changed.

@eine
Copy link
Collaborator

eine commented Nov 9, 2020

I created a synthetic example in https://github.com/eine/vunit/tree/axis/synthetic/examples/vhdl/array_axis_vcs/src/test, in an attempt to better understand this issue. According to previous comments here and in #673, the blocking behaviour is produced by pop_axi_stream in https://github.com/eine/vunit/blob/axis/synthetic/examples/vhdl/array_axis_vcs/src/test/tb_axis_loop.vhd#L113 (that is, by the API of the AXI Stream Slave verification component).

In #673, @FranzForstmayr proposed using a queue (https://github.com/VUnit/vunit/pull/673/files#diff-f2d4d6a3240eda80ddc8a14823d070f532d73aa493d7680a449d846445014477R113-R121), which implies using pop_axi_stream with references and an intermediate queue. That was suggested by Emanual above. Hence, Franz's solution does not really fix the issue, but works around it by introducing a "throughput" adapter. I.e., the queue is effectively a FIFO.

On the one hand, I am concerned about the apparent inconsistency between push_axi_stream and pop_axi_stream. The former does not introduce a cycle, while the latter does it. I think it is acceptable for both of them to introduce a cycle (should there be technical constraints that require it), but inconsistency is confusing. @olafvandenberg, since you implemented #429, do you mind providing some insight?

On the other hand, new_axi_stream_slave and new_axi_stream_master accept actor as an optional argument. This can be used for embedding queues into the verification components:

  constant master_axi_stream : axi_stream_master_t := new_axi_stream_master (data_length => data_width, actor => new_actor(inbox_size => 64, outbox_size => 64));
  constant slave_axi_stream  : axi_stream_slave_t  := new_axi_stream_slave  (data_length => data_width, actor => new_actor(inbox_size => 64, outbox_size => 64));

Therefore, using an external queue for using pop_axi_stream by reference feels non idiomatic to me. If necessary, that should be internal to the VC.

/cc @LarsAsplund

@olafvandenberg
Copy link

I'm not sure I fully understand the issue.
The pop_axi_stream with reference can be fully queued up, without any cycles in between.
When using the pop with a direct await reply, there is indeed a low cycle, so back-to-back transfers like this don't work I guess..
Is that the issue?

The changes (if I remember correctly) at that time were made to align to the clock input of the VC. Before the changes it could occur that the VC became asynchronous when using a wait for time message i think.
The wait for time I'm using a lot to generate randomized / bursty style of data traffic.

The queue being outside of the vc, was already the case I think in the original version.
It follows the same style as the stream slave vc, i guess if the different style (with passing the actor some sizes) is going to be used, it should be adapted in every vc that uses this style.

@eine
Copy link
Collaborator

eine commented Nov 11, 2020

I'm not sure I fully understand the issue.
The pop_axi_stream with reference can be fully queued up, without any cycles in between.
When using the pop with a direct await reply, there is indeed a low cycle, so back-to-back transfers like this don't work I guess..
Is that the issue?

The issue is the different behaviour of pop_axi_stream, depending on the overload:

My understanding is that the two await_pop_axi_stream_reply variants are the ones that should introduce a low cycle:

  • procedure await_pop_axi_stream_reply(
    signal net : inout network_t;
    variable reference : inout axi_stream_reference_t;
    variable tdata : out std_logic_vector;
    variable tlast : out std_logic;
    variable tkeep : out std_logic_vector;
    variable tstrb : out std_logic_vector;
    variable tid : out std_logic_vector;
    variable tdest : out std_logic_vector;
    variable tuser : out std_logic_vector
    );
  • procedure await_pop_axi_stream_reply(
    signal net : inout network_t;
    variable reference : inout axi_stream_reference_t;
    variable tdata : out std_logic_vector;
    variable tlast : out std_logic
    );

But the three pop_axi_stream should behave equally, and not introduce a low cycle.

So, the question is whether the different behaviour of pop_axi_stream corresponds to some limitation of how you implemented the stall feature, or was it an unwanted side effect.

@olafvandenberg
Copy link

Ok I get it..
It is an unwanted side effect of the change needed for the clock re-alignment.
I'll have a look today to see if I can think of a proper fix for it.

@olafvandenberg
Copy link

So I had a closer look, and I can't seem to reproduce this behavior.
There is a small glitch on ready if you do back-to-back transfers while using the blocking version of the pop_axi_stream procedure.
This we can fix easily by a small change, if necessary.
The low cycle I do see when there is no new message in the queue before the reply is send, so not truly back-to-back in that case.
For example:

for i in 3 to 14 loop
    if i = 14 then
          last := '1';
    end if;
    pop_axi_stream(  net, slave_axi_stream,
                     tdata => data,
                     tlast => last,
                     tkeep => keep,
                     tstrb => strb,
                     tid   => id,
                     tdest => dest,
                     tuser => user
                  );
end loop;

The code above will not give low cycles, only a small glitch on ready which can be removed by lowering ready after sending the reply and only if the queue is empty.

image

In the picture above, the first set of transfers is using the check_axi_stream, which is non-blocking.
The second set of transfers uses the blocking pop_axi_stream.

The code below does give a low cycle, but as stated is not truly back-to-back:

for i in 3 to 14 loop
        if i = 14 then
          last := '1';
        end if;
        pop_axi_stream(  net, slave_axi_stream,
                         tdata => data,
                         tlast => last,
                         tkeep => keep,
                         tstrb => strb,
                         tid   => id,
                         tdest => dest,
                         tuser => user
                      );

        wait for 1 ps;
      end loop;

image

Removing the glitches by moving the tready <= '0' in the axistreaming slave:
image

@eschmidscs
Copy link
Contributor

Sorry for mixing in again here, but this looks really interesting. I am pretty sure that we did not have this behaviour here.
What simulator are you using?
And which testbench? Can I try to run that one here?
What happens if you replace the "wait for 1 ps" in the 50%-duty-cycle example with "wait for 0ps" (i.e. communication with vunit)?

@olafvandenberg
Copy link

I'm using Modelsim PE 10.7b.
For the test I did a small modification in the tb_axi_stream.vhd.
the test I modified is "test back-to-back passing check", I've simply copied the test in total, and replaced the check_axistream with a pop_axi_stream:

    elsif run("test back-to-back passing check") then
      wait until rising_edge(aclk);
      timestamp := now;

      last := '0';
      for i in 3 to 14 loop
        if i = 14 then
          last := '1';
        end if;
        push_axi_stream(net, master_axi_stream,
                        tdata => std_logic_vector(to_unsigned(i, 8)),
                        tlast => last,
                        tkeep => "1",
                        tstrb => "1",
                        tid => std_logic_vector(to_unsigned(42, id'length)),
                        tdest => std_logic_vector(to_unsigned(i+1, dest'length)),
                        tuser => std_logic_vector(to_unsigned(i*2, user'length)));
      end loop;

      last := '0';
      for i in 3 to 14 loop
        if i = 14 then
          last := '1';
        end if;
        check_axi_stream(net, slave_axi_stream,
                         expected => std_logic_vector(to_unsigned(i, 8)),
                         tlast => last,
                         tkeep => "1",
                         tstrb => "1",
                         tid => std_logic_vector(to_unsigned(42, id'length)),
                         tdest => std_logic_vector(to_unsigned(i+1, dest'length)),
                         tuser => std_logic_vector(to_unsigned(i*2, user'length)),
                         msg  => "check blocking",
                         blocking  => false);
      end loop;

      check_equal(now, timestamp, result(" setting up transaction stalled"));

      wait_until_idle(net, as_sync(slave_axi_stream));
      check_equal(now, timestamp + (12+1)*10 ns, " transaction time incorrect");

      -- Repeat while using blocking pop_axi_stream to check issue #573
      wait until rising_edge(aclk);
      timestamp := now;

      last := '0';
      for i in 3 to 14 loop
        if i = 14 then
          last := '1';
        end if;
        push_axi_stream(net, master_axi_stream,
                        tdata => std_logic_vector(to_unsigned(i, 8)),
                        tlast => last,
                        tkeep => "1",
                        tstrb => "1",
                        tid => std_logic_vector(to_unsigned(42, id'length)),
                        tdest => std_logic_vector(to_unsigned(i+1, dest'length)),
                        tuser => std_logic_vector(to_unsigned(i*2, user'length)));
      end loop;

      last := '0';
      for i in 3 to 14 loop
        if i = 14 then
          last := '1';
        end if;
        pop_axi_stream(  net, slave_axi_stream,
                         tdata => data,
                         tlast => last,
                         tkeep => keep,
                         tstrb => strb,
                         tid   => id,
                         tdest => dest,
                         tuser => user
                      );
      end loop;

      check_equal(now, timestamp, result(" setting up transaction stalled"));

      wait_until_idle(net, as_sync(slave_axi_stream));
      check_equal(now, timestamp + (12+1)*10 ns, " transaction time incorrect");

note that the check on time will fail, since the blocking pop will consume time..

@eine
Copy link
Collaborator

eine commented Nov 12, 2020

So I had a closer look, and I can't seem to reproduce this behavior.

Can you please try the array_axis_vcs example? The following waveform was generated with GHDL (python3 run.py -g) on Windows (MSYS2/MINGW64).

image

Note: that waveform also shows 5% stalls for both the master and the slave.

The modification in https://github.com/VUnit/vunit/pull/673/files#diff-f2d4d6a3240eda80ddc8a14823d070f532d73aa493d7680a449d846445014477 shows that the UUT can handle back-to-back transfers. However, the current example using a single for loop does introduce dead cycles.

The code above will not give low cycles, only a small glitch on ready which can be removed by lowering ready after sending the reply and only if the queue is empty.

Maybe this is a different behaviour between ModelSim and GHDL? So, where ModelSim is showing a glitch, GHDL shows a cycle?

Or is it because of https://github.com/VUnit/vunit/blob/master/examples/vhdl/array_axis_vcs/src/test/tb_axis_loop.vhd#L122 ?

@eschmidscs
Copy link
Contributor

I tried to run the same test as @olafvandenberg in questasim 2019.2_2. The result is obviously 50%-DC:
image

This starts to be scary...

@eschmidscs
Copy link
Contributor

eschmidscs commented Nov 12, 2020

... on the other hand: If the result is 100% or 50% DC depends on the loop exit in axi_stream_slave. It exits if the queue is empty - and then it waits for the next clock edge, causing 50% DC.
The question if the queue is empty depends on the message sequence. And this sequence depends in the end on delta-cycle order, which is AFAIK not specified by IEEE. So you end up with simulator dependent behaviour.

@olafvandenberg
Copy link

olafvandenberg commented Nov 12, 2020

I pulled the branch axis/synthetic from eine's fork, and ran the array_vcs example tb_axis_loop.
I had to comment out some psl stuff in the fifo that breaks compilation on my side.
The result is shown below:
image
As you can see there are the glitches, but no low cycles.

I wonder if it might have something to do with delta cycles.

update: @eschmidscs I didn't see your latest reply, but indeed I think you are correct.

@olafvandenberg
Copy link

olafvandenberg commented Nov 12, 2020

The delta cycle is proven in my opinion.
I've added one by inserting a wait for 0 ps; right before the tready <= '0' on line 106 in axi_stream_slave.vhd, and I get 50% dc:

image

Expanding to deltas for the 50% dc:
delta_50%dc

The same for the glitch:
delta_glitch

@umarcor
Copy link
Member

umarcor commented Nov 17, 2020

FTR, as commented in ghdl/ghdl#752, it seems that this clock alignment issue might have further effects than the glitch or additional cycle. The workaround in #494 (comment) should provide control over the sizes of the buffers, but it seems not to work. Unfortunately, I did not get to understand what's going on...

@hcommin
Copy link

hcommin commented Jan 5, 2022

I have also run into this issue. I am seeing different behavior even between simulators from the same vendor (Mentor).

Using several different versions of Modelsim (e.g. Modelsim AE 2020.1), I see the 100% duty cycle behavior (ignoring a single idle cycle near the start, which is not a problem in my case):

image

Using Questa FE 2021.2, I see the 50% duty cycle behavior (which is a problem in my case):

image

This is causing quite a headache. Has anyone found a workaround? I tried inserting large numbers of delta cycles here and there, but I wasn't able to change the behavior.

@eschmidscs
Copy link
Contributor

The workaround: In a first loop, you pop as many references as you expect transactions with the corresponding pop_axi_stream procedure and store them to a queue. This does not require simulation time.
Then you add a second loop that iterates over the same number, each time taking a reference from the given queue and calling await_axi_stream_reply with it.
This requires a bit more memory but decouples the AXIS behaviour from message passing.

@tasgomes
Copy link

tasgomes commented Jan 6, 2022

@eschmidscs that approach will solve the 50% duty cycle behavior by providing 100% instead. But I am more concerned with the fact that different simulators have different behavior (modelsim vs ghdl/questa).

If we clearly state that for 100% throttle, one should use the unblocking pop_axi_stream and the blocking pop_axi_stream cannot provide 100% throttle, perhaps we should actually force TREADY low for one clock cycle and not rely on simulator behaviors during delta cycles.

This way the behavior of the blocking pop_axi_stream will be the same across different simulators.

@eschmidscs
Copy link
Contributor

@tasgomes There seem to be two different questions. I tried to answer the question about the workaround @hcommin was asking for.
The question you are raising is about the consistent behaviour, what would not be a workaround anymore, but a real fix. There are some interesting thoughts in this.

If you look at #477, the blocking nature is assumed to be the standard case. From that point of view, I agree with your statement that TREADY should always go low after one transaction.

Personally, I would prefer to have a pop function that does not always cause a stall. But this does not work with the current VC architecture.

@tasgomes
Copy link

tasgomes commented Jan 6, 2022

@eschmidscs Thanks for the reply. I think we are in sync. Ideally the pop function should allow 100% throttle, but if that is not possible, I would then force TREADY low for a complete clock cycle in the axi stream component just to make it consistent across simulators.

@olafvandenberg
Copy link

olafvandenberg commented Jan 11, 2022

I would like to find a proper fix for this issue.
In my opinion the difficulty is that in the current setup you can't predict whether there will or will not be another transaction "immediately" after sending the reply.
So there is no other option than send the reply, check if there is something on the queue and if there isn't, block until a new event occurs.
Would it be an option to extend the interface with something that indicates that another transaction will follow, and therefore the bus_process shouldn't start blocking on the Clk event just yet?

edit:
something like an extra boolean flag in the pop_axi_stream calls with a default value that will give either a forced 50% throttle behavior (to remove the simulator dependent behavior) or will provide 100% throttle but then is expected to get more pops immediately after the reply while maintaining correct clock alignment.
Could also be a seperate call perhaps, like 'open session' -> pop, pop, pop -> 'close session'.

would like to hear your view(s) on this..

@eschmidscs
Copy link
Contributor

@olafvandenberg without doing any proper code study: what happens if you add a "wait for 0ps;" before pulling TREADY low, but then check again for pop transactions before stalling?
That would allow the message passing to run before the stall, so depending on other waits the second check would actually see the new pop request. But there might be so many side effects...

@olafvandenberg
Copy link

@eschmidscs I think that would give consistent 50% behavior, so always >= 1 low cycle. (as mentioned in one of my earlier posts)
The checking for pop transactions is actually done in this case by the queue empty check if I'm correct.
perhaps a forced delta cycle after the reply might do something.
I will try to create a test setup to examine the behavior..

@eschmidscs
Copy link
Contributor

@olafvandenberg That is what I intended: A forced delta cycle by the wait, but then another check of the queue. Did your test above with the wait already check the queue after the test?

@olafvandenberg
Copy link

@eschmidscs I believe so, as the forced delta would be inside the while loop on 'not is_empty(message_queue)'.
I think the problem is that the blocking pop is made of a non-blocking pop followed by a wait for reply which blocks.
The pop message is send to the vc actor, which arrives in the main process, which forwards it to the bus process by means of the queue.
This queue is checked for emptiness, but I assume it will be very unlikely that a next pop message can be present on this queue, which is (in user code) send after receiving the reply, which in turn is send by the bus process.

If there would be prior knowledge that a new pop message will come, we can wait for that instead of waiting on a clk'event and a non-empty queue.
(Using a notify signal similar to the signal back to the main process)

@eschmidscs
Copy link
Contributor

eschmidscs commented Jan 11, 2022

The problem with prior knowledge is always that it is not readily available. The workaround of poping references and then waiting for them is kind of generating exactly this prior knowledge: You tell the vc how many transactions to expect.
I fear that another notify signal will just result in some similar kind of uncertainty.

@eschmidscs
Copy link
Contributor

@olafvandenberg Just for the record: My idea with "wait for 0ps" cannot work because the same statement is probably used in the message passing (didn't check). So it won't help to push out the decision until "all" message passing has been done...

@olafvandenberg
Copy link

@eschmidscs yeah, I checked and I'm able to fix the issue by adding two consecutive 'wait for 0 ps;' statements right after the reply() call.
I believe a separate notify will give a more reliable solution.
I will continue on Thursday with this..

olafvandenberg pushed a commit to olafvandenberg/vunit that referenced this issue Jan 13, 2022
@olafvandenberg
Copy link

I pushed a possible fix to a branch on my fork.
It forces a suspend (in delta cycles) until the reply message has been handled (because of the signal assignment) in my opinion (can't say I'm very experienced in regards to delta cycles though, so please review)
@hcommin could you give this possible fix a try?

@eschmidscs
Copy link
Contributor

Interesting thought... But is that different from "wait for 0 ps"? I think both will just add some delta cycles - and if the message passing requires more, it will be later and the stall will happen.
Anyone writing simulators listening? ;)

@hcommin
Copy link

hcommin commented Jan 14, 2022

@olafvandenberg I am hoping to find time to look into this properly today.

If I understand correctly, my rough expectations should be:

  • 100% duty cycle is possible (and consistent across simulators) using the "non-blocking" procedure, but with the caveat that I must know in advance how many data beats I will receive on that bus.
  • After your proposed changes, 50% duty cycle will be consistent across simulators using the "blocking" procedure. And I don't need to know in advance how many data beats I will receive.

I suspect my current use case falls into the awkward gap (I need 100% duty cycle and it is very difficult to calculate in advance how many data beats will be received)... but I'll look into it.

@olafvandenberg
Copy link

The possible fix should give 100% no matter the usage. (perhaps somebody with more experience in the way simulators schedule can say if the fix will always work though)
For me it works in modelsim, but perhaps it doesn't work in your setup..

@tasgomes
Copy link

I found some time to test the fix proposed by @olafvandenberg.

I have a axi_stream_master directly connected to a axi_stream_slave. Then I executed this code:

                for beat in 0 to 19 loop
                    push_axi_stream(net, AxiStreamMaster_c, m_tdata_v, m_tlast_v, m_tkeep_v,
                                    m_tstrb_v, m_tid_v, m_tdest_v, m_tuser_v);
                end loop;
                
                for beat in 0 to 19 loop
                    pop_axi_stream(net, AxiStreamSlave_c, s_tdata_v, s_tlast_v, s_tkeep_v,
                                        s_tstrb_v, s_tid_v, s_tdest_v, s_tuser_v);
                end loop;

Before the fix

Modelsim: 100% duty-cycle but there are glitches.
image

Questa: 50% duty-cycle with no glitches.
image

After the fix

Modelsim: 100% duty-cycle with no glitches.
image

Questa: 100% duty-cycle with no glitches.
image

I think the fix works fine. We get 100% duty-cycle, no glitches at all and the same behavior across different simulators.
@olafvandenberg push it :)

@eschmidscs
Copy link
Contributor

kul!
@olafvandenberg Could you add some test for this? It seems that people want this - and it broke before. I guess checking the reception time would do the job.

@umarcor
Copy link
Member

umarcor commented Jan 18, 2022

@tasgomes, could you please confirm with GHDL as well? As reported above, there have been inconsistency issues between GHDL and ModelSim.

@tasgomes
Copy link

tasgomes commented Jan 18, 2022

ouch! I don't have gtkwave installed to quickly look at the waveforms. Anyway, I generated the VCD file with GHDL, converted the VCD file to WLV file and then I opened the waveform with modelsim. Unfortunately, seems like with GHDL we still have 50% duty-cycle.

After the fix

GHDL: 50% duty-cycle with no glitches.
image

@umarcor
Copy link
Member

umarcor commented May 12, 2022

Maybe 333c465 (#642) had some impact in this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants