Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix in ringbufindex #2046

Merged
merged 5 commits into from
Mar 9, 2017
Merged

Conversation

yatch
Copy link
Contributor

@yatch yatch commented Jan 11, 2017

There are two bugs in ringbufindex. They are similar.

While ringbufindex_peek_put() is supposed to return the index of the next element to be added, it returns the next index of the adding element. ringbufindex_peek_get() returns the next index of the first element.

Travis got green on my account. However, 19-z1-rpl-tsch.csc.flaky failed, that is one of non-regression tests of TSCH. Since this one fails even on the tip of the master branch, I would say, this PR doesn't cause the failure...

A regression test for ringbufindex has been introduced under regression-tests/03-base as 04-ringbufindex. Without the fixes, two of them fails.

Random seed: 1
549000 node-1 Rime started with address 0.1.0.1.0.1.0.1
549000 node-1 MAC 00:01:00:01:00:01:00:01 sicslowpan/CSMA/nullrdc, channel check rate 1000 Hz
549000 node-1 Tentative link-local IPv6 address fe80:0000:0000:0000:0201:0001:0001:0001
549000 node-1 Tentative global IPv6 address fd00:0000:0000:0000:0201:0001:0001:0001
549000 node-1 Starting 'ringbuf-index.c test'
549000 node-1 =check-me= SUCEEDED - Init
549000 node-1 =check-me= SUCEEDED - Put
549000 node-1 =check-me= FAILED   - PeekPut: exit at L104
549000 node-1 =check-me= SUCEEDED - Get
549000 node-1 =check-me= FAILED   - PeekGet: exit at L168
549000 node-1 =check-me= SUCEEDED - Size
549000 node-1 =check-me= SUCEEDED - Elements
549000 node-1 =check-me= SUCEEDED - Full
549000 node-1 =check-me= SUCEEDED - Empty
TEST FAILED
Test ended at simulation time: 10149000
 FAIL ಠ_ಠ

All the tests pass on this branch.

$ make 04-ringbufindex.testlog
Running test 04-ringbufindex with random Seed 1: ..... OK

$ cat 04-ringbufindex.testlog
Random seed: 1
549000 node-1 Rime started with address 0.1.0.1.0.1.0.1
549000 node-1 MAC 00:01:00:01:00:01:00:01 sicslowpan/CSMA/nullrdc, channel check rate 1000 Hz
549000 node-1 Tentative link-local IPv6 address fe80:0000:0000:0000:0201:0001:0001:0001
549000 node-1 Tentative global IPv6 address fd00:0000:0000:0000:0201:0001:0001:0001
549000 node-1 Starting 'ringbuf-index.c test'
549000 node-1 Run unit-test
549000 node-1 ---
549000 node-1 =check-me= SUCEEDED - Init
549000 node-1 =check-me= SUCEEDED - Put
549000 node-1 =check-me= SUCEEDED - PeekPut
549000 node-1 =check-me= SUCEEDED - Get
549000 node-1 =check-me= SUCEEDED - PeekGet
549000 node-1 =check-me= SUCEEDED - Size
549000 node-1 =check-me= SUCEEDED - Elements
549000 node-1 =check-me= SUCEEDED - Full
549000 node-1 =check-me= SUCEEDED - Empty
549000 node-1 =check-me= DONE
TEST OK
Test ended at simulation time: 549000

@neophob
Copy link

neophob commented Jan 13, 2017

Great catch. I saw that the JN platform uses the ringbufindex code as well, what's the impact there?

@yatch
Copy link
Contributor Author

yatch commented Jan 13, 2017

@neophob I don't think this change has any impact on platform/jn516x/dev/micromac-radio.c.

To my understanding, you don't have any problem with this bug as long as you don't use an index returned by ringbufindex_get(). The JN platform gets an index with ringbufindex_peek_get() and uses ringbufindex_get() just to remove the first element in the ringbuf. In fact, ringbufindex_get() removes a different element from one returned by ringbufindex_peek_get(). But, still, consistency is maintained.

@simonduq
Copy link
Member

Right, if I'm correct it looks like everything was off-by-one, causing no problem (jn nor tsch), but that is definitely unwanted indeed :p
How does 19-z1-rpl-tsch.csc.flaky fail? It should randomly fail to link. But when it links, it normally runs successfully (the run is not flaky).

@yatch
Copy link
Contributor Author

yatch commented Jan 16, 2017

@simonduq Regarding 19-z1-rpl-tsch.csc.flaky, the number of "routing links" on node-1 doesn't get to 8 within the simulation time. I think I've found a bug, which is not in ringbufindex. I'll send a PR later.

@simonduq
Copy link
Member

Right, looking forward to the PR then :) I guess the bug simply slipped in now with the test disabled :/

@yatch
Copy link
Contributor Author

yatch commented Jan 18, 2017

@simonduq I've created a new PR, #2079. My conclusion is, 19-z1-rpl-tsch.csc.flaky fails because some RPL nodes fail to receive broadcast frames including DIOs. This is caused by a bug in the frame filtering feature of the simulated CC2420 device.

At first, I noticed that the test failed on the commit of bfef0b5, although it succeeded before that. So, I suspected the adaptive scheduling, but I was wrong. I guess, some timing change by bfef0b5 triggers the test failure by chance.

A TSCH node is initialized with frame filtering disabled. As long as the frame filtering is disabled, the node doesn't suffer from the bug of CC2420. The filter is enabled once it sends a unicast (ACK-required) frame. Therefore, if it receives a broadcasted DIO before sending any unicast frame, it can join a DODAG. Otherwise, it cannot not. This happens in a failed test of 19-z1-rpl-tsch.csc.flaky. I'm confused why the frame filtering needs to be turned on and off in TSCH, though.

Anyway, the failure of the test is nothing to do with ringbufindex.

@simonduq
Copy link
Member

Could it be the recent changes to MAC header that result in frames that the emulated cc2420 filters out?
Maybe it would be wiser to unset TSCH_HW_FRAME_FILTERING by default. It's already manually disabled on the cc2538 platforms in some project-conf.h files.

@yatch
Copy link
Contributor Author

yatch commented Jan 18, 2017

@simonduq In the first place, why does TSCH needs the frame filtering disabled during scanning and waiting for an ACK...? (it's totally off-topic, by the way... 😅)

@simonduq
Copy link
Member

On some platforms (at least cc2420, which was our initial development platform), the radio would filter out enhanced ACKs, probably due to the len different from 3.

Scanning should work both with and without filtering.

@yatch
Copy link
Contributor Author

yatch commented Jan 19, 2017

@simonduq Thank you for the information! Now I understand what the following comment means, that is found in tsch-slot-operation.c.

/* Entering promiscuous mode so that the radio accepts the enhanced ACK */

if(utp->result == unit_test_failure) {
printf("FAILED - %s: exit at L%u\n", utp->descr, utp->exit_line);
} else {
printf("SUCEEDED - %s\n", utp->descr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: SUCEEDED -> SUCCEEDED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've fixed this typo 😄

@simonduq
Copy link
Member

simonduq commented Mar 8, 2017

Good. I've used the fix in multiple testbeds, successfully.
Could you enable in Travis the test added in #2108 ?

@yatch
Copy link
Contributor Author

yatch commented Mar 9, 2017

@simonduq Sure, I can do that today :-)

@yatch
Copy link
Contributor Author

yatch commented Mar 9, 2017

Added one commit enabling TSCH regression testing. Travis ran the test and got green.

@simonduq
Copy link
Member

simonduq commented Mar 9, 2017

👍

@simonduq
Copy link
Member

simonduq commented Mar 9, 2017

Thanks!

@simonduq simonduq merged commit a24ac86 into contiki-os:master Mar 9, 2017
@yatch yatch deleted the pr/ringbufindex-bugfix branch March 9, 2017 12:07
alexrayne pushed a commit to alexrayne/contiki that referenced this pull request Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants