Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnrc_ndp: don't let addresses timeout #5309

Merged
merged 1 commit into from
Apr 20, 2016

Conversation

miri64
Copy link
Member

@miri64 miri64 commented Apr 13, 2016

This is a temporary quick-fix for #5122 to not have GUAs removed on an interface.
It solves the issue by both not letting the registration run out on the router and by not letting the lifetime of an auto-configured address expire.

I haven't found the time to test this yet, so this is why I marked it as WIP for now.

@miri64 miri64 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: network Area: Networking State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet GNRC labels Apr 13, 2016
@miri64 miri64 added this to the Release 2016.04 milestone Apr 13, 2016
@haukepetersen
Copy link
Contributor

not at FU today, will test tomorrow to see if this fixes the issue for my setup. Otherwise this (of course temporary) fix seems good to me, better than loosing connection all the time...

@haukepetersen
Copy link
Contributor

haukepetersen commented Apr 15, 2016

ok, on first sight this seems to do the job, but after ~30 min the connection between the linux host and the remote node is still lost. The ncache entry for the global address of the border router node is dropped on the remote node... Pinging between border router and remote node using link local addresses still works, so it is again the global address that is dropped from the ncache.

EDIT: forget about below, I tried to ping the wrong global address (it's own), so this is caused by the known issue about pinging itself...
~~And one more interesting observation: this is what the border router node did after the connection was lost (~30 min after startup), when trying to figure out it's state:~~

> ncache
ncache
IPv6 address                    if  L2 address                state       type
------------------------------------------------------------------------------
fe80::3432:4833:46da:ac2a        6  36:32:48:33:46:da:ac:2a   STALE       REG
fe80::1cee:82ff:fe11:327         7  1e:ee:82:11:03:27         STALE       -
fe80::1                          7  1e:ee:82:11:03:27         REACHABLE   -
> ping6 2001:affe::585a:615d:a451:cbd6
ping6 2001:affe::585a:615d:a451:cbd6
0x2ddf
*** RIOT kernel panic:
FAILED ASSERTION.

    pid | name                 | state    Q | pri | stack ( used) | location   
      1 | idle                 | pending  Q |  15 |   256 (  128) | 0x20000e34 
      2 | main                 | pending  Q |   7 |  1536 (  844) | 0x20000f34 
      3 | 6lo                  | bl rx    _ |   3 |  1024 (  468) | 0x20003c50 
      4 | ipv6                 | running  Q |   4 |  1024 (  652) | 0x200032c4 
      5 | udp                  | bl rx    _ |   5 |  1024 (  348) | 0x20004468 
      6 | at86rf2xx            | bl rx    _ |   3 |  1024 (  432) | 0x20002e44 
      7 | gnrc_ethos           | bl rx    _ |   3 |  1024 (  344) | 0x20000224 
      8 | uhcp                 | bl mutex _ |   6 |  1536 (  976) | 0x2000486c 
        | SUM                  |            |     |  8448 ( 4192)

*** halted.

not good...

@haukepetersen
Copy link
Contributor

Another test, pinging the linux host from the remote node:

2016-04-15 11:44:00,229 - INFO # 1000 packets transmitted, 996 received, 1% packet loss, time 1100.06765163 s
2016-04-15 11:44:00,230 - INFO # rtt min/avg/max = 22.929/60.814/196.610 ms
...
2016-04-15 12:02:42,066 - INFO # --- 2001:638:80a:105:a46b:1b44:a737:7c46 ping statistics ---
2016-04-15 12:02:42,068 - INFO # 1000 packets transmitted, 997 received, 1% packet loss, time 1073.06350377 s
2016-04-15 12:02:42,069 - INFO # rtt min/avg/max = 0.776/37.931/196.609 ms
...
2016-04-15 12:20:07,041 - INFO # 12 bytes from 2001:638:80a:105:a46b:1b44:a737:7c46: id=86 seq=605 hop limit=63 time = 23.582 ms
2016-04-15 12:20:08,197 - INFO # 12 bytes from 2001:638:80a:105:a46b:1b44:a737:7c46: id=86 seq=606 hop limit=63 time = 153.775 ms
2016-04-15 12:20:09,311 - INFO # 12 bytes from 2001:638:80a:105:a46b:1b44:a737:7c46: id=86 seq=607 hop limit=63 time = 65.535 ms
2016-04-15 12:20:11,312 - INFO # ping timeout
2016-04-15 12:20:13,313 - INFO # ping timeout
2016-04-15 12:20:15,356 - INFO # ping timeout
2016-04-15 12:20:17,357 - INFO # ping timeout
2016-04-15 12:20:19,358 - INFO # ping timeout
2016-04-15 12:20:21,359 - INFO # ping timeout
2016-04-15 12:20:23,417 - INFO # ping timeout
2016-04-15 12:20:25,448 - INFO # ping timeout
--> broken from here on, so worked for ~35min

@haukepetersen
Copy link
Contributor

by the way, setup is: remote node: iotlab-m3, border router: samr21 connected to my Mint desktop PC via ethos.

@haukepetersen
Copy link
Contributor

haukepetersen commented Apr 15, 2016

This is freaky. I let the run before continue (ping 1000 ...), and after some minutes (~11min), it seems like the connection was good again:

...
2016-04-15 12:31:04,579 - INFO # ping timeout
2016-04-15 12:31:06,580 - INFO # ping timeout
2016-04-15 12:31:08,607 - INFO # ping timeout
2016-04-15 12:31:10,608 - INFO # ping timeout
2016-04-15 12:31:11,737 - INFO # 12 bytes from 2001:638:80a:105:a46b:1b44:a737:7c46: id=86 seq=936 hop limit=63 time = 118.632 ms
2016-04-15 12:31:12,809 - INFO # 12 bytes from 2001:638:80a:105:a46b:1b44:a737:7c46: id=86 seq=937 hop limit=63 time = 24.118 ms
2016-04-15 12:31:13,859 - INFO # 12 bytes from 2001:638:80a:105:a46b:1b44:a737:7c46: id=86 seq=938 hop limit=63 time = 24.757 ms
2016-04-15 12:31:14,907 - INFO # 12 bytes from 2001:638:80a:105:a46b:1b44:a737:7c46: id=86 seq=939 hop limit=63 time = 24.570 ms
...
2016-04-15 12:32:20,467 - INFO # --- 2001:638:80a:105:a46b:1b44:a737:7c46 ping statistics ---
2016-04-15 12:32:20,468 - INFO # 1000 packets transmitted, 672 received, 33% packet loss, time 1381.06607540 s
2016-04-15 12:32:20,469 - INFO # rtt min/avg/max = 0.790/36.610/262.145 ms

@miri64
Copy link
Member Author

miri64 commented Apr 17, 2016

Only more reason to overhaul the ND... I have absolutely no idea how this happens... Will look into it the coming week.

@miri64
Copy link
Member Author

miri64 commented Apr 17, 2016

EDIT: forget about below, I tried to ping the wrong global address (it's own), so this is caused by the known issue about pinging itself...

Can you refer to the issue, please. This is new information for me. edit: Got it, the issue fixed in #5326 sorry.

@OlegHahm
Copy link
Member

#5279

@miri64
Copy link
Member Author

miri64 commented Apr 17, 2016

Yepp, sorry did not see the relation ;-)

@miri64
Copy link
Member Author

miri64 commented Apr 17, 2016

I tested it too now (continuous pinging of the 6LBR's GUA from a 6LR, breakpoint into gnrc_ipv6_nc_remove on the 6LR). I lost USB connectivity somehow (since both nodes were lost I believe it to be more of a host system problem than a RIOT problem), but before that the nodes ran for ~50 min without any issues.

@miri64 miri64 removed the State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet label Apr 17, 2016
@miri64 miri64 changed the title gnrc_ndp: don't don't let addresses timeout gnrc_ndp: don't let addresses timeout Apr 17, 2016
@miri64
Copy link
Member Author

miri64 commented Apr 17, 2016

Repeated it now and running for >1h now and no sign of weakness.... Sorry @haukepetersen can't reproduce the behavior you observed.

@haukepetersen
Copy link
Contributor

don't be sorry, that's rather a good thing, right? I will start another trial run as soon as I am in the office.

@OlegHahm
Copy link
Member

Do I understand it correctly, that with this fix a peer that once got into the neighbor cache would stay there forever (until the next reboot or manual removal)?

@miri64
Copy link
Member Author

miri64 commented Apr 18, 2016

Yes.

@miri64
Copy link
Member Author

miri64 commented Apr 18, 2016

Rebased to current master (just for good measure) and ran it for almost 2h now (and will keep it running): no problems so far.

@kYc0o kYc0o added the Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch label Apr 19, 2016
@miri64
Copy link
Member Author

miri64 commented Apr 20, 2016

So, did anyone except me successfully tested it yet?

@kYc0o
Copy link
Contributor

kYc0o commented Apr 20, 2016

I just did it, so ACK and go when Murdock agrees!

@kYc0o kYc0o added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Apr 20, 2016
@miri64
Copy link
Member Author

miri64 commented Apr 20, 2016

Fixed typo in the commit message

@miri64
Copy link
Member Author

miri64 commented Apr 20, 2016

Backport provided in #5369

This is a temporary quick-fix for RIOT-OS#5122 to not have GUAs removed on an
interface.
It solves the issue by both not letting the registration run out on the router
and by not letting the lifetime of an auto-configured address expire.
@miri64 miri64 removed the Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch label Apr 20, 2016
@kYc0o
Copy link
Contributor

kYc0o commented Apr 20, 2016

and GO! :D

@kYc0o kYc0o merged commit d071b2a into RIOT-OS:master Apr 20, 2016
@miri64 miri64 deleted the gnrc_ndp/fix/gua-hack branch April 20, 2016 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants