Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no memory left in handle_packet_fragment - when there still is plenty of memory left #299

Closed
benpicco opened this issue Nov 4, 2013 · 10 comments
Assignees
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@benpicco
Copy link
Contributor

benpicco commented Nov 4, 2013

on msba2, I occasionally receive these error messages

2013-11-04 16:58:15,512 - INFO # ERROR: no memory left!
2013-11-04 16:58:16,465 - INFO # ERROR: no memory left!
2013-11-04 16:58:16,473 - INFO # ERROR: no memory left!
2013-11-04 16:58:16,481 - INFO # ERROR: no memory left!
2013-11-04 16:58:16,489 - INFO # ERROR: no memory left!
2013-11-04 16:58:16,497 - INFO # ERROR: no memory left!
2013-11-04 16:58:16,839 - INFO # heap
2013-11-04 16:58:16,846 - INFO # # heap 0: 0x40009ba8 -- 0x4000b000 -> 0x4000ec98 (15512 of 20720 free)
2013-11-04 16:58:16,854 - INFO # # heap 1: 0x7fd00000 -- 0x7fd00000 -> 0x7fd04000 (16384 of 16384 free)
2013-11-04 16:58:16,861 - INFO # # heap 2: 0x7fe00000 -- 0x7fe00000 -> 0x7fe04000 (16384 of 16384 free)
2013-11-04 16:58:17,001 - INFO # > ERROR: no memory left!
2013-11-04 16:58:17,009 - INFO # ERROR: no memory left!
2013-11-04 16:58:17,017 - INFO # ERROR: no memory left!
@benpicco
Copy link
Contributor Author

benpicco commented Nov 4, 2013

still happens with RADIO_STACK_SIZE set to 580

@BytesGalore
Copy link
Member

Hi,

what ist the datagram_size passed to handle_packet_fragment(...) ?

Best regards,
Martin

@benpicco
Copy link
Contributor Author

benpicco commented Nov 4, 2013

2013-11-04 18:48:46,283 - INFO # ERROR: no memory left! (datagram_size = 93)
2013-11-04 18:48:46,291 - INFO # ERROR: no memory left! (datagram_size = 93)
2013-11-04 18:48:46,760 - INFO # ERROR: no memory left! (datagram_size = 98)
2013-11-04 18:48:47,128 - INFO # ERROR: no memory left! (datagram_size = 108)
2013-11-04 18:48:47,144 - INFO # ERROR: no memory left! (datagram_size = 108)
2013-11-04 18:48:47,492 - INFO # ERROR: no memory left! (datagram_size = 93)
2013-11-04 18:48:47,500 - INFO # ERROR: no memory left! (datagram_size = 93)
2013-11-04 18:48:47,507 - INFO # ERROR: no memory left! (datagram_size = 93)
2013-11-04 18:48:47,515 - INFO # ERROR: no memory left! (datagram_size = 93)

but it looks like the memory is really used up, I'm also getting 0 from calloc elsewhere in this condition, so heap doesn't show proper values.

The timeout for keeping IP fragments is set to 15s, so broken parts are kept for quite some time, using up all the memory it seems.
However, the node never recovers from that condition (one would think that at some point it would just discard the time-outed fragments (this happens a lot before this condition appears).

2013-11-04 18:55:22,549 - INFO # TIMEOUT!cur_time: 149703819, temp_buf: 134266770
2013-11-04 18:55:22,555 - INFO # TIMEOUT!cur_time: 149703819, temp_buf: 134291965
2013-11-04 18:55:27,582 - INFO # TIMEOUT!cur_time: 154361222, temp_buf: 138907842
2013-11-04 18:55:27,589 - INFO # TIMEOUT!cur_time: 154361222, temp_buf: 139250848
2013-11-04 18:55:28,574 - INFO # TIMEOUT!cur_time: 155278896, temp_buf: 139983226

In the "no memory left" case, the timeout info is never shown again.

There are also several other data aborts originating in lowpan.c functions, indicating that something™ is going wrong while processing fragmented packets.

@benpicco
Copy link
Contributor Author

benpicco commented Nov 4, 2013

heap fragmentation maybe?

@benpicco
Copy link
Contributor Author

benpicco commented Nov 4, 2013

There are several strange things about the custom list implementation in lowpan.c:

e.g. in check_timeout() (https://github.com/RIOT-OS/RIOT/blob/master/sys/net/sixlowpan/lowpan.c#L634)

What happens when temp_buf points to head and gets freed - head will now still point to the old, invalid location.
Actually, head is never updated it seems.

That would explain why it leaks memory and occasionally crashes.
e.g. assuming the following list:

  head
    |
    v
[expired] -> [valid] -> [valid]

after collect_garbage()

  head
    |
    v
undefined     [valid] -> [valid]

I guess the original intention was that new fragments are always added to the head, so only the tail expires, but looking at https://github.com/RIOT-OS/RIOT/blob/master/sys/net/sixlowpan/lowpan.c#L422 shows that this is not true.

@benpicco
Copy link
Contributor Author

benpicco commented Nov 5, 2013

Also, line 422 will overwrite the pointer to current_buf as temp_buf is the previous entry of current_buf - is this really what is intended?

@benpicco
Copy link
Contributor Author

benpicco commented Nov 5, 2013

still runs out of memory and heap shows there would be free memory with the new list.
But less data aborts

@ghost ghost assigned OlegHahm Dec 21, 2013
@mehlis
Copy link
Contributor

mehlis commented Dec 23, 2013

@benpicco is this currently happening?

@benpicco
Copy link
Contributor Author

Yes, the memory does indeed run out, I'm still trying to figure out why.
heap says there is plenty free memory left.

@benpicco
Copy link
Contributor Author

I've traced all calls to m/calloc and free with print_malloc, turns out the memory is indeed exhausted by what looks like a memory leak in rfc5444_reader.c

I'll create a new issue for heap not working correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants