Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Available memory far less than reported #133

Closed
carterw opened this issue Jun 17, 2016 · 29 comments
Closed

Available memory far less than reported #133

carterw opened this issue Jun 17, 2016 · 29 comments

Comments

@carterw
Copy link

carterw commented Jun 17, 2016

I am seeing spontaneous reboots and I'm suspecting that my code is running up against an unseen memory limitation. Is all the free memory reported by GC.stat().sysfree actually available for use by running programs? It reports that there should be about 8k free at runtime, but if my program ingests an additional ~1k of data I get an abrupt reboot.

Are you using a portion of RAM for some other purpose, like a heap, and not subtracting it from the reported free amount?

@cpq
Copy link
Member

cpq commented Jun 17, 2016

I suspect the memory can be fragmented, and allocating contiguous chunks that is less then available memory can fail.

@dimonomid I think you've done memory analysis - can you confirm on this please ?

@dimonomid
Copy link
Contributor

Are you using a portion of RAM for some other purpose, like a heap, and not subtracting it from the reported free amount?

No, reported sysfree is actually available free heap size. Of course, as @cpq mentioned, it's fragmenged, but I admit that having 8K free and being unable to allocate 1K of contiguous memory sounds like quite a strong fragmentation. Usually, the memory map looks like this: typical memory map (and you might want to read our article where we examine fragmentation and improve memory usage: https://blog.cesanta.com/embedded-heap-behaviour-analysis-and-improvement )

On ESP8266, when an out-of-memory occurs, the message E:M 123 is printed to the console, where 123 is the block size which was requested but we failed to allocate it. Does your console output contain E:M?

@carterw
Copy link
Author

carterw commented Jun 18, 2016

I'm not seeing that E:M message being printed to the FNC console. What happens is that the program appears to hang for about 10 seconds and then it reboots, the first line on reboot is;
ets Jan 8 2013,rst cause:2, boot mode:(3,6)

If there is something I can look at further to see a cause please let me know, but it sure looks like this is related to memory fragmentation. This is a very serious problem. I read the blog post on fragmentation, but what I am not seeing is a mechanism that periodically (or on demand) defragments available memory. Many applications, including mine, will be exchanging data with a server which inherently results in a lot of memory allocations and frees. If you aren't consolidating the free memory into reasonably large usable blocks this will inevitably result in an eventual crash.

@dimonomid
Copy link
Contributor

I'm not seeing that E:M message being printed to the FNC console. What happens is that the program appears to hang for about 10 seconds and then it reboots

Ah sorry; this message gets printed to the stderr, but by default stderr goes to the uart1; so, you don't see it at the console (which uses uart0). To make stderr go to uart0, make sure you have the debug section in your fw/src/fs/conf.json, like this:

{
  "debug": {
    "stderr_uart": 0,
    "level": 3
  }
}

After you add this, rebuild the firmware, flash it, and reproduce the problem. While the program hangs for about 10 seconds, it is printing the coredump to the stderr, so you'll see lots of output this time. Again, look for the E:M message before the coredump, or just share the whole output here.

As to the memory defragmentation: unfortunately, while we stick to the standard heap API (malloc, free, etc) and don't have virtual memory, it's impossible to implement any kind of defragmentation. In the memory-constrained environment, it might make sense to use some kind of Slab allocator for the object types which get allocated and freed often.

@carterw
Copy link
Author

carterw commented Jun 18, 2016

Thanks for the info. I am seeing a conf.json file in the device filesystem. Is it really necessary to edit that file in the build environment and rebuild/reflash or can I simply add that json clause to the existing file on the device? Is the conf file you mentioned only evaluated at build time?

I find it very strange that memory fragmentation can't be managed effectively in some way, surely there can be a consolidation routine that runs periodically at the very least? From the standpoint of a JavaScript application, I don't see how there can be a realistic workaround. My program exchanges websocket messages with a server. An incoming message of more than a very small size will crash the device. There is apparently no way to know how much contiguous memory is available at runtime into which the message could be received.

@dimonomid
Copy link
Contributor

Is it really necessary to edit that file in the build environment and rebuild/reflash or can I simply add that json clause to the existing file on the device?

It is evaluated at the system start, before your js files are evaluated. As an alternative to reflashing, you can for example put the following in your js file (or probably execute from the console):

if (Sys.conf.debug.stderr_uart != 0) {
  Sys.conf.debug.stderr_uart = 0;
  Sys.conf.debug.level = 3;
  Sys.conf.save();
}

The last line Sys.conf.save() will actually reboot the device, and your new settings will be applied.

I find it very strange that memory fragmentation can't be managed effectively in some way, surely there can be a consolidation routine that runs periodically at the very least?

When we want to allocate some memory from heap in C, we call malloc(), which returns a pointer to the allocated data, and the calling code then uses the returned pointer. Even if we implement some routine which will relocate allocated blocks in order to defragment the heap, the "calling code" will still use the old pointer, which becomes invalid because the memory block might have been relocated. So, as I said, while we use standard heap API, it's impossible to introduce any kind of defragmentation.

It is probably possible to implement our own allocator (which, for example, will return a pointer-to-pointer-to-data instead of a pointer-to-data), and reimplement mongoose-iot platform on top of it. We were discussing this some time ago, but it's definitely not going to happen in the near future.

A couple of words about the memory for JavaScript stuff: FYI, we don't make a separate heap allocation for every single JS object and string. We have pools of object and pools of object properties, so e.g. when some JS object is freed, the memory is not necessarily returned to the system immediately: if the pool which contains the object is not empty, the memory is held for future object allocations. This reduces the fragmentation somewhat. Plus, all JavaScript strings are compacted by the Garbage Collector, so, you can consider strings to be kind of defragmented. "Kind of" - because the buffer where JS engine stores all its strings is still allocated from the system heap. So, when we run out of the buffer size, the buffer itself should be reallocated, possibly introducing some fragmentation.

There are plans to support at least several buffers with strings, so that we won't require a huge chunk of contiguous memory for all the alive JS strings. It's quite doable and I hope we'll get some time to implement it, but not quite soon.

@dimonomid
Copy link
Contributor

By the way, could you please elaborate on this:

my program ingests an additional ~1k of data

Is it JS code or C code? And could you share the code please?

@carterw
Copy link
Author

carterw commented Jun 18, 2016

It is Js code, I am using the websocket client API. All is well for a few message exchanges with a server, but then an incoming message will cause a reboot before the app even sees the data. I don't want to post the entire client/ server application but I may be able to come up with a condensed test case.

@mkmik
Copy link
Contributor

mkmik commented Jun 18, 2016

if the 1k you alloc is a 1k JS string then you might also have struck another issue which the fact that all of our strings live in a contiguous buffer that is reallocated when it needs to be grown. This requires a contiguous chunk for the old_string_arena+1k bytes. We'd like to improve this obviously.

As we're juggling with priorities, we'd really like to know more about your use case. Could you share some more details ?

For example, we're considering whether we need to introduce something like node.js Buffer object or ES6 TypedArray in order to manage binary data; in order to do so we'd like to know more about what kind of real world problems our users are facing and what kind of code they are writing.

@carterw
Copy link
Author

carterw commented Jun 19, 2016

I'm not allocating the buffer explicitly in my code. On the 8266 side it is just a websocket client that receives json messages from a server (which obviously are strings). Presumably the underlying websocket client implementation allocates the buffer for the incoming message. How this happens exactly I do not know, and since the incoming data can be of arbitrary size it could make allocation tricky.

As for my use case, I am writing a general purpose peer-to-peer message handling utility for IoT. My idea is to allow endpoint devices with limited resources such as the 8266 to easily be slaved to peers over the internet that can run much larger driver programs, access databases, etc. The 8266-class of device would merely execute primitives such as setting or sensing GPIO pins in response to remote messages. I have the framework for this up and running, but as I incrementally add capabilities to the 8266 client I am encountering this crash.

I'll try to gather up more clues today.

@carterw
Copy link
Author

carterw commented Jun 19, 2016

I was able to redirect stderr using the code you suggested above. Now when I run my program I see the following, not including the subsequent core dump. Does this help? What has happened here is that an incoming message arrived, which consisted of a small program file. The incoming file string is showing up in the console output as you can see, then there is the crash. If I send a smaller file no crash.

TAKEF received from c1
console_get_current_clubby clubby handle: 0x3fff2234
console_add_to_cache Cached 22 bytes, total cache size is 114 bytes
function blink(pin, nBlinks) {
    var nb;
    if (nBlinks == null) {
        nb = 5;
    }
    else if (nBlinks == 0)
        return;
    else
        nb = nBlinks - 1;
    pin = pin || 0;
    var level = GPIO.read(pin);
    GPIO.setMode(pin, GPIO.OUT, GPIO.PULLUP);
    GPIO.write(pin, !level);
    console.log("pin:", pin, ", level:", level);
    setTimeout(function() { blink(pin, nb); }, 500);
}
console_get_current_clubby clubby handle: 0x3fff2234
console_add_to_cache Flushing buf (114 bytes)
console_flush_buffer file id=27
console_flush_buffer Out of space
set_errno            spiffs error: -10001
E:M 7786 (974 blocks)
E:M 7289 (912 blocks)
heap integrity broken: block links don't match: 4911 -> 5812, but 5812 -> 13600

Trap 9: pc=0x402937d2 va=0x1
Dumping core
--- BEGIN CORE DUMP ---
{"arch": "ESP8266", "cause": 9, "REGS": {"addr": 1073651960, "data": "ztgkQDD6/z8BAAAACCAAAJDz/

@dimonomid
Copy link
Contributor

Thanks for the details. So yes, it tries to allocate 7786 bytes and fails to do that. As both I and Marko pointed out, currently all strings are stored in the single buffer, so I believe this is the issue in your case: it runs out of the current strings buffer and tries to allocate the new one, of the needed size + 500 bytes.

We need to make v7 store strings in multiple buffers. We were discussing that, but kept postponing it so far. I hope we'll be able to sort it out in July-August.

By the way, to be absolutely sure, you can share the whole coredump, plus two files:

  • fw/platforms/esp8266/.build/mongoose-iot.out
  • fw/platforms/esp8266/firmware/0x11000.bin

So that we'll be able to examine the core dump.

@carterw
Copy link
Author

carterw commented Jun 19, 2016

What is a convenient way to share those files with you?

Is there no temporary workaround? Can there be a function that reports the currently available amount of string buffer memory for example? Then I might be able to ensure that incoming message sizes don't exceed available string space. I'm not concerned about performance at this point.

@dimonomid
Copy link
Contributor

It is sounding like some of the 'available' RAM being reported by sysfree is actually being used by infrastructure?

Well, I wouldn't say so. It is actually free at the moment. And, of course, when you create some new data in your JavaScript code, some memory should be allocated to store it; this is exactly what happens. But, the implementation of strings is far from memory-efficient, and it will be improved.

Can there be a function that reports the currently available amount of string buffer memory for example?

GC.stat().strres returns the current string buffer size, and GC.stat().struse is the used size. So, you can get amount of free string buffer size by subtracting struse from strres.

@carterw
Copy link
Author

carterw commented Jun 20, 2016

Apparently there is so little of this 'string memory' available that I can't do much of anything.

I'm seeing quite a bit of clubby activity happening on the console though, and I am concerned that this is sucking up a lot of the scarce resources. I am not using clubby or interacting with your cloud at present. Is there a way to deactivate it or perhaps even prevent it from loading or building?

@dimonomid
Copy link
Contributor

dimonomid commented Jun 20, 2016

Apparently there is so little of this 'string memory' available that I can't do much of anything.

As a workaround, you can open v7/v7.c, find the line #define _V7_STRING_BUF_RESERVE 500, and replace 500 with the number which seems enough for your entire application lifetime. Say, if currently it consumes about 8K (judging by E:M 7786), try to set it to 9000 or 10000 and see if it works for you.

We don't currently expose the JS API to allocate string buffer of desired size; probably we should (until we implement storage of strings in multiple buffers), but this hack will do the trick for now.

I am not using clubby or interacting with your cloud at present. Is there a way to deactivate it or perhaps even prevent it from loading or building?

You can configure it to prevent it from connecting: either add this code to your js file:

if (Sys.conf.clubby.connect_on_boot) {
  Sys.conf.clubby.connect_on_boot = false;
  Sys.conf.save();
}

Or, set it right into your fw/src/fs/conf.json:

  "clubby": {
    "connect_on_boot": false
  },

Or, to prevent it even from loading, you can open fw/src/js/sys_init.js and comment/remove the code which creates and uses the clubby instance: from line 90 to 104.

@carterw
Copy link
Author

carterw commented Jun 20, 2016

That is extremely helpful, thanks! I will experiment with your suggestions and report back on what happens.

@dimonomid
Copy link
Contributor

No problem, looking forward to hear from you.

@mkmik
Copy link
Contributor

mkmik commented Jun 20, 2016

@carterw please set console.send_to_cloud to false. even if the we don't connect to the cloud endpoint, console.log still buffers all messages (so that you'll be able to see them in the cloud interactive console). While the messages are buffered on the flash FS, this functionality does put some pressure on the RAM, so perhaps you'd better disable it.

As you said, if you cannot use this cloud functionality, there is not much point to it. Would you like to be an early tester of our new cloud platform ?

@carterw
Copy link
Author

carterw commented Jun 21, 2016

It looks like your cloud platform has a lot of features, but right now I am just trying to get some pretty basic code to reliably send a few messages back and forth to the 8266 device running your firmware. If I can do this plus make lights blink, etc. and have it operate without crashing for a few hours it will give me a little confidence in your larger application stack.

I disabled the clubby connect code as suggested above. That seems to have freed up a couple of kbytes of RAM but it didn't modify the JS string buffer available memory.

I bumped up the STRING_BUF_RESERVE size as suggested and built a new firmware zip file. Unfortunately the FNC app refuses to flash it, it reports an error;
INVALID_ARGUMENT: part boot has no address specified

I am using FNC 0.8.6 which is the version you published in late April;
https://github.com/cesanta/fnc/releases

@dimonomid
Copy link
Contributor

@carterw , before new FnC is released, did you try to build it? You'll need Qt Creator for that.

@carterw
Copy link
Author

carterw commented Jun 22, 2016

I was hoping to not have to spend time figuring out how to build FNC. Frankly I am surprised that the released version of FNC is not routinely being kept up to date with the current FW. They should always be compatible.

@mkmik
Copy link
Contributor

mkmik commented Jun 22, 2016

well, that's not a released firmware yet.
I understand that it would be nice to have a nightly build. We're going to cut the next release very soon anyway.

What OS do you run? Let's see if we can quickly make a build for you.

@carterw
Copy link
Author

carterw commented Jun 22, 2016

That would be great, thanks. I'm using a Windows PC.

@mkmik
Copy link
Contributor

mkmik commented Jun 22, 2016

@carterw here it is: https://backend.cesanta.com/devel/FNC-win.zip

currently there is an issue with the Qt library we link on windows so you it won't be able to download the firmwares via https. However, you're use case is to flash custom firmwares, so it should be enough.

@carterw
Copy link
Author

carterw commented Jun 22, 2016

Much appreciated, I will give it a try later today.

@dimonomid
Copy link
Contributor

@carterw , just a quick note that you'll need to run it with --advanced key, in order to get old interface with firmware file picker, log window, etc.

@carterw
Copy link
Author

carterw commented Jun 23, 2016

Good news here. I downloaded that FNC executable and it was able to flash the recent FW I built with the larger JS string buffer. Now GC.stat() is reporting;
"struse":5797,"strres":8010

This should give me enough headroom to move forward on my project, and I am seeing some things that resulted in a crash working now. Thanks for the assistance!

@dimonomid
Copy link
Contributor

Cool, thanks for getting back to us.

Yeah, the way v7 stores strings is indeed an issue; we'll work on improving it.

@cpq cpq closed this as completed Dec 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants