Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce RAM use of Options/Hardware loading #2080

Merged
merged 3 commits into from Mar 4, 2023

Conversation

CapnBry
Copy link
Member

@CapnBry CapnBry commented Feb 12, 2023

Replaces the 8K static RAM allocation used for loading the Options/Hardware from flash, and replaces it with a buffered flash reader Stream that ArduinoJson can use to load the documents instead. The purpose of this optimization is to free up memory on ESP hardware, which has problems loading the webui on 3.2.0 in AP mode.

First of three PRs modifying the OPTIONS / devWifi libraries to reduce memory use.

Details

On a HappyModel EP2 with no home network, the webui fails to fully load. The page loads, the CSS loads, the javascript loads, but generating /config just hangs. If you wait long enough (many minutes) and hit the page again it seems to be OK (fewer simultaneous requests?) but who will wait that long?
elrs-loading

Checking the RAM usage, it isn't even getting to our handler, and when it does, there's about 8.3KB of RAM available, dropping to 3.7KB by the end of the function. We just need more available RAM. The lowest hanging fruit is this giant 8KB buffer that's only used once at boot. I changed this to simply do a malloc()/free() and it worked, but allocating this big chunk off the heap and freeing it created an 8KB hole, followed by a few small allocations from the loader, then the rest of the free RAM. The fragmentation is an issue because it essentially bifurcates the RAM into two memory pools since it is no longer contiguous.

There were some other issues as well with the existing code. It allocates 8KB, only reads 2KB from flash, and the code expects that it has 16+64+512+2048 bytes. Casting the buffer to const char * also made ArduinoJson regard the buffer as read-only, so it would allocate its own RAM and copied the buffer into it, doubling up on allocations.

There's no reason to read it all at once though, since ArduinoJson can load from Stream classes. I tried but could not get a const __FlashStringHelper * to bind to the correct deserializer, because I am pretty sure that would work without any new classes from us. Instead I created a EspFlashStream class that buffers 4 bytes as it reads them from flash. All of this memory allocation is strictly off the stack too, so no heap fragmentation.

Now when the /config page loads, there's over 16KB or RAM available vs 8.3KB before. The AP mode webui loads for me every time now.

Faster than the original too?!

  • 3.2.0 stock performs options_init() in 9901us +/- a couple us
  • 4 byte buffer EspFlashStream 9532us
  • 8 byte buffer 9240
  • 16 byte buffer 9037
  • 32 byte buffer 9879 -- Not sure why this was always slower, tested several times and methods
  • 64 byte buffer 9004

I stuck with a 4 byte buffer since it was already faster and 0.5ms isn't a huge deal.

Code Reorg

  • Constants were added for the size of the different parameter blocks.
  • Split the code into functions to make it a little easier to digest, but the behavior is almost identical. Differences:
    • hardware_init() would not attempt to load hardware JSON from the sketch if there was no product_name on the sketch (it tested the [0] offset of the passed buffer, not the 4 bytes at its starting location). It now will try to load hardware JSON if there looks to be a string there, even if there was no product name flashed in.
    • The options JSON now does the same check, but previously it would always try to load no matter what.

For future work I'd like to remove the String copies of getOptions() and getHardware() from RAM. Doing that made this hard to read patch even harder to follow. That will move this code so getOptions and getHardware return a Stream which may represent a SPIFFS file or a EspFlashStream, both of which can be used by the webserver and ArduinoJson without having their RAM hanging around all the time.

Things that might look wrong

  • I removed the copy of getSketchSize() that used the static buffer. The only difference I can imagine would be if there wasn't enough stack or the structures were not aligned (4) in the stocke code. It seems to work with the core version, but perhaps the new stack-based stream method also works around why this didn't work originally. This is where Paul points out the obvious thing I missed that makes everything make sense. πŸ˜…
  • Note that all the file.close() calls have been removed. SPIFFS file implementation classes automatically close the file in their destructor when they go out of scope. I tested this to be sure it is working so there's no need for us to micromanage the file handle in our code.
  • The device_name and product_name strings are loaded from flash as fixed sized blocks and a null is slapped to the end of the buffer. I believe this will have functionally the same behavior as the old strlcpy(). If they weren't null terminated, they will still not overrun the buffer. If they were null terminated then we'll have some undefined bytes in RAM after the string, but it should not be an issue since everything else handles them as null-terminated. Huge security flaw! An attacker could see what was in the firmware binary after the string, if they can somehow trick us into spitting it out. πŸ€·β€β™€οΈ

@CapnBry
Copy link
Member Author

CapnBry commented Feb 12, 2023

Ah bloody hell. The tests! I'll have to fix the tests tomorrow.

@wvarty
Copy link
Collaborator

wvarty commented Mar 2, 2023

Tested this by merging your private branch into my PoC mavling branch, and wifi is working again.
Looks good to me once you've fixed the tests.

@CapnBry
Copy link
Member Author

CapnBry commented Mar 2, 2023

Ah superfudge, I completely forgot to come back and fix this, thanks for following up, Wez! Tests: ROCKED

@pkendall64
Copy link
Collaborator

This is where Paul points out the obvious thing I missed that makes everything make sense.

I had to add mySketchSize back then when we were on 3.2.0 of the platform because it crashed when calling the built-in function. Now that we're on 4.0.1 it's good, as your PR works on ESP8285 based RXs πŸ‘ πŸ˜ƒ

@pkendall64 pkendall64 merged commit fd5dcd8 into ExpressLRS:master Mar 4, 2023
@CapnBry CapnBry deleted the config-stream branch June 22, 2023 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants