Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC - Add support to split heap across 2-RAM banks #9944

Merged
merged 7 commits into from Apr 30, 2019

Conversation

@deepikabhavnani
Copy link
Contributor

commented Mar 5, 2019

Description

GCC - Add support to split heap across 2-RAM banks.
Commit messages provide more information.

Pull request type

[ ] Fix
[ ] Refactor
[ ] Target update
[X] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@marcuschangarm @juhoeskeli @c1728p9

Release Notes

Mbed heap split into 2-Ram banks is added which can be enabled by compiling source with MBED_SPLIT_HEAP macro. Linker symbols __mbed_sbrk_start __mbed_krbs_start __mbed_sbrk_start_0 __mbed_krbs_start_0 should be added by you in GCC linker script to state start and end of each HEAP region.

@cmonr cmonr requested review from ARMmbed/team-st-mcd Mar 5, 2019

@deepikabhavnani deepikabhavnani force-pushed the deepikabhavnani:stm32_splitheap branch Mar 5, 2019

@c1728p9

c1728p9 approved these changes Mar 5, 2019

@cmonr

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

@deepikabhavnani Please fix the trailing whitespace that astyle found.

@cmonr

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

Holding because @sg- has a problem with this in an offline conversation.

@deepikabhavnani

This comment has been minimized.

Copy link
Contributor Author

commented Mar 5, 2019

Need think of solution for trim operation in malloc, once the allocated memory is freed and trimmed back to region 1, it will not get allocated again.

@marcuschangarm

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

I don't see the problem; the current sbrk implementation doesn't support malloc_trim anyway so we are not losing functionality.

If we want to support it, the easiest solution would be to wrap it:

int __wrap_malloc_trim(size_t pad) {
    return 0; // unable to release memory
}
@deepikabhavnani

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

I need to check the free operation, if internally GCC does release of memory (not free) done at the end of last allocation which is usually done by passing -negative value to _sbrk(). We can have solution to that as well, just need to verify and add.

@deepikabhavnani

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

@sg- I tried allocating and free entire heap memory multiple times with this example https://gist.github.com/deepikabhavnani/5a515f2bced590fa9a9688ecc864886d and it worked fine.

GCC does not make the memory available to OS when we do free operation, it internally maintains the link list of all allocations and combines adjacent chunks to coalesce them. Verified by allocating and freeing different sizes of allocations.

Reference : https://sourceware.org/glibc/wiki/MallocInternals

@deepikabhavnani deepikabhavnani force-pushed the deepikabhavnani:stm32_splitheap branch Mar 6, 2019

@jeromecoutant
Copy link
Contributor

left a comment

  • Maybe we can enable this MBED_SPLIT_HEAP in L476 as well ?
  • what happened if MBED_SPLIT_HEAP is not enabled for a L475 chip ? Should we define memory regions in the ld file depending on this MBED_SPLIT_HEAP macro ?
@@ -3936,7 +3937,7 @@
}
},
"detect_code": ["0468"],
"macros_add": ["USBHOST_OTHER", "TWO_RAM_REGIONS"],

This comment has been minimized.

Copy link
@jeromecoutant

jeromecoutant Mar 6, 2019

Contributor

I understand that this TWO_RAM_REGIONS macro should have been deleted in #9571
Note it is also defined for RHOMBIO_L476DMW1K

This comment has been minimized.

Copy link
@deepikabhavnani

deepikabhavnani Mar 6, 2019

Author Contributor

👍 Thanks will add fix for RHOMBIO_L476DMW1K in separate PR Fixed

@deepikabhavnani

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

Maybe we can enable this MBED_SPLIT_HEAP in L476 as well ?

👍 Plan is to update other targets in separate PR after this is approved and tested. PR moved to 5.13 - added fix for all ST targets. Please review

what happened if MBED_SPLIT_HEAP is not enabled for a L475 chip ? Should we define memory regions in the ld file depending on this MBED_SPLIT_HEAP macro ?

Adding macro in linker script will need tools update, instead shall we change. Shall I change __end__ = __mbed_sbrk_start

@deepikabhavnani

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

Note for self -

bool result = rangeinrange((uint32_t) temp, sizeof(linked_list), mbed_heap_start, mbed_heap_size);

This test should be updated for split heap support

@deepikabhavnani deepikabhavnani force-pushed the deepikabhavnani:stm32_splitheap branch Mar 6, 2019

@deepikabhavnani deepikabhavnani force-pushed the deepikabhavnani:stm32_splitheap branch Mar 6, 2019

@bulislaw
Copy link
Member

left a comment

There's no other actors in the system or "real system" for that matter, therefore releasing memory to the system is not needed.

@deepikabhavnani deepikabhavnani force-pushed the deepikabhavnani:stm32_splitheap branch Mar 7, 2019

@cmonr cmonr added the needs: review label Apr 12, 2019

* If the new address is outside the first region, start allocating from the second region.
*/
if (once && (new_heap > (uint32_t) &__mbed_krbs_start_0)) {
once = false;

This comment has been minimized.

Copy link
@0xc0170

0xc0170 Apr 12, 2019

Member

what does once do - either changing the name to be more clear about the intention or adding a comment. Or at least 1258 comment could include this boolean - it touches only address check

This comment has been minimized.

Copy link
@deepikabhavnani

deepikabhavnani Apr 15, 2019

Author Contributor

once - is indicator of allowing to enter just once. I can update the comment to add more clarity

This comment has been minimized.

Copy link
@juhoeskeli

juhoeskeli Apr 30, 2019

Contributor

What I meant by sbrk behavior is that if you try to allocate here more memory than is available in first region, the code moves to another region and you end up nothing being allocated from first region in the worst case. And you can't come back here either.

This comment has been minimized.

Copy link
@kjbracey-arm

kjbracey-arm Apr 30, 2019

Contributor

Actually, given the current behaviour of newlib's malloc, I don't think the once flag is needed.

The regions must be in memory address order, or newlib malloc would get confused. (Is that known/documented for this feature? I spotted it by inspection of newlib source, and it makes sense given the normal POSIX function of sbrk).

So the test could be if (heap <= (uint32_t) &__mbed_krbs_start_0 && new_heap > (uint32_t) &_mbed_krbs_start_0).

@0xc0170
Copy link
Member

left a comment

Changes looks fine, question:

I find the naming confusing: start_0 means it comes before _start - it is in the lower ram bank (lower address) as heap starts pointing to start_0.
This could be trivial I might be just confused with _start vs _start_0 naming - if I have a target with two RAM banks for heap and want to apply this start/0 to it - I would not know how to apply this to memory sections based on the release notes addition .
Why is it not _0 and _1 / first/second ?

@deepikabhavnani

This comment has been minimized.

Copy link
Contributor Author

commented Apr 15, 2019

@0xc0170 - I understand the confusion, but only reason to not update both to _0 and _1 was to have compatibility with old/existing definitions. Some targets have __mbed_krbs_start and __mbed_sbrk_start in their linker files.

Let me know how I can improve to add more clarity (apart from changing to _0 and _1)

@0xc0170

This comment has been minimized.

Copy link
Member

commented Apr 16, 2019

CI started

@0xc0170 0xc0170 added needs: CI and removed needs: review labels Apr 16, 2019

@mbed-ci

This comment has been minimized.

Copy link

commented Apr 17, 2019

Test run: FAILED

Summary: 2 of 11 test jobs failed
Build number : 2
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_exporter
  • jenkins-ci/mbed-os-ci_greentea-test
@0xc0170

This comment has been minimized.

Copy link
Member

commented Apr 17, 2019

Aborted build, we need CI time for 5.12.2 jobs today

@deepikabhavnani

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2019

Aborted build, we need CI time for 5.12.2 jobs today

Can we start the CI on this?

@bulislaw

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

Why is that 5.13? And not 5.12.x?

@deepikabhavnani

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2019

Why is that 5.13? And not 5.12.x?

I believe because it is new feature. Initially it was 5.12.0, may be because of that?

@juhoeskeli

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

Hello @deepikabhavnani & others. At some point I was reporting allocation problems with pelion-enablement on L475 with this PR. There the allocation/freeing is made in order of 10 /20 / 40 / 60kb. The 40kB allocation failed in that test.

There are several factors at play in here, originating from the newlib implementation. One of these factors is that the _sbrk allocation is aligned to 4kB, and it always tries to allocate one extra page. Second thing is that the allocator does not try to allocate only the missing bytes (e.g. for 40kB we would need ~20kB extra allocated, as we freed that already) but the whole deal (in this case 40kB) in addition. And this is where it fails in this case because with the extra page put on top there is not enough memory (only 63kB or so). Probably the only way to make this work in application is to preallocate (and free) the memory so that both regions are available for use (if you allocate too much on the first go, you end up skipping the first region altogether). We looked this with @kjbracey-arm and he has filed question of this behaviour in here https://answers.launchpad.net/gcc-arm-embedded/+question/680433

@adbridge

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

ci started

@mbed-ci

This comment has been minimized.

Copy link

commented Apr 25, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 5
Build artifacts

@deepikabhavnani

This comment has been minimized.

Copy link
Contributor Author

commented Apr 25, 2019

Probably the only way to make this work in application is to preallocate (and free) the memory so that both regions are available for use

@juhoeskeli - Agree with larger chunks of memory it will be good to pre-allocate them or follow the allocation order of larger to small. In case of single RAM bank GCC can take care of merging the allocation regions, but in 2-region model if first block is 40K and another one 60K initial different sequence will result into different behavior.

Example
RAM 1 - 40K Available RAM2 - 60K available
Allocation sizes 10K, 20k, 40K, 60K (allocate and free)

Sequence 1 - In order 10K, 20K, 40K, 60K
10K is allocated in RAM1 - followed by 20K in RAM1. Now even though 10K and 20K is freed, GCC will not coalese small chunks till memory is available. As a result 40K is allocated in RAM2 and though we have free memory available 60K will not be allocated. But 60K allocation should be successful if we free 40K as well.

Sequence 2 - In order 60K, 40K, 20K, 10K
This should allocate all memory chunks correctly without any failures

Sequence 2 - In order 60K, 20K, 40K, 10K
This should be successful as well. if we free 20K irrespective of 60K is freed or not.

@juhoeskeli - Please clarify if your observation aligns to this understanding.

@adbridge

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2019

@deepikabhavnani I'm going to hold off on merging this until the question is answered

@adbridge adbridge added needs: review and removed needs: CI labels Apr 26, 2019

@juhoeskeli

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

Probably the only way to make this work in application is to preallocate (and free) the memory so that both regions are available for use

@juhoeskeli - Agree with larger chunks of memory it will be good to pre-allocate them or follow the allocation order of larger to small. In case of single RAM bank GCC can take care of merging the allocation regions, but in 2-region model if first block is 40K and another one 60K initial different sequence will result into different behavior.

Citing Kevin, from the link I posted earlier: "Best current work around from outside the library I can see is to have a "heap preload loop" at startup that allocates lots of small (<pagesize) blocks forcing the allocator to make maximally-efficient small sbrk calls, then releases them all."

Example
RAM 1 - 40K Available RAM2 - 60K available
Allocation sizes 10K, 20k, 40K, 60K (allocate and free)

Sequence 1 - In order 10K, 20K, 40K, 60K
10K is allocated in RAM1 - followed by 20K in RAM1. Now even though 10K and 20K is freed, GCC will not coalese small chunks till memory is available. As a result 40K is allocated in RAM2 and though we have free memory available 60K will not be allocated. But 60K allocation should be successful if we free 40K as well.

I think the 60k allocation will not be successful, as the allocator tries to allocate that 60k from address after the 40k which was freed.

Sequence 2 - In order 60K, 40K, 20K, 10K
This should allocate all memory chunks correctly without any failures

Yes, but what happens is that the RAM1 will be left unused and no way to reclaim that (due to how sbrk works - added a comment about this to relevant code section).

Sequence 2 - In order 60K, 20K, 40K, 10K
This should be successful as well. if we free 20K irrespective of 60K is freed or not.

Yes, but what happens is that the RAM1 will be left unused and no way to reclaim that. You must free the 60k otherwise it will not work.

@juhoeskeli - Please clarify if your observation aligns to this understanding.

@deepikabhavnani @adbridge I would still go forward with this, as there is nothing wrong with the PR itself. It is the feature of newlib allocator. With all these caveats I feel it is still better system like this than without. A PR could be later made with workaround to this allocation issue.

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

In case of single RAM bank GCC can take care of merging the allocation regions,

GCC/newlib still has the same basic problem in a single RAM bank. For example if you have 100K in a single region, the sequence malloc(60K); free; malloc(80K) will fail, as it issues the calls sbrk(60K) followed by sbrk(80K), and there isn't 140K of RAM.

The problem is that the sbrk interface is very crude, and newlib uses it in a fairly dumb way. sbrk cannot move backwards to return an earlier pointer to the first region it skipped, and it can't give some sort of limited return - eg for the sbrk(80K) it can't say "I don't have another 80K, but I can give you 40K contiguous with the 60K I already gave you, is that any good?".

And newlib also doesn't try harder - when sbrk(80K) fails, it doesn't attempt sbrk(20K) with the reasonable hope that it will get something contiguous with its free 60K at the top and thus satisfy the malloc(80K) request.

We can avoid the issue by something like this on init:

typedef struct block {
    ns_list_link_t link;
    // aim for block to be around 128 bytes - the SMALL_MEMORY "page size" for newlib malloc
    // (presently SMALL_MEMORY is not defined, so the page size is 4K, meaning 4K alignment
    // will be forced anyway, but this future-proofs in case SMALL_MEMORY gets set)
    char pad[100]; 
} block_t;

NS_LIST_HEAD(block_t, link) blocks;

// Fill heap with small blocks, so it makes small `sbrk` calls that fit nicely into our
// regions, getting as much as we can into each region, avoiding `sbrk`
// having to say "no" to a too-big request, or advance prematurely
// from one region to the next.
for (;;) {
    block_t *block = malloc(sizeof(block_t));
    if (!block) {
        break;
    }

    ns_list_add_to_end(&blocks, block);
}

// And free it all again - newlib will now have populated its free list with all available
// heap regions, and the heap will operate efficiently
ns_list_foreach_safe(block_t, block, &blocks) {
    ns_list_remove(&blocks, block);
    free(block);
}

tr_debug("Heap usable = %zu bytes", mallinfo().arena);

I think I'd advocate adding that code to the pre-main startup for the GCC toolchain, on all builds. Only concern is code-size impact for blinky/bootloader.

@kjbracey-arm
Copy link
Contributor

left a comment

Approving, despite discussion above - none of the issues are caused by this change which is sound. Would suggest removing the once flag next time it's touched though.

@0xc0170 0xc0170 merged commit 1de0712 into ARMmbed:master Apr 30, 2019

26 checks passed

continuous-integration/jenkins/pr-head This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci/build-ARM Success
Details
jenkins-ci/build-GCC_ARM Success
Details
jenkins-ci/build-IAR Success
Details
jenkins-ci/cloud-client-test Success
Details
jenkins-ci/dynamic-memory-usage Success
Details
jenkins-ci/exporter Success
Details
jenkins-ci/greentea-test Success
Details
jenkins-ci/mbed2-build-ARM Success
Details
jenkins-ci/mbed2-build-GCC_ARM Success
Details
jenkins-ci/mbed2-build-IAR Success
Details
jenkins-ci/unittests Success
Details
travis-ci/astyle Success!
Details
travis-ci/docs Success!
Details
travis-ci/doxy-spellcheck Success!
Details
travis-ci/events Success! Runtime is 9854 cycles.
Details
travis-ci/gitattributestest Success!
Details
travis-ci/include_check Success!
Details
travis-ci/licence_check Success!
Details
travis-ci/littlefs Success! Code size is 8448B.
Details
travis-ci/psa-autogen Success!
Details
travis-ci/tools-py2.7 Success!
Details
travis-ci/tools-py3.5 Success!
Details
travis-ci/tools-py3.6 Success!
Details
travis-ci/tools-py3.7 Success!
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.