New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor memap for speed #5125

Merged
merged 9 commits into from Sep 21, 2017

Conversation

Projects
None yet
7 participants
@theotherjimmy
Contributor

theotherjimmy commented Sep 15, 2017

Improves performance by 10-15x on my machine. See below for individual compiler comparisons.

Resolves #4976
Resolves #5124
Resolves #5127

Todo:

  • Parsing tests (let's never let this non-performance bug in again)
  • Resolve the Malformed input... output from GCC
@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 15, 2017

@MarceloSalazar Could you take a look?

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 15, 2017

Adding needs work for parsing tests.

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 15, 2017

GCC Performance

Speedup: 10.86

Before

# git checkout master 
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
# time python2 tools/memap.py -t GCC_ARM BUILD/tests/nrf51_dk/gcc_arm/TESTS/events/queue/queue.map -d 0
+------------------+-------+-------+------+
| Module           | .text | .data | .bss |
+------------------+-------+-------+------+
| [fill]           |    90 |     3 |   41 |
| [lib]/libc.a     | 24469 |  2472 |   56 |
| [lib]/libgcc.a   |  7134 |     0 |    0 |
| [lib]/libnosys.a |    32 |     0 |    0 |
| [lib]/misc       |   208 |     8 |   25 |
| blinky/BUILD     | 20425 |   261 | 5324 |
| fat-fs/BUILD     |  1389 |     0 | 1644 |
| main.o           |  9751 |     4 |  413 |
| mbed-os/BUILD    |  9215 |    12 |  769 |
| Subtotals        | 72713 |  2760 | 8272 |
+------------------+-------+-------+------+
Total Static RAM memory (data + bss): 11032 bytes
Total Flash memory (text + data): 75473 bytes

real    1.012522411s

After

# git checkout improve-memap-performance
Switched to branch 'improve-memap-performance'
Your branch is up-to-date with 'theotherjimmy/improve-memap-performance'.
# time python2 tools/memap.py -t GCC_ARM BUILD/tests/nrf51_dk/gcc_arm/TESTS/events/queue/queue.map -d 0
Malformed input found when parsing GCC map: linker stubs
Malformed input found when parsing GCC map: linker stubs
Malformed input found when parsing GCC map: linker stubs
Malformed input found when parsing GCC map: linker stubs
+--------------------------------+-------+-------+------+
| Module                         | .text | .data | .bss |
+--------------------------------+-------+-------+------+
| TESTS/events                   |  9751 |     4 |  413 |
| [fill]                         |    90 |     3 |   41 |
| [lib]/libc.a                   | 24469 |  2472 |   56 |
| [lib]/libgcc.a                 |  7134 |     0 |    0 |
| [lib]/libnosys.a               |    32 |     0 |    0 |
| [lib]/misc                     |   208 |     8 |   25 |
| drivers/I2C.o                  |    18 |     0 |    0 |
| drivers/InterruptIn.o          |    90 |     0 |    0 |
| drivers/RawSerial.o            |   392 |     0 |    0 |
| drivers/SerialBase.o           |   398 |     0 |    0 |
| drivers/Ticker.o               |   238 |     0 |    0 |
| drivers/Timeout.o              |   167 |     0 |    0 |
| drivers/Timer.o                |   202 |     0 |    0 |
| drivers/TimerEvent.o           |   180 |     0 |    0 |
| events/EventQueue.o            |    62 |     0 |    0 |
| events/equeue                  |  1514 |     0 |   93 |
| features/frameworks            |  7286 |    69 |  513 |
| hal/mbed_gpio.o                |   220 |     0 |    0 |
| hal/mbed_lp_ticker_api.o       |    56 |     0 |   24 |
| hal/mbed_sleep_manager.o       |   219 |     0 |    2 |
| hal/mbed_ticker_api.o          |   436 |     0 |    0 |
| hal/mbed_us_ticker_api.o       |    56 |     0 |   24 |
| platform/mbed_alloc_wrappers.o |    44 |     0 |    0 |
| platform/mbed_assert.o         |    85 |     0 |    0 |
| platform/mbed_board.o          |   366 |     0 |    0 |
| platform/mbed_critical.o       |    62 |     0 |    0 |
| platform/mbed_error.o          |    44 |     0 |    1 |
| platform/mbed_retarget.o       |   647 |     4 |  264 |
| platform/mbed_wait_api_rtos.o  |    64 |     0 |    0 |
| rtos/TARGET_CORTEX             | 10647 |   180 | 6036 |
| rtos/Thread.o                  |     8 |     0 |    0 |
| targets/TARGET_NORDIC          |  7528 |    20 |  780 |
| Subtotals                      | 72713 |  2760 | 8272 |
+--------------------------------+-------+-------+------+
Total Static RAM memory (data + bss): 11032 bytes
Total Flash memory (text + data): 75473 bytes

real    0.093222451s
@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 15, 2017

Arm Compiler 5/6 performance

Speedup: 14.67

Before

# git checkout master 
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
# time python2 tools/memap.py -t ARM BUILD/tests/nrf51_dk/arm/TESTS/events/queue/queue.map -e table
+-----------------+-------+-------+------+
| Module          | .text | .data | .bss |
+-----------------+-------+-------+------+
| [lib]/c_p.l     | 11603 |    16 |  348 |
| [lib]/cpprt_p.l |    44 |     0 |    0 |
| [lib]/fz_ps.l   |    32 |     0 |    0 |
| [lib]/m_ps.l    |    44 |     0 |    0 |
| anon$$obj.o     |    32 |     0 |    0 |
| blinky/BUILD    | 22684 |   395 | 5115 |
| fat-fs/BUILD    |  1680 |  2369 |   76 |
| main.o          | 10902 |     8 |  408 |
| mbed-os/BUILD   | 10677 |    74 | 1104 |
| Subtotals       | 57698 |  2862 | 7051 |
+-----------------+-------+-------+------+
Total Static RAM memory (data + bss): 9913 bytes
Total Flash memory (text + data): 60560 bytes

real    0.884907055s

After

# git checkout improve-memap-performance
Switched to branch 'improve-memap-performance'
Your branch is up-to-date with 'theotherjimmy/improve-memap-performance'.
# time python2 tools/memap.py -t ARM BUILD/tests/nrf51_dk/arm/TESTS/events/queue/queue.map -e table
+--------------------------------+-------+-------+------+
| Module                         | .text | .data | .bss |
+--------------------------------+-------+-------+------+
| ./main.o                       | 10902 |     8 |  408 |
| [lib]/c_p.l                    | 11603 |    16 |  348 |
| [lib]/cpprt_p.l                |    44 |     0 |    0 |
| [lib]/fz_ps.l                  |    32 |     0 |    0 |
| [lib]/m_ps.l                   |    44 |     0 |    0 |
| anon$$obj.o                    |    32 |     0 |    0 |
| drivers/I2C.o                  |    76 |     0 |    0 |
| drivers/InterruptIn.o          |    85 |     0 |    0 |
| drivers/RawSerial.o            |   324 |     0 |    0 |
| drivers/SerialBase.o           |   460 |     0 |    0 |
| drivers/Ticker.o               |   198 |     0 |    0 |
| drivers/Timeout.o              |    64 |     0 |    0 |
| drivers/Timer.o                |   188 |     0 |    0 |
| drivers/TimerEvent.o           |   144 |     0 |    0 |
| events/EventQueue.o            |    76 |     0 |    0 |
| events/equeue                  |  1838 |     8 |   88 |
| features/frameworks            |  8049 |   140 |  512 |
| hal/mbed_gpio.o                |   256 |     0 |    0 |
| hal/mbed_lp_ticker_api.o       |    56 |     0 |   24 |
| hal/mbed_sleep_manager.o       |   216 |     2 |    0 |
| hal/mbed_ticker_api.o          |   464 |     0 |    0 |
| hal/mbed_us_ticker_api.o       |    56 |     0 |   24 |
| platform/FileBase.o            |   244 |     4 |   52 |
| platform/FilePath.o            |   170 |     0 |    0 |
| platform/mbed_alloc_wrappers.o |    16 |     0 |    0 |
| platform/mbed_assert.o         |    84 |     0 |    0 |
| platform/mbed_board.o          |   380 |     0 |    0 |
| platform/mbed_critical.o       |    64 |     0 |    0 |
| platform/mbed_error.o          |    44 |     1 |    0 |
| platform/mbed_retarget.o       |  1191 |     8 |  376 |
| platform/mbed_wait_api_rtos.o  |    56 |     0 |    0 |
| rtos/Mutex.o                   |   252 |     0 |    0 |
| rtos/TARGET_CORTEX             | 11987 |  2569 | 4300 |
| rtos/Thread.o                  |     8 |     0 |    0 |
| targets/TARGET_NORDIC          |  7995 |   106 |  919 |
| Subtotals                      | 57698 |  2862 | 7051 |
+--------------------------------+-------+-------+------+
Total Static RAM memory (data + bss): 9913 bytes
Total Flash memory (text + data): 60560 bytes

real    0.060380164s
@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 15, 2017

IAR Performance

Speedup: 11.38

Before

# git checkout master 
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
# time python2 tools/memap.py -t IAR BUILD/tests/nrf51_dk/iar/TESTS/events/queue/queue.map
+----------------------+-------+-------+------+
| Module               | .text | .data | .bss |
+----------------------+-------+-------+------+
| [lib]/dl6M_tlf.a     | 11882 |   480 |  716 |
| [lib]/dlpp6M_tl_fc.a |    64 |     0 |    0 |
| [lib]/m6M_tl.a       |  1368 |     0 |    0 |
| [lib]/rt6M_tl.a      |   974 |     0 |    0 |
| [misc]               |   239 |     0 |    0 |
| blinky/BUILD         | 20712 |   488 | 5514 |
| fat-fs/BUILD         |  1520 |     0 | 1700 |
| main.o               | 13206 |     0 |  413 |
| mbed-os/BUILD        |  8870 |     8 |  818 |
| Subtotals            | 58835 |   976 | 9161 |
+----------------------+-------+-------+------+
Total Static RAM memory (data + bss): 10137 bytes
Total Flash memory (text + data): 59811 bytes

real    0.757136647s

After

git checkout improve-memap-performance
Switched to branch 'improve-memap-performance'
Your branch is up-to-date with 'theotherjimmy/improve-memap-performance'.
# time python2 tools/memap.py -t IAR BUILD/tests/nrf51_dk/iar/TESTS/events/queue/queue.map
+--------------------------------+-------+-------+------+
| Module                         | .text | .data | .bss |
+--------------------------------+-------+-------+------+
| ./main.o                       | 13206 |     0 |  413 |
| [lib]/dl6M_tlf.a               | 11882 |   480 |  716 |
| [lib]/dlpp6M_tl_fc.a           |    64 |     0 |    0 |
| [lib]/m6M_tl.a                 |  1368 |     0 |    0 |
| [lib]/rt6M_tl.a                |   974 |     0 |    0 |
| [misc]                         |   239 |     0 |    0 |
| drivers/RawSerial.o            |   276 |     0 |    0 |
| drivers/SerialBase.o           |   420 |     0 |    0 |
| drivers/Ticker.o               |   206 |     0 |    0 |
| drivers/Timeout.o              |    64 |     0 |    0 |
| drivers/Timer.o                |   210 |     0 |    0 |
| drivers/TimerEvent.o           |   146 |     0 |    0 |
| events/EventQueue.o            |    72 |     0 |    0 |
| events/equeue                  |  1556 |     0 |   96 |
| features/frameworks            |  7076 |   196 |  356 |
| hal/mbed_gpio.o                |   220 |     0 |    0 |
| hal/mbed_lp_ticker_api.o       |    48 |     0 |   24 |
| hal/mbed_sleep_manager.o       |   208 |     0 |    2 |
| hal/mbed_ticker_api.o          |   660 |     0 |    0 |
| hal/mbed_us_ticker_api.o       |    48 |     0 |   24 |
| platform/ATCmdParser.o         |    32 |     0 |    0 |
| platform/FileBase.o            |    96 |     0 |   56 |
| platform/FilePath.o            |   100 |     0 |    0 |
| platform/mbed_alloc_wrappers.o |    10 |     0 |    0 |
| platform/mbed_assert.o         |    84 |     0 |    0 |
| platform/mbed_board.o          |   600 |     0 |    0 |
| platform/mbed_critical.o       |    54 |     0 |    0 |
| platform/mbed_error.o          |    40 |     0 |    1 |
| platform/mbed_retarget.o       |   400 |     0 |  124 |
| platform/mbed_rtc_time.o       |   156 |     0 |    0 |
| platform/mbed_wait_api_rtos.o  |    56 |     0 |    0 |
| rtos/Mutex.o                   |   192 |     0 |    0 |
| rtos/RtosTimer.o               |    24 |     0 |    0 |
| rtos/TARGET_CORTEX             | 10866 |   180 | 6432 |
| rtos/Thread.o                  |    68 |     0 |    0 |
| targets/TARGET_NORDIC          |  7114 |   120 |  917 |
| Subtotals                      | 58835 |   976 | 9161 |
+--------------------------------+-------+-------+------+
Total Static RAM memory (data + bss): 10137 bytes
Total Flash memory (text + data): 59811 bytes

real    0.066505842s
@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 15, 2017

I had received in person bug reports from:

So you guys might want to check this PR out.

@MarceloSalazar

This comment has been minimized.

Contributor

MarceloSalazar commented Sep 18, 2017

The speedup is great! :)
But I'm seeing that 'mbed-os' folder is gone, which was one of the things that help to understand where memory is going (things that are part of mbed-os vs other external libraries, and application).
Can we check this please?

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 18, 2017

@MarceloSalazar That was part of the cause of testing being so horribly slow on some people's machines. Check #5124. I'm going to add a bit of detail as to why this is a problem.

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 18, 2017

But I'm seeing that 'mbed-os' folder is gone,

Only for tests.

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 18, 2017

Can we check this please?

This is the intended behavior of mbed OS tests.

@MarceloSalazar

This comment has been minimized.

Contributor

MarceloSalazar commented Sep 18, 2017

But I'm seeing that 'mbed-os' folder is gone,
Only for tests.
This is the intended behavior of mbed OS tests.

Could we add/concatenate the 'mbed-os' text, so the output is consistent with the apps' output as well? In the end, tests as also some kind of apps.

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 18, 2017

so the output is consistent with the apps' output as well?

To some degree, this PR is already consistent between the apps and tests; As this PR stands, the output is always relative to the root of the app's containing directory. You are suggesting that it "Look the same" which is a different form of consistency.

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 18, 2017

FYI, I'm going to be completely reworking this PR. There's yet another speedup to take advantage of: the resources object already contains all of the information that we could get from scanning. So let's not be silly and just use that.

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 18, 2017

Hmmmm... bad rebase. Just a sec.

@theotherjimmy theotherjimmy force-pushed the theotherjimmy:improve-memap-performance branch 2 times, most recently Sep 18, 2017

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 18, 2017

There we go; that should be a clean history.

theotherjimmy added some commits Sep 18, 2017

Parse filenames from IAR map file
Instead of scanning.

Is a 8ms/15% speedup.
Remove warnings for Zero sized sections
We just don't care if we don't know where they go

@theotherjimmy theotherjimmy force-pushed the theotherjimmy:improve-memap-performance branch to 6d135c2 Sep 18, 2017

theotherjimmy added some commits Sep 18, 2017

Parse full paths from ARM map file
Now we don't have to scan!

This is a 20% speed improvement

@theotherjimmy theotherjimmy force-pushed the theotherjimmy:improve-memap-performance branch to fdd4ae3 Sep 18, 2017

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 18, 2017

Second round here: IAR is now down to 60ms, and ARM down to 55ms. No files were scanned.

@0xc0170

This comment has been minimized.

Member

0xc0170 commented Sep 20, 2017

What is the current status of this patch?

Any more reviews from the referenced ppl here?

@MarceloSalazar

This comment has been minimized.

Contributor

MarceloSalazar commented Sep 20, 2017

From the technical point of view, I'll let others comment, but the figures speak for themselves :)

From the usability point of view, I still believe we need consistency and generate output that looks the same (IMO tests are basically apps with main components: mbed-os, C/C++ library, and the app/tests itself). Also, there are consumers for this output that would like consistency, either CI systems that need to keep track of this information or others working on UI tools to display/analyze it.

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 20, 2017

@0xc0170 I have to write a test for GCC parsing. Then we can run it through a morph test.

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 20, 2017

@MarceloSalazar I have to say that I don't think it's "consistent" to do path mangling for only tests. As a user, If I saw a .o file in the output, I would expect to see it exactly where memap says it is, relative to the build directory.

IMO tests are basically apps with main components: mbed-os, C/C++ library, and the app/tests itself

I think this is the point of contention. They are built that way (to some extent) but they are not structured that way in the filesystem. Like I said in another comment, I think the form of consistency I am suggesting is good for users, as it can help them discover how tests are build and structured and does not hide either of those aspects of test.

@sg- Could you share your opinion?

@sg-

This comment has been minimized.

Member

sg- commented Sep 20, 2017

if I created a folder test and imported mbed-os into it and then ran mbed test would that solve the problem?

@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 20, 2017

if I created a folder test and imported mbed-os into it and then ran mbed test would that solve the problem?

That would create a build error. You would have 2 of each Mbed OS symbol.

If you also added a .mbed file, then ran mbed compile that would produce the output @MarceloSalazar is suggesting.

theotherjimmy added some commits Sep 20, 2017

Remove prefix stuff from GCC memap parser
It's just not used anymore
@theotherjimmy

This comment has been minimized.

Contributor

theotherjimmy commented Sep 20, 2017

/morph test

@0xc0170

This comment has been minimized.

Member

0xc0170 commented Sep 21, 2017

@adbridge @MarceloSalazar Please review again

@MarceloSalazar

This comment has been minimized.

Contributor

MarceloSalazar commented Sep 21, 2017

Looks good! :)

@0xc0170 0xc0170 added needs: CI and removed needs: review labels Sep 21, 2017

@mbed-bot

This comment has been minimized.

mbed-bot commented Sep 21, 2017

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 1350

All builds and test passed!

@theotherjimmy theotherjimmy merged commit 4de4481 into ARMmbed:master Sep 21, 2017

4 checks passed

Cam-CI uvisor Build & Test Success
Details
ci/morph-test Job has completed
Details
continuous-integration/jenkins/pr-head This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment