Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AP_Scripting: move singleton method bindings to flash #18249

Merged
merged 2 commits into from Feb 2, 2022

Conversation

IamPete1
Copy link
Member

@IamPete1 IamPete1 commented Aug 7, 2021

This changes the singleton enum and method tables to be evaluated in c++, so they stay in flash and we don't have to load them into memory for lua to use. This method seems to work, no changes required to any scripts.

This is the memory usage of the simple loop example for master:
AP: Lua: Time: 0 Mem: 52879 + 72

And with this PR:
AP: Lua: Time: 0 Mem: 40312 + 616

So we save 12.5k of memory total but has a bigger increase of +544, I suspect that means I have not got this quite right. If we pick up that much on a simple script the memory saving may not be worth it with bigger scripts. Seems to not scale and sometimes not pick up anything, I maybe there is some interaction with the garbage clean?

This also costs about 1.6K of flash.

I have only done singletons here, we could pickup some more savings by also doing userdata and ap object types.

The are also probably better ways to do this if you know what your doing. It would be nice to grab the singleton in the indexing function and then pass it down to the method function, that would save lots of flash, but I don't know how to do that.

I have not investigated how this effects run times, or the total memory used by the scripting thread.

To get a better idea of what is going on it is easier to compare the generated bindings directly rather than comparing the binding generator.

@IamPete1
Copy link
Member Author

IamPete1 commented Aug 7, 2021

Binary           text             data              bss             total
---------------  ---------------  ----------------  --------------  ---------------
ardusub          1316 (+0.0874%)  -152 (-12.1019%)  156 (+0.0597%)  1320 (+0.0747%)
blimp            1316 (+0.1066%)  -152 (-12.5413%)  148 (+0.0567%)  1312 (+0.0877%)
arducopter-heli  1648 (+0.0991%)  -152 (-5.9190%)   152 (+0.0585%)  1648 (+0.0856%)
arducopter       1648 (+0.0975%)  -152 (-5.9190%)   152 (+0.0585%)  1648 (+0.0844%)
ardurover        1316 (+0.0873%)  -152 (-12.3779%)  148 (+0.0567%)  1312 (+0.0741%)
antennatracker   1316 (+0.1010%)  -152 (-12.5000%)  148 (+0.0567%)  1312 (+0.0838%)
arduplane        1552 (+0.0917%)  -152 (-6.1389%)   152 (+0.0585%)  1552 (+0.0794%)

@tridge
Copy link
Contributor

tridge commented Aug 9, 2021

great!
@WickedShell can you review?

@khancyr
Copy link
Contributor

khancyr commented Oct 13, 2021

@WickedShell this is really interesting for scripting I think, do you have some time to review it ?

@IamPete1
Copy link
Member Author

Now implemented the same approach for all singletons, ap_objects and userdata types. Still in memory are the manual bindings and the custom uint32 handling. Simple loop on master uses 55974 + 72 this branch uses 35531 + 616. So were saving 20K of memory. I think @WickedShell and I decided larger per run memory was a interaction with how the garbage collection works, it is not larger in all cases.

Looks like were picking up about 1k in flash. Not too bad on the whole.

I think the main thing now is to test this on lots of different scripts and make sure I have not broken anything and that the runtime memory usage and execution time is not adversely affected. SCR_DEBUG_OPTS = 8 for run time logging is very useful for this.

@@ -2379,6 +2396,43 @@ void emit_docs(struct userdata *node, int is_userdata, int emit_creation) {
}


void emit_index_helpers(void) {
fprintf(source, "static bool load_function(lua_State *L, const luaL_Reg *list, const uint8_t length, const char* name) {\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to get accurate timing on a STM32 need to do the same as ENABLE_EKF_TIMING does

#if ENABLE_EKF_TIMING
    void *istate = hal.scheduler->disable_interrupts_save();
    static uint32_t timing_start_us;
    timing_start_us = dal.micros();
#endif

@tridge
Copy link
Contributor

tridge commented Jan 31, 2022

a benchmark like Tools/CPUInfo would be very useful, along with disabling interrupts

@IamPete1
Copy link
Member Author

IamPete1 commented Feb 1, 2022

This test script:

param:set("LOG_DISARMED", 1)
param:set("SCR_DEBUG_OPTS", 8)

function update()
  local sensor_count = 0
  for i = 0, 1000 do
    -- num_sensors call is to a singleton that does not take a semaphore
    -- it is at the bottom of the table serach, so should be a worse case test
    sensor_count = sensor_count + rangefinder:num_sensors()
  end

  gcs:send_text(0, "found: " .. tostring(sensor_count))

  return update, 1000
end
return update()

Master:

image

This PR:

image

edit: Forgot to mention that the testing was done on a Pixhawk1

So approx double the run time, num_sensors is the 9th item in the list. Vehicle is the largest singleton with 25 items so the last item in that list will be a worse slow down.

Master memory: 36892 + 368

This PR: 22356 + 368

A saving of 14536 and this is a case where the per call memory is the same.

@tridge tridge merged commit 427e8d1 into ArduPilot:master Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants