-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve voice loading times #85
Conversation
@el-tocino took the time to run this on a device similar to the raspberry pi and after some issues with swapping an improvement similar to mine could be observed. However when the voice file (in this case mycroft_voice_4.0.flitevox) is completely uncached by the kernel the delay for the file operations are an order of magnitude larger than the improvements reducing the effectiveness of the optimization attempt to a couple of percent. |
d0320a8
to
456fff5
Compare
Now that Travis has run the os X build there is an actual issue. I'll see if I can fix it. |
e0bc047
to
95fdb94
Compare
Codecov Report
@@ Coverage Diff @@
## development #85 +/- ##
===============================================
+ Coverage 35.48% 35.53% +0.05%
===============================================
Files 97 97
Lines 10255 10262 +7
===============================================
+ Hits 3639 3647 +8
+ Misses 6616 6615 -1
Continue to review full report at Codecov.
|
There, it's passing. I used the work around described in https://gist.github.com/jbenet/1087739 I haven't tested been able to test it properly since I have no computer running os X |
Sorry for not replying for a long time. To me this looks good and can be merged. My only question is that I don't see why the optimization should be hidden behind a I don't know how far did you go with the block allocator and the alignment issues, but I bet it can provide much larger improvements. Another option if the alignment issues are annoying would be to have different memory blocks per data type to ensure alignment. |
Yeah, the flag was mainly added to test the differences, I'll make an update and remove it. I know I did a block-allocator at one point, and I have a separate branch for that (somewhere)...I think I wanted this merged separately for some reason but I can't quite remember. |
--enable-voice-load-opt to enable optimization
- Fix errors due to unused variables when running unoptimized - Remove malloc-optimization to remove crash when unloading voice
* Removed experimental malloc options * Removed minor unintended code styling changes
src/cg/cst_cg_map.c
Outdated
@@ -77,16 +77,22 @@ cst_cg_db *cst_cg_load_db(cst_voice *vox, cst_file fd) | |||
cst_cg_db *db = cst_alloc(cst_cg_db, 1); | |||
int i; | |||
uint32_t elements[2]; | |||
uint32_t load_buff[4]; | |||
struct load_buff_s { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on sizeof(a structure)
is risky. We don't know were mimic may end up being used and what alignment issues we may face. Given that this happens just once (not any inner loop) what do you think about using two cst_fread
, one for the integers and one for the floats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good idea.
I was mainly testing my options with this psuh. Frankly I didn't think this would pass Travis.
Updated previous commit according to your suggestion. |
Merged! :-) |
Then I'll move on to the block allocator :) |
This PR intends to improve the loading times of voice files (.flitevox). The new code is not active by default the option
--enable-voice-load-opt
needs to be added to configure for them to take effect. When this flag is enabled I see an improvement in loading speed of 20-25% (when running a series of 100 loads). I've addedtime_voice_load
to the testsuite to check the load times (roughly).My approach has been to try to reduce the number of calls to
fread()
and to limit the number of context switches into kernel mode.The two most notable changes are
fread()
incst_read_tree_nodes()
(10% improvement)Many other tiny changes each reducing the load time by a couple of percent each contributes to the rest of the time improvements.
Currently I'm working on a block allocator to reduce context switches for memory allocation but I still have to fix some issues with that (make sure alignments are correct).
I haven't had the possibility to try this on a raspberry pi, the increased
vbuf
might make a bigger difference on such a system.