New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace MemoryManager implementation with rpmalloc #3873

Merged
merged 8 commits into from Oct 18, 2017

Conversation

Projects
9 participants
@lukas-w
Member

lukas-w commented Oct 10, 2017

Fixes #3865, by integrating rpmalloc.

Quick – probably not very meaningful – benchmark, using Skiessi-RandomProjectNumber14253:

time user time system elapsed cpu
stable-1.2 23.26 1.33 8.09 303%
malloc/free in MemoryManager 21.30 1.30 6.57 344%
rpmalloc override, replaces all memory allocations 21.66 1.45 6.47 357%
rpmalloc (this PR) 21.15 1.45 6.71 337%
@zonkmachine

This comment has been minimized.

Show comment
Hide comment
@zonkmachine

zonkmachine Oct 10, 2017

Member

Cool!

On Linuxmint 17.3, I get:

-- Configuring done
CMake Error at src/3rdparty/rpmalloc/CMakeLists.txt:3 (add_library):
  Cannot find source file:

.

Edit. New to submodules. The commands that got me going was:

git submodule init
git submodule update --recursive --remote
Member

zonkmachine commented Oct 10, 2017

Cool!

On Linuxmint 17.3, I get:

-- Configuring done
CMake Error at src/3rdparty/rpmalloc/CMakeLists.txt:3 (add_library):
  Cannot find source file:

.

Edit. New to submodules. The commands that got me going was:

git submodule init
git submodule update --recursive --remote
@tresf

This comment has been minimized.

Show comment
Hide comment
@tresf

tresf Oct 10, 2017

Member

On Linuxmint 17.3, I get:

It's a submodule, you'll have to clone with the recurse option. git clone --recursive foo://bar

Member

tresf commented Oct 10, 2017

On Linuxmint 17.3, I get:

It's a submodule, you'll have to clone with the recurse option. git clone --recursive foo://bar

@Umcaruje

This comment has been minimized.

Show comment
Hide comment
@Umcaruje

Umcaruje Oct 10, 2017

Member

Error when compiling with carla:

In file included from /home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:25:0:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.h:81:5: error: ‘QMutex’ does not name a type
     QMutex fMutex;
     ^
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp: In member function ‘intptr_t CarlaInstrument::handleDispatcher(NativeHostDispatcherOpcode, int32_t, intptr_t, void*, float)’:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:253:12: warning: enumeration value ‘NATIVE_HOST_OPCODE_INTERNAL_PLUGIN’ not handled in switch [-Wswitch]
     switch (opcode)
            ^
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp: In member function ‘virtual void CarlaInstrument::play(sample_t (*)[2])’:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:355:32: error: ‘fMutex’ was not declared in this scope
         const QMutexLocker ml(&fMutex);
                                ^
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp: In member function ‘virtual bool CarlaInstrument::handleMidiEvent(const MidiEvent&, const MidiTime&, f_cnt_t)’:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:371:28: error: ‘fMutex’ was not declared in this scope
     const QMutexLocker ml(&fMutex);
                            ^
plugins/carlabase/CMakeFiles/carlabase.dir/build.make:72: recipe for target 'plugins/carlabase/CMakeFiles/carlabase.dir/carla.cpp.o' failed
make[2]: *** [plugins/carlabase/CMakeFiles/carlabase.dir/carla.cpp.o] Error 1
Member

Umcaruje commented Oct 10, 2017

Error when compiling with carla:

In file included from /home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:25:0:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.h:81:5: error: ‘QMutex’ does not name a type
     QMutex fMutex;
     ^
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp: In member function ‘intptr_t CarlaInstrument::handleDispatcher(NativeHostDispatcherOpcode, int32_t, intptr_t, void*, float)’:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:253:12: warning: enumeration value ‘NATIVE_HOST_OPCODE_INTERNAL_PLUGIN’ not handled in switch [-Wswitch]
     switch (opcode)
            ^
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp: In member function ‘virtual void CarlaInstrument::play(sample_t (*)[2])’:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:355:32: error: ‘fMutex’ was not declared in this scope
         const QMutexLocker ml(&fMutex);
                                ^
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp: In member function ‘virtual bool CarlaInstrument::handleMidiEvent(const MidiEvent&, const MidiTime&, f_cnt_t)’:
/home/umcaruje/github/lmmstest/plugins/carlabase/carla.cpp:371:28: error: ‘fMutex’ was not declared in this scope
     const QMutexLocker ml(&fMutex);
                            ^
plugins/carlabase/CMakeFiles/carlabase.dir/build.make:72: recipe for target 'plugins/carlabase/CMakeFiles/carlabase.dir/carla.cpp.o' failed
make[2]: *** [plugins/carlabase/CMakeFiles/carlabase.dir/carla.cpp.o] Error 1
@tresf

This comment has been minimized.

Show comment
Hide comment
@tresf

tresf Oct 10, 2017

Member

@Umcaruje that appears to be a bug with Carla, it should have imported QMutex without relying on it from another header. Can you add #include <QMutex> to the Carla plugin, it sounds like an existing (benign with MemoryManager, but still legit) bug with Carla on stable although @lukas-w can just add it to the PR too. :)

This is easy to miss when another common header accidentally provides the include.

Member

tresf commented Oct 10, 2017

@Umcaruje that appears to be a bug with Carla, it should have imported QMutex without relying on it from another header. Can you add #include <QMutex> to the Carla plugin, it sounds like an existing (benign with MemoryManager, but still legit) bug with Carla on stable although @lukas-w can just add it to the PR too. :)

This is easy to miss when another common header accidentally provides the include.

@Umcaruje

This comment has been minimized.

Show comment
Hide comment
@Umcaruje

Umcaruje Oct 10, 2017

Member

Built without carla, I can confirm I don't get stutters while playing on elementary os loki under both qt4 and qt5. Very good and comparable performance with 1.1. I'll keep using it for further testing.

Member

Umcaruje commented Oct 10, 2017

Built without carla, I can confirm I don't get stutters while playing on elementary os loki under both qt4 and qt5. Very good and comparable performance with 1.1. I'll keep using it for further testing.

@PhysSong

This comment has been minimized.

Show comment
Hide comment
@PhysSong

PhysSong Oct 11, 2017

Member

Tested. It showed very fast allocation speed and I got no crashes during tests.

Member

PhysSong commented Oct 11, 2017

Tested. It showed very fast allocation speed and I got no crashes during tests.

@MJansson MJansson referenced this pull request Oct 11, 2017

Closed

EINVAL undeclared on MINGW #30

@gi0e5b06

This comment has been minimized.

Show comment
Hide comment
@gi0e5b06

gi0e5b06 Oct 11, 2017

May someone try this project?
It runs at 60% CPU on my old computer.
memory20171011.zip

gi0e5b06 commented Oct 11, 2017

May someone try this project?
It runs at 60% CPU on my old computer.
memory20171011.zip

@qnebra

This comment has been minimized.

Show comment
Hide comment
@qnebra

qnebra Oct 11, 2017

@gi0e5b06

On my computer with 2.2 Ghz CPU LMMS with your project opened use 14% of CPU.

qnebra commented Oct 11, 2017

@gi0e5b06

On my computer with 2.2 Ghz CPU LMMS with your project opened use 14% of CPU.

@qnebra

This comment has been minimized.

Show comment
Hide comment
@qnebra

qnebra Oct 11, 2017

I am merged this PR into my local lmms and got this on cmake generate:

CMake Error at src/3rdparty/rpmalloc/CMakeLists.txt:3 (add_library):
Cannot find source file:

rpmalloc/rpmalloc/rpmalloc.c

Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
.hxx .in .txx

CMake Error: Cannot determine link language for target "rpmalloc".
CMake Error: CMake can not determine linker language for target: rpmalloc

qnebra commented Oct 11, 2017

I am merged this PR into my local lmms and got this on cmake generate:

CMake Error at src/3rdparty/rpmalloc/CMakeLists.txt:3 (add_library):
Cannot find source file:

rpmalloc/rpmalloc/rpmalloc.c

Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
.hxx .in .txx

CMake Error: Cannot determine link language for target "rpmalloc".
CMake Error: CMake can not determine linker language for target: rpmalloc

@tresf

This comment has been minimized.

Show comment
Hide comment
@tresf

tresf Oct 11, 2017

Member

@qnebra please see #3873 (comment). This PR switches the project to using submodules, which will take some adjustment, but will greatly reduce the amount of 3rd party source code we bundle.

Member

tresf commented Oct 11, 2017

@qnebra please see #3873 (comment). This PR switches the project to using submodules, which will take some adjustment, but will greatly reduce the amount of 3rd party source code we bundle.

@qnebra

This comment has been minimized.

Show comment
Hide comment
@qnebra

qnebra Oct 11, 2017

Thank you.
Maybe add information about submodules to a developer wiki of lmms? To avoid situations like this.

qnebra commented Oct 11, 2017

Thank you.
Maybe add information about submodules to a developer wiki of lmms? To avoid situations like this.

@zonkmachine

This comment has been minimized.

Show comment
Hide comment
@zonkmachine

zonkmachine Oct 11, 2017

Member

Maybe add information about submodules to a developer wiki of lmms?

Let's merge it first.

Member

zonkmachine commented Oct 11, 2017

Maybe add information about submodules to a developer wiki of lmms?

Let's merge it first.

@qnebra

This comment has been minimized.

Show comment
Hide comment
@qnebra

qnebra Oct 11, 2017

Test & merge, of course.

qnebra commented Oct 11, 2017

Test & merge, of course.

@gi0e5b06

This comment has been minimized.

Show comment
Hide comment
@gi0e5b06

gi0e5b06 Oct 12, 2017

@qnebra Thanks for testing. I'm making some tests about memory use. Actually half of the instruments are good (don't allocate extra memory) and the other half is not, mainly because of Oscillator. All oscillators are duplicated for each played note (with no new information). There must be a way to avoid that.

PS1: I also rewrote BaseDetuning to avoid allocation. @softrabbit was right, a single float works fine.

PS2: This memory manager doesn't seem to improve performance a lot. I wouldn't merge it as it is. The memory manager is a sensible part very depending on the OS, the OS version, the compiler, etc. It should be configurable at runtime. Right now, 5 are available (legacy, standard, rpmalloc, mine and PsySong's one). An abstract class would make wonders.

gi0e5b06 commented Oct 12, 2017

@qnebra Thanks for testing. I'm making some tests about memory use. Actually half of the instruments are good (don't allocate extra memory) and the other half is not, mainly because of Oscillator. All oscillators are duplicated for each played note (with no new information). There must be a way to avoid that.

PS1: I also rewrote BaseDetuning to avoid allocation. @softrabbit was right, a single float works fine.

PS2: This memory manager doesn't seem to improve performance a lot. I wouldn't merge it as it is. The memory manager is a sensible part very depending on the OS, the OS version, the compiler, etc. It should be configurable at runtime. Right now, 5 are available (legacy, standard, rpmalloc, mine and PsySong's one). An abstract class would make wonders.

@softrabbit

This comment has been minimized.

Show comment
Hide comment
@softrabbit

softrabbit Oct 12, 2017

Member

All oscillators are duplicated for each played note (with no new information). There must be a way to avoid that.

What, no new information? Frequency and phase come to mind... that being said, I'd like to see some kind of "oscillator bank" concept, where all 6 (TripleO) or 18 (Organic) Oscillators are merged into one object, preferably organizing the data and math in a way that makes it easy for compilers to optimize. (SIMD, anyone?) That might even up for some new features...

Member

softrabbit commented Oct 12, 2017

All oscillators are duplicated for each played note (with no new information). There must be a way to avoid that.

What, no new information? Frequency and phase come to mind... that being said, I'd like to see some kind of "oscillator bank" concept, where all 6 (TripleO) or 18 (Organic) Oscillators are merged into one object, preferably organizing the data and math in a way that makes it easy for compilers to optimize. (SIMD, anyone?) That might even up for some new features...

@gi0e5b06

This comment has been minimized.

Show comment
Hide comment
@gi0e5b06

gi0e5b06 Oct 12, 2017

@softrabbit Frequency is the same than for the NotePlayHandle, same for ext_phaseOffset. PhaseOffset seems to be more or less equal to ext_phaseOffset. So only phase stays (one float), which is probably computable (I think but I may be wrong). Anyway, there is a big room for improvment here. Especially as Oscillator by many many instruments.

I wouldn't group the oscillators into a single object. Instead I would pre-allocate 2048 oscillators (for example) and reuse them when needed because all the instruments use the same Oscillator type (for now). More important, if it goes over 2048, the note should be skipped.

As a result: no memory allocation and CPU under controlled for all the instruments.

gi0e5b06 commented Oct 12, 2017

@softrabbit Frequency is the same than for the NotePlayHandle, same for ext_phaseOffset. PhaseOffset seems to be more or less equal to ext_phaseOffset. So only phase stays (one float), which is probably computable (I think but I may be wrong). Anyway, there is a big room for improvment here. Especially as Oscillator by many many instruments.

I wouldn't group the oscillators into a single object. Instead I would pre-allocate 2048 oscillators (for example) and reuse them when needed because all the instruments use the same Oscillator type (for now). More important, if it goes over 2048, the note should be skipped.

As a result: no memory allocation and CPU under controlled for all the instruments.

@softrabbit

This comment has been minimized.

Show comment
Hide comment
@softrabbit

softrabbit Oct 12, 2017

Member

I wouldn't group the oscillators into a single object. Instead I would pre-allocate 2048 oscillators (for example) and reuse them when needed because all the instruments use the same Oscillator type (for now). More important, if it goes over 2048, the note should be skipped.

I hate to dig up my own code as an example, but: #2089

That's a speedup of maybe ~2x from reorganizing arrays of structs into structs of arrays. Probably even more if compiling for wider SIMD models than SSE2. That's not to say the same applies to the Oscillator situation, but it might be worth considering.

Member

softrabbit commented Oct 12, 2017

I wouldn't group the oscillators into a single object. Instead I would pre-allocate 2048 oscillators (for example) and reuse them when needed because all the instruments use the same Oscillator type (for now). More important, if it goes over 2048, the note should be skipped.

I hate to dig up my own code as an example, but: #2089

That's a speedup of maybe ~2x from reorganizing arrays of structs into structs of arrays. Probably even more if compiling for wider SIMD models than SSE2. That's not to say the same applies to the Oscillator situation, but it might be worth considering.

@gi0e5b06

This comment has been minimized.

Show comment
Hide comment
@gi0e5b06

gi0e5b06 Oct 12, 2017

@softrabbit Actually I don't disagree. But that was my understanding that you talked about grouping, not restructurating. As a first step requiring little code changes, I'm suggesting to group them all in a big array (with eventually informal subgrouping in 6/18).

As for switching from "arrays of structs into structs of arrays", there are certainly possible gains. But it is a complete rewrite of the Oscillator class and of the depending instruments. And IMHO, that's something that should be only tried later. Eventually in parallel (new classes: Oscillators, Kicker2, TripleOscillator2, Organic2, etc).

gi0e5b06 commented Oct 12, 2017

@softrabbit Actually I don't disagree. But that was my understanding that you talked about grouping, not restructurating. As a first step requiring little code changes, I'm suggesting to group them all in a big array (with eventually informal subgrouping in 6/18).

As for switching from "arrays of structs into structs of arrays", there are certainly possible gains. But it is a complete rewrite of the Oscillator class and of the depending instruments. And IMHO, that's something that should be only tried later. Eventually in parallel (new classes: Oscillators, Kicker2, TripleOscillator2, Organic2, etc).

@qnebra

This comment has been minimized.

Show comment
Hide comment
@qnebra

qnebra Oct 12, 2017

@gi0e5b06

I forgot about something, on my PC your project run "as is" consume 65% of CPU. My previous tests are incorrect, because I modified your project. Interesing behaviour with this loop points.

What tools can I use to test lmms perfomance? Because in my project program has low usage of system resources according to system monitoring tools, but sounds are extremely stutter and glitch.

qnebra commented Oct 12, 2017

@gi0e5b06

I forgot about something, on my PC your project run "as is" consume 65% of CPU. My previous tests are incorrect, because I modified your project. Interesing behaviour with this loop points.

What tools can I use to test lmms perfomance? Because in my project program has low usage of system resources according to system monitoring tools, but sounds are extremely stutter and glitch.

@zonkmachine zonkmachine added this to In Progress in Release LMMS 1.2.0-RC5 Oct 13, 2017

@PhysSong

This comment has been minimized.

Show comment
Hide comment
@PhysSong

PhysSong Oct 14, 2017

Member

Are there some benchmarks for cache miss rate? It is also important for performance.

Member

PhysSong commented Oct 14, 2017

Are there some benchmarks for cache miss rate? It is also important for performance.

@lukas-w

This comment has been minimized.

Show comment
Hide comment
@lukas-w

lukas-w Oct 18, 2017

Member

@PhysSong I did a couple more benchmarks, again using Skiessi-RandomProjectNumber14253.

Startup and load project:

time cpu Ir incl. L1 Miss Sum Cycle Estimation Called
stable-1.2 2.06 45% 1.6×10⁹
(27.9%)
3.0×10⁶
(3.79%)
1.7×10⁹
(23.74%)
6.6×10³
this PR 1.97 44% 4.7×10⁵
(0.01%)
5.8×10³
(0.01%)
1.7×10⁴
(0.00%)
6.6×10³

Render project using CLI:

time cpu Ir incl. L1 Miss Sum Cycle Estimation Called
stable-1.2 07.49 324% 2.0×1010
(15.3%)
3.9×10⁴
(7.27%)
2.0×1010
(14.9%)
2.8×10⁴
this PR 06.30 359% 2.2×10⁶
(0.00%)
7.2×10³
(0.00%)
5.5×10⁵
(0.00%)
2.8×10⁴

If KCachegrind's cycle estimation is accurate, the old MemoryManager is something between 36,000 (rendering) and 100,000 times (startup and loading) slower than rpmalloc.

Merging.

Member

lukas-w commented Oct 18, 2017

@PhysSong I did a couple more benchmarks, again using Skiessi-RandomProjectNumber14253.

Startup and load project:

time cpu Ir incl. L1 Miss Sum Cycle Estimation Called
stable-1.2 2.06 45% 1.6×10⁹
(27.9%)
3.0×10⁶
(3.79%)
1.7×10⁹
(23.74%)
6.6×10³
this PR 1.97 44% 4.7×10⁵
(0.01%)
5.8×10³
(0.01%)
1.7×10⁴
(0.00%)
6.6×10³

Render project using CLI:

time cpu Ir incl. L1 Miss Sum Cycle Estimation Called
stable-1.2 07.49 324% 2.0×1010
(15.3%)
3.9×10⁴
(7.27%)
2.0×1010
(14.9%)
2.8×10⁴
this PR 06.30 359% 2.2×10⁶
(0.00%)
7.2×10³
(0.00%)
5.5×10⁵
(0.00%)
2.8×10⁴

If KCachegrind's cycle estimation is accurate, the old MemoryManager is something between 36,000 (rendering) and 100,000 times (startup and loading) slower than rpmalloc.

Merging.

@lukas-w lukas-w merged commit 8d6cb12 into stable-1.2 Oct 18, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@lukas-w lukas-w deleted the fix/memory-manager-performance branch Oct 18, 2017

@zonkmachine zonkmachine moved this from In Progress to Done in Release LMMS 1.2.0-RC5 Oct 18, 2017

@zonkmachine

This comment has been minimized.

Show comment
Hide comment
@zonkmachine

zonkmachine Oct 20, 2017

Member

Maybe add information about submodules to a developer wiki of lmms?

@qnebra You are of course absolutely right here. The now updated wiki page is:
https://github.com/LMMS/lmms/wiki/Accessing-git-repository

Member

zonkmachine commented Oct 20, 2017

Maybe add information about submodules to a developer wiki of lmms?

@qnebra You are of course absolutely right here. The now updated wiki page is:
https://github.com/LMMS/lmms/wiki/Accessing-git-repository

@gi0e5b06

This comment has been minimized.

Show comment
Hide comment
@gi0e5b06

gi0e5b06 Oct 20, 2017

My memory manager is about 35% faster than rpmalloc.
And it's not even optimized yet.
Available here:
https://github.com/gi0e5b06/lmms/blob/master/src/core/MemoryManagerArray.cpp

Drawbacks: is dedicated to LMMS, uses on average more memory

To do: auta-adaptable to the style of the user

Suggestion: make the memory manager an option configurable by the user

gi0e5b06 commented Oct 20, 2017

My memory manager is about 35% faster than rpmalloc.
And it's not even optimized yet.
Available here:
https://github.com/gi0e5b06/lmms/blob/master/src/core/MemoryManagerArray.cpp

Drawbacks: is dedicated to LMMS, uses on average more memory

To do: auta-adaptable to the style of the user

Suggestion: make the memory manager an option configurable by the user

PhysSong added a commit to PhysSong/lmms that referenced this pull request Oct 23, 2017

@lukas-w lukas-w referenced this pull request Oct 27, 2017

Closed

LMMS doesn't free memory #3366

PhysSong added a commit to PhysSong/lmms that referenced this pull request Oct 29, 2017

Replace MemoryManager implementation with rpmalloc (#3873)
* Replace MemoryManager implementation with rpmalloc
    Fixes #3865
* Travis: Specify OSX image for Qt5 build

PhysSong added a commit to PhysSong/lmms that referenced this pull request Oct 29, 2017

Replace MemoryManager implementation with rpmalloc (#3873)
* Replace MemoryManager implementation with rpmalloc
    Fixes #3865
* Travis: Specify OSX image for Qt5 build

@tresf tresf referenced this pull request Nov 10, 2017

Open

Submodule candidates #3963

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment