Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative sortByPositionPresorted() and String generation optimization for TheoreticalSpectrumGenerator #89

Open
wants to merge 29 commits into
base: develop
Choose a base branch
from

Conversation

akriese
Copy link

@akriese akriese commented Apr 29, 2020

No description provided.

@akriese
Copy link
Author

akriese commented May 5, 2020

We've written a new sort-Function for partially sorted chunks of Peaks in a spectrum. Also MSSpectrum::select() got optimized by applying moves and reserves.
Furthermore, the functions of TheoreticalSpectrumGenerator were optimized by caching Strings and using emplace_back() and reserve() for meta information vectors.
So far, the new sort-Function is only applied for the getSpectrum() function, but might be used to replace stable_sort() in other functions too, due to the fact, that partially sorted chunks exist very often.
The above optimizations speed up getSpectrum() by about 30% in its usage in the SimpleSearchEngine.

@akriese akriese changed the title Alternative sortByPosition() and added String generation optimization Alternative sortByPositionPresorted() and String generation optimization for TheoreticalSpectrumGenerator May 5, 2020
@cbielow
Copy link
Owner

cbielow commented May 5, 2020

really nice work!

Can you say how much time the String creation saves? Its still a bit inconsistent, because sometimes strings are created even though they will never be used (add_meta == false), or they are created in a loop. So, is String creation still a larger chunk in the flame graph?

@akriese
Copy link
Author

akriese commented May 5, 2020

I think, add_metainfo_ is almost always true, so creating the Strings is almost always needed.
--> Otherwise we would have to take the loop apart and do one for add_metainfo_==true and the same thing for add_metainfo_==false
String creation is still a bigger part in the flamegraph, but got reduced. Im gonna test, how much exactly.

@akriese
Copy link
Author

akriese commented May 5, 2020

Note: I am currently only testing the SimpleSearchEngine, which doesnt use losses, precursors or abundantImmoniumIons. So, we don't actually test the optimizations in those functions right now.

@akriese
Copy link
Author

akriese commented May 5, 2020

Looking at the flamegraph, the biggest difference is, that the String::operator+(String) chunk is basically gone. All other chunks stayed more or less the same. The percentage of addPeaks_ went from 4.5% to 3%.

@cbielow
Copy link
Owner

cbielow commented May 6, 2020

in line

spectrum.getIntegerDataArrays()[0] = charges;

the local variables are copied to the spectrum data arrays.
It should be much cheaper to move them (both for move-assignment and move-construction cases).

@cbielow
Copy link
Owner

cbielow commented May 6, 2020

does this really work:
an inner chunk:


wrapped in an outer chunk:

??

@akriese
Copy link
Author

akriese commented May 6, 2020

The last commits have sped up getSpectrum() once more. Compared to a build from the develop-Branch, the function is now 40% faster (in the SimpleSearchEngine).

@akriese
Copy link
Author

akriese commented May 7, 2020

I changed the configuration of SimpleSearchEngine to test more functions, that we had optimized in above commits. EmpiriclaFormula::toString() appears to be very expensive. So I added a static map<EmpiricalFormula, String> to cache earlier computed formula-Strings. Also EmpiricalFormula::operator<() was optimized by reordering comparisons. That too brought a good speed-up.

Depending on the SimpleSearchEngine's settings, the speedup lies between 7% and 20%.

Copy link

@IceFreez3r IceFreez3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quiet get when you can do chunks.add(true) and when only chunks.add(false). For the rest i made some (little) comments, but looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants