Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64 encoding/decoding using SIMDe #6978

Merged
merged 50 commits into from Aug 11, 2023
Merged

Conversation

cbielow
Copy link
Contributor

@cbielow cbielow commented Jul 31, 2023

Description

faster de&encoding of any XML data which contains base64 strings, e.g. mzML.
FileConverter is about 8-10% faster when loading+storing an mzML file on both Linux and Windows.

We now require some advanced instruction sets for x64 CPUs: SSE3 (g++/clang) or AVX (MSVC compiler for Windows); and NEON for ARM64 CPUs.
AVX was introduced 2013 - we may exclude some really old CPU's for Windows here, but everybody else should see significant benefits.
SSE3 was introduced 2004 -- not a problem here.

Checklist

  • Make sure that you are listed in the AUTHORS file
  • Add relevant changes and new features to the CHANGELOG file
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing unit tests pass locally with my changes
  • Updated or added python bindings for changed or new classes (Tick if no updates were necessary.)

How can I get additional information on failed tests during CI

Click to expand If your PR is failing you can check out
  • The details of the action statuses at the end of the PR or the "Checks" tab.
  • http://cdash.openms.de/index.php?project=OpenMS and look for your PR. Use the "Show filters" capability on the top right to search for your PR number.
    If you click in the column that lists the failed tests you will get detailed error messages.

Advanced commands (admins / reviewer only)

Click to expand
  • /reformat (experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.
  • setting the label "NoJenkins" will skip tests for this PR on jenkins (saves resources e.g., on edits that do not affect tests)
  • commenting with rebuild jenkins will retrigger Jenkins-based CI builds

⚠️ Note: Once you opened a PR try to minimize the number of pushes to it as every push will trigger CI (automated builds and test) and is rather heavy on our infrastructure (e.g., if several pushes per day are performed).

Moritz Aubermann and others added 25 commits May 4, 2023 20:44
…source builds to run without weird error messages"

This reverts commit 30e5def.
…\simde\simde\x86\sse.h(3363,39): error C4100: 'i': unreferenced formal parameter ` on MSVC due to external includes since we are using /we4100 and /we4189
Copy link
Contributor

@jpfeuffer jpfeuffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment before looking at the code:

How bad would it be to have this as a build option? I'm not sure about all its implications yet.

Does this allow linking to an existing simde lib installation? It's on all major package managers as far as I can see.

Also, if we make it default it would make sense to adapt the contrib to build everything (especially Eigen) with that support (as they will benefit as well).

@cbielow
Copy link
Contributor Author

cbielow commented Jul 31, 2023

How bad would it be to have this as a build option? I'm not sure about all its implications yet.

it would mean that we need to keep two implementations around (and make sure we test both of them), i.e. this would double CI costs, if done properly.
Any specific things we should be concerned about when switching to SSE/AVX?

Does this allow linking to an existing simde lib installation? It's on all major package managers as far as I can see.

Good point! I will add that.

Also, if we make it default it would make sense to adapt the contrib to build everything (especially Eigen) with that support (as they will benefit as well).

This probably needs careful consideration for each individual library, since this may have implications on their tests and we may need to pass these things into the respective build systems. But that can be done, yes.

@jpfeuffer
Copy link
Contributor

The reason is mainly conda:
https://longr.github.io/posts/gromacs-conda/

@timosachsenberg
Copy link
Contributor

I am not sure if I understand all implications.
Some technical questions

  • Would it make sense to use the single header version with https://github.com/simd-everywhere/simde/blob/master/amalgamate.py - not sure how often we would need to update it if we use only the older instruction sets
  • Could the simd stuff be put into the cpp files not exposing SIMD specific types or functions?
  • Eigen is header only. Can we also enable it there?

@cbielow
Copy link
Contributor Author

cbielow commented Aug 1, 2023

I am not sure if I understand all implications. Some technical questions

* Would it make sense to use the single header version with https://github.com/simd-everywhere/simde/blob/master/amalgamate.py - not sure how often we would need to update it if we use only the older instruction sets

Since we will have to allow for an external SIMDe anyway, I don't think there is an advantage. We'd even have to conditionally change the way we include SIMDe headers, since an external SIMDe will use different names then.

* Could the simd stuff be put into the cpp files not exposing SIMD specific types or functions?

This may work for Base64. Not sure about other potential applications. I can see why hiding the SIMDe headers from the OpenMS interface is beneficial, but if we allow for a custom external SIMDe, the problem is somewhat reduced.
So I'm not sure if its worth the effort.

* Eigen is header only. Can we also enable it there?

Yes, that's easy. Just use the compiler flags, as we already do:
https://eigen.tuxfamily.org/index.php?title=FAQ#Which_SIMD_instruction_sets_are_supported_by_Eigen.3F

@jpfeuffer
Copy link
Contributor

If you could confirm with the conda and debian guys that this is ok, I would be willing to merge if it passes.

@cbielow
Copy link
Contributor Author

cbielow commented Aug 8, 2023

rebuild jenkins

@cbielow
Copy link
Contributor Author

cbielow commented Aug 9, 2023

rebuild jenkins

@cbielow
Copy link
Contributor Author

cbielow commented Aug 11, 2023

rebuild jenkins

@cbielow
Copy link
Contributor Author

cbielow commented Aug 11, 2023

this is ready for merge. Please have a final look :)

@cbielow
Copy link
Contributor Author

cbielow commented Aug 11, 2023

btw: Debian devs said, we should just mention SSE in Changelog/Readme. That's enough for package creation.

@cbielow
Copy link
Contributor Author

cbielow commented Aug 11, 2023

tests pass

@cbielow cbielow merged commit 69a0ee6 into OpenMS:develop Aug 11, 2023
1 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants