Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant performance improvements to AddResult / GetParameter #4446

Merged
merged 4 commits into from Sep 20, 2022

Conversation

fblanchetNaN
Copy link

@fblanchetNaN fblanchetNaN commented Aug 1, 2022

Saving and loading data from the SQL database rely on _adapt_xxx and _convert_xxx methods. However, currently they are using many numpy functions (np.isnan, np.isinf, np.save, np.load) that are slow when dealing with arrays containing single values. This PR makes the following changes that significantly improve data loading and saving performance:

  • For float: Replace numpy functions by math functions, use math.isinfinite instead of combining isnan and isinf;
  • For complex: Do not save complex single values as numpy arrays of one value. It saves both time and space if we avoid reading and writing the relatively large numpy header. This changes the representation of the data in the database, but it's fully backwards compatible (and tested);
  • For array: Use directly functions from np.lib.format, it skips some work of np.save and np.load;

b0b6528 rewrites adapters and converters avoiding numpy functions and significantly improves performance (around x3 when loading regular numeric data and up to x70 for complex data).

20c58d7 is a minor improvement (~10% when loading complex or numeric data), it avoids unpacking sqlite3.Row to list when it's unnecessary (those functions can reorder the columns however, it's most of the time unnecessary because the SQL queries are properly sorted).

3d7b0fd follows the previous one: as the SQL queries are properly sorted, the named tuple feature of sqlite3.Row (row[column_name]) becomes superfluous.

459b342 is up-to-date benchmarks. (Note that the benchmarks are not run in CI.)

Commit b0b6528 is the most important one, the two other modifications give pretty minor improvements compared to the number of changes to the codebase. If you feel that it's not worth changing so much for such a small gain, they can be left out.

(we worked on these preformance improvements together with @mgunyho)

closes #1238 (?)

@codecov
Copy link

codecov bot commented Aug 2, 2022

Codecov Report

Merging #4446 (3362c91) into master (34f942b) will decrease coverage by 0.02%.
The diff coverage is 93.93%.

❗ Current head 3362c91 differs from pull request most recent head b476b79. Consider uploading reports for the commit b476b79 to get more accurate results

@@            Coverage Diff             @@
##           master    #4446      +/-   ##
==========================================
- Coverage   68.23%   68.21%   -0.03%     
==========================================
  Files         339      339              
  Lines       31801    31778      -23     
==========================================
- Hits        21700    21676      -24     
- Misses      10101    10102       +1     

@jenshnielsen
Copy link
Collaborator

Thanks @fblanchetNaN and @mgunyho I will need a bit of time to look at this so please bear with me

@jenshnielsen
Copy link
Collaborator

@fblanchetNaN and @mgunyho thanks for the contribution. @astafan8 And I finally had a chance to look at it today. We agreed that this looks good and we would like to merge most of it. We are not completely confident in the logic for scalar complex numbers so we would prefer if we can merge the other optimizations but leave that one for a subsequent pr.

@fblanchetNaN
Copy link
Author

Thanks for the response!

We agree that the complex scalar logic looks a bit convoluted, even though it is just following the numpy format standard. We can take it out from this PR and discuss it further separately, although for us this is actually the more significant change because it really affects the loading times in some of our experiments.

I can remove the changes related to complex stuff and force-push if that sounds good. I guess we anyway need to rebase to fix the conflicts.

@fblanchetNaN
Copy link
Author

The branch is now updated excluding the changes related to complex numbers

Copy link
Contributor

@astafan8 astafan8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qcodes/dataset/sqlite/queries.py Outdated Show resolved Hide resolved
qcodes/dataset/sqlite/queries.py Show resolved Hide resolved
qcodes/dataset/sqlite/queries.py Outdated Show resolved Hide resolved
qcodes/dataset/sqlite/queries.py Outdated Show resolved Hide resolved
qcodes/dataset/sqlite/queries.py Outdated Show resolved Hide resolved
qcodes/dataset/sqlite/queries.py Show resolved Hide resolved
qcodes/dataset/sqlite/query_helpers.py Outdated Show resolved Hide resolved
qcodes/dataset/sqlite/query_helpers.py Outdated Show resolved Hide resolved
qcodes/utils/types.py Outdated Show resolved Hide resolved
@fblanchetNaN
Copy link
Author

We just realized that some stuff from the complex adapter/converter was left over (like pointed out here, and also here). Is it ok to rebase and force-push? Or should I add new commits to preserve your review? Sorry about the mess.

@astafan8
Copy link
Contributor

Is it ok to rebase and force-push? Or should I add new commits to preserve your review? Sorry about the mess.

no worries :) both are fine, feel free to do what's most convenient for you, the comments will be preserved anyways.

@fblanchetNaN
Copy link
Author

The branch is now updated including the required changes, all checks have passed.

I have marked the conversations as resolved after replying/accepting them.

Copy link
Contributor

@astafan8 astafan8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors merge

@bors bors bot merged commit b48686e into microsoft:master Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug / DataSet.get_data takes long time for large dataset
3 participants