Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark for comparing the merge/galloping join algorithm with the hash join algorithm #956

Merged
merged 196 commits into from
Jun 7, 2024

Conversation

schlegan
Copy link
Contributor

@schlegan schlegan commented Apr 22, 2023

An exhaustive benchmark for the comparison of the two implementations that also showcases the usage of our benchmarking library.

…ning of the benchmark infrastructure development and then became deprecated.
…hmark and tried updating it to work with the new version of the benchmark infrastructure.
…c benchmark table generation into configuration options.
…now no longer creates names, but takes them as an argument.
…mplementation and added the general point in time, in which the benchmark measuring was finished, to the general metadata.
@codecov
Copy link

codecov bot commented Apr 22, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.85%. Comparing base (ccf592e) to head (26bb2e1).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #956      +/-   ##
==========================================
- Coverage   88.85%   88.85%   -0.01%     
==========================================
  Files         326      326              
  Lines       28926    28926              
  Branches     3205     3205              
==========================================
- Hits        25703    25702       -1     
  Misses       2070     2070              
- Partials     1153     1154       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…here the smaller table keeps the same size and the bigger table grows. Now there are multiple benchmark tables, each with a different number of rows in the smaller table.
@schlegan schlegan marked this pull request as ready for review February 22, 2024 10:55
@schlegan schlegan marked this pull request as draft March 1, 2024 09:24
@schlegan schlegan marked this pull request as ready for review March 5, 2024 09:44
@schlegan schlegan marked this pull request as draft March 22, 2024 10:48
@schlegan schlegan marked this pull request as ready for review May 7, 2024 16:51
Copy link

sonarcloud bot commented Jun 6, 2024

Quality Gate Passed Quality Gate passed

Issues
2 New issues
5 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Comment on lines 219 to 221
/*
The number of rows.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*
The number of rows.
*/
// The number of rows.

Comment on lines 224 to 226
/*
The number of columns.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Comment on lines 213 to 217
/*
If nobody played around with the private member variables, every row
should have the same amount of columns and there should be AT LEAST one row,
and one column. So we can just return the length of the first row.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*
If nobody played around with the private member variables, every row
should have the same amount of columns and there should be AT LEAST one row,
and one column. So we can just return the length of the first row.
*/
// Every row has the same number of columns, so we just return the length of the first row.

Comment on lines 28 to 32
Note: This will cause an exception, if the row and column is bigger
than the table. However, such a situation should only happen, if this
function was called with the wrong column numbers, in which case it
ain't our problem and can only be fixed by the user.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace this Note by the three AD_CONTRACT_CHECK(columnTo... < table->numColumns()).

Comment on lines +4 to +5
// Author of the file this file is based on: Björn Buchhold
// (buchhold@informatik.uni-freiburg.de)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Author of the file this file is based on: Björn Buchhold
// (buchhold@informatik.uni-freiburg.de)

(I think you wrote this one all by yourself, didn't you?)

Comment on lines 45 to 47
static void createOverlapRandomly(IdTableAndJoinColumn* const smallerTable,
const IdTableAndJoinColumn& biggerTable,
const double probabilityToCreateOverlap) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this helper should be in the same helper/util file that also creates random IdTables.
And all of those should also work with multiple join columns to make life easier later when we also benchmark join on multiple columns.
This requires basically changing IdTableAndJoinColumn to IdTableAndJoinColumns etc.

*/
void parseConfiguration(const BenchmarkConfiguration& config) {
numberRows =
config.getValueByNestedKeys<size_t>("numberRows").value_or(100);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rather numRows. numberRows sounds strange and numberOfRows is too long:)

@joka921 joka921 merged commit 57ba939 into ad-freiburg:master Jun 7, 2024
19 checks passed
@schlegan schlegan deleted the JoinAlgorithmBenchmark branch June 7, 2024 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants