Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Add routine to debug join fuzz tests #10970

Merged
merged 15 commits into from
Jun 18, 2024
Merged

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

It is pretty complicated to find the exact root cause for issues triggered by fuzz tests. Those tests have a random nature and sometime require stable things to reproduce the case especially when the test is flaky. Adding debug procedures and doc how to use then while debugging #10886

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core datafusion crate label Jun 18, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an improvement to me -- thanks for improving the testing infrastructure @comphead


// if debug flag is set the test will save randomly generated inputs and outputs to user folders
// so it is easy to run debug on top of the failed test data
if debug {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

It might make sense to add this information about saving information to the overall docstring of this function (run_tests) so people don't have to read the implementation to find out what debug does.

@@ -383,7 +419,7 @@ impl JoinFuzzTestCase {

/// Perform sort-merge join and hash join on same input
/// and verify two outputs are equal
async fn run_test(&self) {
async fn run_test(&self, join_test: &[JoinTest], debug: bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is already on a struct, another way to pass in the debug flag would be as a new field on JoinFuzzTestCase

like

struct JoinFuzzTestCase {
    batch_sizes: &'static [usize],
    input1: Vec<RecordBatch>,
    input2: Vec<RecordBatch>,
    join_type: JoinType,
    join_filter_builder: Option<JoinFilterBuilder>,
    // if debug flag is set the test will save randomly generated inputs and outputs to user folders
    // so it is easy to run debug on top of the failed test data
    debug: bool,
}

/// This method useful for debugging fuzz tests
/// It helps to save randomly generated input test data for both join inputs into the user folder
/// as a parquet files preserving partitioning.
/// Once the data is saved it is possible to run a custom test on top of the saved data and debug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@comphead
Copy link
Contributor Author

Thanks @alamb for the review

@comphead comphead merged commit e9f9a23 into apache:main Jun 18, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants