Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize the separator in name #10363

Closed
wants to merge 11 commits into from

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented May 3, 2024

Which issue does this PR close?

Closes #10364 .

Rationale for this change

I got the name checking error that due to the mismatch of separator in creating name. I found that most of the name built with single comma and space, so I think we should switch others to this format.

Error is like

External error: query failed: DataFusion error: Internal error: Input field name SUM(get_field(CASE WHEN get_field(t2.struct(t1.time,t1.load1,t1.load2,t1.host), Utf8("c3")) IS NOT NULL THEN t2.struct(t1.time,t1.load1,t1.load2,t1.host) END,Utf8("c2"))) does not match with the projection expression SUM(get_field(CASE WHEN get_field(t2.struct(t1.time,t1.load1,t1.load2,t1.host),Utf8("c3")) IS NOT NULL THEN t2.struct(t1.time,t1.load1,t1.load2,t1.host) END,Utf8("c2"))).
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
[SQL] select t2."struct(t1.time,t1.load1,t1.load2,t1.host)"['c3'] as host, sum((case when t2."struct(t1.time,t1.load1,t1.load2,t1.host)"['c3'] is not null then t2."struct(t1.time,t1.load1,t1.load2,t1.host)" end)['c2']) from (select struct(time,load1,load2,host) from t1) t2 where t2."struct(t1.time,t1.load1,t1.load2,t1.host)"['c3'] IS NOT NULL group by t2."struct(t1.time,t1.load1,t1.load2,t1.host)"['c3'] order by host;

I got one with t2.struct(t1.time,t1.load1,t1.load2,t1.host), Utf8("c3") and another t2.struct(t1.time,t1.load1,t1.load2,t1.host),Utf8("c3"). That is why I think we should fix the separator.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

You can find the error here with cargo test --test sqllogictests -- expr. Although, it is not on the main branch, but I think most of the changes are unrelated to the name checking issue.

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core datafusion crate sqllogictest labels May 3, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 changed the title Minor: Standardize the separator in name Standardize the separator in name May 12, 2024
@@ -1859,7 +1859,7 @@ pub(crate) fn create_name(e: &Expr) -> Result<String> {
AggregateFunctionDefinition::UDF(..) => {
let names: Vec<String> =
args.iter().map(create_name).collect::<Result<_>>()?;
names.join(",")
names.join(", ")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the main change

@@ -291,7 +291,7 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
///
fn display_name(&self, args: &[Expr]) -> Result<String> {
let names: Vec<String> = args.iter().map(create_name).collect::<Result<_>>()?;
Ok(format!("{}({})", self.name(), names.join(",")))
Ok(format!("{}({})", self.name(), names.join(", ")))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the main change

@jayzhan211 jayzhan211 marked this pull request as ready for review May 12, 2024 08:41
Comment on lines +55 to +70
#[macro_export]
macro_rules! assert_snapshot {
($CHUNKS: expr) => {
let formatted = $crate::arrow::util::pretty::pretty_format_batches_with_options(
$CHUNKS,
&$crate::format::DEFAULT_FORMAT_OPTIONS,
)
.unwrap()
.to_string();

let actual_lines: Vec<&str> = formatted.trim().lines().collect();

insta::assert_yaml_snapshot!(actual_lines);
};
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use this for tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test with insta. Because changing them manually is painful. I did this in datafusion-example, since they are not test.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 this looks like a really nice contributon

Given the insta tool is a new dependency and I think has larger implications than just this PR I think we should create a new ticket / PR to propose adding it to DataFusion, rather than including it as part of this unrelated PR (though I suspect one reason to use insta is to make updating the tests easier)

];

assert_fn_batches!(expr, expected, 1);
assert_fn_batches!(expr, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was easier to see what was going on with these tests when the results are inlined -- i think insta supports that too "Inlined snapshots" -- https://insta.rs/docs/snapshot-types/

@@ -121,7 +121,7 @@ fn create_function_physical_name(
false => "",
};

let phys_name = format!("{}({}{})", fun, distinct_str, names.join(","));
let phys_name = format!("{}({}{})", fun, distinct_str, names.join(", "));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes in this file the only in the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several one, but are all like this

@jayzhan211 jayzhan211 marked this pull request as draft May 13, 2024 23:37
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Jul 13, 2024
@jayzhan211 jayzhan211 closed this Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate logical-expr Logical plan and expressions sqllogictest Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Standardize the separator in name
3 participants