Skip to content

Feat: allow user-supplied queries to generate unit test data#2579

Merged
georgesittas merged 2 commits intomainfrom
jo/generate_test_data_with_user_supplied_queries
May 8, 2024
Merged

Feat: allow user-supplied queries to generate unit test data#2579
georgesittas merged 2 commits intomainfrom
jo/generate_test_data_with_user_supplied_queries

Conversation

@georgesittas
Copy link
Contributor

@georgesittas georgesittas commented May 7, 2024

Fixes #2542, TL;DR this PR adds a new format for specifying unit test fixture/expected data: SQL. :-)

  • Discuss identifier normalization decisions. Initial approach was to execute user-supplied queries without normalizing or quoting them, but this led to issues in the create_view method because the fully-qualified view name in the CREATE VIEW <name> AS ... statement won't be quoted if we set quote_identifiers=False, which may cause problems.

@erindru I ran your example against Trino and it seemed to work fine. I feel like the flexibility this feature provides will allow users to bypass conversion issues in unit tests pretty easily. Thanks for the idea!

EDIT: will make a followup PR to update the unit test documentation.

@georgesittas georgesittas requested review from a team, izeigerman and tobymao May 7, 2024 23:23
@georgesittas georgesittas force-pushed the jo/generate_test_data_with_user_supplied_queries branch from 7550c84 to 215b6ed Compare May 7, 2024 23:24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing, but i think it makes sense, we need to normalize by the model, that's always true or else references will be wrong.

the parsing should be supplied by the actual test engine so there are no nuanences with transpilation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both points are exactly what I had in mind 👍

@georgesittas georgesittas force-pushed the jo/generate_test_data_with_user_supplied_queries branch from 215b6ed to 12771bf Compare May 7, 2024 23:30
@erindru
Copy link
Collaborator

erindru commented May 7, 2024

Allowing users to supply arbitrary SQL to produce a resultset for tests is a nice escape hatch imo.

Of course in an ideal world SQLGlot would be able to generate the correct query 100% of the time, but the variations in SQL syntax and obscure data types between engines are essentially infinite.

I think allowing users to un-block themselves by supplying the query manually when they encounter an edge case will go a long way to reducing friction

@georgesittas georgesittas force-pushed the jo/generate_test_data_with_user_supplied_queries branch from 12771bf to 2c8ae82 Compare May 8, 2024 14:49
@georgesittas georgesittas force-pushed the jo/generate_test_data_with_user_supplied_queries branch from 2c8ae82 to c580b0c Compare May 8, 2024 20:49
@georgesittas georgesittas merged commit 21e84f0 into main May 8, 2024
@georgesittas georgesittas deleted the jo/generate_test_data_with_user_supplied_queries branch May 8, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid SQL generated for the ROW type in Trino unit test fixtures

3 participants