feat: data to test java substrait consumer #89

davisusanibar · 2023-03-21T23:09:20Z

kou · 2023-03-22T01:24:41Z

Could you remove .DS_Stores?

lidavidm

@westonpace would it be useful to have more Acero/Substrait testing data like this in apache/arrow?

Also: are there any licensing concerns with including TPC-H queries? (Though for that matter: why are the SQL queries here in the first place? We can't do anything with them.)

Can you include a README to explain the purpose of these files?

westonpace · 2023-03-23T13:16:49Z

@westonpace would it be useful to have more Acero/Substrait testing data like this in apache/arrow?

@lidavidm

We have a lot of hard-coded JSON but its embedded in the test files themselves (e.g. serde_test.cc or test_substrait.py) and not in standalone files. The original concern around hard-coded JSON was that Substrait may evolve quickly and those JSON files would be difficult to maintain. For example, the JSON files in this PR are missing the version field (Isthmus does not yet populate this) and they don't have URIs for the extension functions (almost no one generates these yet). So they may need to change at some point.

As a result, I have been waiting for the text format to be ready before I made any attempt to curate a large set of test queries (but that is still a few months off at least).

I think SQL is probably a pretty good solution if you have a good SQL->Substrait library (that may be an advantage for Java). In that case I would suggest only storing the SQL and then generating the Substrait on the fly.

I don't actually know what the legal ramifications are for TPC-H but it is a good question.

lidavidm · 2023-03-23T13:33:44Z

Hmm, ok. Depending on Isthmus is a problem for Java since it requires a newer Java version than what we use, so we have to start doing acrobatics with the test setup.

@davisusanibar would you like to start a mailing list discussion about requiring JDK11+ for development & testing in general, but building JARs targeting JDK8? Then we could depend on Isthmus and avoid embedding potentially unstable plans. (Currently, our docs state we can still build on JDK8.)

westonpace · 2023-03-23T14:11:16Z

Another option is maybe that we create some kind of docker / ci script in this repo that creates the proto / json files from SQL files?

lidavidm · 2023-03-23T14:24:35Z

Ah, that would work, too. Though to start with I'm not sure if it even needs to be automated per se.

westonpace · 2023-03-24T15:35:19Z

Though to start with I'm not sure if it even needs to be automated per se.

Agreed. As long as the source SQL files are stored alongside the Substrait files, with a detailed README explaining how to regenerate the Substrait files, then I think we can proceed.

lidavidm · 2023-03-24T15:59:15Z

OK, then I think what needs to happen here is just check in a Java file (maybe something that'll work with jshell?) showing how to generate the plans given the SQL files, and address the existing review feedback.

davisusanibar · 2023-03-28T18:47:25Z

Hi Team,

Just close this PR. Let's start Java Substrait Consumer on the same way that Python/C++ Substrait plans were implemented (maintain the json plan on testing files).

feat: data to test java substrait consumer

cfd10ea

fix: delete invalid files

aca3744

davisusanibar mentioned this pull request Mar 22, 2023

GH-34223: [Java] Java Substrait Consumer JNI call to ACERO C++ apache/arrow#34227

Merged

lidavidm reviewed Mar 22, 2023

View reviewed changes

davisusanibar closed this Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: data to test java substrait consumer #89

feat: data to test java substrait consumer #89

Uh oh!

davisusanibar commented Mar 21, 2023

Uh oh!

kou commented Mar 22, 2023

Uh oh!

lidavidm left a comment

Uh oh!

westonpace commented Mar 23, 2023 •

edited

Loading

Uh oh!

lidavidm commented Mar 23, 2023

Uh oh!

westonpace commented Mar 23, 2023

Uh oh!

lidavidm commented Mar 23, 2023

Uh oh!

westonpace commented Mar 24, 2023

Uh oh!

lidavidm commented Mar 24, 2023

Uh oh!

davisusanibar commented Mar 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: data to test java substrait consumer #89

feat: data to test java substrait consumer #89

Uh oh!

Conversation

davisusanibar commented Mar 21, 2023

Uh oh!

kou commented Mar 22, 2023

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

westonpace commented Mar 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lidavidm commented Mar 23, 2023

Uh oh!

westonpace commented Mar 23, 2023

Uh oh!

lidavidm commented Mar 23, 2023

Uh oh!

westonpace commented Mar 24, 2023

Uh oh!

lidavidm commented Mar 24, 2023

Uh oh!

davisusanibar commented Mar 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

westonpace commented Mar 23, 2023 •

edited

Loading