Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exasol support #2046

Merged
merged 16 commits into from
Aug 20, 2024
Merged

Exasol support #2046

merged 16 commits into from
Aug 20, 2024

Conversation

ssteinhauser
Copy link
Contributor

@ssteinhauser ssteinhauser commented Jul 23, 2024

As already mentioned in one of my previous issues, I'd like to implement support for the Exasol syntax. For this, I will provide multiple PRs over time.

This is the first one, implementing REGEXP_LIKE support and allowing subselects without parentheses being part of functions with multiple parameters.

Hint: Hide whitespace changes to see the actual changes not related to formatting

Stefan Steinhauser added 2 commits July 22, 2024 16:59
Allow unparenthesesed sub selects being part of multiple function parameters
@manticore-projects
Copy link
Contributor

Thank you for your interest and contribution. I would look at it promptly, but please resolve the conflicts first (just resynchronize the repositories).

Copy link
Contributor

@manticore-projects manticore-projects left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't reformat files!
Beside this, there are 2 more relevant remarks.

@@ -4527,6 +4529,8 @@ Expression PrimaryExpression() #PrimaryExpression:

| LOOKAHEAD( ParenthesedSelect() , {!interrupted} ) retval=ParenthesedSelect()

| LOOKAHEAD( Select() , {!interrupted} ) retval=Select()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please where is this coming from and why do we need it?
It's certainly misplaces and the Semantic Lookahead will make it terribly expensive.

Copy link
Contributor

@manticore-projects manticore-projects Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very possible that the two performance tests fail now because of this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes (at least) from the need to support sub selects (without parentheses) as function arguments for functions with mutliple parameters, like the example implemented in the unit tests: SELECT COALESCE(SELECT mycolumn from mytable, 0)

From what I could see, this is a InternalFunction and multiple parameters are handeled via ExpressionList and ultimately handeled via PrimaryExpression.
It's also likely that sub selects without parentheses are supported elsewhere (outside functions) in Exasol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example would be the following query, that won't be parsable without that change: select * from mytable where mycol = select mycol from anothertable

@manticore-projects
Copy link
Contributor

Greetings, two issues (beside the reformatting):

  1. please feel free to relax the time out conditions (temporarily):
testRecursiveBracketExpressionIssue1019_2 at NestedBracketsPerformanceTest.java:181
testDeepFunctionParameters at NestedBracketsPerformanceTest.java:355
  1. There are now many more Choice conflicts, please fix them (to the absolute bare minimum)

@ssteinhauser
Copy link
Contributor Author

Greetings, two issues (beside the reformatting):

1. please feel free to relax the time out conditions (temporarily):
testRecursiveBracketExpressionIssue1019_2 at NestedBracketsPerformanceTest.java:181
testDeepFunctionParameters at NestedBracketsPerformanceTest.java:355
2. There are now many more `Choice conflicts`, please fix them (to the absolute bare minimum)

The choice conflicts are solved.

Regarding the first one: I think this one does not come from my changes since the same error occurs on the master branch already. However, how can I reduce the timeout conditions of only those two tests?

@manticore-projects
Copy link
Contributor

Regarding the first one: I think this one does not come from my changes since the same error occurs on the master branch already. However, how can I reduce the timeout conditions of only those two tests?

Please check my remark on the semantic lookahead for the SELECT first -- I am pretty sure its the cause of this performance deterioration and I am not sure why you want parse for plain selects there.

But in general, yes you can increase the time out or reduce the number of loops for those 2 tests. Just check the source, it should be pretty self explaining.

@ssteinhauser
Copy link
Contributor Author

Are you referring to your comment regarding the SELECT above?
I've also already added remarks on that. Doesn't that explain why I want to implement that? I can also check for more examples if you want to?
Furthermore, if I revert the changes that were merged recently (i.e. 93c8210), the tests don't fail anymore. Also the other way around, tests are failing on the master branch also since those changes were merged (see https://github.com/JSQLParser/JSqlParser/actions/runs/10051649914/job/27781582452 and https://github.com/JSQLParser/JSqlParser/actions/runs/10052623497/job/27784109464)

@manticore-projects
Copy link
Contributor

Are you referring to your comment regarding the SELECT above?

yes.

I've also already added remarks on that. Doesn't that explain why I want to implement that? I can also check for more examples if you want to?

where, I did not see any response to my remark?

@ssteinhauser
Copy link
Contributor Author

where, I did not see any response to my remark?

Directly above under your remarks, I've added multiple comments to the different topics. Unfortunately, I cannot link it since GitHub does not provide a link directly to those comments.

@manticore-projects
Copy link
Contributor

where, I did not see any response to my remark?

Directly above under your remarks, I've added multiple comments to the different topics. Unfortunately, I cannot link it since GitHub does not provide a link directly to those comments.

Unfortunately I can't see any response, please don't ask me why. I am lost on this.

image

Can you please explain it here in this discussion what this SELECT shall do?

Copy link
Contributor Author

@ssteinhauser ssteinhauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really sorry, the remarks were still on pending and I didn't see how to publish them in the first look.
Hopefully, now you should see my comments. If not let me know and I'll post them as normal comments again.

@@ -4527,6 +4529,8 @@ Expression PrimaryExpression() #PrimaryExpression:

| LOOKAHEAD( ParenthesedSelect() , {!interrupted} ) retval=ParenthesedSelect()

| LOOKAHEAD( Select() , {!interrupted} ) retval=Select()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes (at least) from the need to support sub selects (without parentheses) as function arguments for functions with mutliple parameters, like the example implemented in the unit tests: SELECT COALESCE(SELECT mycolumn from mytable, 0)

From what I could see, this is a InternalFunction and multiple parameters are handeled via ExpressionList and ultimately handeled via PrimaryExpression.
It's also likely that sub selects without parentheses are supported elsewhere (outside functions) in Exasol.

@@ -4527,6 +4529,8 @@ Expression PrimaryExpression() #PrimaryExpression:

| LOOKAHEAD( ParenthesedSelect() , {!interrupted} ) retval=ParenthesedSelect()

| LOOKAHEAD( Select() , {!interrupted} ) retval=Select()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example would be the following query, that won't be parsable without that change: select * from mytable where mycol = select mycol from anothertable

@manticore-projects
Copy link
Contributor

manticore-projects commented Jul 23, 2024

Ok, I can see the explanation now and understand what you try to achieve. However, it gives me some headache:

  1. We have already Function allowing for a SELECT parameter (without brackets)
  2. select * from mytable where mycol = select mycol from anothertable should never work
  3. the SELECT production checks for parenthesed selects too. So if we really are going to add the Select then we can replace the ParenthesedSelect -- and we also can remove the Select from the Function then because it would be covered by the PrimaryExpression indeed.

I am not opposing Exasol per se, but I do see Sub Selects without brackets as very evil:

select select 1;

select 1 where 1 = select 1 + 1;

@ssteinhauser
Copy link
Contributor Author

Ok, I can see the explanation now and understand what you try to achieve. However, it gives me some headache:

1. We have already `Function` allowing for a `SELECT` parameter (without brackets)

That's right, but only if it is the only parameter of the function, right?

2. `select * from mytable where mycol = select mycol from anothertable` should never work

But this works on Exasol.

3. the `SELECT` production checks for parenthesed selects too. So if we really are going to add the `Select` then we can replace the `ParenthesedSelect` -- and we also can remove the `Select` from the `Function` then because it would be covered by the `PrimaryExpression` indeed.

You are right, I could implement that, but let us first discuss on how we proceed with that feature in general.

I am not opposing Exasol per se, but I do see Sub Selects without brackets as very evil:

select select 1;

select 1 where 1 = select 1 + 1;

I agree, that this is evil and not really nice, but Exasol supports all of those statements.

@manticore-projects
Copy link
Contributor

On formatting: you can always run the Spotless formatApply task, which will apply consistent and accepted formatting on all changed files.

@manticore-projects
Copy link
Contributor

Ok, I can see the explanation now and understand what you try to achieve. However, it gives me some headache:

1. We have already `Function` allowing for a `SELECT` parameter (without brackets)

That's right, but only if it is the only parameter of the function, right?

You are correct and I would prefer to extend the Function definition accordingly when needed.
Because really, a function argument is the only case where I will want to see a SELECT without brackets. And even then:

-- what exactly will this mean? 
SELECT function(SELECT 1, 2)
3. the `SELECT` production checks for parenthesed selects too. So if we really are going to add the `Select` then we can replace the `ParenthesedSelect` -- and we also can remove the `Select` from the `Function` then because it would be covered by the `PrimaryExpression` indeed.

You are right, I could implement that, but let us first discuss on how we proceed with that feature in general.

Agreed. This matter is complex because it quickly can have unwanted negative side effects.

I am not opposing Exasol per se, but I do see Sub Selects without brackets as very evil:

select select 1;

select 1 where 1 = select 1 + 1;

I agree, that this is evil and not really nice, but Exasol supports all of those statements.

I believe you can understand for my example, why I have so many concerns here.
So I wonder: Would Exasol not be the perfect example for introducing dialect specific parsing modes, e. g. allowing those Unparenthesed SubSelects only when the Exasol dialect is set:


LOOKAHEAD( {dialect==Dialect.EXASOL}, 2 ) select = SELECT()
|
LOOKAHEAD(3) select = ParenthesedSelect()

@ssteinhauser
Copy link
Contributor Author

On formatting: you can always run the Spotless formatApply task, which will apply consistent and accepted formatting on all changed files.

I tried this, but this reformats the whole file just as it has been in the first run.

@manticore-projects
Copy link
Contributor

On formatting: you can always run the Spotless formatApply task, which will apply consistent and accepted formatting on all changed files.

I tried this, but this reformats the whole file just as it has been in the first run.

This is strange because only files with actual changes should be formatted.
Anyway, so be it: please format with Spotless even when the whole file would be affected. I will waive it.

Sorry for the extra steps.

@ssteinhauser
Copy link
Contributor Author

I believe you can understand for my example, why I have so many concerns here. So I wonder: Would Exasol not be the perfect example for introducing dialect specific parsing modes, e. g. allowing those Unparenthesed SubSelects only when the Exasol dialect is set:


LOOKAHEAD( {dialect==Dialect.EXASOL}, 2 ) select = SELECT()
|
LOOKAHEAD(3) select = ParenthesedSelect()

Dialect specific parsing could make sense in other cases as well.

Do you have any thoughts on how to implement it? Do you want to have it implemented over the whole grammar or just for this special case for now?
Should it just be an enum or should it be something more flexible, e.g. if some parts of the grammar are supported by multiple databases?

@ssteinhauser
Copy link
Contributor Author

On formatting: you can always run the Spotless formatApply task, which will apply consistent and accepted formatting on all changed files.

I tried this, but this reformats the whole file just as it has been in the first run.

This is strange because only files with actual changes should be formatted. Anyway, so be it: please format with Spotless even when the whole file would be affected. I will waive it.

Sorry for the extra steps.

No problem, will reformat the files.
Just for clarification, the formatApply task just formats the files that have been changed, but it reformats the whole file, not only changed lines.

@manticore-projects
Copy link
Contributor

I believe you can understand for my example, why I have so many concerns here. So I wonder: Would Exasol not be the perfect example for introducing dialect specific parsing modes, e. g. allowing those Unparenthesed SubSelects only when the Exasol dialect is set:


LOOKAHEAD( {dialect==Dialect.EXASOL}, 2 ) select = SELECT()
|
LOOKAHEAD(3) select = ParenthesedSelect()

Dialect specific parsing could make sense in other cases as well.

Yes, we just tried to avoid it so far since many features have an impact on many dialects.

Do you have any thoughts on how to implement it? Do you want to have it implemented over the whole grammar or just for this special case for now?

I have thought about it for a long time actually and I see three ways in general:

a) the Grammar is built from a template and we would end up with specific parsers
b) the parser is generated hot and on demand
c) we control only certain lines using a dialect variable and semantic lookaheads (like I have shown)

Should it just be an enum or should it be something more flexible, e.g. if some parts of the grammar are supported by multiple databases?

I feel like right know it should be an enum, which would cover only a very few really exceptional corner cases like DuckDB structs and your Exasol unparenthesed subselects.
The idea of JSQLParser was to be RDBMS agnostic and I would love to keep this concept alive as long as possible -- but your unparenthesed subselects really push it.

So, in my opinion we should try 3 steps:

  1. define a new Feature dialect which can be of GENERAL, DUCKDB, EXASOL (see the other Features like allowComplexParsing for guidances)

  2. introduce your SELECT as PrimaryExpression with a LookAhead for dialect==EXASOL only

  3. the streamline the Grammar so we avoid any redundant semantic lookahead

Lets see how far we get.

@manticore-projects
Copy link
Contributor

Just for clarification, the formatApply task just formats the files that have been changed, but it reformats the whole file, not only changed lines.

Yes, exactly.
I got only very confused when the whole file got formatted although I did not see any content change in those files. Maybe I missed something.

@manticore-projects
Copy link
Contributor

Btw, 'IN' predicates seem to similar affected and its Exasol and DB2.
So maybe, instead of calling it a dialect, the feature may be ALLOW_UNPARENTHESED_SUB_SELECTS?!

@manticore-projects
Copy link
Contributor

The Exasol documentation is not helpful at all because it shows the brackets as mandatory: https://docs.exasol.com/db/latest/sql_references/predicates/not_in.htm

@ssteinhauser
Copy link
Contributor Author

So, in my opinion we should try 3 steps:

1. define a new `Feature` `dialect` which can be of `GENERAL`, `DUCKDB`, `EXASOL` (see the other `Features` like `allowComplexParsing` for guidances)

2. introduce your `SELECT` as `PrimaryExpression` with a LookAhead for `dialect==EXASOL` only

3. the streamline the Grammar so we avoid any redundant semantic lookahead

Btw, 'IN' predicates seem to similar affected and its Exasol and DB2. So maybe, instead of calling it a dialect, the feature may be ALLOW_UNPARENTHESED_SUB_SELECTS?!

I'll try to implement the steps above and agree with the last suggestion to call the feature something like ALLOW_UNPARENTHESED_SUB_SELECTS and not make it DBMS specific.

@manticore-projects
Copy link
Contributor

I'll try to implement the steps above and agree with the last suggestion to call the feature something like ALLOW_UNPARENTHESED_SUB_SELECTS and not make it DBMS specific.

Sounds like a plan.
If you would like to give me access to your fork/repo I would be more than willing to help/support with the implementation, especially with providing the new Feature.

@ssteinhauser
Copy link
Contributor Author

I'll try to implement the steps above and agree with the last suggestion to call the feature something like ALLOW_UNPARENTHESED_SUB_SELECTS and not make it DBMS specific.

Sounds like a plan. If you would like to give me access to your fork/repo I would be more than willing to help/support with the implementation, especially with providing the new Feature.

Sure, I will grant access to the fork. Thank you very much for your support, I really appreciate it.
I've already started introducing the new feature, but it's still missing the support of unparenthesized sub select as from item.

@manticore-projects
Copy link
Contributor

This looks really good to me, well done!

Considering the IN predicate (without brackets), can we find a more generic name for this new Feature which covers both sub-selects and IN predicates (as well as any other fragments, which should be in between brackets in general but not in Exasol/DB2)?

@manticore-projects
Copy link
Contributor

manticore-projects commented Jul 24, 2024

I have added a test for failing, when the specific flag is not set and this works as expected.
Added some fine-tuning on the performance tests so you can control the timeouts better via @timeout test annotation.

Please decide on the best name for the feature (depending if we want to cover IN predicates also) and then we can merge.

PS: really well done!

@ssteinhauser
Copy link
Contributor Author

I have added a test for failing, when the specific flag is not set and this works as expected. Added some fine-tuning on the performance tests so you can control the timeouts better via @timeout test annotation.

Please decide on the best name for the feature (depending if we want to cover IN predicates also) and then we can merge.

PS: really well done!

Thank you for your support on this!

I didn't really get the point yet on the DB2 IN predicate you mention. Isn't the whole syntax covered in the existing implementation? Maybe I miss something here or I have wrong documentations.

Besides that, I think it is really related to unparenthesized sub selects or can you imagine on another expression as well?

Before merging, I'd like to also cover the case, that an unparenthesized sub select can be a FromItem as well. Unfortunately, the abstract Select class does not implement FromItem right now.

@manticore-projects
Copy link
Contributor

On the IN predicate, my understanding is that both DB2 and Exasol allow:

-- IN with one argument does not need brackets?!
select 1 from dual where a IN 1;

-- IN with two or more arguments needs brackets
select 1 from dual where a IN (1, 2);

Looks similar to me, although I also agree that we should focus on the sub selects first and then introduce a separate Feature for the IN predicate later.

@manticore-projects
Copy link
Contributor

Before merging, I'd like to also cover the case, that an unparenthesized sub select can be a FromItem as well. Unfortunately, the abstract Select class does not implement FromItem right now.

You may want to look into LateralSubSelect for reference.

@ssteinhauser
Copy link
Contributor Author

On the IN predicate, my understanding is that both DB2 and Exasol allow:

-- IN with one argument does not need brackets?!
select 1 from dual where a IN 1;

-- IN with two or more arguments needs brackets
select 1 from dual where a IN (1, 2);

Looks similar to me, although I also agree that we should focus on the sub selects first and then introduce a separate Feature for the IN predicate later.

That's right, this looks similar. But both queries can already be parsed without any feature flag.

Before merging, I'd like to also cover the case, that an unparenthesized sub select can be a FromItem as well. Unfortunately, the abstract Select class does not implement FromItem right now.

You may want to look into LateralSubSelect for reference.

Thanks for the hint, I will have a look at this one.

@manticore-projects
Copy link
Contributor

That's right, this looks similar. But both queries can already be parsed without any feature flag.

Do you know the feeling, when you discover that apparently you have done something exceptionally in the past -- but you can't remember, when and why? :-D
I would have waged on we not supporting this yet.

@ssteinhauser
Copy link
Contributor Author

I've had a look at LateralSubSelect, but I think to support the complete syntax of the feature flagged unparenthesised Select() as FromItem, the Select class would have to implement the FromItem interface and I would have to implement the methods of FromItem in all sub classes which are extending Select.
Before I go ahead and do the implementation I would like to ask for your opinion on this. Do you agree with it or do you have another suggestion.

@manticore-projects
Copy link
Contributor

Greetings!

What will we d with this PR please?

@ssteinhauser
Copy link
Contributor Author

I've been waiting all the time for you reply on my previous comment on this. Since this is quite a little bit of work and I don't want to do any wasted effort, I wanted to discuss if my intention matches your opinion.
However, currently I am busy with other things unfortunately. So I'd suggest to merge that Pull Request and discuss the implementation of Unparenthesized sub selects as FromItem in a separate one.

@manticore-projects
Copy link
Contributor

I've been waiting all the time for you reply on my previous comment on this.

Sorry, I have not been fully aware of this.

So I'd suggest to merge that Pull Request and discuss the implementation of Unparenthesized sub selects as FromItem in a separate one.

Yes, lets just get a version compiling please.

@ssteinhauser
Copy link
Contributor Author

Yes, lets just get a version compiling please.

Yes, I am currently working on it, but due to a change in the master branch (probably this one dc14c4b) where no updateKeywords has been executed, the build fails now.

@manticore-projects manticore-projects merged commit 93ca661 into JSQLParser:master Aug 20, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants