Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should bracket quotation (like sqlserver supports it) be removed to be able to support array constructs #677

Closed
wumpz opened this issue Sep 8, 2018 · 31 comments

Comments

@wumpz
Copy link
Member

wumpz commented Sep 8, 2018

At the moment JSqlParser supports quotation like [colname]. This is the reason it cannot support simple array constructs using brackets as well.

As the bracket quotation, IMHO, should be replaced by double quotes.

Should the bracket quotation be removed to get the way free to arrays?

@wumpz wumpz changed the title removal of sqlservers bracket quotation to be able to support array constructs should bracket quotation (like sqlserver supports it) be removed to be able to support array constructs Sep 10, 2018
@wumpz
Copy link
Member Author

wumpz commented Oct 26, 2018

So no opinion from JSqlParsers users so far.

I tend to do extactly this, removing bracket quotation for version 2.0 of JSqlParser, to do array processing.

@CodingFabian
Copy link

since we have seen arrays but not this quotation, I would vote for yes

@wumpz
Copy link
Member Author

wumpz commented Nov 11, 2018

So be it. Starting with version 2 JSqlParser drops support for bracket quotation.

@wumpz wumpz closed this as completed Nov 11, 2018
@yeswaydude
Copy link

There is a huge history of SQL that has continued to run against SQL Server for decades that uses the square bracket-quoted identifiers. If I recall correctly, some SQL Server syntax can only be accessed via the square brackets (i.e. FOR XML directives).

Does someone have an example of array syntax with an explanation of how support for that precludes support for square bracket-quoted identifiers?

@wumpz
Copy link
Member Author

wumpz commented Dec 4, 2018

Some would be (from Postgresqls manual (http://www.postgresqltutorial.com/postgresql-array/) ):

SELECT
 id,
 name,
 phones [ 2 ]
FROM
 contacts
WHERE
 id = 3;

SqlServer would interpret [2] as an alias, which is perfectly fine. Nevertheless array access using brackets is more or less standardized as well within most programming languages.

@gordthompson
Copy link

IMO it certainly would be a shame if "drops support for bracket quotation" meant that v2.x would be completely unable to parse T-SQL like

SELECT [column name] FROM [table name];

As mentioned in a previous comment, like it or not there is an enormous amount of T-SQL out there that uses [column name] instead of "column name", despite what ANSI (or others) might say.

@wumpz
Copy link
Member Author

wumpz commented Feb 21, 2019

I will happily reopen this issue. I as well was surprised, that so little seemed to be interested. I want a discussion. I want suggestions how to implement it. I would keep both, if it works, but I assume is collides to much:

select a [b]

is using sqlserver an alias and using e.g. postgres an array access.

@wumpz wumpz reopened this Feb 21, 2019
@gordthompson
Copy link

gordthompson commented Feb 22, 2019

If it is a question of dropping the currently-supported [bracket quoting] in order to support array constructs then one possibility might be to add the option to specify something like tsqlQuoting when creating a parser. If tsqlQuoting is true then [thing] would be a table/column reference, but if it is false then we may be able to safely assume that [thing] is an array reference (because a quoted table/column name would presumably be the ANSI-quoted "thing").

@wumpz
Copy link
Member Author

wumpz commented Feb 23, 2019

One can achieve this by semantic lookahead?

@gordthompson
Copy link

One can achieve this by semantic lookahead?

Maybe. Automatically detecting [column_name] vs. [array_reference] would certainly be nice if that is practical given the variety of SQL dialects out there. In other words, is it reasonable to expect that we could define parsing rules (possibly including lookahead) to reliably differentiate between the two cases for an arbitrary SQL statement without prior knowledge of the dialect?

Providing an explicit hint would definitely make it easier for the parser, in any case.

@wumpz
Copy link
Member Author

wumpz commented Feb 27, 2019

In the past, I tried various variations to tackle this problem. I could not get a proper solution. Any help would be appreciated.
One problem is, that an array construct could hold any type of expression.

A solution would be to switch off and on a defined token. But I could not achieve this.

@wumpz
Copy link
Member Author

wumpz commented Mar 19, 2019

Version 2 is released with bracket support. Skipped removal due to hopefully ongoing discussion.

@kiover
Copy link

kiover commented Apr 23, 2019

how to parser 'begin' statement?

@wumpz
Copy link
Member Author

wumpz commented Apr 23, 2019

@kiover could it be that you wanted to open a new issue?

@SerialVelocity
Copy link
Contributor

@wumpz It would be nice if you put SQL extensions behind feature flags.
e.g.

CCJSqlParserUtil.builder()
  .withTSQLBracketNotation()
  .withOracleHints()
  .withPostgresArraySyntax()
  .withPostgresSetCommands()
  .build()

(ofc the above would fail because of conflicting options though)
or even just specify a syntax to parse:

new CCJSqlParserUtil(Syntax.POSTGRES)

@wumpz
Copy link
Member Author

wumpz commented Apr 29, 2019

Again the same problem. How to modify behaviour of tokenizing by flags. Productions you can disable, but tokens? How would you do that using JavaCC? The brackets are part of a token.

@SerialVelocity
Copy link
Contributor

SerialVelocity commented Apr 30, 2019

Can you use a templated tokeniser? It would require compiling multiple tokenisers but will give you the flexibility to do the new CCJSqlParserUtil(Syntax.POSTGRES) solution.

@wumpz
Copy link
Member Author

wumpz commented May 2, 2019

@SerialVelocity What do you mean by a templated tokenizer? Does JavaCC support something like this? I know JavaCC token modes.

@SerialVelocity
Copy link
Contributor

Sorry, I phrased that comment badly. I meant you could use a templating language to generate your JavaCC input files. That way, you can choose one of the parsers based on user/runtime input.

I don't know anything about JavaCC or how you would do this in Maven, but in Gradle you could add a task that runs before JavaCC that takes a file like src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt.tmpl and generates src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.generic.jjt and src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.t-sql.jjt.

@AnEmortalKid
Copy link
Contributor

@SerialVelocity would it be more feasible to have provider specific implementations or jsqlparser and the common api for parsing in different modules?

jsqlparser - the api that loads a parser implementation
jsqlparser-tsql - specific to tsql

So

Statement stmt = CCJSqlParserUtil.parse("SELECT * FROM tab1");

Would still be the public api but would return a TSQLStatement or MySQLStatement based on what parser impl you have on the classpath?

@SerialVelocity
Copy link
Contributor

Since the majority of the code is identical, I think having it all in one module would be better. Most of the parsers are nearly identical and I would expect them to all extend a base class with a common set of SQL. (You will have to be careful with things like "Oracle hints" since a generic statement visitor should probably visit them)

The idea of automatically deciding on a parser impl, depending on what is on the classpath, is magic that isn't needed. It will cause issues in the future where people may want to parse multiple different SQL languages. (e.g. When converting between them).

I think one of the following would work better:

CCJSqlPostgresParserUtil.parse("SELECT * FROM tab1");
CCJSqlParserUtil.parsePostgres("SELECT * FROM tab1");
new CCJPostgresSqlParser().parse("SELECT * FROM tab1");

The reason why I think that is that all of the above can return a concrete type (i.e. PostgresStatement) rather than just a Statement.

@SerialVelocity
Copy link
Contributor

This could also be used for things like #791 where limit is only a reserved word for certain SQL languages (https://www.petefreitag.com/tools/sql_reserved_words_checker/?word=limit)

@SerialVelocity
Copy link
Contributor

@GitHub4ZhangJX That is unrelated as it uses parentheses, not brackets. Please file a separate issue.

@wumpz
Copy link
Member Author

wumpz commented May 24, 2019

@GitHub4ZhangJX Please file a new issue. Your problem / issue does not belong to this discussion.

@SerialVelocity
Copy link
Contributor

@wumpz Did you come to a conclusion about what you think the best way of implementing this would be?

@wumpz
Copy link
Member Author

wumpz commented Jul 11, 2019

I think at the moment, the fallback token way of javacc/javacc#90 is the way to go. With this I could switch on / off token useage.

@wumpz
Copy link
Member Author

wumpz commented Aug 12, 2019

The implementation has started. I now learned how to implement this token fallback method. It seems to work as expected.

The bracket quotation will still be in place, but has to be enabled by some parser switch. I think the array functionality should be the default behaviour.

@wumpz wumpz added this to To do in array accessor Aug 12, 2019
@wumpz wumpz moved this from To do to In progress in array accessor Aug 12, 2019
@AnEmortalKid
Copy link
Contributor

I think the array functionality should be the default behaviour.

+1 , provided we have a note on the wiki/readme around this :)

@wumpz wumpz moved this from In progress to Done in array accessor Aug 13, 2019
@wumpz wumpz closed this as completed in 695d532 Aug 13, 2019
@revusky revusky mentioned this issue Oct 18, 2020
@revusky
Copy link

revusky commented Apr 16, 2021

Hi all (I mean whoever is interested in this sort of thing...). There is a specific reason why this sort of problem is basIcally intractable in legacy JavaCC. I explain it here

That is a blog post of mine from nearly 5 months ago. A very big problem is that legacy JavaCC provides no way to "massage" the token stream in a CommonTokenAction (what I call a "token hook") method because it's called in the wrong spot. But this has been fixed.

Now, in terms of supporting mutiple grammars (I know you reject that, and I can understand why, but if you finally did decide to go with that sort of solution, JavaCC21 now has a preprocessor.. And that's pretty robust. I'm using it increasingly in internal development. I mean to say, I can understand why you dislike the idea of maintaining multiple grammars/parsers, but the thing is that it's not such an unmaintainable mess if you have both an INCLUDE directive combined with the aforementioned preprocessor. You can do things like:

   #if standard_syntax
        INCLUDE "standard.javacc"
   #elif alternative_syntax1
        INCLUDE "alternative1.javacc"
    #else
         INCLUDE "other_alternative.javacc"
     #endif

This is the preprocessor that is standard in C# actually. So, with these features, maintaining multiple grammars that are slightly different one from the other (though perhaps still slightly evil) is now basically tractable....

But, you know, generally speaking, every single major problem in legacy JavaCC has been addressed in JavaCC21. In fact, the last two really big issues were the lack of support for supplementary unicode characters and that really annoying code too large problem. and I nailed those recently.

So, I very honestly believe that every last major issue in legacy JavaCC has been fixed. In my work, I mean, which is the only actively developed version of JavaCC.... Oh, and come to think of it, you might be interested in this.

Oh, that, and the fact that you are not limited to JDK7 level Java in your code actions. https://javacc.com/2021/03/21/announcement-jdk-16-support/

Well, anyway, there's the shameless plug (or multi-plug) but the whole situation is kind of dumbfounding really. Just look through the various links. I think that every problem you have with that crufty old legacy JavaCC is now solved. Do you want to pretend that all this work I did simply doesn't exist or something?

@revusky
Copy link

revusky commented Apr 16, 2021

@SerialVelocity would it be more feasible to have provider specific implementations or jsqlparser and the common api for parsing in different modules?

jsqlparser - the api that loads a parser implementation
jsqlparser-tsql - specific to tsql

So

Statement stmt = CCJSqlParserUtil.parse("SELECT * FROM tab1");

Would still be the public api but would return a TSQLStatement or MySQLStatement based on what parser impl you have on the classpath?

The problem is that if you were try to do this with the legacy JavaCC tool, it would quickly turn into an unmaintainable mess, because it lacks key features that allow you to do this in a maintainable way. The key features in question are a preprocessor and an INCLUDE directive.

(I basically said the same thing in another message already, but that could have gotten lost in the large amount of information, so....)

@revusky
Copy link

revusky commented Jun 15, 2021

Just for your information, this whole issue has been killed dead now in JavaCC21. Just look at this blog post

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

8 participants