Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AQL: Update keyword list #2047

Merged
merged 6 commits into from
Sep 12, 2019
Merged

AQL: Update keyword list #2047

merged 6 commits into from
Sep 12, 2019

Conversation

Simran-B
Copy link
Contributor

@Simran-B Simran-B commented Sep 6, 2019

Add K_SHORTEST_PATHS, OPTIONS, SEARCH, TO

Technically, some of the words are not keywords / reserved words in AQL (you may use them as attribute names for instance without having to wrap them in backticks). Should they be removed from this list and implemented in a different way? If no, then e.g. doc.search will have search colored like a keyword even though ArangoDB processes it like a regular attribute access. If yes, how to do that in prism?

NEW and OLD are special variables (non-keyword), but their availability depends on the presence of a modification (sub-)query using the INSERT/UPDATE/REPLACE/REMOVE keywords. Note that they are case-sensitive! Because of that, they can not simply be added to the existing keyword list, which is case-insensitive.

True keywords:

  • AGGREGATE (keyword)
  • ALL (modifier)
  • AND (operator)
  • ANY (modifier)
  • ASC (keyword)
  • COLLECT (declaration)
  • DESC (keyword)
  • DISTINCT (modifier)
  • FILTER (declaration)
  • FOR (declaration)
  • GRAPH (keyword)
  • IN (keyword)
  • INBOUND (modifier)
  • INSERT (command)
  • INTO (keyword)
  • LET (declaration)
  • LIKE (operator)
  • LIMIT (declaration)
  • NONE (modifier)
  • NOT (operator)
  • OR (operator)
  • OUTBOUND (modifier)
  • REMOVE (command)
  • REPLACE (command)
  • RETURN (declaration)
  • SHORTEST_PATH (keyword)
  • SORT (declaration)
  • UPDATE (command)
  • UPSERT (command)
  • WITH (keyword)

Keyword-like:

  • true
  • false
  • null

Not keywords but used in language constructs:

  • NEW (case-sensitive!)
  • OLD (case-sensitive!)
  • OPTIONS
  • SEARCH
  • TO

Add K_SHORTEST_PATHS, OPTIONS, SEARCH, TO
@RunDevelopment
Copy link
Member

Thank you for the detailed explanation!

I added NEW and OLD to keywords (case-sensitive).

If no, then e.g. doc.search will have search colored like a keyword even though ArangoDB processes it like a regular attribute access.

This will simply be a false positive. Many other languages also suffer from those (e.g. JS with from). You could tokenize every property access but that will generate a lot of extra tokens which aren't styled in any way.

How common will this be?
Depending on how many users will experience this issue, the overhead might be worth it.

@Simran-B
Copy link
Contributor Author

I added NEW and OLD to keywords (case-sensitive).

Very nice, thanks!

How common will this be?
Depending on how many users will experience this issue, the overhead might be worth it.

The most common occurrence will probably be attribute access using the dot notation. It depends on what the user calls his attributes. If the attribute name is any of NEW, OLD, options, search or to then there will be a false-positive, e.g. FOR doc IN coll RETURN doc.options.

The bracket notation is largely unaffected (string literal: FOR doc IN coll RETURN doc["options"]), but users may also call collections and variables after one of these pseudo-keywords (FOR doc IN options / LET search = "xxx" RETURN doc[search]).

Overall, it shouldn't be too bad and I doubt that it's worth to go to great length to avoid false-positives. I wonder however if there are some low hanging fruits.

For example, to prevent search from being identified as keyword in doc.search, we could use something other than \b which excludes the dot. My first thought was \s, but it isn't actually guaranteed to be surrounded by whitespace:

FOR doc IN someView SEARCH(doc.attr == "foo") RETURN doc

Eventhough this should be a function call to an unknown function SEARCH, it is actually interpreted as SEARCH doc.attr == "foo". I need to think and test a bit more to be able to tell if there's always a space in front of SEARCH at least.

@RunDevelopment
Copy link
Member

I made a small adjustment to get rid of most of the false positives:

image

Code like foo. search or foo[ to] will still lead to false positives, so let's hope that AQL coders like formatted code.

@Simran-B
Copy link
Contributor Author

I appreciate your fixes for foo.search and foo[options]! I hope it's not a big performance hit?

While it would be nice to not color attribute keys in object literals, it doesn't seem like a big problem to me and is probably hard to detect (there is the short notation { search }, normal notation { search: search } as well as the notation for a dynamically computed key { [ search ]: ... }).

BTW. I missed LIKE (operator) in my list above (added it), but it is already implemented in Prism.

On a different note, should the expression for matching number literals be adjusted to match the current AQL parser?

  • 00.23: 2+ leading zeros are invalid
  • 01.23 / 00: padding zeroes are invalid
  • .123 / -.123 / .1e4 / -.1e4: omission of leading zero is supported since v3.5.0

https://www.arangodb.com/docs/stable/aql/fundamentals-data-types.html

The token definition distinguishes between integers and double values:
https://github.com/arangodb/arangodb/blob/devel/arangod/Aql/tokens.ll#L459

(0|[1-9][0-9]*)
((0|[1-9][0-9]*)(\.[0-9]+)?|\.[0-9]+)([eE][\-\+]?[0-9]+)?

I think the operator regex needs to be adjusted to account for the array contraction operator:

'operator': /\*{2,}|[=!]~|[!=<>]=?|&&|\|\||[-+*/%]/,

@RunDevelopment
Copy link
Member

I hope it's not a big performance hit?

I used a fast method but in turn, I accepted some false positives (see my previous comment).

Regarding the short notation (e.g. { search }): As you suspected, these will be almost impossible to detect. Square brackets, while less impossible will still be quite hard.

I fixed the array concat operator and implemented the stricter number pattern.

With this, I think we are ready to merge. This PR is already beyond updating the keyword list, so for further suggestion/improvements, please open a new issue. (It's just to keep things simple and organized.)

@RunDevelopment RunDevelopment merged commit 32a4c42 into PrismJS:master Sep 12, 2019
@RunDevelopment
Copy link
Member

Thank you for contributing!

@Simran-B
Copy link
Contributor Author

Thanks for your help!

It just occurred to be that there is one more non-reserved word used in language constructs: the COUNT (case-insensitive) in COLLECT ... WITH COUNT INTO .... I will check the bison/flex source code tomorrow to see if there are any more constructs that define such words and open another PR.

@RunDevelopment
Copy link
Member

Thank you! I'll be looking forward to it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants