Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on AQL parser and WITH clause usage #19651

Open
Nahuel92 opened this issue Aug 22, 2023 · 5 comments
Open

Feedback on AQL parser and WITH clause usage #19651

Nahuel92 opened this issue Aug 22, 2023 · 5 comments
Labels
1 Feature 3 AQL Query language related

Comments

@Nahuel92
Copy link

Nahuel92 commented Aug 22, 2023

My Environment

  • ArangoDB Version: 3.10.8
  • Deployment Mode: Cluster
  • Deployment Strategy: Kubernetes
  • Configuration: One pod per node, operator version 1.2.24. 3 DB severs, 3 coordinators, 3 agents
  • Infrastructure: AWS (r5.large, 2 vCPU)
  • Operating System: Linux
  • Total RAM in your machine: 16 GB (per node)
  • Disks in use: EBS Volume (100 GiB per node)
  • Used Package: Unknown

Component, Query & Data

Affected feature:

Improvement request about AQL query error messages.

AQL query (if applicable):

N/A

AQL explain and/or profile (if applicable):

N/A

Dataset:

N/A

Size of your Dataset on disk:

N/A

Replication Factor & Number of Shards (Cluster only):

N/A

Steps to reproduce

N/A

Problem:

Hi, I have the following feedback that I wanted to share with the ArangoDB team:

  • The AQL parser should list all missing collections from the WITH clause at once instead of listing them one by one. I am currently fixing queries and what happens is that I fix one query by adding a collection and when I test again, I find out more collections are missing from the same query. The app complexity is huge, so I can't automate testing (at least not right now).

Expected result:

I expect the AQL parser to return all missing collections from the WITH clause at once. For example, if the query at runtime requires 3 collections but the AQL query has only one collection explicitly defined, I expect the parser to return the 2 missing collections at once.

And (optionally) WITH clause/operator usage should be mandatory following the opt-out model (even for non-cluster mode). This is to enforce that Local/Dev environments are as similar to Prod as possible.

@dothebart dothebart added 1 Feature 3 AQL Query language related 3 Optimizer AQL query optimizer is involved labels Aug 22, 2023
@jsteemann
Copy link
Contributor

The WITH clause can be made mandatory even in single server, by setting the following startup option:

  --query.require-with <boolean>                               whether AQL queries should require the `WITH collection-name` clause even on single servers (enable 
                                                               this to remove this behavior difference between single server and cluster) (default: false) 
                                                               (introduced in v3.8.0)

The problem with reporting the missing collections all at once is that the parser does not know which unspecified collections may be visited during the execution of a query.
For example, if you run the query

FOR v, e, p IN 1..3 OUTBOUND myEdgeCollection RETURN v

it is not clear at query parse time which other (vertex) collections the edges will link to. It will only become clear when looking up the edges and then trying to look up documents in the unspecified collections.

In theory we could make the query continue to collect further unspecified collections, and only make it fail at the very end. But that could also add considerable complexity, so I am not sure if that is a reasonable way forward.

@Simran-B Simran-B removed the 3 Optimizer AQL query optimizer is involved label Aug 24, 2023
@Nahuel92
Copy link
Author

The WITH clause can be made mandatory even in single server, by setting the following startup option:

  --query.require-with <boolean>                               whether AQL queries should require the `WITH collection-name` clause even on single servers (enable 
                                                               this to remove this behavior difference between single server and cluster) (default: false) 
                                                               (introduced in v3.8.0)

The problem with reporting the missing collections all at once is that the parser does not know which unspecified collections may be visited during the execution of a query. For example, if you run the query

FOR v, e, p IN 1..3 OUTBOUND myEdgeCollection RETURN v

it is not clear at query parse time which other (vertex) collections the edges will link to. It will only become clear when looking up the edges and then trying to look up documents in the unspecified collections.

In theory we could make the query continue to collect further unspecified collections, and only make it fail at the very end. But that could also add considerable complexity, so I am not sure if that is a reasonable way forward.

Hi there. Thanks for your answer :)
Yes, I am aware of the flag to make the WITH clause mandatory. I was just suggesting that it should be turned on by default (allowing users to opt-out, instead). As I see it, that would reduce differences between Dev and Prod environments, streamlining queries promotion from Dev to higher environments (not to mention query debugging).

On the other hand, I understand that returning all missing collections at once will add more complexity and may affect performance, but, from my perspective, it is more than reasonable if you consider how useful for debugging and troubleshooting it is (and it doesn't have to be enabled by default, it could be a flag that one can turn on as needed).

In my particular case, I had to fix tons of complex queries that didn't have a WITH clause and it was a huge pain in the neck having to do it one by one (and we are still finding broken queries used in less frequently used features).
I have been spending more than a month on fixing queries, just for you to have an idea of how awful the process is in my situation. Of course, this situation can't be generalized and say that every ArangoDB user will go through it, but I really would have loved to have some help on it.

@dothebart
Copy link
Contributor

you can choose your edges, and collect their relations to see all possible collections.

@Simran-B
Copy link
Contributor

The --query.require-with option has been introduced in 3.8.0 and it is off by default for backward compatibility. It could potentially be turned on by default in 4.x

@Nahuel92
Copy link
Author

Nahuel92 commented Sep 4, 2023

you can choose your edges, and collect their relations to see all possible collections.

That's a good idea, unfortunately I couldn't think of it when I was firefighting Prod but definitely will help in cases like the one I had.

The --query.require-with option has been introduced in 3.8.0 and it is off by default for backward compatibility. It could potentially be turned on by default in 4.x

That would be awesome, I find that useful for reducing the impedance across environments.

Thanks a lot, guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Feature 3 AQL Query language related
Projects
None yet
Development

No branches or pull requests

4 participants