Skip to content

AQL query does not properly handle null detection #12490

@Calrain

Description

@Calrain

My Environment

  • ArangoDB Version: 3.6.5
  • Storage Engine: RocksDB
  • Deployment Mode: Single Server
  • Deployment Strategy: Manual Start
  • Configuration: Running in Windows 2016 Server
  • Infrastructure: own
  • Operating System: Windows 2016 Server
  • Total RAM in your machine: 8Gb
  • Disks in use: SSD

Component, Query & Data

Affected feature:
AQL does not handle null detection with subqueries

__AQL query:

LET data = [
    {
        product: 1,
        info: [
            "advocado",
            "banana"
        ]
    },
    {
        product: 2,
        info: null
    },
    {
        product: 3,
        info: [
            "coconut",
            "mango"
        ]
    }
]

FOR d IN data
RETURN {
    product: d.product,
    summary: (d.info == null ? 'none' : CONCAT(FOR i IN d.info RETURN i))
}

When this command runs, I get this error:
Query: AQL: collection or array expected as operand to FOR loop; you provided a value of type 'null' (while executing)

I would expect this query to work.

__AQL explain:

Execution plan:
 Id   NodeType            Est.   Comment
  1   SingletonNode          1   * ROOT
  2   CalculationNode        1     - LET data = [ { "product" : 1, "info" : [ "advocado", "banana" ] }, { "product" : 2, "info" : null }, { "product" : 3, "info" : [ "coconut", "mango" ] } ]   /* json expression */   /* const assignment */
  4   EnumerateListNode      3     - FOR d IN data   /* list iteration */
 12   SubqueryStartNode      3       - LET #3 = ( /* subquery begin */
  6   CalculationNode        3         - LET #7 = d.`info`   /* attribute expression */
  7   EnumerateListNode    300         - FOR i IN #7   /* list iteration */
 13   SubqueryEndNode      300       - RETURN  i ) /* subquery end */
 10   CalculationNode      300       - LET #9 = { "product" : d.`product`, "summary" : ((d.`info` == null) ? "none" : CONCAT(#3)) }   /* simple expression */
 11   ReturnNode           300       - RETURN #9

Indexes used:
 none

Functions used:
 Name     Deterministic   Cacheable   Uses V8
 CONCAT   true            true        false  

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   remove-redundant-calculations
  3   remove-unnecessary-calculations
  4   splice-subqueries

Dataset:
This query has the dataset included, but I've found this exact problem with many queries that I've had to work with.

When running a query over an array of objects, and where you're trying to look at a key in those objects that is also an array, it seems to be impossible to get AQL to detect that it's a null valued key.

What happens is that subqueries that are only supposed to run on the key if it's non-null still fire off.

In the example above you can see (apologies for the formatting, just helps me understand it) that I am trying to determine if d.info is null, because if it is I want to return a simple string called 'none', but only if it's non-null should it perform that CONCAT command that iterates over d.info.

It seems the AQL is trying to still run over d.info, so I'm not sure if it's a bug with my query, a bad way of detecting if d.info is null, or something else that I'm missing.

Steps to reproduce

  1. Run the query above, that should show it

Problem:
Trying to get this AQL query to work.

I know that a simple solution is to ensure the second product have an .info key of [] instead of being null, but there are real situations where it's not possible to do that.

This example is a simplified version of it happening with much more complicated queries with COLLECT and AGGREGATE function use. Interestingly those commands aren't part of the problem, it seems to be that I'm just unable to get AQL to not perform that CONCAT (or any other query that iterates over d.info) if d.info is null.

Expected result:

[
  {
    "product": 1,
    "summary": "advocadobanana"
  },
  {
    "product": 2,
    "summary": "none"
  },
  {
    "product": 3,
    "summary": "coconutmango"
  }
]

Note: Flipping around the ternary operator doesn't fix it either, e.g. using:

FOR d IN data
RETURN {
    product: d.product,
    summary: (d.info ? CONCAT(FOR i IN d.info RETURN i) : 'none')
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions