variable deduplication #87

Geal · 2021-11-04T16:54:05Z

When doing a query like this:

query ExampleQuery($topProductsFirst: Int) {
  topProducts(first: $topProductsFirst) {
    name
   reviews {
    body author {
      username
      reviews {
        id
        body
        product {
          name
        }
      }
    }
  }
}

we will do a serie of requests, first to producs to get the topproducts, then to reviews, then to products. The second request to products aggregates the queries but does not deduplicate them, so we get variables like this:

{"representations":
  Array([
    Object({"__typename": String("Product"), "upc": String("1")}),
    Object({"__typename": String("Product"), "upc": String("2")}),
    Object({"__typename": String("Product"), "upc": String("3")}),  
    Object({"__typename": String("Product"), "upc": String("1")}), 
    Object({"__typename": String("Product"), "upc": String("1")}),
    Object({"__typename": String("Product"), "upc": String("2")}), 
    Object({"__typename": String("Product"), "upc": String("3")}),
    Object({"__typename": String("Product"), "upc": String("1")})
])}

The router should be able to recognize that we are asking multiple times for the same data, and only ask each item once, so there should be a mapping between the reviews and the products query: (with indexes, reviews should reference [0, 1, 2, 0, 0, 1, 2, 0] from the variables array [ { "__typename": "Product", "upc": "1" }, { "__typename": "Product", "upc": "2" }, { "__typename": "Product", "upc": "3" } ])

To go further, the first request to products returned this:

Object({"topProducts":
  Array([
    Object({"__typename": String("Product"), "upc": String("1"), "name": String("Table")}),
    Object({"__typename": String("Product"), "upc": String("2"), "name": String("Couch")}),
    Object({"__typename": String("Product"), "upc": String("3"), "name": String("Chair")})])})

so for each of these products, we already knew the name, so we could have avoided entirely the second call to products.

Implementation details

Variables in representations are generated here:

router/crates/apollo-router-core/src/federated.rs

Lines 314 to 315 in 46b0191

    
           let representations = response.select(current_dir, requires, &schema)?; 
        
           variables.insert("representations".into(), representations);

from select in

router/crates/apollo-router-core/src/response.rs

Lines 59 to 85 in 46b0191

    
           pub fn select( 
        
               &self, 
        
               path: &Path, 
        
               selections: &[Selection], 
        
               schema: &Schema, 
        
           ) -> Result<Value, FetchError> { 
        
               let values = 
        
                   self.data 
        
                       .get_at_path(path) 
        
                       .map_err(|err| FetchError::ExecutionPathNotFound { 
        
                           reason: err.to_string(), 
        
                       })?; 
        
               Ok(Value::Array( 
        
                   values 
        
                       .into_iter() 
        
                       .flat_map(|value| match (value, selections) { 
        
                           (Value::Object(content), requires) => { 
        
                               select_object(content, requires, schema).transpose() 
        
                           } 
        
                           (_, _) => Some(Err(FetchError::ExecutionInvalidContent { 
        
                               reason: "not an object".to_string(), 
        
                           })), 
        
                       }) 
        
                       .collect::<Result<Vec<_>, _>>()?, 
        
               )) 
        
           }

The annoying part is that since we are working with serde_json::Value elements, that do not implement Hash, they are hard to deduplicate (for now, there's work happening on that serde-rs/json#747 serde-rs/json#720 serde-rs/json#814 )

What could be done here? Should we use another structure than serde_json::Value? Should the response contain a kind of repository of values to look up and avoid some queries?

The text was updated successfully, but these errors were encountered:

abernix · 2021-12-08T14:03:14Z

This sounds related to one of the concerns raised in apollographql/federation#180 (just noting for the sake of posterity). I think right now this would be potentially solved in that repo's query planner, right?

abernix · 2021-12-08T14:04:13Z

(Well, I guess it's execution of the plan, but I guess this exists in both implementations right now, yea?)

Geal · 2021-12-08T14:52:42Z

I am not sure this could be solved in the query planner, this is very dependent on runtime data. If you query:

query {
  topProducts {
    reviews {
      author { name }
  }
}

on this data:

[
  {
    "id": 1,
    "name": "chair",
    "reviews": [
      {
         "author": {
           "id": 1234,
           "name": "Alice"
         }
      },
     {
         "author": {
           "id": 5678,
           "name": "Bob"
         }
      }
    ]
  },
  {
    "id": 2,
    "name": "couch",
    "reviews": [
      {
         "author": {
           "id": 1234,
           "name": "Alice"
         }
      }
    ]
  }
]

we could get a query to the users subgraph with the following representations: [{id: 1234}, {id: 5678}, {id: 1234}], and the query planner has no way to know how many of those will be generated

On the other hand, it is easily solvable with #250: in the phase generating the variables for a subgraph query, it creates for each representation a "path" into the end response, to know where to store it.
It would be possible to go through that list, and for each unique representation, store the list of paths. Then when we get the responses, write the corresponding object at all those paths

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>

Geal added the triage label Nov 4, 2021

Geal mentioned this issue Nov 22, 2021

avoid cloning entities when building the response #132

Merged

This was referenced Dec 13, 2021

Remove the response mutex, refactor fetch_node and fix response merging #250

Merged

zero copy deserialization #283

Closed

bnjjj self-assigned this Apr 19, 2022

bnjjj added a commit that referenced this issue Apr 19, 2022

Variable deduplication for query planner #87

b58d92c

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>

bnjjj mentioned this issue Apr 19, 2022

Variable deduplication in query planner #872

Merged

abernix removed the triage label May 23, 2022

bnjjj closed this as completed in #872 Jun 7, 2022

tinnou pushed a commit to Netflix-Skunkworks/router that referenced this issue Oct 16, 2023

fix: fixes crates.io publish for harmonizer (apollographql#87)

986f0fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

variable deduplication #87

variable deduplication #87

Geal commented Nov 4, 2021

abernix commented Dec 8, 2021

abernix commented Dec 8, 2021

Geal commented Dec 8, 2021 •

edited

variable deduplication #87

variable deduplication #87

Comments

Geal commented Nov 4, 2021

Implementation details

abernix commented Dec 8, 2021

abernix commented Dec 8, 2021

Geal commented Dec 8, 2021 • edited

Geal commented Dec 8, 2021 •

edited