Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variable deduplication #87

Closed
Geal opened this issue Nov 4, 2021 · 3 comments · Fixed by #872
Closed

variable deduplication #87

Geal opened this issue Nov 4, 2021 · 3 comments · Fixed by #872
Assignees

Comments

@Geal
Copy link
Contributor

Geal commented Nov 4, 2021

When doing a query like this:

query ExampleQuery($topProductsFirst: Int) {
  topProducts(first: $topProductsFirst) {
    name
   reviews {
    body author {
      username
      reviews {
        id
        body
        product {
          name
        }
      }
    }
  }
}

we will do a serie of requests, first to producs to get the topproducts, then to reviews, then to products. The second request to products aggregates the queries but does not deduplicate them, so we get variables like this:

{"representations":
  Array([
    Object({"__typename": String("Product"), "upc": String("1")}),
    Object({"__typename": String("Product"), "upc": String("2")}),
    Object({"__typename": String("Product"), "upc": String("3")}),  
    Object({"__typename": String("Product"), "upc": String("1")}), 
    Object({"__typename": String("Product"), "upc": String("1")}),
    Object({"__typename": String("Product"), "upc": String("2")}), 
    Object({"__typename": String("Product"), "upc": String("3")}),
    Object({"__typename": String("Product"), "upc": String("1")})
])}

The router should be able to recognize that we are asking multiple times for the same data, and only ask each item once, so there should be a mapping between the reviews and the products query: (with indexes, reviews should reference [0, 1, 2, 0, 0, 1, 2, 0] from the variables array [ { "__typename": "Product", "upc": "1" }, { "__typename": "Product", "upc": "2" }, { "__typename": "Product", "upc": "3" } ])

To go further, the first request to products returned this:

Object({"topProducts":
  Array([
    Object({"__typename": String("Product"), "upc": String("1"), "name": String("Table")}),
    Object({"__typename": String("Product"), "upc": String("2"), "name": String("Couch")}),
    Object({"__typename": String("Product"), "upc": String("3"), "name": String("Chair")})])})

so for each of these products, we already knew the name, so we could have avoided entirely the second call to products.

Implementation details

Variables in representations are generated here:

let representations = response.select(current_dir, requires, &schema)?;
variables.insert("representations".into(), representations);
from select in
pub fn select(
&self,
path: &Path,
selections: &[Selection],
schema: &Schema,
) -> Result<Value, FetchError> {
let values =
self.data
.get_at_path(path)
.map_err(|err| FetchError::ExecutionPathNotFound {
reason: err.to_string(),
})?;
Ok(Value::Array(
values
.into_iter()
.flat_map(|value| match (value, selections) {
(Value::Object(content), requires) => {
select_object(content, requires, schema).transpose()
}
(_, _) => Some(Err(FetchError::ExecutionInvalidContent {
reason: "not an object".to_string(),
})),
})
.collect::<Result<Vec<_>, _>>()?,
))
}

The annoying part is that since we are working with serde_json::Value elements, that do not implement Hash, they are hard to deduplicate (for now, there's work happening on that serde-rs/json#747 serde-rs/json#720 serde-rs/json#814 )

What could be done here? Should we use another structure than serde_json::Value? Should the response contain a kind of repository of values to look up and avoid some queries?

@abernix
Copy link
Member

abernix commented Dec 8, 2021

This sounds related to one of the concerns raised in apollographql/federation#180 (just noting for the sake of posterity). I think right now this would be potentially solved in that repo's query planner, right?

@abernix
Copy link
Member

abernix commented Dec 8, 2021

(Well, I guess it's execution of the plan, but I guess this exists in both implementations right now, yea?)

@Geal
Copy link
Contributor Author

Geal commented Dec 8, 2021

I am not sure this could be solved in the query planner, this is very dependent on runtime data. If you query:

query {
  topProducts {
    reviews {
      author { name }
  }
}

on this data:

[
  {
    "id": 1,
    "name": "chair",
    "reviews": [
      {
         "author": {
           "id": 1234,
           "name": "Alice"
         }
      },
     {
         "author": {
           "id": 5678,
           "name": "Bob"
         }
      }
    ]
  },
  {
    "id": 2,
    "name": "couch",
    "reviews": [
      {
         "author": {
           "id": 1234,
           "name": "Alice"
         }
      }
    ]
  }
]

we could get a query to the users subgraph with the following representations: [{id: 1234}, {id: 5678}, {id: 1234}], and the query planner has no way to know how many of those will be generated

On the other hand, it is easily solvable with #250: in the phase generating the variables for a subgraph query, it creates for each representation a "path" into the end response, to know where to store it.
It would be possible to go through that list, and for each unique representation, store the list of paths. Then when we get the responses, write the corresponding object at all those paths

@bnjjj bnjjj self-assigned this Apr 19, 2022
bnjjj added a commit that referenced this issue Apr 19, 2022
Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
@abernix abernix removed the triage label May 23, 2022
@bnjjj bnjjj closed this as completed in #872 Jun 7, 2022
tinnou pushed a commit to Netflix-Skunkworks/router that referenced this issue Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants