Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash composite children selection in addition to arguments #495

Closed
agu-z opened this issue Apr 7, 2021 · 24 comments · Fixed by #556
Closed

Hash composite children selection in addition to arguments #495

agu-z opened this issue Apr 7, 2021 · 24 comments · Fixed by #556
Labels

Comments

@agu-z
Copy link

agu-z commented Apr 7, 2021

It's currently possible to write queries that will compile but still fail by composing two or more SelectionSets that use the same composite field.

For example:

SelectionSet.map2 User 
    (Query.me User.firstName) 
    (Query.me User.lastName)

Generates the following query:

query { 
  me { 
    firstName3832528868: firstName 
  } 
  me { 
    lastName3832528868: lastName 
  } 
}

This query will always fail. Depending on the backend implementation, the validation engine could error out, indicating me needs to be aliased, or the resulting JSON will contain only firstName or lastName, leading to a decoding error.

This behavior is not limited to RootQuery selections, but any composite field used more than once in the final SelectionSet.

The same way this library automatically aliases fields with a hash of their arguments, it'd be nice if it would also consider a composite field's children for the hash. Generating:

query { 
  me4047547180: me { 
    firstName3832528868: firstName 
  } 
  me3393677220: me { 
    lastName3832528868: lastName 
  } 
}

I feel that's a natural extension of one of the top features which is not to have to think about aliasing.

An alternative option is to merge composite fields with different selections into a single one. In some cases, this could positively affect performance, but maybe that's too high-level or complicated.

Anyway, thanks for making the nicest graphql client out there :)

@TobiasPfeifer
Copy link

TobiasPfeifer commented Jul 2, 2021

Encountered the same Problem. Is there a workaround to create a valid query/mutation?

mutation { 
  createObj (arg1: "Argument", arg2: "Argument") {
    wrapped {
      arg1,
      arg2
    }
  } 
}

however I get a query that has a structure like this:

mutation { 
  createObj (arg1: "Argument", arg2: "Argument") {
    wrapped {
      arg1
    },
    wrapped {
      arg2
    }
  } 
}

Edit: Found a workaround using pipes and modelling the Wrapper as a type

createObj: SelectionSet ObjectCreated Obj.Object.ObjectCreated
createObj =
    let
        innerSelection : SelectionSet Wrapped Obj.Object.Wrapped
        innerSelection = SelectionSet.succeed Wrapped 
            |> with (Obj.Object.Wrapped.arg1)
            |> with (Obj.Object.Wrapped.arg2)
    in
    SelectionSet.succeed ObjectCreated 
        |> with (Obj.Object.ObjectCreated.wrapped innerSelection)

However I cannot find a way to skip the wrapper data type and have ObjectCreated.arg1 instead of ObjectCreated.wrapped.arg1

@agu-z
Copy link
Author

agu-z commented Jul 4, 2021

@TobiasPfeifer Here are some ways to keep your data structure flat:

If your ObjectCreated type only contains selections from wrapped

type alias ObjectCreated = 
    { arg1 : String
    , arg2 : String
    }

You can skip Wrapped and map into ObjectCreated from your selections on Obj.Object.Wrapped:

createObj: SelectionSet ObjectCreated Obj.Object.ObjectCreated
createObj =
    let
        innerSelection : SelectionSet ObjectCreated Obj.Object.Wrapped
        innerSelection = SelectionSet.succeed ObjectCreated 
            |> with (Obj.Object.Wrapped.arg1)
            |> with (Obj.Object.Wrapped.arg2)
         -- You can also do the above with map2
    in
    Obj.Object.ObjectCreated.wrapped innerSelection

If you have other fields on ObjectCreated

type alias ObjectCreated = 
    { arg1 : String
    , arg2 : String
    , somethingElse : SomethingElseType 
    }

Then you could temporarily keep arg1 and arg2 in a tuple (or your Wrapped record if you want to):

innerSelection : SelectionSet (String, String) Obj.Object.Wrapped   
innerSelection = SelectionSet.succeed Tuple.pair 
    |> with (Obj.Object.Wrapped.arg1)
    |> with (Obj.Object.Wrapped.arg2)

and then partially apply the ObjectCreated function with them:

SelectionSet.succeed (\(arg1, arg2) -> ObjectCreated arg1 arg2) 
    |> with (Obj.Object.ObjectCreated.wrapped innerSelection)
    |> with somethingElseOnObjectCreated

Conclusion

Because you can map at any level, the library doesn't make you adhere to any particular data structure, which is great.

However, according to the library types, you might think you can do this:

SelectionSet.succeed ObjectCreated  
    |> with (Obj.Object.ObjectCreated.wrapped Obj.Object.Wrapped.arg1)
    |> with (Obj.Object.ObjectCreated.wrapped Obj.Object.Wrapped.arg2)
    |> with somethingElseOnObjectCreated

This seems like a convenient way to flatten the structure to match your needs, but even though it compiles, it will always fail at runtime.

@dillonkearns
Copy link
Owner

Hello @agu-z and @TobiasPfeifer, thanks so much for all these useful details here! elm-graphql definitely promises to give you an abstraction that makes it impossible to have a field alias collision, so I would consider this to be a bug since it is not providing that guarantee here.

I'm not sure exactly what the fix is without digging into it more, but I pass in strings in the generated code in other places to ensure that the hash includes all the relevant information that can make a field alias unique. For example, we ended up adding the name of the types as a String argument to fix #95

Object.selectionForField "String" "greet" [ Argument.required "input" requiredArgs____.input Swapi.InputObject.encodeGreeting ] Decode.string

@agu-z
Copy link
Author

agu-z commented Aug 23, 2021

@dillonkearns I think in this case, all the information needed is already there.

Currently, Composite fields are only aliased based on their arguments:

maybeAliasHash : RawField -> Maybe Int
maybeAliasHash field =
(case field of
Composite name arguments children ->
if List.isEmpty arguments then
Nothing
else
arguments
|> Argument.serialize
|> Just

I think the solution would conceptually look like this:

maybeAliasHash : RawField -> Maybe Int
maybeAliasHash field =
    (case field of
        Composite name arguments children ->
            Argument.serialize arguments
                ++ serializeChildren Nothing children
                |> Just

However, this exact change might be bad for performance because children would now be recursively serialized twice.

So maybe Document.Field.serialize and friends would have to be refactored slightly so that children serialization only occurs once. I'm not sure how bad it would be though, maybe it's negligible.

I've only looked into the internals of elm-graphql for the last 30 min, so I might be way off 😄

@dillonkearns
Copy link
Owner

Hmm, okay so thinking about it a bit more... I think it would work and be a lot simpler to implement if all sibling composite fields were merged up. Not sure if there are any cases I'm not considering that would make this approach problematic, but on first thought it seems viable:

SelectionSet.map2 User 
    (Query.me User.firstName) 
    (Query.me User.lastName)
Generates the following query:

Currently serializes to (error):

query { 
  me { 
    firstName3832528868: firstName 
  } 
  me { 
    lastName3832528868: lastName 
  } 
}

Could instead serialize to:

query { 
  me { 
    firstName3832528868: firstName 
    lastName3832528868: lastName 
  } 
}

Since me can never be referenced directly in this context, it seems like this would solve the problem pretty cleanly.

@dillonkearns
Copy link
Owner

And it wouldn't require a change to the code generation, I believe. It would only change the serialize function.

@dillonkearns
Copy link
Owner

Okay, here's a failing test that I think could be a promising starting point for this approach:

https://github.com/dillonkearns/elm-graphql/compare/merge-composite-children

@dillonkearns
Copy link
Owner

The test output is

    "query {\n  me {\n    firstName0: firstName\n  }\n  me {\n    lastName0: lastName\n  }\n}"
    ╷
    │ Expect.equal
    ╵
    "query {\n  me {\n    firstName0: firstName\n    lastName0: lastName\n  }\n}"

@agu-z
Copy link
Author

agu-z commented Aug 23, 2021

Right, when I mentioned this in the first post I wasn't sure if that'd be expected.

However, if you're onboard, I do think it's a superior solution because it could free you to compose selections however you like without worrying about computing the same thing many times in the backend.

This another cool way this package can be higher level than other clients.

Would you merge on serialization or earlier on SelectionSet composition?

@dillonkearns
Copy link
Owner

Gotcha, yeah I think it will be pretty negligible considering that we only need to run the merge logic on composite fields with the same field name, which will generally be pretty rare. And even in those cases I don't think it will be a noticeable performance cost if it's done in a minimal way.

Here's the RawField type definition for reference.

type RawField
    = Composite String (List Argument) (List RawField)
    | Leaf { typeString : String, fieldName : String } (List Argument)

One algorithm to merge a List RawField could be:

  1. Split apart the List RawField into something like
{
   leaves: List { typeString : String, fieldName : String } (List Argument)`
  , composites : Dict String ((List Argument), (List RawField))
}
  1. For each key in composites, compare arguments and merge all that have equivalent List Argument. Merge by concatenating their List RawField together into a single Composite field.
  2. The List RawField at that level can now just be all of the leaves plus the result of merging together composites
  3. Recursively run the same algorithm on all of the List RawField children for each of the resulting Composite fields.

The part of the algorithm that checks if the List Argument is equivalent within sibling Composite fields with the same name is the most expensive part probably, but it only needs to be run within Composite siblings calling the same field, so I think the performance cost will be minimal.

@dillonkearns
Copy link
Owner

Oh wait, even better...

The hash needs to be uniquely computed for the args. So creating a Dict keyed off of the field name and hashed arguments (could just be concatenating those as a String) would ensure uniqueness without doing any looping to compare things manually. More robust, and as fast as building a Dict.

@dillonkearns
Copy link
Owner

I believe this fixes it.

Can anyone think of any other cases to test out? #556

@agu-z
Copy link
Author

agu-z commented Aug 23, 2021

The hash needs to be uniquely computed for the args. So creating a Dict keyed off of the field name and hashed arguments (could just be concatenating those as a String) would ensure uniqueness without doing any looping to compare things manually. More robust, and as fast as building a Dict.

Ah, hashes save the day again! 😄

Can anyone think of any other cases to test out? #556

Is there something about Union types to consider? Looking at the code, it seems like it'd be fine because fragment selections work like any other composite field selection and they'll get a different name based on the case.

As long as only sibling composite fields are merged, it should be fine. I didn't see a test for that in your PR, but I guess it's obvious that won't happen based on the nature of your mergeFields function.

@dillonkearns
Copy link
Owner

Union types are a good consideration, although I think that's more a concern of the code generation. Since this is happening at the low-level, I think it's only responsibility is to ensure that there are never any fields in the request that have the same field name or field alias. So that makes it a lot simpler to think about!

@agu-z
Copy link
Author

agu-z commented Aug 23, 2021

Makes sense. I like this outcome a lot :)

@dillonkearns
Copy link
Owner

Testing this out on some examples, I'm realizing that duplicate Leaf fields should also be deduped.

For example, this invalid GraphQL Request

query {
  human1213318493: human(id: "1001") {
    id994531791: id
    id994531791: id
    id994531791: id
  }
}

Is generated from this elm-graphql code in the current #556 branch:

    Query.human { id = CustomScalarCodecs.Id 1001 }
        (SelectionSet.map3 Response
            Swapi.Object.Human.id
            Swapi.Object.Human.id
            Swapi.Object.Human.id
        )

@agu-z
Copy link
Author

agu-z commented Aug 23, 2021

Ah yes, I did notice that but I figured that wasn't invalid GraphQL because in my testing the server responded fine and since they all had the same alias and content there was no decoding error.

@agu-z
Copy link
Author

agu-z commented Aug 23, 2021

But it seems natural to do it here too

@dillonkearns
Copy link
Owner

Ah, interesting, yes I didn't get a server error for that either. I was trying to find it in the GraphQL specification but couldn't.

Seems like it couldn't hurt to dedupe it anyway, though, and it was an easy change, so seems like I may as well leave it in: b8bc555

What do you think? Seems like it could only be more robust with the deduped Leaves, right?

@dillonkearns
Copy link
Owner

Oh, I found the section: https://spec.graphql.org/draft/#sec-Field-Selection-Merging

@dillonkearns
Copy link
Owner

Alright, so thinking about it a bit more, it is indeed clearly valid according to that section of the spec to have duplicate names for leaves. But I can't think of any place it would be problematic, and if anything I could imagine some implementations doing duplicate work based on that.

So I'm thinking let's leave the Leaf de-duping in. In the future if it ever comes up (which seems unlikely) then we can revisit it.

@agu-z
Copy link
Author

agu-z commented Aug 24, 2021

Seems like it couldn't hurt to dedupe it anyway, though, and it was an easy change, so seems like I may as well leave it in: b8bc555

Agreed, I would leave it in. Even if it doesn't actually break, it might help skip some work in the backend and save some surprises / concerns when looking at the generated queries.

@dillonkearns
Copy link
Owner

That was my line of thinking as well. Good to get a second opinion, thanks for sharing your thoughts on this!

@dillonkearns
Copy link
Owner

This is now live with Elm package version 5.0.6 🎉

Thanks again for the feedback and discussion on this 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants