Skip to content

Conversation

gyDBD
Copy link
Contributor

@gyDBD gyDBD commented May 11, 2021

issue: #75

The endpoint that would benefit from this would be an endpoint that takes multiple batch ids and returns a nested object. See more explanation in the issue linked above.

Solution:

I add two new configurable key: propertyBatchKey and propertyNewKey to config the property key we want to batch up.

Here are the 3 cases that we support for now:
They all expect to have same input parameters format:

ids: str[] (should be the batchKey)
properties: str[] (should be the propertyBatchKey)

But they could have different response contract:

case 1: properties are returned in a nested object, id is at the top level.
e.g.: { id: 'id_1', properties: { property_1: 'foo', property_2: 'boo' } }

case 2: properties is not returned in a nested object, but spread at top level as well.
e.g.: { id: 'id_1', property_1: 'foo', property_2: 'boo' }

case 3: id and its properties are returned as a key-value pair at the top level.
e.g.: { 'id_1': { property_1: 'foo', property_2: 'boo' } }

Testing

update: I added more unit tests so that we know the new propertyBatchKey works fine with existing config.
e2e test with star war example: (data masking is correct, and when we query three fields, they only call the function once)
output: https://fluffy.yelpcorp.com/i/MpJQFFdv18k4jnc6tHCRV2Br5qVpVmjd.html

@gyDBD gyDBD requested review from magicmark and kkellyy May 11, 2021 17:47
@magicmark
Copy link
Collaborator

Looking good!

Couple questions:

1. Merging Strategies

Will this support use cases as described in the original issue? #75 ?

Looks like there's different shapes for how secondary stuff (arguments/response) could be exposed. It could be an array that needs merging (your use case), or object properties (the use case described in the original issue description). Maybe even another strategy like boolean merging (e.g. if include_extra_info was your secondary batch key, you'd use "true" as the superset value to merge)

even if we don't support the original use case for now, given we've seen different "behaviours", I wonder if we want an option to control that (even if this this is the only behaviour we support right now) - something like secondaryMergeStrategy, or maybe it's just something we can infer by the types, and we don't need this key 🤔

2. Data Masking

Assuming we make the following set of network requests:

getBusiness({ foo_ids: [1,2], properties: ['rating', 'name'], include_extra_info: false});
getBusiness({ foo_ids: [3], properties: ['rating'], include_extra_info: true});

What is the response to the callsite that made this initial loader request?

{ foo_id: 2, property: 'name', include_extra_info: false },

They only requested the name property - but would they get back 'rating' too?

(can't tell from the tests yet)

I think we'd want to make sure we don't leak extra info that wasn't requested - to avoid screwing up the types and introducing buggy responses that people rely on that we may want to fix one day.

this is awesome stuff!!!

Comment on lines 124 to 128
export function getBatchKeyForPartitionItems(
batchKey: string,
ignoreKeys: Array<string>,
items: ReadonlyArray<object>,
): ReadonlyArray<ReadonlyArray<number>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could partitionItems be re-used? What is this doing differently?

Copy link
Contributor Author

@gyDBD gyDBD May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of returning the order, here we return the batchKey value of that item, the main difference is line 135. https://github.com/Yelp/dataloader-codegen/pull/261/files#diff-92b6f52e2c1b302cd9d15133831c595f151215aa1dfda274ff29fbd33d89a9b2R135

For example, if we have (the batch key here is bar_id and secondary batch key is property) :

    { bar_id: 100, property: 'name', include_extra_info: true },
    { bar_id: 101, property: 'rating', include_extra_info: false },
    { bar_id: 100, property: 'rating', include_extra_info: true },

function partitionItems will return [ [ 0, 2 ], [ 1 ] ]
while, function getBatchKeyForPartitionItems with return [ [ 100, 100 ], [ 101 ] ]
We use this information to help us match results object to requests. see https://github.com/Yelp/dataloader-codegen/pull/261/files#diff-92b6f52e2c1b302cd9d15133831c595f151215aa1dfda274ff29fbd33d89a9b2R349-R350

@gyDBD
Copy link
Contributor Author

gyDBD commented May 11, 2021

1. Merging Strategies

Will this support use cases as described in the original issue? #75 ?

Looks like there's different shapes for how secondary stuff (arguments/response) could be exposed. It could be an array that needs merging (your use case), or object properties (the use case described in the original issue description). Maybe even another strategy like boolean merging (e.g. if include_extra_info was your secondary batch key, you'd use "true" as the superset value to merge)

even if we don't support the original use case for now, given we've seen different "behaviours", I wonder if we want an option to control that (even if this this is the only behaviour we support right now) - something like secondaryMergeStrategy, or maybe it's just something we can infer by the types, and we don't need this key 🤔

@magicmark I believe I did cover the use cases as described in the original issue #75. See this unit test the response is like { id1: {rating: 3, name: 'Burger King'}} vs my use case {foo_id: id1, rating: 3, name: 'Burger King'}. So either the batchKey value is as a key or as a value in the response, it's covered. However, if the response doesn't contain batchKey in it, e.g. {rating: 3, name: 'Burger King'} that cannot be covered.

Comment on lines 376 to 379
result = Object.assign(
{},
...[newKey, propertyBatchKeyPartion[i][j]].map((key) => ({ [key]: element[key] })),
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this syntax is really confusing to me

we usually don't need to explicitly call Object.assign for merging - we can use normal spread syntax:

const foo = {
  ...myObj1,
  ...myObj2,
};

I'm a bit more confused about what's going on here though.

> Object.assign({}, ...['hello', 'world'] )
{ '0': 'w', '1': 'o', '2': 'r', '3': 'l', '4': 'd' }

:thonk:

Could we separate out these lines or something? this is making my brain hurt a bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched how to only keep selected key-value pair in an object, https://stackoverflow.com/questions/54907549/keep-only-selected-keys-in-every-object-from-array and stackoverflow gave me this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not following :/

Could you give a sample demo input/output of what's happening here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To just keep newKey ('id') and the property ('name') from the response. So that we only return what we asked for.

For example:

var data = {id: '1', name: 'John', age: 42};

var keys_to_keep = ['id', 'name']

var data_new=Object.assign({}, ...keys_to_keep.map(key => ({[key]: data[key]})));

console.log(data_new)

output:

{ id: '1', name: 'John' }

Copy link
Collaborator

@magicmark magicmark Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha - thanks. (The simple example helped a lot!)

does https://lodash.com/docs/4.17.15#pick work?

Copy link
Collaborator

@magicmark magicmark Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Object.assign({}, ... logic feels sufficiently tricky to warrant using the lodash method instead (I think we're already using lodash in here anyway)

@gyDBD gyDBD marked this pull request as ready for review June 1, 2021 20:03
Comment on lines 118 to 119
getFilmsV2: ({ film_ids, properties }) => {
return [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth a comment to explain what's going on here / why v2 exists

Comment on lines 132 to 133
const response = await swapiLoaders.getFilmsV2.load({ film_id: this.id, property: 'title' });

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show the output of $ flow check when you try to read response.director?

hopefully it errors, which would verify the data masking is correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added console.log(response) there, and it only gives me film_id and title. I could also do the flow check.

Copy link
Contributor Author

@gyDBD gyDBD Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added console.log(response) and console.log(response.director).
got this

{ film_id: 4, title: 'The Phantom Menace' }
undefined

data masking is correct, but flow doesn't complain, which is something I should fix.

Copy link
Contributor Author

@gyDBD gyDBD Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the response type just matches the SWAPI_Film_v2 type I defined here
which is

export type SWAPI_Film_v2 = $ReadOnly<{|
    film_id: number,
    title: string,
    episode_id: number,
    director: string,
    producer: string,
|}>;

so it won't blow up when I do response.director.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I think we want flow to blow up here though don’t we, otherwise there’s a mismatch between what actually exists vs what flow thinks exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magicmark I have trouble with generate correct flow type. Here's a try flow reproduce
Could you help me with that? I don't know how to make id not nullable while change other fields nullable.

On the other hand,

Hmm, here, we only get nullable because the swagger endpoint defines each as property so. That's just what the example endpoint does - there's no guarantee other endpoints will do that.

I think we can document supported resource must return nullable properties? If some endpoint require all properties be non-nullable, and we remove lots of properties in the dataloader-codegen, that seems wrong to me and we shouldn't support that case. So we require properties being nullable make sense to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a (subtle) difference between getting the superset of all possible null properties and the subset of known guaranteed non-null properties

So if I did:

const properties1 = loaders.fetchStuff({ properties: ['foo', 'bar'] });
const properties2 = loaders.fetchStuff({ properties: ['baz'] });

If I understand what you're saying correctly, you'd expect both properties1 and properties2 to have the following identical type:

{ foo?: string, bar?: string, baz?: string }

If we're being super nitpicky, this is wrong:

  • there's no way that properties1 should ever have baz present since we didn't request it. Flow is kinda lying to us.
  • Inproperties2, baz should always be present. Again, flow is kinda lying to us.

These aren't catastrophic problems. The stock price of the company won't go up or down as a result either way here :)

But again, as this is foundational infrastructure that hundreds of our resolvers use, I think we probably want to get the flow typings as "correct" as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You commented this before

At a very minumum, we should mark all properties as nullable - the more correct thing to do would be to omit all non-requested properties from the response entirely.

So I thought just having all properties nullable is fine.
I have trouble with getting it as "correct" as possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should mark all properties as nullable

Is that what the current PR does? Or does it rely on the endpoint already doing that for us? That's a key difference.

I think we can document supported resource must return nullable properties?

Is a reasonable alternative.

I'm happy to set up some time to pair on this to see if we can get flow to do the right thing? We can timebox it, and if not, fall back to your suggestion :)

Copy link
Contributor Author

@gyDBD gyDBD Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what the current PR does? Or does it rely on the endpoint already doing that for us? That's a key difference.

We rely on the endpoint returns nullable properties. And that's the reasonable alternative thing you commented on.

I'm happy to set up some time to pair on this to see if we can get flow to do the right thing? We can timebox it, and if not, fall back to your suggestion :)

Thanks!

Copy link
Collaborator

@magicmark magicmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good! couple thoughts:

  • the examples only show one style of the property merging interface - given the complexity here, might be worth e2e testing all styles
  • can we verify the generated flow types are correct? (by showing it blowing up when you try to read a value we didn't request - a screenshot in the PR description would be good)
  • can we verify the runtime is correct, by doing the above and console.log() the bad value, and verify it's empty
  • do we need an extra config param to control what style of endpoint it is?

@magicmark
Copy link
Collaborator

magicmark commented Jun 3, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants