Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Reduce the number of SQL queries #342

Closed
wants to merge 84 commits into from

Conversation

benjie
Copy link
Member

@benjie benjie commented Feb 4, 2017

  • calculate hasNextPage/hasPreviousPage in the same query as fetching the data
  • remove a source of N+1 queries relating to relations (foo.barsByBazId)

TODO:

KNOWN ISSUES:

I've been informed by @cusxio that the following query returns null, but that including the username field fixes this - sounds like I need to add username to the list of implicit fields on the search endpoints:

query {
  userByUsername(username:"cusx") {
    id
  }
}

Workaround:

query {
  userByUsername(username:"cusx") {
    id
    username
  }
}

Fixes #219
Fixes #265

@valoricDe
Copy link
Contributor

valoricDe commented Feb 28, 2017

I was to eager. Throwing out the externalFieldNameDependencies ended in not fetching the sub sub queries. Ok so we rather account for classicIds with

query: sql.query`${sql.identifier(aliasIdentifier)}.${sql.identifier(externalFieldName === 'row_id' ? 'id' : externalFieldName)}`,

But thats just a hack. @benjie do you have a better idea?

@valoricDe
Copy link
Contributor

Subjective Evaluation: Request time cut in half 💛

@benjie
Copy link
Member Author

benjie commented Feb 28, 2017

externalFieldName is the name of the postgresql column (without id -> rowId shenanigans) this field represents

externalFieldNameDependencies is a list of external field names that we need to fetch in order for the system to work (because if we don't fetch them then we cannot generate the required field - e.g. the __id computed field (which is still __id because I've not merged the latest master yet), or relations that need to know the row's fooId so it can look up barsByFooId, and so on). When sorting we also have to add the sort criteria to the list of fields to fetch otherwise the logic in the paginators falls down I think.

@benjie
Copy link
Member Author

benjie commented Feb 28, 2017

I'm not sure how best to deal with rowId and I'm sure I'm getting various things from places I oughtn't since I just hacked it out as a proof of concept, so I'm awaiting Caleb's ideas on this point.

@benjie
Copy link
Member Author

benjie commented Mar 1, 2017

Just discovered a bug in this relating to one-to-many relations on sub-entries on mutation payloads.

@benjie
Copy link
Member Author

benjie commented Mar 1, 2017

Found it 👍

@0x6368656174
Copy link

0x6368656174 commented Mar 29, 2017

In this version, the GRAND (column) ON [TABLE] does not work, in contrast to the version of bg/performance-experiment. And I really need the support of GRAND (column) ON [TABLE] = (((

@benjie
Copy link
Member Author

benjie commented Mar 29, 2017

@0x6368656174 Do you know why it doesn't work on this branch? Given it's based on the other (which didn't pass tests), I'm theorising it's because this branch pulls in the other required columns that are necessary for PostGraphQL internals, e.g. if you order by a field, that field gets implicitly added to the selection; similarly if you filter by a field.

@0x6368656174
Copy link

@benjie if I execute the query:

{
  allUsers {
    nodes {
      id
      name
    }
  }
}

In this branch, the query is selected:

select coalesce(json_agg(__local_0__), '[]'::json) as "rows", ( select count(*) from ( select * from "public"."users" as __local_1__ where "id" is not null and true ) as __local_2__ ) ::integer as "totalCount", ( select count(*) from ( select * from "public"."users" as __local_1__ where "id" is not null and true ) as __local_3__ where true ) > count(*) as "hasNextPage", false as "hasPreviousPage" from ( select json_build_object( 'value', json_build_object('id', __local_4__."id", 'name', __local_4__."name"), 'cursor', json_build_array("id") ) as __local_0__ from ( select * from "public"."users" as __local_1__ where "id" is not null and true ) as __local_4__ where true and true order by "id" using < limit all ) as __local_5__

And it should be:

select coalesce(json_agg(__local_0__), '[]'::json) as "rows", ( select count(*) from ( select "id", "name" from "public"."users" as __local_1__ where "id" is not null and true ) as __local_2__ ) ::integer as "totalCount", ( select count(*) from ( select "id", "name" from "public"."users" as __local_1__ where "id" is not null and true ) as __local_3__ where true ) > count(*) as "hasNextPage", false as "hasPreviousPage" from ( select json_build_object( 'value', json_build_object('id', __local_4__."id", 'name', __local_4__."name"), 'cursor', json_build_array("id") ) as __local_0__ from ( select "id", "name" from "public"."users" as __local_1__ where "id" is not null and true ) as __local_4__ where true and true order by "id" using < limit all ) as __local_5__;

Plus, I do not need totalCount, hasNextPage, hasPreviosPage, but in the SQL-request they are writing.

@benjie
Copy link
Member Author

benjie commented Mar 30, 2017

Formatted, so I can compare:

SELECT coalesce(json_agg(__local_0__), '[]'::json) AS "rows",

  (SELECT count(*)
   FROM
     (SELECT *
      FROM "public"."users" AS __local_1__
      WHERE "id" IS NOT NULL
        AND TRUE ) AS __local_2__) ::integer AS "totalCount",

  (SELECT count(*)
   FROM
     (SELECT *
      FROM "public"."users" AS __local_1__
      WHERE "id" IS NOT NULL
        AND TRUE ) AS __local_3__
   WHERE TRUE ) > count(*) AS "hasNextPage",
       FALSE AS "hasPreviousPage"
FROM
  (SELECT json_build_object('value', json_build_object('id', __local_4__."id", 'name', __local_4__."name"), 'cursor', json_build_array("id")) AS __local_0__
   FROM
     (SELECT *
      FROM "public"."users" AS __local_1__
      WHERE "id" IS NOT NULL
        AND TRUE ) AS __local_4__
   WHERE TRUE
     AND TRUE
   ORDER BY "id" USING <
   LIMIT ALL) AS __local_5__
SELECT coalesce(json_agg(__local_0__), '[]'::json) AS "rows",

  (SELECT count(*)
   FROM
     (SELECT "id",
             "name"
      FROM "public"."users" AS __local_1__
      WHERE "id" IS NOT NULL
        AND TRUE ) AS __local_2__) ::integer AS "totalCount",

  (SELECT count(*)
   FROM
     (SELECT "id",
             "name"
      FROM "public"."users" AS __local_1__
      WHERE "id" IS NOT NULL
        AND TRUE ) AS __local_3__
   WHERE TRUE ) > count(*) AS "hasNextPage",
       FALSE AS "hasPreviousPage"
FROM
  (SELECT json_build_object('value', json_build_object('id', __local_4__."id", 'name', __local_4__."name"), 'cursor', json_build_array("id")) AS __local_0__
   FROM
     (SELECT "id",
             "name"
      FROM "public"."users" AS __local_1__
      WHERE "id" IS NOT NULL
        AND TRUE ) AS __local_4__
   WHERE TRUE
     AND TRUE
   ORDER BY "id" USING <
   LIMIT ALL) AS __local_5__;

@benjie
Copy link
Member Author

benjie commented Mar 30, 2017

@0x6368656174 So you're saying that the select * in the count statements and the source table are causing you trouble.

I'm aware that we over-fetch the hasPreviousPage/etc - I ran out of time whilst writing this proof of concept.

The intention of this PR was performance, rather than excluding fields for permissions reasons, and I'm hesitant to advance the code any further knowing that we'll probably be rebuilding a fair chunk of it. That said, you're welcome to build your own branch off of this one that solves your issue - if it's simple enough then I may consider merging, but you'll have to be careful to include all required fields, not just the ones that are directly requested (because PostGraphQL requires some internally for various things: sorting, relations, etc).

Personally I don't use column-level grants on select statements, I just split the tables based on permissions boundaries into sub-tables, e.g. instead of a monolithic user table, I might have user, user_profile, user_email, user_auth, etc.

@0x6368656174
Copy link

@benjie Splitting one monolithic table into several is not a way out of the problem. First, there can be too many small tables that will degrade performance. Secondly, if one column is available for two user roles, but the rest are not available, then you get a whole table with only one column.

@0x6368656174
Copy link

@benjie The easiest solution is to replace the select (*) with the select ("id"), and everything will work, because "Id" in PostGraphQL should always be available to the role, if it is available at least some other column.

@benjie
Copy link
Member Author

benjie commented Mar 30, 2017

Well, for the counts simply select 1 should suffice and doesn't require us to figure out the primary key(s).

I don't find performance degrades too much with splitting concerns into separate tables; one-to-one indexed joins are very fast.

@benjie
Copy link
Member Author

benjie commented May 22, 2017

@benjie
Copy link
Member Author

benjie commented May 22, 2017

There's also an issue here:

https://github.com/postgraphql/postgraphql/blob/bg/performance-experiment-2/src/postgraphql/schema/procedures/createPgProcedureObjectTypeGqlFieldEntry.ts#L70

If you have a procedure which returns a row from the database, this isn't sufficient to parse it into a Map as is required by much of the postgraphql internals - it just leaves it as a vanilla JSON object. If you then try and access attributes on this, it throws an error like value.get is not a function

@benjie
Copy link
Member Author

benjie commented May 22, 2017

Using fixtures.return.type.transformPgValueIntoValue seems to fix the latter.

@benjie
Copy link
Member Author

benjie commented May 22, 2017

Seems this same issue was fixed on master only a couple months ago.

@benjie
Copy link
Member Author

benjie commented Jul 18, 2017

Closing in favour of V4: #506

@benjie benjie closed this Jul 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance of computed columns in connections Performance when embedding children
5 participants