Choke Points for a GraphQL Performance Benchmark

Olaf Hartig edited this page Oct 3, 2018 · 3 revisions

We (researchers of the Database and Web Information Systems Group at Linköping University) aim to design a performance benchmark for GraphQL server implementations. With this document we are seeking feedback on the very first design artifact that we have created, that is, a description of so-called choke points that the benchmark will be designed to test.

Before we come to this description, we provide a few words about the motivation, the general setup, and the choke-point based benchmark design methodology that we are applying.

Motivation

We observe that some of the vendors of GraphQL server implementations have a performance-related test suite. However, these test suites are very limited, which makes them unsuitable as a general performance benchmark. Important limitations include: a single fixed dataset that cannot be scaled to different sizes, a fixed set of queries (often only a few) that have been designed to test features of the respective implementation, only a single type of metric that can be measured (typically, some form of throughput), single-machine setup (i.e., the benchmark driver that issues concurrent GraphQL requests runs on the same machine as the system under test). In addition to these limitations, the dataset and the queries of these test suites have to be customized for every system that is tested, which makes experimental comparisons of systems unreliable.

Our aim for the benchmark that we propose to develop is to address these limitations in order to enable users to compare the performance of different GraphQL implementations and setups, and also to provide developers of GraphQL server implementations with a well-designed and balanced test suite.

Scenario

The focus of our benchmark will be a scenario in which data from a legacy database has to be exposed as a read-only GraphQL API with a user-specified GraphQL schema.

Systems that can be used to implement GraphQL servers but that are not designed to support this scenario out of the box can also be tested with the benchmark, for which they have to be extended with an integration component such as a schema delegation layer. In such a case, from the perspective of the benchmark, the combination of the system and the integration component are treat as a black box.

Methodology

We plan to apply the design methodology for benchmark development as introduced by the Linked Data Benchmark Council. The main artifacts created by the process of applying this methodology are

  • a data schema,
  • a workload of operations to be performed by the system under test,
  • performance metrics, and
  • benchmark execution rules.

A crucial aspect of the methodology is to identify key technical challenges, so-called “choke points”, for the types of systems for which the benchmark is being designed. These choke points then inform the creation of the aforementioned artifacts.

In the remainder of this document we describe the choke points that we have identified for GraphQL server implementations. We have grouped these choke points into different classes.

Remember that we are seeking feedback on these choke points. You may provide feedback i) in one of the pull requests that we have opened for each of the choke points, ii) by opening a new issue in Github, or iii) via email. While any feedback is welcome, we are particularly interested in the following questions:

  • Are the descriptions of the choke points sufficiently clear?
  • Is each of the given choke points reasonable?
  • Are there other key technical challenges that are not captured by any of our choke points?

Choke Points Related to Attribute Retrieval

Queries may request the retrieval of multiple attributes (scalar fields) of the data objects selected by the queries. A naive implementation may perform multiple, separate operations to fetch these attributes from the underlying data source. An approach that is commonly used to improve the performance in this case is called batching; the idea of this approach is to combine the retrieval of all requested attributes into a single query over the underlying data source.

CP 1.1: Multi-attribute retrieval

This choke point captures the challenge to efficiently support queries that request multiple attributes of the selected data objects.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/2

Choke Points Related to Relationship Traversal

One of the main innovations of GraphQL in comparison to REST APIs is that it allows users to traverse the relationships between data objects in a single request. Supporting such a traversal in a GraphQL server may pose different challenges. The following choke points capture these challenges.

CP 2.1: Traversal of different 1:N relationship types

Types of relationships between object types may be 1:1, 1:N, or M:N (note that the latter two are the same from the perspective of a given data object that has such relationships), and whether a data object has a specific relationship may be mandatory or optional. Supporting an efficient traversal of 1:N (or M:N) relationships may be challenging due to the fact that every such relationship may have a different fan-out (i.e., how big is N?). This challenge, together with the need to support traversals along multiple relationships (which may be of different types), is captured by this choke point.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/3

CP 2.2: Efficient traversal of 1:1 relationship types

While all 1:1 relationship types trivially have the same fan-out of 1, they may differ in whether they are mandatory or optional. Additionally, a traversal of these types of relationships may be supported more efficiently by using techniques that cannot be applied to 1:N relationships. Consequently, the challenge captured by this choke point is to employ suitable traversal techniques that are efficient for the different types of relationships.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/4

CP 2.3: Relationship traversal with and without retrieval of intermediate object data

In addition to traversing along multiple relationships, GraphQL allows users to retrieve attributes (scalar fields) of the objects that are visited during the traversal. As an example, consider the following GraphQL query.

query {
  person(name:″Alice″) {
    knows {
      name
      reviewedBooks { title }
    }
  }
}

This query does not only retrieve the titles of the books reached via the traversal, but also the names of the persons that are visited when traversing to the books. This choke point captures how efficient a system is in retrieving multiple attributes of such intermediate object data.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/5

CP 2.4: Traversal of relationships that form cycles

In cases in which relationships form directed cycles, traversing along these relationships may result in coming back to a data object that has been visited before on the same traversal path (which, formally, turns the path into a walk). This choke point captures the challenge of avoiding unnecessary operations in these cases. For instance, a naive implementation may end up requesting the same data multiple times from the underlying data source. Even a more sophisticated solution that caches and reuses the results of such requests may end up repeating the same operations over the cached data.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/6

CP 2.5: Acyclic relationship traversal that visits data objects repeatedly

Even without directed cycles, data objects may be visited multiple times during a traversal (via different paths). The challenge in this case is essentially the same as in the previous choke point. However, the techniques to address the challenge may be different, or they may not be equally effective in all cases. Consequently, we introduce a separate choke point related to avoiding unnecessary operations when data objects are visited repeatedly during the traversal of acyclic relationships.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/7

Choke Points Related to Ordering and Paging

Given that an exhaustive traversal of a sequence of 1:N relationships may easily result in reaching a prohibitively large number of data objects, providers of public GraphQL APIs aim to protect their servers from queries that require such resource-intensive traversals. A common approach used in this context is to enforce clients to use paging when accessing 1:N relationships, which essentially establishes an upper bound on the maximum possible fan-out at every level of the traversal. As an example, consider the following query which retrieves only the last ten books reviewed by each of the first ten persons known by Alice (rather than all the books reviewed by all the persons known by Alice).

query {
  person(name:″Alice″) {
    knows (first:10) {
      reviewedBooks (last:10) { title }
    }
  }
}

A feature related to paging is to allow users to specify a particular order over the objects visited by traversing a 1:N relationship. This feature may be used in combination with paging, but also to simply request a particular order in which objects have have to appear in the result.

CP 3.1: Paging without offset

This choke point captures the general challenge of providing efficient support for the retrieval of a limited number of first objects or last objects (i.e., by avoiding unnecessary operations).

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/8

CP 3.2: Paging with offset

This choke point captures the additional challenge of efficiently supporting paging also in cases in which a limited number of objects have to be retrieved starting from a user-specified offset.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/9

CP 3.3: Ordering

The challenge captured by this choke point is to efficiently apply a user-specified order over the objects visited by traversing a 1:N relationship.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/10

Choke Points Related to Searching and Filtering

Field arguments in GraphQL queries are powerful not only because they can be used as a flexible approach to expose paging and ordering features. Another use case, which is perhaps even more interesting from a data retrieval point of view, is to expose arbitrarily complex search and filtering functionality. The following choke points capture different challenges related to this use case.

CP 4.1: String matching

The challenge captured by this choke point is to efficiently support string matching as illustrated in the following query (which retrieves the birthdate of persons whose name begins with an A).

query {
  person(match:{attribute:″name″, startswith:″A″}) {
    birthdate
  }
}

Notice that the string matching features exposed by a GraphQL API can be more powerful than supporting only the notion of prefix matching that is illustrated in the example.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/11

CP 4.2: Date matching

While similar in spirit to the previous one, this choke point focuses on values that are dates. In addition to requiring exact matches for a given date, it is possible to match dates based on range conditions (e.g., everything before a specific date) or based on given periods (e.g., everything in some February).

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/12

CP 4.3: Subquery-based filtering

Consider the following GraphQL query which retrieves the title of books reviewed by Alice and, for each of these books, the names of all those reviewers of the book who have also reviewed a movie that was reviewed by Alice.

query {
 person(name:″Alice″) {
   reviewedBooks {
     title
     reviewedBy(where:{reviewedMovies:{reviewedBy:{name:″Alice″}}}) {
       name
     }
   }
 }
} 

The query represents a case in which candidate data objects selected by traversing a relationship are filtered based on some condition; this filter condition is expressed in some form of subquery that is embedded inside the corresponding field argument. As illustrated by the example query, such a filter condition may involve the traversal of relationships for each candidate data object. We notice that this feature bears similarities to the notion of correlated subqueries in relational query languages such as SQL, and supporting such a subquery-based filtering in GraphQL poses similar challenges as correlated subqueries in SQL.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/13

CP 4.4: Subquery-based search

While the previous choke point focuses explicitly on cases in which a subquery-based filter condition has to be applied somewhere along a relationship traversal, it is also possible to use such subqueries as field arguments of the main query object (like in the string matching example given above). For instance, the following query searches for persons who have reviewed a movie that was reviewed by Alice and, for each such person, retrieves the persons that they know.

query {
  person(where:{reviewedMovies:{reviewedBy:{name:″Alice″}}}) {
    knows {
      name
    }
  }
} 

Such a usage of subqueries captures more a notion of query-based search for candidate objects rather than filtering. While techniques to perform a subquery-based filtering (as captured by the previous choke point) may also be employed for subquery-based search, some systems may be equipped with techniques that focus especially on only one of the two variants of using subqueries in field arguments. Therefore, we consider subquery-based search as an additional choke point (separate from the previous one about subquery-based filtering).

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/14

CP 4.5: Multiple filter conditions

In addition to single filter conditions, it is also possible to enable users to express conjunctions or disjunctions of multiple such conditions. The added challenge in this case is that the conditions may not be equally selective. Hence, pushing them together to the underlying data source, or at least choosing which of them to evaluate first, may have an impact on the performance (in particular, if they are subqueries).

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/15

Choke Points Related to Aggregation

Another advanced feature that GraphQL APIs may provide is to execute aggregation functions over the queried data. We identify the following two choke points related to aggregation.

CP 5.1: Calculation-based aggregation

This choke point focuses on aggregation functions that calculate a single value from a set of values specified by the query. Examples of such functions include SUM, AVERAGE, and MAX. The challenge is to push the computation of such aggregation functions into the underlying data source or, if this is not possible, employ some other techniques to aggregate the specified values efficiently.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/16

CP 5.2: Counting

This choke point focuses on the aggregation function that counts the number of elements in a set of data objects or values. In contrast to the previous choke point, the challenge in this case is to recognize that the objects/values may not actually have to be retrieved from the underlying data source in order to count them.

To provide feedback on this choke point refer to: https://github.com/LiUGraphQL/LinGBM/pull/17

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.