Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design: Range queries V1 support #55

Closed
raminqaf opened this issue Aug 16, 2022 · 0 comments
Closed

Design: Range queries V1 support #55

raminqaf opened this issue Aug 16, 2022 · 0 comments
Labels
type/design Design documents for enhancements

Comments

@raminqaf
Copy link
Contributor

raminqaf commented Aug 16, 2022

Desing: Range queries V1 support

Development: 0.8

last update: 06.10.2022


This issue describes our approach for the support of Range queries in Quick.

Goals

  1. Quick CLI: user defines range mirrors that index the data for range queries
  2. GraphQL Range Query: the user defines the range field (from and to) in the GraphQL Query type
  3. GraphQL Range Data Fetcher: extracts the range information and prepares the request to the mirror
  4. Range Processor for Mirrors: the quick mirror builds a range index on a separate state store
  5. Range Index Structure: a flattened string key in the mirror
  6. Range Query Service: a service that calls the Interactive Query API to fetch the data from the range state store

Out of scope

  1. Custom State Store: the range queries use RocksDB as the default state store. Using a custom state store (SQLite) is currently not in the scope of this epic
  2. Custom order: the Interactive Query API of Kafka returns the results in the Lexicographic order. To customize this order, we need to build a custom index on RocksDB
  3. To infinity: if the query contains only from and not to argument, then all the values from the lower bound to the highest bound should be returned.
  4. Mirror Library: an abstraction over the mirror API where the user can implement its query logic
  5. Query Complex Keys: Quick does not support topics with complex keys yet
  6. Pagination: Limit the data with pagination
  7. Multi Range: to do range over two fields

Implementation

1. Quick CLI

Goal: the user defines range mirrors that index the data for range queries

During topic creation, the user can pass a --range-field <Fieled> option. This option deploys a mirror with an extra state store containing the range query index.

Example:

quick topic user-request-range --key integer --value schema --schema gateway.UserRequests --range-field timestamp

This command sends a request to the manager, and the manager prepares the deployment of a mirror called user-request-range. This mirror creates two indexes:

  1. Range Index over the topic key (here the userId) and timestamp
  2. Point Index only over the topic key (here the userId)

2. GraphQL Range Query

Goal: the user defines the range (from and to) in GraphQL Query type

The user needs to define the range query and arguments in the GraphQL schema. The GraphQL schema should contain the necessary information for the range data fetcher. For simplicity, we decided to extend the @topic directive. The @topic directive gets two new arguments, rangeFrom and rangeTo. These two arguments define the range for a specific field.

Example:

type Query {
    userRequests(
        userId: Int
        timestampFrom: Int
        timestampTo: Int
    ): [UserRequests] @topic(name: "user-request-range", 
                             keyArgument: "userId", 
                             rangeFrom: "timestampFrom", 
                             rangeTo: "timestampTo")
}
​
type UserRequests {
    userId: Int
    serviceId: Int
    timestamp: Int
    requests: Int
    success: Int
}

3. GraphQL Range Data Fetcher

Goal: extracts the range information and prepares the request to the mirror

Given the example below:

# query from 1 to 2
{
    userRequests(userId: 1, timestampFrom: 1, timestampTo: 2)  {
        requests
    }
}

The range data fetcher gets the necessary information and prepares a range call to the mirror range endpoint:
GET /user-request-mirror/mirror/range/1?from=1&to=2. It is important to notice that the range query is an exclusive range. In other words, the boundary point is not included in the range. For this specific example, so only the value of timestamp 1 is included in the returned value.

4. Range Processor for Mirrors

Goal: the mirror builds a range index on a separated state store

The mirror needs a new processor to prepare a range index in a separate state store for range queries. Consider the following example. The topic contains the following information:

key (UserId) value
1 {timestamp: 1, serviceId: 2, requests: 10, success: 8}
1 {timestamp: 2, serviceId: 3, requests: 5, success: 3}
2 {timestamp: 1, serviceId: 4, requests: 7, success: 2}

The range mirror will materialize the topic in RocksDB in two ways:

  1. For range queries:
key value
1_00000000001 {timestamp: 1, serviceId: 2, requests: 10, success: 8}
1_00000000002 {timestamp: 2, serviceId: 3, requests: 5, success: 3}
2_00000000001 {timestamp: 1, serviceId: 4, requests: 7, success: 2}
  1. For point queries:
key value
1 {timestamp: 2, serviceId: 3, requests: 5, success: 3}
2 {timestamp: 1, serviceId: 4, requests: 7, success: 2}

5. Range Index Structure

Goal: a flattened string key in the mirror

The mirror implements the processor API to create an index to support range queries. This index is a flattened string with a combination of the topic key and the value for which the range queries are requested. The index needs to pad the values (depending on the type Int 10 digits or Long 19 digits) with zeros to keep the lexicographic order. So to generify the format of the key in the state store: <topicKeyValue>_<zero_paddings><rangeFieldValue>. In our example, if we have a topic with userId as its key and want to create a range over the timestamp, the key in the state store would look like this:

1_00000000001

The flattened key approach will create unique keys for each user in a timestamp. Therefore all the values will be accessible when running a range query.

6. Range Query Service

Goal: a service that calls the Interactive Query API to fetch the data from the range state store

when the request GET /user-request-mirror/mirror/range/<key>?from=<rangeFrom>&to=<rangeTo> (e.g GET /user-request-mirror/mirror/range/1?from=1&to=2) is received by the mirror. The mirror creates the range from argument (in the above example, this would be 00000000001_00000000001 and range to (again in the example, this value would be 00000000001_00000000002) and passes these values to the range method of the IQ puts the values in a list and returns them to the requested gateway.

@raminqaf raminqaf added the type/design Design documents for enhancements label Aug 16, 2022
@raminqaf raminqaf changed the title Design: Range queries support Design: Range queries V1 support Aug 16, 2022
raminqaf added a commit that referenced this issue Oct 4, 2022
raminqaf added a commit that referenced this issue Oct 5, 2022
raminqaf added a commit that referenced this issue Oct 11, 2022
raminqaf added a commit that referenced this issue Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/design Design documents for enhancements
Projects
None yet
Development

No branches or pull requests

1 participant