Design: Substrate Query Infrastructure Framework #465

bedeho · 2020-04-07T16:25:55Z

Background

This issue describes what is meant by query infrastructure, and also why its needed

Ignore the proposal itself.

Goal

Develop a software framework for making query infrastructure for Substrate runtimes, and use it to start the implementation for the Joystream runtime specifically. Moreover, the framework is designed to be as compatible as possible with The Graph protocol, in the sense that, it should minimise the cost of translate an instance based on this framework into an instance based on a possible future The Graph Substrate compatible standard.

Architecture

The query infrastructure consists of the following three servers operating in concert:

A GraphQL server serving the API. The GraphQL API only has queries, no subscriptions or mutations, and these are resolved into a standalone relational database server (2). Critically, each query must correspond to exactly one table in the database, in essence meaning that queries map to a single SELECT lookup, without any joins.
A relational database server which holds the current query state. The database must also hold some state which represents exactly how much the blockchain has been processed to result in the current state of the database. This information must be atomically updated with processing of each mapping, as it allows the whole query infrastructure to continue if it is halts during operation for any reason (power outage, lost connection to full node, etc.).
A block ingestion server which processes blocks and corresponding emitted events originating from a given Substrate archival node (4), and updates the database based correspondingly.
A Substrate archival node is a full node which stores, for all blocks, the set of events that were emitted. A normal full node will not do, as it only emits events being generated by ongoing validation, and this is not sufficient for our purposes, since the ingestion processes may need access to other events, for example during initial synchronisation or catchup.

It should be the case that 1 and 2-3 should be able to run on separate hosts.

Developer Workflow

To instantiate query infrastructure for a given runtime using the framework, the developer as to provide the following:

API description: A description of all the types and queries which will be in the API, and also have a corresponding table in the database. The query that is exposed in the actual GraphQL API will also include OpenCrud arguments for filtering, pagination and ordering, but this is not included in this description. There should also be documentations in this description, and it should be propagated all the way to the database and GraphQL schemas.
Event processors: An event processor is a (Typescript) function which corresponds to a specific event name in a specific module, and updates the query database based on the semantics of the event, along with information about the originating transaction and transaction parameters, if it applies. Some events may originate from block finalisation code, e.g. on_finalize, or from genesis builder logic, this is why there isn't always an extrinsic. The developer must write such processors for each event that must be detected in order to properly manage the query state. These processors will often just update the table for a single query, but not always. They correspond quite closely to the concept of mappings in The Graph.

Once a developer has these ready, there should be some simple CLI tool for generating the database schemas, ORM library for talking to the database and the GraphQL server schema. The CLI tool should probably also help with setting up a workspace for developing, packaging and deploying your own infrastructure.

Framework Implementation Requirements

Must use Typescript.
Must use Warthog GraphQL API framework. Provides autogenerated, database, GraphQL schema with OpenCrud support and client side ORM.
Must have have tests and CI.
Documentation written in API description should propagate to become autogenerated documentation for the GraphQL API and database schemas.
Locate in new root directory substrate-query-node in repo https://github.com/Joystream/substrate-runtime-joystream.

Joystream Node Implementation Requirements

Must use Typescript.
Should use Joystream/types library.
Must have have tests and CI.
Located in new root directory joystream-query-node in repo https://github.com/Joystream/substrate-runtime-joystream.
Reliable and automated deployment, e.g. through dockerization of some sort.
Targets Constantinople runtime, with key queries for membership and proposal modules.

Questions

How should the types found in the runtime be encoded in the database and the GraphQL schema. For example, u128 that may be part of runtime written in Rust, how should that be encoded? Keep in mind that we want an encoding which allows us to
(Hard Problem): How should we deal with runtime upgrades? Runtime upgrades may often also involve on-chain migrations of the stored state, and totally new types. Locally in the query infrastructure, how should it attempt to deal with this, and also how should a query node synch up with a chain which may have multiple upgrade since genesis? A full local migration may involve
- updating database schema
- migrating database tables
- updating graphql schema & event processors

The text was updated successfully, but these errors were encountered:

bedeho transferred this issue from another repository May 1, 2020

bedeho transferred this issue from Joystream/joystream Nov 27, 2021

dmtrjsg added the low-prio label Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: Substrate Query Infrastructure Framework #465

Design: Substrate Query Infrastructure Framework #465

bedeho commented Apr 7, 2020

Design: Substrate Query Infrastructure Framework #465

Design: Substrate Query Infrastructure Framework #465

Comments

bedeho commented Apr 7, 2020

Background

Goal

Architecture

Developer Workflow

Framework Implementation Requirements

Joystream Node Implementation Requirements

Questions