Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orion V2: Simplify entity caching #54

Closed
Lezek123 opened this issue Jan 10, 2023 · 2 comments
Closed

Orion V2: Simplify entity caching #54

Lezek123 opened this issue Jan 10, 2023 · 2 comments

Comments

@Lezek123
Copy link
Contributor

Lezek123 commented Jan 10, 2023

The goal

Subsquid now recommends using a batch processor, which means it's possible to implement a memory layer where we can persist (cache) entities and query / update / remove them without hitting the database. The goal is to process as many events as possible through this memory layer before "flushing" the final state into db (the idea of "flushing" is borrowed from https://www.doctrine-project.org/ in case anyone is familiar with this framework or https://mikro-orm.io/ which is a doctrine-inspired TypeScript equivalent), which would speed up the processing significantly, as:

  1. We can avoid making some redundant quieries (like one entity being updated through many events within the same batch)
  2. We can batch all insert/update/delete operations into a single query.

To sum up we want a memory entity cache layer which:

  • avoids "flushing" changes to database as much as possible
  • avoids providing inconsistent data due cache/db state differences (cached changes should always be prioritized)
  • ideally provides maximum TypeSafety (this is especially important for relations, ie. if a given relation is not part of the returned data, we don't want it in the typing either)

Current solution

Current solution is a very complex overlay on top of TypeORM: https://github.com/Lezek123/orion/blob/orion-v2/src/utils/EntitiesCollector.ts, which supports querying entities along with relations and combines stored + cached state if necessary.

Alternative solutions

MikroORM store

Subsquid devs are working on introducing support for MikroORM framework, which would simplify some of the work, as the framework seems better-suited for this use-case than TypeORM: https://github.com/belopash/squid-mikroorm-store

It's not clear whether they will implement any additional utilities though.

Simplifying the current solution

There are ways to simplify the current over-complicated EntitiesCollector solution even without MikroORM (although the latter would be preferred):

  1. Disregard joins. Relations should always be queried explicitly. Only ids of related entities are provided when given entity is queried and only if the entity is the owning side of the relation.

    In this case the only complexity is one-to-many, ie. when trying to get all child entities of a parent while avoiding having to flush all the childeren. This however just comes down to executing the following steps:

    1. Let's define ids of all child entitites that are managed (including those scheduled for removal) as managed_children_ids
    2. Get all managed child entities (excluding those scheduled for removal) where child.parent_id = parent.id
    3. Get all stored child entities where child.parent_id = parent.id and child.id is NOT IN(managed_children_ids)
    4. Concat the results
  2. Consider disregarding the concept of EntityCollection, require providing an entity class as context instead whenever needed.

@Lezek123
Copy link
Contributor Author

Lezek123 commented Jan 12, 2023

https://github.com/belopash/squid-mikroorm-store has been abandoned:
image

Which leaves us with TypeORM for now (so just trying to make the current solution a little bit cleaner)

@Lezek123
Copy link
Contributor Author

Done in 290d833

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant