-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requirements: Orion state #30
Comments
AddendumThe virtue of the proposal above was that it aspired to leverage the already existing investment made into developing and testing the QN we already had, however, it did not specify how the interaction between the two states would work exactly. There are a few alternatives that come to mind, but they all have serious drawbacks.
Both of these have serious drawbacks, and in general there are disadvantages of using our existing QN with yet more infrastructure
One could address most of these last 4 issues by just using Subsquid instead, however, that still leaves the question from above of how. A variation not yet considered would be to just natively use the Subsquid processor database as the same state that Now, how to adjudicate all of this? Some prototyping in close contact with Atlas codebase and team is probably useful. |
I'd opt for this solution
I doesn't look like there's any "heavy lock", just standard SQL transactions, and the database transaction isolation level (used when processing block / batch of blocks) is easily configurable in Subsquid. I think
There is a way to extend the Subsquid-generated graphql api with custom resolvers: There appears to also be the way to add a custom request check function (not part of the docs though), possibly allowing us to introduce authorization for mutation endpoints
I'm not sure if I fully understand this approach, but it seems very complex and probably unnecessary(?) given the findings above. |
I am very concerned about getting stuck with yet another auto-generated API where the number of exceptions will just start to pile up when combining filtering, relationships, unions and all the rest of it. We have never even tried doing very basic stuff like aggregation and grouping, despite it being table stakes queries for lots of normal API calls. Subsquid by default will have less of this than Hydra even, because no Subsquid users come anywhere close our needs. Even if there is a way to add new queries with custom endpoints, the resulting API may start becoming quite messy as you stop having genuine canonical entity queries, because the default ones are too weak. Also, we would need to remove all sorts of queries which make no sense because they either exposed private data, or they are just not useful to the application developer directly. Correct me if I am wrong, but is there not also some serious performance issue with all the default generated resolvers for Subsquid, just as I believe was pointed out for Hydra?
I'm not sure what you mean by this locking not being heavy, but perhaps you could describe in more detail exactly when locks are applied? I asked in the Subsquid/Hydra Telegram channel the other day, and they said they had switched to some new batch based locking, but I did not get to the bottom of what that really meant. What I think would be good to understand a bit better is, in a scenario where there is lots of attempts to both read and write to the state from operator, consumption actions, error events, how will whatever locking Subsquid natively does interact with that?
How would one configure such a low level choice as a Subsquid user, and why would there be any room for difference in approach from one use to another? Are we talking about forking Subsquid here?
I don't know what a request check function is, is that some GraphQL level concept? I could not find anything on this. But in any case, the ability to add mutations will be a necessity, so it would have to be established early that we can do this for sure, ideally without having to start forking Subsquid.
The idea is not very complex, just badly explained, but is more work: use Subsquid exclusively as a producer of finalized events/calls relevant to Orion, but separate out the public API, data model and processing to a separate system which treats all writers the same, regardless of what locking Subsquid does. I think, if the locking thing is either totally irrelevant, or just a minor issue at small scales, then I think the API issue(s) would be my main concern really, because it is nice to keep it simple. |
Subsquid api is easy to extend, as we can define our custom models, graphql resolvers etc., so the autogenerated api becomes just a base and saves us some development time compared to building a graphql api from scratch. From my (not very long at this point) experience with Subsquid I find it way easier to work with than Hydra, which tried to sort of "do everything for you".
I'm not sure what exact performance issue you have in mind, but it looks like there have been a lot of optimalizations in terms of queries, processing speed etc., we can also define custom SQL indexes which are very helpful for speeding up certain queries.
So I'm not exactly sure what's the alternative here, as I see it we have a few options:
There are no explicit locks, so the only locks that apply are those that are acquired by default by There is however an SQL transaction which wraps all db operations executed when processing a block or batch of blocks (processing batches is now a recommended approach). This means that if Subsquid is processing a batch of blocks where 1000 new videos were created, other queries that we execute at this time by default won't see any of those new videos until the entire batch is comitted, but we can still modify the existing records in the database etc. (and the effects will be instant) I've send a link to SQL documentation which describes One caveat however is that if the processor transaction already updated a row (say, a video of
Basically Subsquid processor is now configurable programmatically, we can import a processor class ( All we need to do is add processor.run(new TypeormDatabase({ isolationLevel: 'READ COMMITTED' }), async ctx => {
// ...
}) A note about this can also be found in the subsquid documentation: https://docs.subsquid.io/develop-a-squid/substrate-processor/store-interface/#typeormdatabase-recommended
It's a plugin which is executed on import { RequestCheckFunction } from '@subsquid/graphql-server/lib/check'
export const requestCheck: RequestCheckFunction = async (req) => {
if (req.operation.operation === 'mutation' && !req.http.headers.get('x-admin')) {
return 'Access denied'
}
return true
} It requires an Of course it's much more customizable than this.
I tested it locally and it's possible by adding a custom graphql resolver, for example: import { Args, ArgsType, Field, ID, Mutation, Query, Resolver } from 'type-graphql'
import { Video } from '../model'
import { EntityManager } from 'typeorm'
@ArgsType()
export class AddVideoViewArgs {
@Field(() => ID)
videoId: string
}
@Resolver()
export class VideoViewsResolver {
// Set by depenency injection
constructor(private tx: () => Promise<EntityManager>) {}
@Mutation(() => Number, { description: "Add a single view to the target video's count" })
async addVideoView(
@Args() { videoId }: AddVideoViewArgs
): Promise<number> {
const videoRepository = (await this.tx()).getRepository(Video);
const video = await videoRepository.findOneBy({ id: videoId })
if (!video) {
throw new Error('Video not found')
}
video.views += 1
await videoRepository.save(video)
return video.views
}
@Query(() => Number, { description: 'Get number of views per video' })
async getVideoViews(
@Args() { videoId }: AddVideoViewArgs
): Promise<number> {
const videoRepository = (await this.tx()).getRepository(Video);
const video = await videoRepository.findOneBy({ id: videoId })
if (!video) {
throw new Error('Video not found')
}
return video.views
}
} The resolver needs to also have at least one |
Fantastic work, lets proceed as you are suggesting then.
What determines the boundaries of what makes up an individual batch? e.g. will two distinct subsquids be processing the exact same sequence of batches, regardless of how it runs or is halted? |
Disclaimer: these are really rough high level requirements, probably need to be refined and discussed, just putting it down in order to have all thoughts in one place
Background
Orion currently does very naive view & follow counting for Atlas, but we know that this should be global public state that will live in a forthcoming shared data layer Joystream/joystream#2753. There are a variety of problems where augmenting properties of Orion is a natural solution:
Requirements
Introduce some initial monolitihic data representation in Orion puts us on a path to allow these issues to be addressed over time which addresses these problems by having a distinct application centric & operator specific API. It ingests data from the
This sloppy schema tries to summarize how the parts would fit together
The text was updated successfully, but these errors were encountered: