-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinguish between spent and unspent matched inputs #20
Comments
Some thoughts / refinement:
(b) sounds definitely like a better solution, especially because it is only a small overhead in memory, remains quite efficient in construction, and would perform extremely well for small entries (i.e. for restrictive) patterns which is precisely where this sort of optimization comes in handy. |
Maybe a flag option for users that don't want to have the checks done? Absolutely love this process and am super excited about this feature. |
@bakon11 are you suggesting a flag different from the one I mentioned in the issue description? If I get you correctly, you mean a flag to completely disable marking spent / unspent. If that's the case, then I would be quite opposed to this because of the inherent complexity / inconsistence it may introduce (for example, the DB schema would have to be modified for that feature, but the modification would be pointless / unused should the feature be disabled via a flag). |
Ah ha I see, I got fixated on this portion here:
In my head I figured if a user wanted to have a faster DB sync they could have the option for Kupo not to check for spend UTXO and flag them, and completely misinterpreted the rest of the post.
This makes a bit more sense now thanks again for explaining. Personally I think being able to keep track of spend UTXOs is a must have feature even if it means/meant some performance sacrifice because of extra operations performed. |
At this stage I don't know what to expect performance wise. It could as well be that this addition does not impact much the overall behavior (hence the benchmark). But if it does, at least I have a plan. |
Here's some preliminary results, without any particular optimization beyond good practices:
Still a few TODOs (in particular, making this available through the API now) but looks already promising without much further work needed. It's about 40% slower in the marking case but remain without an acceptable range. This version runs with and without a new flag One weird thing however is the discrepancy in the number of "unspent" addresses between the two scenarios. The scenarios (and queries) are executed on the exact same time range, which is relatively far in the past (in the stable area) so I would expect them to be strictly equal 🤔 ... will investigate and add few more tests. |
@KtorZ Amazing work do you have performance results (i.e. how much time the query takes) for a large UTxO query for an address? |
This is really amazing, it's literally the type of tool I was looking forward to. Absolutely fantastic work man. IMO 55 min sync time or none pruned all address is pretty fast and MORE than acceptable IMO. |
@MartinSchere "large" UTxO query on the testnet is a bit tricky. The most active address on the testnet is this one:
Which is the one I used for the tests above. It has now around ~600 entries, and 72 of which are unspent at the moment of writing this. With this, I get the following results (which includes the whole e2e pipeline, including JSON serialization. This is timing
|
I really don't understand what people have against SQLite. It's really an amazing tool 😄 |
@KtorZ Wow. It's fair to say it's an improvement compared to the 45 minutes it takes in db-sync Great work! |
45 min to query information about an address? For real 😳? |
To query UTxOs, you need to query TxOuts, then join them with their respective address, then check for existence in every record of the transactions to see if that TxOut has been spent or not, resulting in quadratic complexity. On the positive side, it gives you time to reflect on life and grab a coffee |
But.. a chain index should be optimized for querying UTxO because this is where the main added-value is :| Anyway. Okay. Whatever. I am trying to quit coffee though. |
Yeah, I absolutely agree with you and many of us are waiting for an indexer that does this, (I'll use it for my explorer). Let me know if I can contribute in some way. |
What would you miss from Kupo beyond this ticket and perhaps #21 ? |
I think this is getting out of the scope of the issue. Mind continuing the conv. in Discord? |
I'd be interested in that conversation |
@waalge I'll create issues out of the conversation 👍 |
@MartinSchere, back to your question:
Thanks to @fivebinaries trying things out on mainnet, I now have an answer. On the current master ( 29835e5 ), querying one of the largest address on mainnet (200k+ UTxO):
That's about 132MB of JSON streamed from the database. So I guess we can say that ~6s is an upper-bound 😅 |
@KtorZ Thanks for the metric. Guess what, that's jpg.store's contract |
Describe your idea, in simple words.
Kupo currently synchronizes and stores the entire chain history (matching the provided patterns). However, there are use-cases where users are only interested in the edge of the UTxO graph. See the to e discussion here: #19
A 2-step idea could be to:
spent
orunspent
as blocks are being scrutinized and allow searching only matches that are spent or unspent.spent
entries from the database.Why is it a good idea?
Are you willing to work on it yourself?
Yes.
The text was updated successfully, but these errors were encountered: