Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull data directly from chain #1

Open
davidgasquez opened this issue Oct 25, 2023 · 9 comments
Open

Pull data directly from chain #1

davidgasquez opened this issue Oct 25, 2023 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@davidgasquez
Copy link
Owner

Currently, we rely on the Allo Indexer API Data. We should add an option to pull data straight from chains using something like cryo or subsquids. This way, we don't need to trust the Allo API data is that's what we want.

@davidgasquez davidgasquez added the enhancement New feature or request label Oct 25, 2023
@davidgasquez davidgasquez self-assigned this Oct 25, 2023
@davidgasquez davidgasquez changed the title Pull data from chains Pull data directly from chain Oct 25, 2023
@davidgasquez
Copy link
Owner Author

Can Gitcoin Data Portal rely on Indexed data?

@davidgasquez
Copy link
Owner Author

Can Gitcoin Data Portal rely on Indexed data?

Probably not because Indexed is missing many chains in which GC rounds are running.

We need something like cryo.

@davidgasquez
Copy link
Owner Author

This works!

import cryo

cryo.collect(
    "transactions",
    blocks=["18.9M"], 
    rpc="https://eth.merkle.io",
    reorg_buffer=1000,
    max_concurrent_chunks=15, 
    inner_request_size=10000,
    output_dir="data",
    contract=["0x03506eD3f57892C85DB20C36846e9c808aFe9ef4"],
    hex=True
)

Don't forget to pip install cryo-python polars though!

@davidgasquez
Copy link
Owner Author

Made a small Colab notebook for people to play around.

From a quick test, it'll take around 52 hour to fully index a that contract, 0x03506eD3f57892C85DB20C36846e9c808aFe9ef4 in Ethereum mainnet.

@DistributedDoge
Copy link
Collaborator

  • got low-effort 4x speedup while fetching events by raising concurrent_chunks to 100.
  • inside collab fetching all (undecoded) logs from Project Registry took 14 seconds (from deployment 400 days ago till now).
  • while TXs need some thinking, if performance inside CI-runner is comparable, event-based assets seem feasible now
import cryo

cryo.freeze(
    "events",
    blocks=["16071515:"], 
    rpc="https://eth.merkle.io",
    reorg_buffer=1000,
    max_concurrent_chunks=100, 
    inner_request_size=10_000,
    output_dir="data_fast",
    contract=["0x03506eD3f57892C85DB20C36846e9c808aFe9ef4"],
    hex=True
)

@davidgasquez
Copy link
Owner Author

Woah! I did try with higher max_concurrent_chunks but didn't get any speedup locally... interesting!

while TXs need some thinking, if performance inside CI-runner is comparable, event-based assets seem feasible now

🚀

@DistributedDoge
Copy link
Collaborator

Just leaving a note that tx data from Covalent is quite neat for analyzing cost side, as it already has dolarized amounts for actual gas cost.

  • I think total gas cost of mainnet transactions dealing with grants stack project profiles was $23k for about 2.3k operations.

Unfortunately, the fetch is a bit on the longer side. Figuring out the incremental part could help save a lot of time and API credits (that we still have aplenty).

  • Free API key request limit of 4/second => need to limit parallel runs for assets of that type
  • 3 minutes to pull 2.3k events in pages of 100 isn't that impressive

@davidgasquez
Copy link
Owner Author

I think total gas cost of mainnet transactions dealing with grants stack project profiles was $23k for about 2.3k operations.

Nice! Would be awesome to publish a report inside Quarto analyzing the new data and showing the process to derive these numbers.

Unfortunately, the fetch is a bit on the longer side. Figuring out the incremental part could help save a lot of time and API credits (that we still have aplenty). Free API key request limit of 4/second => need to limit parallel runs for assets of that type
3 minutes to pull 2.3k events in pages of 100 isn't that impressive

Understandable. Really need to think harder about #28. Meanwhile, we can always do it slow. GitHub actions errors out after... 6 hours I think. 🤷‍♂️

@davidgasquez
Copy link
Owner Author

I'm keeping an eye on mesc and its integration with Cryo. I think there might be a simple approach to get data from multiple chains easily. Probably slower than Covalent, except if we do partitions + incremental!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants