Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite AppData sync using csv upload feat #103

Merged
merged 9 commits into from
Jul 11, 2024
Merged

Conversation

fleupold
Copy link
Contributor

This PR dramatically simplifies the way we sync app_data into Dune. Instead of looking for hashes we see on Dune for which we don't have the app_data pre-image and then trying to fetch this data from IPFS, we simply mirror the entire app_data pre-image table we have locally to Dune using their csv upload feature.

We ensured that we are no longer seeing orders for which app_data can only be retrieved from IPFS (to the contrary, we see more and more orders for which the pre image has only been written into the DB).

In order to achieve this, we

  • Dramatically reduced the logic inside sync/app_data.py and its configuration to basically be a full tablescan from the backend which gets written using the CSV upload feature of the dune python client
  • Removed queries that were related to finding missing app data hashes
  • Rewrote the e2e test for app data hashes

App data for each network (mainnet, gnosis, arbitrum) will be written in its own table (since each network requires a separate db connection to the designated target db).
The table name can be specified as an additional parameter (which feels a bit 🤡, I'm not sure how sync job specific arguments were envisioned in the current architecture)

Test Plan

Both python3 -m src.main --sync-table app_data as well as python -m pytest tests/e2e/test_sync_app_data.py pass

@fleupold fleupold requested a review from harisang July 10, 2024 12:22
Copy link
Contributor

@bh2smith bh2smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So glad you are finally using the upload CSV endpoint! I mentioned this to @fhenneke several months ago. Be aware that this endpoint consumes API credits, so performing this regularly might use a lot (while the old solution would have remained free indefinitely).

Note that you can now also remove all the AWS dependencies and utilities from this code base (to make sure you caught everything).

Copy link
Contributor

@fhenneke fhenneke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great change.

Potentially the whole DuneFetcher code can be removed as well.

We should eventually move other tables to the new API as well.

max_retries=2,
give_up_threshold=3,
self.dune = DuneClient(os.environ["DUNE_API_KEY"])
self.fetcher = DuneFetcher(self.dune)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DuneFetcher does not seem to be used anywhere. Would it be safe to remove that class?

src/main.py Outdated Show resolved Hide resolved
Base automatically changed from update_dune_client to main July 11, 2024 12:45
@fleupold
Copy link
Contributor Author

Be aware that this endpoint consumes API credits, so performing this regularly might use a lot

I wasn't able to find any details on how much credits are used (so I ended up thinking it might be free). @bh2smith do you know where this is documented? Not here afaict.

@bh2smith
Copy link
Contributor

bh2smith commented Jul 11, 2024

Ahh it's just part of the storage plan on the data set. But looks like you've got 15Gb on the plus plan so nothing to worry about.

@fleupold fleupold merged commit b778238 into main Jul 11, 2024
4 checks passed
@fleupold fleupold deleted the rewrite_app_data_sync branch July 11, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants