-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite AppData sync using csv upload feat #103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So glad you are finally using the upload CSV endpoint! I mentioned this to @fhenneke several months ago. Be aware that this endpoint consumes API credits, so performing this regularly might use a lot (while the old solution would have remained free indefinitely).
Note that you can now also remove all the AWS dependencies and utilities from this code base (to make sure you caught everything).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great change.
Potentially the whole DuneFetcher
code can be removed as well.
We should eventually move other tables to the new API as well.
tests/e2e/test_sync_app_data.py
Outdated
max_retries=2, | ||
give_up_threshold=3, | ||
self.dune = DuneClient(os.environ["DUNE_API_KEY"]) | ||
self.fetcher = DuneFetcher(self.dune) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DuneFetcher
does not seem to be used anywhere. Would it be safe to remove that class?
Ahh it's just part of the storage plan on the data set. But looks like you've got 15Gb on the plus plan so nothing to worry about. |
This PR dramatically simplifies the way we sync app_data into Dune. Instead of looking for hashes we see on Dune for which we don't have the app_data pre-image and then trying to fetch this data from IPFS, we simply mirror the entire app_data pre-image table we have locally to Dune using their csv upload feature.
We ensured that we are no longer seeing orders for which app_data can only be retrieved from IPFS (to the contrary, we see more and more orders for which the pre image has only been written into the DB).
In order to achieve this, we
sync/app_data.py
and its configuration to basically be a full tablescan from the backend which gets written using the CSV upload feature of the dune python clientApp data for each network (mainnet, gnosis, arbitrum) will be written in its own table (since each network requires a separate db connection to the designated target db).
The table name can be specified as an additional parameter (which feels a bit 🤡, I'm not sure how sync job specific arguments were envisioned in the current architecture)
Test Plan
Both
python3 -m src.main --sync-table app_data
as well aspython -m pytest tests/e2e/test_sync_app_data.py
pass