Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create regular public database snapshots #3

Closed
jamiew opened this issue May 20, 2021 · 14 comments
Closed

Create regular public database snapshots #3

jamiew opened this issue May 20, 2021 · 14 comments
Labels

Comments

@jamiew
Copy link
Contributor

jamiew commented May 20, 2021

Discussed with spillere --

  • Capture & share regular postgres database backups (huge, hundreds of GB)
  • Capture & share regular blockchain ledger snapshots too? (50MB/each)

For fast ETL spinup might actual full badgerdb too for ETL rather than simple ledger snapshot. I forget

@dansku
Copy link
Collaborator

dansku commented May 20, 2021

Will be great to share the results here of how long will take to build the snapshot along with how big the files are.
Find a solution on where to host them and also build the scripts to automate everything.

just to left it here, my current blockchain-etl has 35GB

@jamiew jamiew added the devops label Aug 8, 2021
@jamiew
Copy link
Contributor Author

jamiew commented Sep 17, 2021

danksu did this:

http://etl-snapshots.dewi.org/

@mfalkvidd
Copy link

Has Dewi considered using the compressed output format of pg_dump for the ETL dumps? That would avoid the extra step to zip the dump (on Dewi's side) and to unzip (on the restore side). Would save time and disk space when restoring.

@dansku
Copy link
Collaborator

dansku commented Oct 26, 2021

Has Dewi considered using the compressed output format of pg_dump for the ETL dumps? That would avoid the extra step to zip the dump (on Dewi's side) and to unzip (on the restore side). Would save time and disk space when restoring.

could you provide a dump/restore command to test?

@mfalkvidd
Copy link

Sorry, I can't. I've spent two weeks trying to follow various guides and advice on Discord but my Etl instance is still far from functional.

The pg_dump man page mentions the -Z flag for compression though, so adding compression should be as easy as adding the Z flag.

@KnightAR
Copy link

KnightAR commented Mar 8, 2022

danksu did this:

http://etl-snapshots.dewi.org/

The site is giving a 500 Internal Server Error. :(

@davetapley
Copy link

Found this via need to do davetapley/helium-tax#83 ahead of new tax year in USA.
I see https://etl-snapshots.dewi.org/ is still going, but with 1TB file 😱

Just an idea: I've been using https://duckdb.org/ and its ability to use parquet files for another project (migrating from Postgres for OLAP work) and the file sizes are staggeringly smaller.

It gets even more mind-bending when you discover duckdb can reference non-local parquet files (e.g. on GCS) and due to the format can run queries against it without pulling the whole file 🤯

I see the ETL page @dansku built uses https://storage.googleapis.com (assuming that's GCS?), so it might not be too difficult to generate parquet files there instead.

Happy to do some work on it, if someone can point me to where http://etl-snapshots.dewi.org/ comes from?

@jamiew
Copy link
Contributor Author

jamiew commented Dec 24, 2022 via email

@dansku
Copy link
Collaborator

dansku commented Dec 24, 2022

Indeed we can do that @davetapley

@davetapley
Copy link

Awesome! It's a bit hard to test without access to DeWi ETL server,
but I can outline script here and 🤞🏻 it'll be pretty easy to get running.

I just need oracle price history for purposes of davetapley/helium-tax#83,
so perhaps start with the oracle_prices table and add others as/when required?


Script:

  1. Download https://github.com/duckdb/duckdb/releases/download/v0.6.1/duckdb_cli-linux-amd64.zip
  2. Unzip and get duckdb binary
  3. Save SQL below into parquet_dump.sql (and set DeWi ETL server creds)
  4. Invoke with ./duckdb < parquet_dump.sql
  5. Copy exported parquet_dump folder to http://etl-snapshots.dewi.org/
INSTALL postgres;
LOAD postgres;
-- DeWi ETL server creds here:
CALL postgres_attach('host=HOST dbname=DBNAME user=USER password=PASS');

CREATE TABLE prices (block BIGINT, price BIGINT);
INSERT INTO prices (block, price) SELECT block, price FROM oracle_prices;
EXPORT DATABASE 'parquet_dump' (FORMAT PARQUET);

There's probably a way to go straight from Postgres to Parquet, but I figure it'd require a Postgres extension, and I know this approach works and only requires duckdb binary, which is nice.

@dansku
Copy link
Collaborator

dansku commented Dec 25, 2022

oracle_prices.csv

this data is quite small

@davetapley
Copy link

@dansku nice. Is there anyway we can get that on http://etl-snapshots.dewi.org/ ?

Also: Want to try Parquet dump (as shown above) for rewards?

@davetapley
Copy link

@dansku could you dump that .csv one more time so I have all of 2022? 🙏🏻

quite a lot of people asking for it over on:

which is blocked on:

@dansku
Copy link
Collaborator

dansku commented Jan 9, 2023

hey @davetapley adding the oracle price dump again
oracle_prices.csv

you can also do tax reporting with hotspotty.net

@dansku dansku closed this as completed Jan 9, 2023
davetapley added a commit to davetapley/helium-tax that referenced this issue Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants