Make pipeline incremental #28

davidgasquez · 2024-01-05T18:55:05Z

The main idea is to rely on the latest portal data and run smaller incremental on CI. We should provide a --full-refresh flag ala dbt to make data from scratch.

This is a big one!

The text was updated successfully, but these errors were encountered:

davidgasquez · 2024-01-05T18:59:23Z

The ideal approach I can think of would be to rely on Dagster partitions and sensors.

Read the data from IPFS (or github actions cache!)
Run Dagster sensors to check which partitions are missing.
Run code for missing partitions and rematerialize datasets.

Perhaps there is a much easier approach we can use while we figure out all thhe Dasgter stuff.

davidgasquez · 2024-01-09T09:30:22Z

Thinking about relying on external assets. Make the previous run the external assets and compute the diff using sensors?

davidgasquez · 2024-01-19T17:15:56Z

We could also attach to the previous database and use it as the current state. Run sensors and then the remaining partitions.

davidgasquez added the enhancement New feature or request label Jan 5, 2024

davidgasquez self-assigned this Jan 5, 2024

davidgasquez mentioned this issue Jan 6, 2024

Migrate out of Dagster IO Managers #32

Closed

davidgasquez mentioned this issue Jan 11, 2024

Pull data directly from chain #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make pipeline incremental #28

Make pipeline incremental #28

davidgasquez commented Jan 5, 2024

davidgasquez commented Jan 5, 2024

davidgasquez commented Jan 9, 2024

davidgasquez commented Jan 19, 2024

Make pipeline incremental #28

Make pipeline incremental #28

Comments

davidgasquez commented Jan 5, 2024

davidgasquez commented Jan 5, 2024

davidgasquez commented Jan 9, 2024

davidgasquez commented Jan 19, 2024