Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pipeline incremental #28

Open
davidgasquez opened this issue Jan 5, 2024 · 3 comments
Open

Make pipeline incremental #28

davidgasquez opened this issue Jan 5, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@davidgasquez
Copy link
Owner

The main idea is to rely on the latest portal data and run smaller incremental on CI. We should provide a --full-refresh flag ala dbt to make data from scratch.

This is a big one!

@davidgasquez davidgasquez added the enhancement New feature or request label Jan 5, 2024
@davidgasquez davidgasquez self-assigned this Jan 5, 2024
@davidgasquez
Copy link
Owner Author

The ideal approach I can think of would be to rely on Dagster partitions and sensors.

  1. Read the data from IPFS (or github actions cache!)
  2. Run Dagster sensors to check which partitions are missing.
  3. Run code for missing partitions and rematerialize datasets.

Perhaps there is a much easier approach we can use while we figure out all thhe Dasgter stuff.

@davidgasquez
Copy link
Owner Author

Thinking about relying on external assets. Make the previous run the external assets and compute the diff using sensors?

@davidgasquez
Copy link
Owner Author

We could also attach to the previous database and use it as the current state. Run sensors and then the remaining partitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant