Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run the pipeline with fake pandera-generated data #23

Open
Galileo-Galilei opened this issue Aug 18, 2023 · 0 comments
Open

Run the pipeline with fake pandera-generated data #23

Galileo-Galilei opened this issue Aug 18, 2023 · 0 comments
Assignees

Comments

@Galileo-Galilei
Copy link
Owner

Galileo-Galilei commented Aug 18, 2023

Description

I want to be able to run pipeline with fake data generated from a dataset schema, mainly for pipeline unit testing or debugging with small dataset.

Context

Unit testing for data pipeline is hard, and this may be a helpful solution. [

Possible Implementation(s)

  • create a kedro pandera dryrun --pipeline <pipeline_name> (name to be defined) command which will generate data for all inputs datasets and run the pipeline thanks to pandera [data synthesis]
  • create a PanderaRunner to run the pipeline with kedro run --runner=PanderaRunner --pipeline <pipeline_name>. The advantage is to stick to the kedro CLI and eventually enable "composition" with other logic; the drawback is that this solution is not compatible with a custom config file we may introduce
@Galileo-Galilei Galileo-Galilei self-assigned this Sep 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant