Run the pipeline with fake pandera-generated data #23

Galileo-Galilei · 2023-08-18T07:39:23Z

Description

I want to be able to run pipeline with fake data generated from a dataset schema, mainly for pipeline unit testing or debugging with small dataset.

Unit testing for data pipeline is hard, and this may be a helpful solution. [

create a kedro pandera dryrun --pipeline <pipeline_name> (name to be defined) command which will generate data for all inputs datasets and run the pipeline thanks to pandera [data synthesis]
create a PanderaRunner to run the pipeline with kedro run --runner=PanderaRunner --pipeline <pipeline_name>. The advantage is to stick to the kedro CLI and eventually enable "composition" with other logic; the drawback is that this solution is not compatible with a custom config file we may introduce

The text was updated successfully, but these errors were encountered:

Galileo-Galilei self-assigned this Sep 10, 2023