`kedro pandera coverage` #34

datajoely · 2023-08-31T11:15:08Z

Description

The more I think about the importance of data contracts ensuring coverage checks as part of a team's workflow feels like a natural evolution of this pattern.

Context

The way I see this, there are two standards a user should aim for:

A "gold standard 🥇" pattern where every dataset in your project has pandera schemas attached (all parameter inputs also have pandera/pydantic definitions too)
A "silver standard 🥈" pattern where just the free-inputs/outputs of a pipeline are properly validated and the rest is treated a closed box.

Possible Implementation

Build an AST introspection utility which uses an instantiated KedroSession object to validate state

Possible Alternatives

Look at building a Pylint plugin to do the same thing

Galileo-Galilei · 2023-09-01T21:01:16Z

I like the idea, but why would we need AST introspection? Couldn't just we if all the dataset in a pipeline have a schema attached in their metadata? Is it related to the decorator way to trigger data checks?

datajoely · 2023-09-04T09:27:10Z

So I was thinking about supporting the Python annotations (as well as the catalog metadata), but we don't actually have to do that using static analysis we can actually just inspect the live objects at pipeline creation time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`kedro pandera coverage` #34

`kedro pandera coverage` #34

datajoely commented Aug 31, 2023

Galileo-Galilei commented Sep 1, 2023

datajoely commented Sep 4, 2023

kedro pandera coverage #34

kedro pandera coverage #34

Comments

datajoely commented Aug 31, 2023

Description

Context

Possible Implementation

Possible Alternatives

Galileo-Galilei commented Sep 1, 2023

datajoely commented Sep 4, 2023

`kedro pandera coverage` #34

`kedro pandera coverage` #34