Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro pandera coverage #34

Open
datajoely opened this issue Aug 31, 2023 · 2 comments
Open

kedro pandera coverage #34

datajoely opened this issue Aug 31, 2023 · 2 comments

Comments

@datajoely
Copy link

Description

The more I think about the importance of data contracts ensuring coverage checks as part of a team's workflow feels like a natural evolution of this pattern.

Context

The way I see this, there are two standards a user should aim for:

  • A "gold standard 🥇" pattern where every dataset in your project has pandera schemas attached (all parameter inputs also have pandera/pydantic definitions too)
  • A "silver standard 🥈" pattern where just the free-inputs/outputs of a pipeline are properly validated and the rest is treated a closed box.

Possible Implementation

  • Build an AST introspection utility which uses an instantiated KedroSession object to validate state

Possible Alternatives

  • Look at building a Pylint plugin to do the same thing
@Galileo-Galilei
Copy link
Owner

I like the idea, but why would we need AST introspection? Couldn't just we if all the dataset in a pipeline have a schema attached in their metadata? Is it related to the decorator way to trigger data checks?

@datajoely
Copy link
Author

So I was thinking about supporting the Python annotations (as well as the catalog metadata), but we don't actually have to do that using static analysis we can actually just inspect the live objects at pipeline creation time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants