Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing DataFrameSchema to function that check_output is decorating? #1513

Open
dwinski opened this issue Feb 23, 2024 · 0 comments
Open

Passing DataFrameSchema to function that check_output is decorating? #1513

dwinski opened this issue Feb 23, 2024 · 0 comments
Labels
question Further information is requested

Comments

@dwinski
Copy link

dwinski commented Feb 23, 2024

Is there a way to pass a DataFrameSchema as an argument to the function that check_output (or other check decorator) is decorating? In the examples in the documentation (https://pandera.readthedocs.io/en/stable/decorators.html) the function being decorated is only passed the dataframe to be processed (and validated) and the DataFrameSchema is assigned in the global scope. I'd ideally like to be able to load the DataFrameSchema from a yaml file inside the main function of the script and then make calls to helper dataframe processing functions that make use of validation decorators. To do this I'd need to pass the DataFrameSchema I loaded from yaml as an argument to the dataframe processing functions where the decorators could somehow also access the schema. I'm think of something like code below (if it were possible). Not sure if I'm missing some obvious solution or if I need to create a workaround (maybe define an inner function that gets decorated?) to use this workflow in my script.

import pandas as pd
from pandera import check_output
from pandera.io import from_yaml


@check_output(output_schema)
def load_data(path_to_data, output_schema):
    df = pd.read_csv(path_to_data)
    return(df)
    

def main(config_file):
     '''  Main function for script.  Note that config file is a yaml file with read/write paths for script  '''
     
      # load DataFrameSchema from yaml for validating raw data
      raw_data_schema = from_yaml(config_file['input_schema'])

     # load raw data and pass validation schema to decorator
      raw_data_df = load_data(config_file['raw_data_path'], raw_data_schema)

      #....do more data processing in further steps after raw data has been validated 

@dwinski dwinski added the question Further information is requested label Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant