Skip to content
This repository has been archived by the owner on Jan 13, 2023. It is now read-only.

Chapter 3 : Cascade evaluate ValueError: The pyarrow library is not installed #26

Closed
jeromemassot opened this issue Mar 27, 2021 · 1 comment

Comments

@jeromemassot
Copy link

jeromemassot commented Mar 27, 2021

Dear authors,
the evaluate component of the pipeline fails due to the lack of pyarrow module.

Solved by changing the module request in the pipeline definition :

dsl.pipeline(
    name='Cascade pipeline on SF bikeshare',
    description='Cascade pipeline on SF bikeshare'
)

def cascade_pipeline(
    project_id = PROJECT_ID
):
    ddlop = comp.func_to_container_op(run_bigquery_ddl, packages_to_install=['google-cloud-bigquery'])
        
    c1 = train_classification_model(ddlop, PROJECT_ID)
    c1_model_name = c1.outputs['created_table']
    
    c2a_input = create_training_data(ddlop, PROJECT_ID, c1_model_name, 'Typical')
    c2b_input = create_training_data(ddlop, PROJECT_ID, c1_model_name, 'Long')
    
    c3a_model = train_distance_model(ddlop, PROJECT_ID, c2a_input.outputs['created_table'], 'Typical')
    c3b_model = train_distance_model(ddlop, PROJECT_ID, c2b_input.outputs['created_table'], 'Long')
    
    evalop = comp.func_to_container_op(evaluate, packages_to_install=['google-cloud-bigquery[bqstorage,pandas]', 'pandas'])
    error = evalop(PROJECT_ID, c1_model_name, c3a_model.outputs['created_table'], c3b_model.outputs['created_table'])
    print(error.output)

Best Regards

Jerome

@jeromemassot
Copy link
Author

SOLVED by changing the module import used by the evalop :
evalop = comp.func_to_container_op(evaluate, packages_to_install=['google-cloud-bigquery[bqstorage,pandas]', 'pandas'])

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant