Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in action StoreValidationResultAction when checkpoint called via GreatExpectationsOperator with "include_unexpected_rows": True #52

Closed
Grudra1 opened this issue Apr 21, 2022 · 1 comment

Comments

@Grudra1
Copy link

Grudra1 commented Apr 21, 2022

GE version : 0.15.0, 0.15.1

I am calling the GreatExpectationsOperator from one of my airflow dags, where I am passing a checkpoint, which has runtime configuration as below :
runtime_configuration: {"result_format": {"result_format": "SUMMARY","include_unexpected_rows": True}}

Since the unexpected rows are also being pulled, the validation result is failing to store it in local disk. My action list configuration in checkpoint is as below :
_action_list:

  • name: store_validation_result
    action:
    class_name: StoreValidationResultAction
  • name: store_evaluation_params
    action:
    class_name: StoreEvaluationParametersAction
  • name: update_data_docs
    action:
    class_name: UpdateDataDocsAction_

There is no problem when the include_unexpected_rows is toggled to False.

The error Trace is as mentioned below :
_[2022-04-21 14:23:59,760] {cursor.py:720} INFO - query execution done
[2022-04-21 14:24:00,136] {validator.py:1646} INFO - 1 expectation(s) included in expectation_suite.
[2022-04-21 14:24:00,477] {logging_mixin.py:109} WARNING -
Calculating Metrics: 0%| | 0/12 [00:00<?, ?it/s]
[2022-04-21 14:24:00,604] {cursor.py:696} INFO - query: [SHOW /* sqlalchemy:_get_schema_primary_keys /PRIMARY KEYS IN SCHEMA sample_db.pub...]
[2022-04-21 14:24:02,266] {cursor.py:720} INFO - query execution done
[2022-04-21 14:24:02,370] {cursor.py:696} INFO - query: [SELECT /
sqlalchemy:_get_schema_columns / ic.table_name, ic.column_name, ic.da...]
[2022-04-21 14:24:03,873] {cursor.py:720} INFO - query execution done
[2022-04-21 14:24:09,438] {logging_mixin.py:109} WARNING -
Calculating Metrics: 17%|#6 | 2/12 [00:08<00:44, 4.43s/it]
[2022-04-21 14:24:09,465] {cursor.py:696} INFO - query: [SELECT count(
) AS "table.row_count" FROM ge_temp_284c970f]
[2022-04-21 14:24:09,967] {cursor.py:720} INFO - query execution done
[2022-04-21 14:24:10,178] {logging_mixin.py:109} WARNING -
Calculating Metrics: 33%|###3 | 4/12 [00:09<00:16, 2.04s/it]
[2022-04-21 14:24:10,237] {cursor.py:696} INFO - query: [SELECT c_custkey, c_name, c_address, c_nationkey, c_phone, c_acctbal, c_mktsegme...]
[2022-04-21 14:24:10,782] {cursor.py:720} INFO - query execution done
[2022-04-21 14:24:11,241] {cursor.py:696} INFO - query: [SELECT c_nationkey AS unexpected_values FROM ge_temp_284c970f WHERE c_nationkey ...]
[2022-04-21 14:24:11,770] {cursor.py:720} INFO - query execution done
[2022-04-21 14:24:13,527] {logging_mixin.py:109} WARNING -
Calculating Metrics: 83%|########3 | 10/12 [00:12<00:01, 1.03it/s]
[2022-04-21 14:24:13,641] {cursor.py:696} INFO - query: [SELECT sum(CASE WHEN (c_nationkey IS NULL) THEN 1 ELSE 0 END) AS "column_values....]
[2022-04-21 14:24:14,463] {cursor.py:720} INFO - query execution done
[2022-04-21 14:24:15,786] {logging_mixin.py:109} WARNING -
Calculating Metrics: 100%|##########| 12/12 [00:15<00:00, 1.00s/it]
[2022-04-21 14:24:15,909] {logging_mixin.py:109} WARNING -
Calculating Metrics: 100%|##########| 12/12 [00:15<00:00, 1.28s/it]
[2022-04-21 14:24:15,912] {logging_mixin.py:109} WARNING -
[2022-04-21 14:24:16,255] {validation_operators.py:465} ERROR - Error running action with name store_validation_result
Traceback (most recent call last):
File "/root/.local/lib/python3.6/site-packages/great_expectations/validation_operators/validation_operators.py", line 453, in _run_actions
checkpoint_identifier=checkpoint_identifier,
File "/root/.local/lib/python3.6/site-packages/great_expectations/checkpoint/actions.py", line 78, in run
**kwargs,
File "/root/.local/lib/python3.6/site-packages/great_expectations/checkpoint/actions.py", line 830, in _run
expectation_suite_id=expectation_suite_ge_cloud_id,
File "/root/.local/lib/python3.6/site-packages/great_expectations/data_context/store/store.py", line 161, in set
self.key_to_tuple(key), self.serialize(key, value), **kwargs
File "/root/.local/lib/python3.6/site-packages/great_expectations/data_context/store/validations_store.py", line 168, in serialize
value, indent=2, sort_keys=True
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 584, in dumps
serialized = self.dump(obj, many=many)
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 560, in dump
result = self._serialize(processed_obj, many=many)
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 524, in _serialize
value = field_obj.serialize(attr_name, obj, accessor=self.get_attribute)
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/fields.py", line 309, in serialize
return self._serialize(value, attr, obj, **kwargs)
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/fields.py", line 697, in _serialize
return [self.inner._serialize(each, attr, obj, **kwargs) for each in value]
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/fields.py", line 697, in
return [self.inner._serialize(each, attr, obj, **kwargs) for each in value]
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/fields.py", line 564, in _serialize
return schema.dump(nested_obj, many=many)
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 555, in dump
PRE_DUMP, obj, many=many, original_data=obj
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 1075, in _invoke_dump_processors
tag, pass_many=False, data=data, many=many, original_data=original_data
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 1231, in _invoke_processors
data = processor(data, many=many, **kwargs)
File "/root/.local/lib/python3.6/site-packages/great_expectations/core/expectation_validation_result.py", line 253, in convert_result_to_serializable
data.result = convert_to_json_serializable(data.result)
File "/root/.local/lib/python3.6/site-packages/great_expectations/core/util.py", line 161, in convert_to_json_serializable
new_dict[str(key)] = convert_to_json_serializable(data[key])
File "/root/.local/lib/python3.6/site-packages/great_expectations/core/util.py", line 168, in convert_to_json_serializable
new_list.append(convert_to_json_serializable(val))
File "/root/.local/lib/python3.6/site-packages/great_expectations/core/util.py", line 257, in convert_to_json_serializable
f"{str(data)} is of type {type(data).name} which cannot be serialized."
TypeError: (891097, 'Customer#0007', 'eRL', 21, '31-843-843', Decimal('50.04'), 'FUTURE', 'some junk string for testing') is of type RowProxy which cannot be serialized.
[2022-04-21 14:24:16,838] {taskinstance.py:1463} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1165, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1283, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1313, in _execute_task
result = task_copy.execute(context=context)
File "/root/.local/lib/python3.6/site-packages/great_expectations_provider/operators/great_expectations.py", line 160, in execute
result = self.checkpoint.run()
File "/root/.local/lib/python3.6/site-packages/great_expectations/core/usage_statistics/usage_statistics.py", line 287, in usage_statistics_wrapped_method
result = func(*args, **kwargs)
File "/root/.local/lib/python3.6/site-packages/great_expectations/checkpoint/checkpoint.py", line 167, in run
validation_dict=validation_dict,
File "/root/.local/lib/python3.6/site-packages/great_expectations/checkpoint/checkpoint.py", line 367, in _run_validation
**operator_run_kwargs,
File "/root/.local/lib/python3.6/site-packages/great_expectations/core/async_executor.py", line 100, in submit
return AsyncResult(value=fn(*args, **kwargs))
File "/root/.local/lib/python3.6/site-packages/great_expectations/validation_operators/validation_operators.py", line 392, in run
checkpoint_identifier=checkpoint_identifier,
File "/root/.local/lib/python3.6/site-packages/great_expectations/validation_operators/validation_operators.py", line 466, in _run_actions
raise e
File "/root/.local/lib/python3.6/site-packages/great_expectations/validation_operators/validation_operators.py", line 453, in _run_actions
checkpoint_identifier=checkpoint_identifier,
File "/root/.local/lib/python3.6/site-packages/great_expectations/checkpoint/actions.py", line 78, in run
**kwargs,
File "/root/.local/lib/python3.6/site-packages/great_expectations/checkpoint/actions.py", line 830, in _run
expectation_suite_id=expectation_suite_ge_cloud_id,
File "/root/.local/lib/python3.6/site-packages/great_expectations/data_context/store/store.py", line 161, in set
self.key_to_tuple(key), self.serialize(key, value), **kwargs
File "/root/.local/lib/python3.6/site-packages/great_expectations/data_context/store/validations_store.py", line 168, in serialize
value, indent=2, sort_keys=True
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 584, in dumps
serialized = self.dump(obj, many=many)
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 560, in dump
result = self._serialize(processed_obj, many=many)
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 524, in _serialize
value = field_obj.serialize(attr_name, obj, accessor=self.get_attribute)
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/fields.py", line 309, in serialize
return self._serialize(value, attr, obj, **kwargs)
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/fields.py", line 697, in _serialize
return [self.inner._serialize(each, attr, obj, **kwargs) for each in value]
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/fields.py", line 697, in
return [self.inner._serialize(each, attr, obj, **kwargs) for each in value]
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/fields.py", line 564, in _serialize
return schema.dump(nested_obj, many=many)
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 555, in dump
PRE_DUMP, obj, many=many, original_data=obj
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 1075, in _invoke_dump_processors
tag, pass_many=False, data=data, many=many, original_data=original_data
File "/root/.local/lib/python3.6/site-packages/great_expectations/marshmallow__shade/schema.py", line 1231, in invoke_processors
data = processor(data, many=many, **kwargs)
File "/root/.local/lib/python3.6/site-packages/great_expectations/core/expectation_validation_result.py", line 253, in convert_result_to_serializable
data.result = convert_to_json_serializable(data.result)
File "/root/.local/lib/python3.6/site-packages/great_expectations/core/util.py", line 161, in convert_to_json_serializable
new_dict[str(key)] = convert_to_json_serializable(data[key])
File "/root/.local/lib/python3.6/site-packages/great_expectations/core/util.py", line 168, in convert_to_json_serializable
new_list.append(convert_to_json_serializable(val))
File "/root/.local/lib/python3.6/site-packages/great_expectations/core/util.py", line 257, in convert_to_json_serializable
f"{str(data)} is of type {type(data).name} which cannot be serialized."
TypeError: (891097, 'Customer#0007', 'eRL', 21, '31-843-843', Decimal('50.04'), 'FUTURE', 'some junk string for testing') is of type RowProxy which cannot be serialized.
[2022-04-21 14:24:17,362] {taskinstance.py:1513} INFO - Marking task as FAILED. dag_id=ge_test_01, task_id=validation_customer_table, execution_date=20220421T142326, start_date=20220421T142338, end_date=20220421T142417
[2022-04-21 14:24:22,341] {local_task_job.py:151} INFO - Task exited with return code 1
[2022-04-21 14:24:23,283] {local_task_job.py:261} INFO - 0 downstream tasks scheduled from follow-on schedule check

I think calling the bashoperator to run the checkpoint can be a workaround , but we want to avoid the bashoperator.

Thanks in advance !!

@denimalpaca
Copy link
Contributor

Hey @Grudra1 , does this still occur in the newest version of the operator?

@denimalpaca denimalpaca closed this as not planned Won't fix, can't repro, duplicate, stale Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants