Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with pickling & unpickling the Table object #871

Merged
merged 7 commits into from
Sep 16, 2022

Conversation

utkarsharma2
Copy link
Collaborator

Description

What is the current behavior?

Currently, Python-sdk is falling with pickling error.

airflow.exceptions.AirflowException: When using a synchronous executor (e.g. SequentialExecutor and DebugExecutor), the first run of this task will fail on purpose, so the single worker thread is unblocked to execute other tasks. The task is set up for retry and eventually works.
[2022-09-16 22:52:28,847] {backfill_job.py:190} ERROR - Task instance <TaskInstance: example_google_bigquery_gcs_load_and_save.cleanup backfill__2016-01-01T00:00:00+00:00 [failed]> failed
[2022-09-16 22:52:28,851] {backfill_job.py:370} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 3 | succeeded: 1 | running: 0 | failed: 1 | skipped: 0 | deadlocked: 0 | not ready: 3
[2022-09-16 22:52:28,865] {base_executor.py:95} INFO - Adding to queue: ['<TaskInstance: example_google_bigquery_gcs_load_and_save.extract_top_5_movies backfill__2016-01-01T00:00:00+00:00 [queued]>']
[2022-09-16 22:52:28,872] {base_executor.py:95} INFO - Adding to queue: ['<TaskInstance: example_google_bigquery_gcs_load_and_save.save_file_to_gcs backfill__2016-01-01T00:00:00+00:00 [queued]>']
[2022-09-16 22:52:28,901] {debug_executor.py:85} ERROR - Failed to execute task: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte.
Traceback (most recent call last):
  File "/Users/utkarsharma/sandbox/astronomer/astro/.nox/dev/lib/python3.8/site-packages/airflow/models/xcom.py", line 618, in deserialize_value
    return pickle.loads(result.value)
_pickle.UnpicklingError: state is not a dictionary

What is the new behavior?

To fix the above issue we need to add getstate() to the table class

Does this introduce a breaking change?

Nope

Checklist

  • Created tests that fail without the change (if possible)
  • Extended the README/documentation, if necessary

Comment on lines 151 to 153
def test_if_table_object_can_be_pickled():
"""Verify if we can pickle Table object"""
pickle.loads(pickle.dumps(Table(name="test")))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this test for File object. The reason self.__dict__ is not available for File is because it uses slots.

A class whose instances have no object.dict attribute and define their attributes in a object.slots attribute instead. In attrs, they are created by passing slots=True to @attr.s (and are on by default in attrs.define()/attrs.mutable()/attrs.frozen()).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But atleast this test will verify that you can pickle and unpickle the object.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the following be a better test?

table = Table(name="test")
assert pickle.loads(pickle.dumps(table)) == table

cc @ashb

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaxil Ya this seems better.

@kaxil kaxil changed the title Add __getstate__() to table class Fix pickling issue with Table object Sep 16, 2022
@kaxil kaxil changed the title Fix pickling issue with Table object Fix issues with pickling & unpickling the Table object Sep 16, 2022
Copy link
Collaborator

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving so you are not blocked but please address the review comments

@utkarsharma2 utkarsharma2 merged commit 07c27c0 into main Sep 16, 2022
@utkarsharma2 utkarsharma2 deleted the PicklingIssueFix branch September 16, 2022 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants