Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow installing Python SDK without enabling pickling #997

Merged
merged 1 commit into from
Oct 5, 2022

Conversation

kaxil
Copy link
Collaborator

@kaxil kaxil commented Oct 4, 2022

Description

What is the current behavior?

Users can not even run Airflow if Astro SDK is installed but if they don't want to run it. For example, we can't install Astro SDK in the runtime image.

What is the new behavior?

This PR reverts #446 . Without this, users can not even run Airflow if Astro SDK is installed.

This will even allow users to use File and Table objects in inlets and outlets without using any operators or decorators.

This will also allow us to install SDK in Runtime image without impacting users who don't want to use SDK

The only caveat if users don't enable XCom pickling, the following error is raised but that is OK. We would like to get this in Astro SDK 1.2 (or 1.1.2) as we plan to tackle removal of pickling completely in #795:

[2022-10-04, 23:09:10 UTC] {xcom.py:599} ERROR - Could not serialize the XCom value into JSON. If you are using pickle instead of JSON for XCom, then you need to enable pickle support for XCom in your *** config.
[2022-10-04, 23:09:10 UTC] {taskinstance.py:1851} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 2378, in xcom_push
    XCom.set(
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/xcom.py", line 206, in set
    value = cls.serialize_value(
  File "/usr/local/lib/python3.9/site-packages/airflow/models/xcom.py", line 597, in serialize_value
    return json.dumps(value).encode('UTF-8')
  File "/usr/local/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Table is not JSON serializable

This will allow users to use the following without pickling:

import pendulum
from airflow import DAG, Dataset
from airflow.operators.bash import BashOperator

from astro.files.base import File
dag1_dataset = File(path="s3://dag1/output_1.txt")

with DAG(
    dag_id='dataset_produces_1',
    catchup=False,
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    schedule='@daily',
    tags=['produces', 'dataset-scheduled'],
) as dag1:
    BashOperator(outlets=[dag1_dataset], task_id='producing_task_1', bash_command="sleep 5")

with DAG(
    dag_id='dataset_consumes_1',
    catchup=False,
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    schedule=[dag1_dataset],
    tags=['consumes', 'dataset-scheduled'],
) as dag3:
    # [END dag_dep]
    BashOperator(
        outlets=[Dataset('s3://consuming_1_task/dataset_other.txt')],
        task_id='consuming_1',
        bash_command="sleep 5",
    )

This works because of apache/airflow@25a1a1b which we added in 2.4.0

Does this introduce a breaking change?

No

Checklist

  • Created tests which fail without the change (if possible)
  • Extended the README / documentation, if necessary

This PR reverts #446 . Without this, users can not even run Airflow if Astro SDK is installed.

This will even allow users to use `File` and `Table` objects in `inlets` and `outlets` without using any operators or decorators.

This will also allow us to install SDK in Runtime image without impacting users who don't want to use SDK
@codecov
Copy link

codecov bot commented Oct 5, 2022

Codecov Report

Base: 94.46% // Head: 94.45% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (549a1a7) compared to base (7b791e4).
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #997      +/-   ##
==========================================
- Coverage   94.46%   94.45%   -0.01%     
==========================================
  Files          47       47              
  Lines        2113     2110       -3     
  Branches      229      228       -1     
==========================================
- Hits         1996     1993       -3     
  Misses         83       83              
  Partials       34       34              
Impacted Files Coverage Δ
python-sdk/src/astro/__init__.py 100.00% <ø> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.



if not conf.getboolean(section="core", key="enable_xcom_pickling"):
raise OSError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaxil, what if we didn't raise an OSError but had a warning alerting users that some features may not work if enable_xcom_pickling is False? (e.g. aql.dataframe)? It can be misleading to not have any warning, and several features do not work.

I understand the motivation in allowing users to install and use the parts worth it without pickling - but, still, it's better to have a warning (at the very list on the tasks/decorators which will not work, such as aql.dataframe).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aql.dataframe has a check before pushing to XCom though:

else:
if (
not settings.IS_CUSTOM_XCOM_BACKEND
and not settings.ALLOW_UNSAFE_DF_STORAGE
):
raise IllegalLoadToDatabaseException()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where would you like me to add the warning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana I merged it since the error message from Airflow contains useful message as below but do let me know if you would like me to add any warning, happy to do that in a separate PR

Could not serialize the XCom value into JSON. If you are using pickle instead of JSON for XCom, then you need to enable pickle support for XCom in your *** config.

Copy link
Member

@feluelle feluelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current exception looks sufficient to me:

Could not serialize the XCom value into JSON. If you are using pickle instead of JSON for XCom, then you need to enable pickle support for XCom in your *** config.

@kaxil kaxil merged commit edd99ca into main Oct 5, 2022
@kaxil kaxil deleted the remove-error-pickling branch October 5, 2022 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants