Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow unsupported pickle protocol while moving from Python 3.8 to Python 3.7 #14134

Closed
ghost opened this issue Feb 8, 2021 · 9 comments
Closed
Labels
affected_version:2.0 Issues Reported for 2.0 kind:bug This is a clearly a bug

Comments

@ghost
Copy link

ghost commented Feb 8, 2021

Apache Airflow version: 2.0.0

Environment: Python 3.7

What happened:

Something bad has happened.
Please consider letting us know by creating a bug report using GitHub.
 
Python version: 3.7.9
Airflow version: 2.0.0
Node: 0f52efb20ebe
-------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/airflow/.local/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/airflow/.local/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/airflow/.local/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/airflow/.local/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/airflow/.local/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/www/auth.py", line 34, in decorated
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/www/decorators.py", line 97, in view_func
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/www/decorators.py", line 60, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/www/views.py", line 1887, in tree
    .limit(num_runs)
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3373, in all
    return list(self)
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/loading.py", line 100, in instances
    cursor.close()
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    with_traceback=exc_tb,
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/loading.py", line 80, in instances
    rows = [proc(row) for row in fetch]
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/loading.py", line 80, in <listcomp>
    rows = [proc(row) for row in fetch]
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/loading.py", line 588, in _instance
    populators,
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/loading.py", line 725, in _populate_full
    dict_[key] = getter(row)
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/sql/sqltypes.py", line 1723, in process
    return loads(value)
ValueError: unsupported pickle protocol: 5

How to reproduce it:

It happened moving from Python 3.8 to Python 3.7

Anything else we need to know:

We are using Nomad + Docker to run Airflow, DBs used are Postgres and Redis

The issue is similar to #13317, I've already tried to clear the cookies and to change the address from which I use the web UI

@ghost ghost added the kind:bug This is a clearly a bug label Feb 8, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 8, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@ghost
Copy link
Author

ghost commented Feb 8, 2021

Update: destroying the Postgres db seems to fix the issue. Is there a way to migrate without dropping the entire DB?

@kaxil
Copy link
Member

kaxil commented Feb 8, 2021

You need to find all the Columns in all tables that store a Pickled value and clear them.

Example: clear the dag_pickle table, sometime even conf column in dag_run table needs to be cleared

@kaxil kaxil closed this as completed Feb 8, 2021
@vikramkoka vikramkoka added the affected_version:2.0 Issues Reported for 2.0 label Feb 9, 2021
@atejano
Copy link

atejano commented Feb 3, 2022

You need to find all the Columns in all tables that store a Pickled value and clear them.

Example: clear the dag_pickle table, sometime even conf column in dag_run table needs to be cleared

How do I find the Columns in the airflow tables that store a Pickled value?

@borismo
Copy link
Contributor

borismo commented Feb 4, 2022

clear the dag_pickle table

Do you mean truncate the whole table @kaxil ?

sometime even conf column in dag_run table needs to be cleared

Meaning:

UPDATE dag_run
SET conf = NULL

?

Edit: by the way, I am getting this error without changing the Python version.

@markhatch
Copy link
Contributor

I ran into this as well when upgrading.

Ran airflow db upgrade

@borismo's comment was helpful

UPDATE dag_run
SET conf = NULL;

Also had to drop some picked info in task_instance:

UPDATE task_instance
SET executor_config = NULL;

@obearn
Copy link

obearn commented Apr 20, 2022

@markhatch Thanks a lot for this answer.
Just noticed that the table celery_taskmeta may contain picked data (result, args, kwargs).
During the migration I just deleted from this tables since it is not useful for not running tasks.

@markhatch
Copy link
Contributor

Is there a reason why pickle is being used to store information into the db? Seems like a proper way to cause upgrade issues...

@potiuk
Copy link
Member

potiuk commented Apr 20, 2022

That's how tasks are serialized for celerry for example. So for example tasks can be retried if their fail to be executed. There are multiple reasons for that.

But if you have a proposal how to get rid of it - feel free to discuss it in devlist or better - start a PR :) Airflow has > 2000 contributors so becoming one is a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.0 Issues Reported for 2.0 kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

7 participants