Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB Migration Fails When Running Upgrade 6c7537a6004a -> 3e1b21cd94a4 #6606

Closed
3 tasks done
MikeMangialardi opened this issue Jan 7, 2019 · 11 comments
Closed
3 tasks done
Labels
!deprecated-label:bug Deprecated label - Use #bug instead inactive Inactive for >= 30 days

Comments

@MikeMangialardi
Copy link

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Superset version

Superset 0.999.0dev

Expected results

DB Upgrade migration completes successfully

Actual results

DB Upgrade migration fails with the following error:

INFO  [alembic.runtime.migration] Running upgrade 6c7537a6004a -> 3e1b21cd94a4, change_owner_to_m2m_relation_on_datasources.py
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL SYSCALL error: EOF detected


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/superset", line 4, in <module>
    __import__('pkg_resources').run_script('superset==0.999.0.dev0', 'superset')
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1438, in run_script
    exec(code, namespace, namespace)
  File "/usr/lib/python3.6/site-packages/superset-0.999.0.dev0-py3.6.egg/EGG-INFO/scripts/superset", line 15, in <module>
    cli()
  File "/usr/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/flask/cli.py", line 557, in main
    return super(FlaskGroup, self).main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/flask/cli.py", line 412, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/flask_migrate/cli.py", line 134, in upgrade
    _upgrade(directory, revision, sql, tag, x_arg)
  File "/usr/lib/python3.6/site-packages/flask_migrate/__init__.py", line 95, in wrapped
    f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/flask_migrate/__init__.py", line 280, in upgrade
    command.upgrade(config, revision, sql=sql, tag=tag)
  File "/usr/lib/python3.6/site-packages/alembic/command.py", line 254, in upgrade
    script.run_env()
  File "/usr/lib/python3.6/site-packages/alembic/script/base.py", line 427, in run_env
    util.load_python_file(self.dir, 'env.py')
  File "/usr/lib/python3.6/site-packages/alembic/util/pyfiles.py", line 81, in load_python_file
    module = load_module_py(module_id, path)
  File "/usr/lib/python3.6/site-packages/alembic/util/compat.py", line 145, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/lib/python3.6/site-packages/superset-0.999.0.dev0-py3.6.egg/superset/migrations/env.py", line 100, in <module>
    run_migrations_online()
  File "/usr/lib/python3.6/site-packages/superset-0.999.0.dev0-py3.6.egg/superset/migrations/env.py", line 93, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/usr/lib/python3.6/site-packages/alembic/runtime/environment.py", line 836, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/usr/lib/python3.6/site-packages/alembic/runtime/migration.py", line 330, in run_migrations
    step.migration_fn(**kw)
  File "/usr/lib/python3.6/site-packages/superset-0.999.0.dev0-py3.6.egg/superset/migrations/versions/3e1b21cd94a4_change_owner_to_m2m_relation_on_.py", line 86, in upgrade
    batch_op.drop_column('user_id')
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/usr/lib/python3.6/site-packages/alembic/operations/base.py", line 300, in batch_alter_table
    impl.flush()
  File "/usr/lib/python3.6/site-packages/alembic/operations/batch.py", line 53, in flush
    fn(*arg, **kw)
  File "/usr/lib/python3.6/site-packages/alembic/ddl/impl.py", line 180, in drop_constraint
    self._exec(schema.DropConstraint(const))
  File "/usr/lib/python3.6/site-packages/alembic/ddl/impl.py", line 115, in _exec
    return conn.execute(construct, *multiparams, **params)
  File "/usr/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/usr/lib/python3.6/site-packages/sqlalchemy/sql/ddl.py", line 68, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "/usr/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1009, in _execute_ddl
    compiled
  File "/usr/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/usr/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/usr/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL SYSCALL error: EOF detected
 [SQL: 'ALTER TABLE tables DROP CONSTRAINT user_id'] (Background on this error at: http://sqlalche.me/e/e3q8)

Steps to reproduce

Build superset from source
Run the command superset db upgrade

@kristw kristw added #question !deprecated-label:bug Deprecated label - Use #bug instead and removed #question labels Jan 8, 2019
@travisgu
Copy link

I am running the docker-compose based on the latest source and got stuck on the same step.
The log prints out:
INFO [alembic.runtime.migration] Running upgrade 6c7537a6004a -> 3e1b21cd94a4, change_owner_to_m2m_relation_on_datasources.py

Then it is hanging and no reponse.

@mattmorganpdx
Copy link

mattmorganpdx commented Jan 16, 2019

I'm having the same problem. When I look at the psql running queries I see 3 seemly identical queries:
SELECT ab_permission_view.id AS ab_permission_view_id, ab_permission_view.permission_id AS ab_permission_view_permission_id, ab_permission_view.view_menu_id AS ab_permission_view_view_menu_id
FROM ab_permission_view
WHERE 13 = ab_permission_view.permission_id AND 26 = ab_permission_view.view_menu_id
LIMIT 1

there is a blocking query:

ALTER TABLE tables DROP CONSTRAINT user_id

I'm working on trying to debug where in the upgrade script that is happening.

@KevinOnEarth
Copy link

I get to the same point. Occasionally, I will get to the next upgrade line, but mostly get stuck here like @travisgu
INFO [alembic.runtime.migration] Running upgrade 6c7537a6004a -> 3e1b21cd94a4, change_owner_to_m2m_relation_on_datasources.py

@mattmorganpdx
Copy link

I'm running with docker compose. I found that if I kill the SELECT ab_permission_view... query with the SELECT pg_terminate_backend(pid) command that the db upgrade would complete. I believe that something the running superset process is doing is locking something required by the alter table command run in the change_owner_to_m2m_relation_on_datasources.py script

@KevinOnEarth
Copy link

I've done that and I get a list of errors, the top of which:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: terminating connection due to administrator command
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

and the following when pulling up the superset page:

 Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
    cursor.execute(statement, parameters)
psycopg2.ProgrammingError: relation "user_attribute" does not exist
LINE 2: FROM user_attribute 

You didn't see these types of errors when you terminated the alter script?

@mohan-kumar
Copy link

mohan-kumar commented Jan 30, 2019

I'm facing same issue. DB upgrade gets stuck at 'ALTER TABLE tables DROP CONSTRAINT user_id'.
@mistercrunch any idea why this is happening?

@mistercrunch
Copy link
Member

So it looks like some the issues are around lingering locks. So either you find the locking processes and kill them, or another option is to schedule downtime, stop all the web servers, and restart the database process prior to running the migration.

There are different ways to hack around alembic (the migration toolkit we use). You can tweak the migration code to skip steps and apply them manually with the proper ALTER command. You can also skip migrations by setting the alembic_version table to a particular version. Say if the command that is getting stuck is removing a now-unused column, you could just decide that this doesn't matter and move past it.

@villebro
Copy link
Member

Ran into this on our test env and was later able to replicate on /contrib/docker. Was able to get around it by doing what @mattmorganpdx did, i.e. starting the upgrade, and once it hangs following the instructions on https://wiki.postgresql.org/wiki/Lock_Monitoring to find the blocking PIDs

SELECT blocked_locks.pid     AS blocked_pid,
         blocked_activity.usename  AS blocked_user,
         blocking_locks.pid     AS blocking_pid,
         blocking_activity.usename AS blocking_user,
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process
   FROM  pg_catalog.pg_locks         blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks         blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
 
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE NOT blocked_locks.GRANTED;

and then killing each violating PID (blocking_pid above) by issuing

SELECT pg_terminate_backend(PID);

Just make sure you kill the blocking PID, not the blocked PID.

@stale
Copy link

stale bot commented Apr 13, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Apr 13, 2019
@kakoni
Copy link
Contributor

kakoni commented Apr 16, 2019

Also ran into this.

 SELECT ab_permission_view.id AS ab_permission_view_id,
        ab_permission_view.permission_id AS ab_permission_view_permission_id, 
        ab_permission_view.view_menu_id AS ab_permission_view_view_menu_id 
 FROM ab_permission_view                                                                                                                                                                         
 WHERE 13 = ab_permission_view.permission_id AND 25 = ab_permission_view.view_menu_id                                                                                                            
 LIMIT 1

Looking at pg_stat_activity, this query was fired first, sitting in idle in transaction state.
Few milli seconds later

ALTER TABLE ab_view_menu ALTER COLUMN name TYPE VARCHAR(255)

is executed, but it locks, as ab_permission_view references ab_view_menu (and select+alter table on the same table block, as per https://www.citusdata.com/blog/2018/02/15/when-postgresql-blocks/)

Anyhow, this was reported earlier, see #1260 (comment)

@stale stale bot removed the inactive Inactive for >= 30 days label Apr 16, 2019
@stale
Copy link

stale bot commented Jun 15, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
!deprecated-label:bug Deprecated label - Use #bug instead inactive Inactive for >= 30 days
Projects
None yet
Development

No branches or pull requests

9 participants