Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrations for 0.11 hangs with posgresql #1260

Closed
karec opened this issue Oct 5, 2016 · 20 comments
Closed

Migrations for 0.11 hangs with posgresql #1260

karec opened this issue Oct 5, 2016 · 20 comments

Comments

@karec
Copy link

karec commented Oct 5, 2016

OS Debian GNU/Linux 8
Postgresql 9.5
Caravel version 0.11.0

To reproduce :

pip install caravel --upgrade
caravel db upgrade

Additional information

  • Webserver not running
  • No application using the database
  • Hangs even if db upgrade is called after postgresql restart

Postgres activity will migration hangs :

blocked_pid | blocked_user | blocking_pid | blocking_user |                         blocked_statement                          |                                  current_statement_in_blocking_process                                   
-------------+--------------+--------------+---------------+--------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------
        8050 | logs         |         8049 | logs          |                                                                   +| UPDATE alembic_version SET version_num='ab3d66c4246e' WHERE alembic_version.version_num = 'eca4694defa7'
             |              |              |               |             SELECT a.attname,                                     +| 
             |              |              |               |               pg_catalog.format_type(a.atttypid, a.atttypmod),    +| 
             |              |              |               |               (SELECT pg_catalog.pg_get_expr(d.adbin, d.adrelid)  +| 
             |              |              |               |                 FROM pg_catalog.pg_attrdef d                      +| 
             |              |              |               |                WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum+| 
             |              |              |               |                AND a.atthasdef)                                   +| 
             |              |              |               |               AS DEFAULT,                                         +| 
             |              |              |               |               a.attnotnull, a.attnum, a.attrelid as table_oid     +| 
             |              |              |               |             FROM pg_catalog.pg_attribute a                        +| 
             |              |              |               |             WHERE a.attrelid = 16962                              +| 
             |              |              |               |             AND a.attnum > 0 AND NOT a.attisdropped               +| 
             |              |              |               |             ORDER BY a.attnum                                     +| 

Migrations logs :

INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 960c69cb1f5b -> f162a1dea4c4, d3format_by_metric
INFO  [alembic.runtime.migration] Running upgrade f162a1dea4c4 -> ad82a75afd82, Update models to support storing the queries.
INFO  [alembic.runtime.migration] Running upgrade ad82a75afd82 -> 3c3ffe173e4f, add_sql_string_to_table
INFO  [alembic.runtime.migration] Running upgrade 3c3ffe173e4f -> 41f6a59a61f2, database options for sql lab
INFO  [alembic.runtime.migration] Running upgrade 41f6a59a61f2 -> 4500485bde7d, allow_run_sync_async
INFO  [alembic.runtime.migration] Running upgrade 4500485bde7d -> 65903709c321, allow_dml
INFO  [alembic.runtime.migration] Running upgrade 41f6a59a61f2 -> 33d996bcc382
INFO  [alembic.runtime.migration] Running upgrade 33d996bcc382, 65903709c321 -> b347b202819b, empty message
INFO  [alembic.runtime.migration] Running upgrade b347b202819b -> 5e4a03ef0bf0, Add access_request table to manage requests to access datastores.
INFO  [alembic.runtime.migration] Running upgrade 5e4a03ef0bf0 -> eca4694defa7, sqllab_setting_defaults
INFO  [alembic.runtime.migration] Running upgrade eca4694defa7 -> ab3d66c4246e, add_cache_timeout_to_druid_cluster
INFO  [alembic.runtime.migration] Running upgrade eca4694defa7 -> 3b626e2a6783, Sync DB with the models.py.

Edit :

First query to hangs is :

SELECT ab_permission_view.id AS ab_permission_view_id, ab_permission_view.permission_id AS ab_permission_view_permission_id, ab_permission_view.view_menu_id AS ab_permission_view_view_menu_id

But even after killing it, migration still hangs at the same point

@woel0007
Copy link

Was there any follow-up to this issue? I'm experiencing the same challenge, upgrading the database from eca4694defa7 to 3b626e2a6783 hangs...

Postgres Version 9.5.3.

Has anyone else been able to successfully upgrade the DB?

@mistercrunch
Copy link
Member

What I don't understand is that the builds on Postgres go through the migrations without any issues. Maybe it's because the tables are empty while they run on our side? A Postgres version issue?

It may help to find which exact line of the migration script hangs and research that specifically.

@woel0007
Copy link

woel0007 commented Oct 18, 2016

I've done a bit more fiddling with this issue this AM and found one interesting tidbit: If the migration is run one-step at a time in sequence, the upgrade succeeds without hanging. Perhaps there's something wrong with the branch/merge?

My starting point was 'eca4694defa7'
I tried to just simply run upgrade, which should have installed 'ab3d66c4246e', '3b626e2a6783', 'ef8843b41dac' and 'b46fa1b0b39e', but it hangs when attempting to install '3b626e2a6783'.

However, running each upgrade individually, one-after-the-other worked just fine:
./caravel/bin/caravel db upgrade ab3d66c4246e
./caravel/bin/caravel db upgrade 3b626e2a6783
./caravel/bin/caravel db upgrade ef8843b41dac
./caravel/bin/caravel db upgrade b46fa1b0b39e

This isn't so much a fix as it is a workaround as I can't really explain what is going wrong with the straight upgrade... but it solved my immediate issue.

@agfunder
Copy link

I succeeded in upgrading by using @woel0007 's method of manual upgrades.
I determined migration order by inspecting 'down_revision' in migrations folder (I assume there is a better way)

However at multiple points I had to kill 'idle in transaction' postgres PIDs using select pg_terminate_backend(PID) as per #238 (I hope a better way is found).

@phreatom
Copy link

phreatom commented Nov 3, 2016

I got the behavior of #238 in 0.12 when doing a fresh install. The first migration produced the deadlock. All following migrations did succeed.

@vavrusa
Copy link
Contributor

vavrusa commented Dec 14, 2016

Did anyone every make superset work with postgres without manually killing transactions?

@GlombSt
Copy link

GlombSt commented Jan 8, 2017

nope, had to kill the queries

@itajaja
Copy link
Contributor

itajaja commented Mar 14, 2017

if it can help, I am experiencing the same problem. starting from blank database, had to kill a couple of connections from time to time. using postgres 9.4.7

@nancykyo
Copy link

nancykyo commented Aug 18, 2017

anyone look into this?
This should be a bug of flask-appbuider, caused by no commit or autocommit.
Except the deadlock during migration, do you notice the "idle in transaction", which ends up with a slow query.

And the query is:

pg.[5432.slow_query.detail]:ip: 10.90.55.109, waiting: False, wait_event_type:NULL, 
wait_event:NULL, datname: superset, usename: xxxx, pid: 19191,appname: , backend_conn_time:2:15:57.151609, state: idle in transaction, run_time: 2:15:55.257645, query: 

[SELECT ab_permission_view.id AS ab_permission_view_id, ab_permission_view.permission_id AS ab_permission_view_permission_id, ab_permission_view.view_menu_id AS ab_permission_view_view_menu_id FROM ab_permission_view WHERE 58 = ab_permission_view.view_menu_id AND 11 = ab_permission_view.permission_id LIMIT 1] |  

@rhunwicks
Copy link
Contributor

tldr; adding sm.get_session.close() to the bottom of superset/__init__.py fixes it.

I have also experienced this bug. I think it happens because:

  • superset/__init__.py calls
  • ConnectorRegistry.register_sources(module_datasource_map) and that in turn does
  • __import__(module_name, fromlist=class_names) and that ends up calling
  • flask_appbuilder.base.add_view() which eventually calls
  • sm.add_permissions_menu() which calls
  • find_permission_view_menu() which does
  • return self.get_session.query(self.permissionview_model).filter_by(permission=permission, view_menu=view_menu).first()

The final call is executing a query against the database, and because of the Python dbapi transaction semantics we now have an open, idle transaction which blocks the CREATE TABLE. That happens because the table has a foreign key to ab_user and so creating the constraint on the database requires AccessExclusiveLock on ab_user, and the transaction can't get that if there is another open transaction somewhere that has selected it somehow.

It's not clear what the best place to fix it is.

  • you could file a bug against Flask-Appbuilder and propose that add_view should always leave the database without an open transaction - i.e. you call self.sm.get_session.close() at the bottom of add_view before return baseview
  • you could try and fix it in appbuilder.security.manager by closing the session before returning the found permission/view - but that will be invasive and you would need to apply it to all the find_ methods and it might break other things if we are calling find midway through a transaction to create a new menu or whatever
  • you could fix it in superset/__init__.py by calling sm.get_session.close at the bottom of the file - that seems safe as we can be sure we shouldn't have any transactions in flight at that point
  • you might also be able to make it work by getting the migration command to use the same session as the security manager

@xrmx
Copy link
Contributor

xrmx commented Oct 8, 2017

@rhunwicks thanks for the analysis! Can we add the session cleanup in a overriden add_permissions_menu in security/sqla/manager.py? Like the following untested:

diff --git a/flask_appbuilder/security/sqla/manager.py b/flask_appbuilder/security/sqla/manager.py
index c75ba548..138914e1 100644
--- a/flask_appbuilder/security/sqla/manager.py
+++ b/flask_appbuilder/security/sqla/manager.py
@@ -251,6 +251,10 @@ class SecurityManager(BaseSecurityManager):
                 log.error(c.LOGMSG_ERR_SEC_DEL_PERMISSION.format(str(e)))
                 self.get_session.rollback()
 
+    def add_permissions_menu(self, view_menu_name):
+        super(SecurityManager, self).add_permissions_menu(self, view_menu_name)
+        self.get_session.close()
+
     # ----------------------------------------------
     #       PRIMITIVES VIEW MENU
     #----------------------------------------------

@jquense
Copy link

jquense commented Jan 4, 2018

Is there any way we can get some attention or love on this issue? It essentially limits our ability to use superset in any serious way. I'd be happy to send a PR but it's not at all clear where to solve the issue

@xrmx
Copy link
Contributor

xrmx commented Jan 4, 2018

@jquense whatever fix the get merged in whatever project that fix installation for postgresql is good :)

@mistercrunch
Copy link
Member

Currently using MySQL so it's hard to reproduce or justify doing the work on my side.

Sounded like it's a FAB bug so it should be addressed there. I'm a committer on that project so I can review on that repo too.

@kakoni
Copy link
Contributor

kakoni commented Jun 27, 2018

Quick note: Personally encountered hanging with postgresql 9.2. However everything went smoothly with 9.6/10.1.

@sohelsarder
Copy link

Encountered hang with postgres 9.4, with postgres 9.6 it works perfectly.

@srs4bhavin
Copy link

hangs with mysql v 8.0.11 on osx 10.13

@yourlin
Copy link

yourlin commented Dec 3, 2018

hangs with mysql v 5.7.23 on osx 10.14

@bestao88
Copy link

confirmed hangs with mysql v8.0.
I tried to test 0.27 to 0.290RC7, if changed the metadata database to mysql 8.0, always hang with below:
INFO: [...] Running Upgrade -> 4e6a06bad7a8, Init

@jdbertron
Copy link

tldr; adding sm.get_session.close() to the bottom of superset/__init__.py fixes it.

I have also experienced this bug. I think it happens because:

  • superset/__init__.py calls
  • ConnectorRegistry.register_sources(module_datasource_map) and that in turn does
  • __import__(module_name, fromlist=class_names) and that ends up calling
  • flask_appbuilder.base.add_view() which eventually calls
  • sm.add_permissions_menu() which calls
  • find_permission_view_menu() which does
  • return self.get_session.query(self.permissionview_model).filter_by(permission=permission, view_menu=view_menu).first()

The final call is executing a query against the database, and because of the Python dbapi transaction semantics we now have an open, idle transaction which blocks the CREATE TABLE. That happens because the table has a foreign key to ab_user and so creating the constraint on the database requires AccessExclusiveLock on ab_user, and the transaction can't get that if there is another open transaction somewhere that has selected it somehow.

It's not clear what the best place to fix it is.

  • you could file a bug against Flask-Appbuilder and propose that add_view should always leave the database without an open transaction - i.e. you call self.sm.get_session.close() at the bottom of add_view before return baseview
  • you could try and fix it in appbuilder.security.manager by closing the session before returning the found permission/view - but that will be invasive and you would need to apply it to all the find_ methods and it might break other things if we are calling find midway through a transaction to create a new menu or whatever
  • you could fix it in superset/__init__.py by calling sm.get_session.close at the bottom of the file - that seems safe as we can be sure we shouldn't have any transactions in flight at that point
  • you might also be able to make it work by getting the migration command to use the same session as the security manager

Great investigation. These are essentially my findings also. Ideally, reusing the connection would be nice but difficult to implement. I implemented an alternative #7417 to using sm.get_session.close() bu leveraging an app context that closes when going out of scope.

zhaoyongjie pushed a commit to zhaoyongjie/incubator-superset that referenced this issue Nov 17, 2021
zhaoyongjie pushed a commit to zhaoyongjie/incubator-superset that referenced this issue Nov 24, 2021
zhaoyongjie pushed a commit to zhaoyongjie/incubator-superset that referenced this issue Nov 25, 2021
zhaoyongjie pushed a commit to zhaoyongjie/incubator-superset that referenced this issue Nov 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests