New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
propseal for modification to drop_test_schema #5198
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
The reason I used adapter.drop_schema was in theory to use more adapter specific macros/methods. Do you know why this works and adapter.drop_schema doesn't? |
for redshift when a table has a current transaction open on it it locks the tables and prevents the drop_schema macro from completing succesfully due to interaction with I believe this is just a redshift issue, and am a little worried us adjusting to fix it might affect other adapters but not entirely sure. |
core/dbt/tests/fixtures/project.py
Outdated
for schema_name in self.created_schemas: | ||
relation = self.adapter.Relation.create(database=self.database, schema=schema_name) | ||
schema = self.adapter.quote_as_configured(relation.schema, "schema") | ||
self.run_sql(f"drop schema if exists {schema} cascade") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely prefer us not going back to hard-coded run_sql
here. This will definitely cause issues on other adapters.
Do we know why we're running drop schema
at a point in time when there are still open transactions against objects in the schema? In theory, every test cases uses its own dedicated schema, so by the time we're running drop_test_schema
, we should be totally done with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 from what I can tell it seems like a lock gets put on things like test_basic
and incremental_unique_id
because they each reference the same tables between many of their tests e.g snapshot_expected
, and duplicate_insert
and wonder if test hasn't had enough time to tear down ref to that table before next test kicks off especially if in same class so it locks as to not have two people updating same. table. as mentioned in https://docs.aws.amazon.com/redshift/latest/dg/r_STV_LOCKS.html. we were trying to deduce if either the get_connection
or even the drop_schema
calls we make as part of drop_test_schema
were starting a transaction right when we were starting teardown. @kwigley does that sound correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same tables
These should be in different schemas, though—a dedicated schema for each test case, for all the resources required for that test—right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, every tests should get a randomly generated unique schema. So in theory different tests should not be locking each other's tables. And drop_test_schema should only be executed at the end of the test.
It's been a while since I had to interact with Redshift, but I do vaguely remember that dropping tables in Redshift didn't necessary give back everything... you had to recover the space, etc, by .... vacuuming? It's been a while. I'm wondering if this is a Redshift management thing anyway. |
Also the "drop_schema" default implementation (which I believe is used by Redshift) is: drop schema if exists {{ relation.without_identifier() }} cascade. So that code should pretty much be doing the same thing as what's happening here. |
today i've continued playing around with what tests do leave files after running, non of the older integration tests have today, its only the 11 files generated by the new functional tests, so could this be an issue with how the fixtures work by chance? the initiating models/seeds staying open technically being an active transaction against the tests? because those files wouldn't go away till after the schemas are dropped at end of test correct? which would prevent the default |
One thing that is different is that the older test framework used the initial adapter that was created by test/integration/base.py, which meant that we had to patch providers. In order to avoid that patching, the new tests now use the latest adapter. I wouldn't think that would make a difference. It's pretty much the same thing that you get when you run separate CLI dbt commands (a "new" adapter object every time). |
By "leave files after running", you actually mean "leave schemas after running"? |
Also, what query are you running to see what tables are left over, and what are the schema/table names? |
General process to look this over has been to move breakpoints throughtut the process even tried to debug in the macro which is interesting. then in postico or whatever you need to sign in as super user to see lock tables and other relevant information to see if you can drop the table as for to run the tests |
Something wonky is going here when calling -- I'm not sure what happens before this...
commit;
begin;
drop schema if exists "test16515170591869289928_test_basic" cascade;
commit;
begin; which works perfectly fine, so I'm not 100% why the schema is not dropping. While this PR isn't a solution, it works properly for dropping schemas created by tests in |
So it sounds like the problem is with the drop_schema/drop_relation redshift code, not the test_drop_schema per se. |
The base implementation of "drop_relation" in core/dbt/adapters/sql/impl.py does "self.cache_dropped(relation)", but the redshift version calls the super version... so the only thing different is the fresh_transaction bit? |
Yeah which applies the exclusivelock and starts a new transaction. but is supposed to drop it to allow us to use |
It looks like adapter.drop_schema doesn't work in postgres either. |
I looked at the logs and the 'drop_schema' call was getting rolled back every time. The 'drop_schema' method in core/dbt/adapters/sql/impl.py did not have a commit. I added "self.commit_if_has_connection()" after "self.execute_macro(DROP_SCHEMA_MACRO_NAME, kwargs=kwargs)" and it worked. |
Reverted back to previous |
…op_test_schema, add commit to drop_schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a test of this functionality. It's supposed to be tested in tests/adapter/dbt/tests/adapter/basic/test_adapter_methods.py, but it's pretty clear that wasn't actually testing that a schema was dropped. In addition, when looking at that test, I see that they specifically decided in #1983 to not do the commit. But that really wasn't working. @jtcohen6 Could you weigh in here?
If adding @gshank Thanks for finding and linking the context from #1983. I have to disagree with Drew from 2019 on these points, though:
If the
adapter.drop_schema(database, schema)
adapter.commit_if_has_connection() Of course, users may also call it for their own purposes. If they do, I think they'd expect it to work :) |
It's up to you. In theory it should go in that test case that I mentioned, but that's an adapter zone test... I'm okay without one in this case. |
* propseal for modification to drop_test_schema * changelog * remove hard coded run_dbt version and put back previous version of drop_test_schema, add commit to drop_schema
so there's no need for it here anymore. see this PR: dbt-labs/dbt-core#5198
* reorganize Dockerfile and use docker in github CI * in dbt-core 1.2.x, commit is now being done in SQLAdapter.drop_schema(), so there's no need for it here anymore. see this PR: dbt-labs/dbt-core#5198 * rm file on disk on DROP SCHEMA * load text.so sqlean extension to get split_parts fn * add math.so for functions needed by datediff
resolves # dbt-labs/dbt-redshift#110
Description
Test files where getting locked due to transactions leaving them in database after test ended leading to error:
due to database reaching limit on tables.
Checklist
changie new
to create a changelog entry