propseal for modification to drop_test_schema #5198

McKnight-42 · 2022-04-29T20:09:06Z

Description

Test files where getting locked due to transactions leaving them in database after test ended leading to error:

psycopg2.errors.InternalError_: 1040
DETAIL:  Maximum tables limit exceeded. The maximum number of tables per cluster is 9900 for this instance type. The limit includes permanent and temporary tables. (pid:25694)

due to database reaching limit on tables.

Checklist

I have read the contributing guide and understand what's expected of me
I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
I have run changie new to create a changelog entry

github-actions · 2022-04-29T20:09:28Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

McKnight-42 · 2022-04-29T20:11:29Z

@gshank, @jtcohen6 was wondering if this change seems reasonable or if it posses any known problems?

gshank · 2022-04-29T20:14:54Z

The reason I used adapter.drop_schema was in theory to use more adapter specific macros/methods. Do you know why this works and adapter.drop_schema doesn't?

McKnight-42 · 2022-04-29T20:20:49Z

for redshift when a table has a current transaction open on it it locks the tables and prevents the drop_schema macro from completing succesfully due to interaction with drop schema if exists {{ relation.without_identifier() }} cascade
https://github.com/dbt-labs/dbt-redshift/blob/main/dbt/adapters/redshift/impl.py#L36 talks about this in the docstring a bit
https://docs.aws.amazon.com/redshift/latest/dg/r_DROP_TABLE.html
https://docs.aws.amazon.com/redshift/latest/dg/r_STV_LOCKS.html

I believe this is just a redshift issue, and am a little worried us adjusting to fix it might affect other adapters but not entirely sure.

jtcohen6 · 2022-05-02T09:09:07Z

core/dbt/tests/fixtures/project.py

+        for schema_name in self.created_schemas:
+            relation = self.adapter.Relation.create(database=self.database, schema=schema_name)
+            schema = self.adapter.quote_as_configured(relation.schema, "schema")
+            self.run_sql(f"drop schema if exists {schema} cascade")


I'd definitely prefer us not going back to hard-coded run_sql here. This will definitely cause issues on other adapters.

Do we know why we're running drop schema at a point in time when there are still open transactions against objects in the schema? In theory, every test cases uses its own dedicated schema, so by the time we're running drop_test_schema, we should be totally done with it

@jtcohen6 from what I can tell it seems like a lock gets put on things like test_basic and incremental_unique_id because they each reference the same tables between many of their tests e.g snapshot_expected, and duplicate_insert and wonder if test hasn't had enough time to tear down ref to that table before next test kicks off especially if in same class so it locks as to not have two people updating same. table. as mentioned in https://docs.aws.amazon.com/redshift/latest/dg/r_STV_LOCKS.html. we were trying to deduce if either the get_connection or even the drop_schema calls we make as part of drop_test_schema were starting a transaction right when we were starting teardown. @kwigley does that sound correct.

the same tables

These should be in different schemas, though—a dedicated schema for each test case, for all the resources required for that test—right?

Yes, every tests should get a randomly generated unique schema. So in theory different tests should not be locking each other's tables. And drop_test_schema should only be executed at the end of the test.

gshank · 2022-05-02T18:16:53Z

It's been a while since I had to interact with Redshift, but I do vaguely remember that dropping tables in Redshift didn't necessary give back everything... you had to recover the space, etc, by .... vacuuming? It's been a while. I'm wondering if this is a Redshift management thing anyway.

gshank · 2022-05-02T18:22:15Z

Also the "drop_schema" default implementation (which I believe is used by Redshift) is: drop schema if exists {{ relation.without_identifier() }} cascade. So that code should pretty much be doing the same thing as what's happening here.

McKnight-42 · 2022-05-02T18:24:41Z

today i've continued playing around with what tests do leave files after running, non of the older integration tests have today, its only the 11 files generated by the new functional tests, so could this be an issue with how the fixtures work by chance? the initiating models/seeds staying open technically being an active transaction against the tests? because those files wouldn't go away till after the schemas are dropped at end of test correct? which would prevent the default drop_schema functionality from working?

gshank · 2022-05-02T18:31:27Z

One thing that is different is that the older test framework used the initial adapter that was created by test/integration/base.py, which meant that we had to patch providers. In order to avoid that patching, the new tests now use the latest adapter. I wouldn't think that would make a difference. It's pretty much the same thing that you get when you run separate CLI dbt commands (a "new" adapter object every time).

gshank · 2022-05-02T18:32:43Z

By "leave files after running", you actually mean "leave schemas after running"?

gshank · 2022-05-02T18:51:38Z

Also, what query are you running to see what tables are left over, and what are the schema/table names?

McKnight-42 · 2022-05-02T18:56:44Z

Also, what query are you running to see what tables are left over, and what are the schema/table names?

General process to look this over has been to move breakpoints throughtut the process even tried to debug in the macro which is interesting.

then in postico or whatever you need to sign in as super user to see lock tables and other relevant information
various commands run:

to see if you can drop the table
--drop schema if exists "<table_name>" cascade;
to look at current locked tables
select table_id, last_update, lock_owner, lock_owner_pid from stv_locks;
see open locked. tables
select "table" from svv_table_info where table_id = <Insert a table_id from above command>;
to get more information about table like what type of lock is on table.
select * from svv_transactions;

as for to run the tests
python -m pytest tests/functional/adapter/test_basic.py -s and comparing generated tables to tables that remain at various points during tests at each breakpoint stop and after test completion.

kwigley · 2022-05-02T19:40:54Z

Something wonky is going here when calling self.adapter.drop_schema(relation) for redshift specifically that is difficult to debug. It involves database transactions and python threading. I have a feeling it is related to this bit of code https://github.com/dbt-labs/dbt-redshift/blob/6543b82bf5943e2bc272880984380205437d49e6/dbt/adapters/redshift/impl.py#L36-L52.
This snippet is essentially the equivalent of

-- I'm not sure what happens before this...
commit;
begin;
drop schema if exists "test16515170591869289928_test_basic" cascade;
commit;
begin;

which works perfectly fine, so I'm not 100% why the schema is not dropping. While this PR isn't a solution, it works properly for dropping schemas created by tests in dbt-redshift. I'm still working to understand why though..

gshank · 2022-05-02T19:51:01Z

So it sounds like the problem is with the drop_schema/drop_relation redshift code, not the test_drop_schema per se.

gshank · 2022-05-02T20:09:53Z

The base implementation of "drop_relation" in core/dbt/adapters/sql/impl.py does "self.cache_dropped(relation)", but the redshift version calls the super version... so the only thing different is the fresh_transaction bit?

McKnight-42 · 2022-05-02T20:15:44Z

The base implementation of "drop_relation" in core/dbt/adapters/sql/impl.py does "self.cache_dropped(relation)", but the redshift version calls the super version... so the only thing different is the fresh_transaction bit?

Yeah which applies the exclusivelock and starts a new transaction. but is supposed to drop it to allow us to use drop_relation from core

gshank · 2022-05-02T20:55:46Z

It looks like adapter.drop_schema doesn't work in postgres either.

gshank · 2022-05-02T21:23:54Z

I looked at the logs and the 'drop_schema' call was getting rolled back every time. The 'drop_schema' method in core/dbt/adapters/sql/impl.py did not have a commit. I added "self.commit_if_has_connection()" after "self.execute_macro(DROP_SCHEMA_MACRO_NAME, kwargs=kwargs)" and it worked.

McKnight-42 · 2022-05-02T22:01:07Z

Reverted back to previous drop_test_schema method and added suggested code above.

…op_test_schema, add commit to drop_schema

gshank

I think we need a test of this functionality. It's supposed to be tested in tests/adapter/dbt/tests/adapter/basic/test_adapter_methods.py, but it's pretty clear that wasn't actually testing that a schema was dropped. In addition, when looking at that test, I see that they specifically decided in #1983 to not do the commit. But that really wasn't working. @jtcohen6 Could you weigh in here?

jtcohen6 · 2022-05-03T17:58:46Z

If adding commit_if_has_connection() fixes this, I'm very happy.

@gshank Thanks for finding and linking the context from #1983. I have to disagree with Drew from 2019 on these points, though:

these sprinkled-in transactional statements make it super hard to reason about what dbt is doing in different parts of the codebase.
it's the destructive nature of this method that gave me pause

If the commit is necessary, we should have it. It should be sprinkled through the logs wherever it's needed. As far as the "destructive" nature of drop schema: dbt itself doesn't actually call drop_schema anywhere, in the course of its standard operations. It gets called in two highly specific places:

While running functional tests on dbt-core (this issue)
By dbt Cloud, after a PR is merged, to drop the "PR schema" used by the CI check built from that PR. I just checked into this part of the dbt Cloud codebase, and guess what's there?

            adapter.drop_schema(database, schema)
            adapter.commit_if_has_connection()

Of course, users may also call it for their own purposes. If they do, I think they'd expect it to work :)

gshank · 2022-05-03T18:08:54Z

It's up to you. In theory it should go in that test case that I mentioned, but that's an adapter zone test... I'm okay without one in this case.

* propseal for modification to drop_test_schema * changelog * remove hard coded run_dbt version and put back previous version of drop_test_schema, add commit to drop_schema

so there's no need for it here anymore. see this PR: dbt-labs/dbt-core#5198

* reorganize Dockerfile and use docker in github CI * in dbt-core 1.2.x, commit is now being done in SQLAdapter.drop_schema(), so there's no need for it here anymore. see this PR: dbt-labs/dbt-core#5198 * rm file on disk on DROP SCHEMA * load text.so sqlean extension to get split_parts fn * add math.so for functions needed by datediff

propseal for modification to drop_test_schema

482a1a5

cla-bot bot added the cla:yes label Apr 29, 2022

McKnight-42 requested a review from gshank April 29, 2022 20:10

McKnight-42 self-assigned this Apr 29, 2022

McKnight-42 closed this Apr 29, 2022

McKnight-42 reopened this Apr 29, 2022

changelog

49cb9e4

McKnight-42 requested a review from VersusFacit April 30, 2022 05:53

jtcohen6 reviewed May 2, 2022

View reviewed changes

McKnight-42 closed this May 2, 2022

McKnight-42 reopened this May 2, 2022

remove hard coded run_dbt version and put back previous version of dr…

4c69880

…op_test_schema, add commit to drop_schema

McKnight-42 requested a review from jtcohen6 May 3, 2022 15:26

McKnight-42 marked this pull request as ready for review May 3, 2022 16:23

McKnight-42 requested review from a team as code owners May 3, 2022 16:23

gshank requested changes May 3, 2022

View reviewed changes

gshank requested review from jtcohen6 and removed request for jtcohen6 May 3, 2022 16:32

gshank approved these changes May 3, 2022

View reviewed changes

McKnight-42 merged commit c270a77 into main May 3, 2022

McKnight-42 deleted the mcknight/modify_drop_test_schema branch May 3, 2022 18:10

leahwicz mentioned this pull request May 9, 2022

[CT-588] [Bug]drop_test_schema changes #5200

Closed

1 task

codeforkjeff added a commit to codeforkjeff/dbt-sqlite that referenced this pull request Jul 16, 2022

in dbt-core 1.2.x, commit is now being done in SQLAdapter.drop_schema()

709f4fa

so there's no need for it here anymore. see this PR: dbt-labs/dbt-core#5198

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

propseal for modification to drop_test_schema #5198

propseal for modification to drop_test_schema #5198

McKnight-42 commented Apr 29, 2022 •

edited

Loading

github-actions bot commented Apr 29, 2022

McKnight-42 commented Apr 29, 2022

gshank commented Apr 29, 2022

McKnight-42 commented Apr 29, 2022 •

edited

Loading

jtcohen6 May 2, 2022

McKnight-42 May 2, 2022 •

edited

Loading

jtcohen6 May 2, 2022

gshank May 2, 2022

gshank commented May 2, 2022

gshank commented May 2, 2022

McKnight-42 commented May 2, 2022 •

edited

Loading

gshank commented May 2, 2022

gshank commented May 2, 2022

gshank commented May 2, 2022

McKnight-42 commented May 2, 2022 •

edited by gshank

Loading

kwigley commented May 2, 2022 •

edited

Loading

gshank commented May 2, 2022

gshank commented May 2, 2022 •

edited

Loading

McKnight-42 commented May 2, 2022

gshank commented May 2, 2022

gshank commented May 2, 2022

McKnight-42 commented May 2, 2022 •

edited

Loading

gshank left a comment

jtcohen6 commented May 3, 2022 •

edited

Loading

gshank commented May 3, 2022

propseal for modification to drop_test_schema #5198

propseal for modification to drop_test_schema #5198

Conversation

McKnight-42 commented Apr 29, 2022 • edited Loading

Description

Checklist

github-actions bot commented Apr 29, 2022

McKnight-42 commented Apr 29, 2022

gshank commented Apr 29, 2022

McKnight-42 commented Apr 29, 2022 • edited Loading

jtcohen6 May 2, 2022

Choose a reason for hiding this comment

McKnight-42 May 2, 2022 • edited Loading

Choose a reason for hiding this comment

jtcohen6 May 2, 2022

Choose a reason for hiding this comment

gshank May 2, 2022

Choose a reason for hiding this comment

gshank commented May 2, 2022

gshank commented May 2, 2022

McKnight-42 commented May 2, 2022 • edited Loading

gshank commented May 2, 2022

gshank commented May 2, 2022

gshank commented May 2, 2022

McKnight-42 commented May 2, 2022 • edited by gshank Loading

kwigley commented May 2, 2022 • edited Loading

gshank commented May 2, 2022

gshank commented May 2, 2022 • edited Loading

McKnight-42 commented May 2, 2022

gshank commented May 2, 2022

gshank commented May 2, 2022

McKnight-42 commented May 2, 2022 • edited Loading

gshank left a comment

Choose a reason for hiding this comment

jtcohen6 commented May 3, 2022 • edited Loading

gshank commented May 3, 2022

McKnight-42 commented Apr 29, 2022 •

edited

Loading

McKnight-42 commented Apr 29, 2022 •

edited

Loading

McKnight-42 May 2, 2022 •

edited

Loading

McKnight-42 commented May 2, 2022 •

edited

Loading

McKnight-42 commented May 2, 2022 •

edited by gshank

Loading

kwigley commented May 2, 2022 •

edited

Loading

gshank commented May 2, 2022 •

edited

Loading

McKnight-42 commented May 2, 2022 •

edited

Loading

jtcohen6 commented May 3, 2022 •

edited

Loading