Skip to content

[Bug] Extremely slow query to determine relations that uses all disk space (160GB) #657

@michalc

Description

@michalc

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running dbt --log-level debug run (after following https://docs.getdbt.com/guides/manual-install?step=1, but for PostgreSQL), I see it gets stuck on this query:

with relation as (
    select
        pg_rewrite.ev_class as class,
        pg_rewrite.oid as id
    from pg_rewrite
),
class as (
    select
        oid as id,
        relname as name,
        relnamespace as schema,
        relkind as kind
    from pg_class
),
dependency as (
    select distinct
        pg_depend.objid as id,
        pg_depend.refobjid as ref
    from pg_depend
),
schema as (
    select
        pg_namespace.oid as id,
        pg_namespace.nspname as name
    from pg_namespace
    where nspname != 'information_schema' and nspname not like 'pg\_%'
),
referenced as (
    select
        relation.id AS id,
        referenced_class.name ,
        referenced_class.schema ,
        referenced_class.kind
    from relation
    join class as referenced_class on relation.class=referenced_class.id
    where referenced_class.kind in ('r', 'v', 'm')
),
relationships as (
    select
        referenced.name as referenced_name,
        referenced.schema as referenced_schema_id,
        dependent_class.name as dependent_name,
        dependent_class.schema as dependent_schema_id,
        referenced.kind as kind
    from referenced
    join dependency on referenced.id=dependency.id
    join class as dependent_class on dependency.ref=dependent_class.id
    where
        (referenced.name != dependent_class.name or
         referenced.schema != dependent_class.schema)
)

select
    referenced_schema.name as referenced_schema,
    relationships.referenced_name as referenced_name,
    dependent_schema.name as dependent_schema,
    relationships.dependent_name as dependent_name
from relationships
join schema as dependent_schema on relationships.dependent_schema_id=dependent_schema.id
join schema as referenced_schema on relationships.referenced_schema_id=referenced_schema.id
group by referenced_schema, referenced_name, dependent_schema, dependent_name
order by referenced_schema, referenced_name, dependent_schema, dependent_name;

After ~50 minutes, it then errors:

08:30:45  Finished running  in 0 hours 53 minutes and 15.01 seconds (3195.01s).
08:30:45  Encountered an error:
Database Error
  could not write to file "base/pgsql_tmp/pgsql_tmp16987.290": No space left on device

And I see in our database metrics that all the local disk space on the instance was used (I have now done this multiple times) - in AWS terms the FreeLocalStorage drops to zero.

Image

We do have I guess quite a... full... database - terabytes of data, thousands of tables, some of them with a few thousand partitions, a few thousand roles, and I suspect millions of rows in pg_auth_members if that makes a difference.

Running explain on the query results in this:

"QUERY PLAN"
"Sort  (cost=5802750502280924.00..5802750506280924.00 rows=1600000000 width=256)"
"  Sort Key: referenced_schema.name, referenced_class.name, dependent_schema.name, dependent_class.name"
"  CTE class"
"    ->  Seq Scan on pg_class  (cost=0.00..381562.96 rows=7231296 width=73)"
"  CTE schema"
"    ->  Seq Scan on pg_namespace  (cost=0.00..139.52 rows=2543 width=68)"
"          Filter: ((nspname <> 'information_schema'::name) AND (nspname !~~ 'pg\_%'::text))"
"  ->  HashAggregate  (cost=4719087471544695.00..5802749300264574.00 rows=1600000000 width=256)"
"        Group Key: referenced_schema.name, referenced_class.name, dependent_schema.name, dependent_class.name"
"        Planned Partitions: 128"
"        ->  Hash Join  (cost=2809231.22..14550321923462.91 rows=5284141410595979 width=256)"
"              Hash Cond: ((pg_rewrite.oid = pg_depend.objid) AND (dependent_class.id = pg_depend.refobjid))"
"              ->  Nested Loop  (cost=1561869.91..4785569549797.32 rows=129227083182591 width=264)"
"                    Join Filter: ((referenced_class.name <> dependent_class.name) OR (referenced_class.schema <> dependent_class.schema))"
"                    ->  Nested Loop  (cost=0.15..7485498.32 rows=1405503 width=136)"
"                          Join Filter: (referenced_class.schema = referenced_schema.id)"
"                          ->  Nested Loop  (cost=0.15..1160707.66 rows=110539 width=72)"
"                                Join Filter: (pg_rewrite.ev_class = referenced_class.id)"
"                                ->  Index Scan using pg_rewrite_oid_index on pg_rewrite  (cost=0.15..69.55 rows=267 width=8)"
"                                ->  CTE Scan on class referenced_class  (cost=0.00..171743.28 rows=108469 width=72)"
"                                      Filter: (kind = ANY ('{r,v,m}'::""char""[]))"
"                          ->  CTE Scan on schema referenced_schema  (cost=0.00..50.86 rows=2543 width=68)"
"                    ->  Materialize  (cost=1561869.76..5214698.30 rows=91945929 width=136)"
"                          ->  Merge Join  (cost=1561869.76..2959149.65 rows=91945929 width=136)"
"                                Merge Cond: (dependent_schema.id = dependent_class.schema)"
"                                ->  Sort  (cost=194.70..201.05 rows=2543 width=68)"
"                                      Sort Key: dependent_schema.id"
"                                      ->  CTE Scan on schema dependent_schema  (cost=0.00..50.86 rows=2543 width=68)"
"                                ->  Materialize  (cost=1561675.07..1597831.55 rows=7231296 width=72)"
"                                      ->  Sort  (cost=1561675.07..1579753.31 rows=7231296 width=72)"
"                                            Sort Key: dependent_class.schema"
"                                            ->  CTE Scan on class dependent_class  (cost=0.00..144625.92 rows=7231296 width=72)"
"              ->  Hash  (cost=1206078.13..1206078.13 rows=2183545 width=8)"
"                    ->  HashAggregate  (cost=1051564.59..1184242.68 rows=2183545 width=8)"
"                          Group Key: pg_depend.objid, pg_depend.refobjid"
"                          Planned Partitions: 64"
"                          ->  Seq Scan on pg_depend  (cost=0.00..253497.58 rows=14187858 width=8)"

Expected Behavior

No error and dbt run completes

Steps To Reproduce

  1. Follow https://docs.getdbt.com/guides/manual-install?step=1, but instead of BigQuery run PostgreSQL
  2. dbt --log-level debug run

(I suspect this won't reproduce in most setups that have far less data in the database

Relevant log output

07:37:32  Using postgres connection "master"
07:37:32  On master: BEGIN
07:37:32  Opening a new connection, currently in state <snip>
07:37:32  SQL status: BEGIN in 0.063 seconds
07:37:32  Using postgres connection "master"
07:37:32  On master: /* {"app": "dbt", "dbt_version": "1.9.1", "profile_name": "<snip>", "target_name": "dev", "connection_name": "master"} */
with relation as (
        select
            pg_rewrite.ev_class as class,
            pg_rewrite.oid as id
        from pg_rewrite
    ),
    class as (
        select
            oid as id,
            relname as name,
            relnamespace as schema,
            relkind as kind
        from pg_class
    ),
    dependency as (
        select distinct
            pg_depend.objid as id,
            pg_depend.refobjid as ref
        from pg_depend
    ),
    schema as (
        select
            pg_namespace.oid as id,
            pg_namespace.nspname as name
        from pg_namespace
        where nspname != 'information_schema' and nspname not like 'pg\_%'
    ),
    referenced as (
        select
            relation.id AS id,
            referenced_class.name ,
            referenced_class.schema ,
            referenced_class.kind
        from relation
        join class as referenced_class on relation.class=referenced_class.id
        where referenced_class.kind in ('r', 'v', 'm')
    ),
    relationships as (
        select
            referenced.name as referenced_name,
            referenced.schema as referenced_schema_id,
            dependent_class.name as dependent_name,
            dependent_class.schema as dependent_schema_id,
            referenced.kind as kind
        from referenced
        join dependency on referenced.id=dependency.id
        join class as dependent_class on dependency.ref=dependent_class.id
        where
            (referenced.name != dependent_class.name or
             referenced.schema != dependent_class.schema)
    )

    select
        referenced_schema.name as referenced_schema,
        relationships.referenced_name as referenced_name,
        dependent_schema.name as dependent_schema,
        relationships.dependent_name as dependent_name
    from relationships
    join schema as dependent_schema on relationships.dependent_schema_id=dependent_schema.id
    join schema as referenced_schema on relationships.referenced_schema_id=referenced_schema.id
    group by referenced_schema, referenced_name, dependent_schema, dependent_name
    order by referenced_schema, referenced_name, dependent_schema, dependent_name;
^[[O^[[2;2R^[[3;1R^[]10;rgb:3333/3333/3333^[\^[]11;rgb:ffff/ffff/ffff^[\^[[O^[[2;2R^[[3;1R^[]10;rgb:3333/3333/3333^[\^[]11;rgb:ffff/ffff/ffff^[\^[[O^[[2;2R^[[3;1R^[]10;rgb:3333/3333/3333^[\^[]11;rgb:ffff/ffff/ffff^[\08:30:45  Postgres adapter: Postgres error: could not write to file "base/pgsql_tmp/pgsql_tmp16987.290": No space left on device

08:30:45  On master: ROLLBACK
08:30:45  Postgres adapter: Error running SQL: macro postgres__get_relations
08:30:45  Postgres adapter: Rolling back transaction.
08:30:45  On master: Close
08:30:45  Connection 'master' was properly closed.
08:30:45  Connection 'list_public_datasets_1__user_9c57e86a' was properly closed.
08:30:45  
08:30:45  Finished running  in 0 hours 53 minutes and 15.01 seconds (3195.01s).
08:30:45  Encountered an error:
Database Error
  could not write to file "base/pgsql_tmp/pgsql_tmp16987.290": No space left on device
08:30:45  Resource report: {"command_name": "run", "command_success": false, "command_wall_clock_time": 3196.026, "process_in_blocks": "23912", "process_kernel_time": 0.181269, "process_mem_max_rss": "109924", "process_out_blocks": "896", "process_user_time": 2.499159}
08:30:45  Command `dbt run` failed at 08:30:45.330191 after 3196.03 seconds
08:30:45  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f8614c4e0d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f861127d6a0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f861127d700>]}
08:30:45  Flushing usage events
08:30:45  An error was encountered while trying to flush usage events
(base) ~/<snip>$ 2RR10;rgb:3333/3333/333311;rgb:ffff/ffff/ffff2RR10;rgb:3333/3333/333311;rgb:ffffdbt --log-level debug run2RR10;rgb:3333/3333/333311;rgb:ff

Environment

- OS: Debian Bullseye
- Python: 3.9.19
- dbt-postgres: 1.9.1

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions