Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2561] [Bug] dbt model referencing an ephemeral model results in broken compiled SQL code #7595

Closed
2 tasks done
MauroLuzzatto opened this issue May 11, 2023 · 5 comments
Closed
2 tasks done
Labels
bug Something isn't working duplicate This issue or pull request already exists

Comments

@MauroLuzzatto
Copy link

MauroLuzzatto commented May 11, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running a dbt model that references an ephemeral model, the resulting compiled SQL code created by dbt contains an error.

Specifically, the ephemeral CTE is followed by a “with” statement instead of a comma, which causes a “Syntax error”.

Example of the incorrect compiled dbt model code:

 create or replace table `dwh`.`analytics`.`dim_table`
 partition by TIMESTAMP_trunc(source_partition_ts, day)
 cluster by id
 OPTIONS(
   description="""description"""
 )
 as (
   with __dbt__cte__my_ephemeral_model as (
      SELECT
         col
      FROM `dwh`.`analytics`.`activity`
      GROUP BY col
)

-- Comment: Here, the error appears with the wrongfully inserted "WITH" statement
WITH my_cte_code AS (
…

)

This results in a Syntax error (which is expected):

Syntax error: Expected "(" or "," or keyword SELECT but got keyword WITH at [33:1]

This error appeared during an airflow run using a docker image using the python:3.8-slim-bullseye base. It also appeared for other models following the same structure of referencing an ephemeral model during GHA runs (Ubuntu: 22.04.2 LTS distribution). However, we could not reproduce the error in our local environment with a macOS Ventura Version 13.3.1 (22E261) OS.

The dbt packages had the same versions in all of the environments.

Expected Behavior

Generate correct SQL with a comma instead of a “with” statement

Example of the correct code:

create or replace table `dwh`.`analytics`.`dim_table`
partition by TIMESTAMP_trunc(source_partition_ts, day)
cluster by track_id
OPTIONS(
   description="""description"""
 )
   as (
     with __dbt__cte__my_ephemeral_model as (
        SELECT
           col
        FROM `dwh`.`analytics`.`activity`
        GROUP BY col
),

my_cte_code AS (
…
)

Steps To Reproduce

We were struggling to reproduce the error in all our environments. We could see it appearing for the same model codes in GHA running on Ubuntu: 22.04.2 LTS OS and on airflow using a docker image using the python:3.8-slim-bullseye base.

The reoccurring pattern of the errors that the models were referencing ephemeral models and that the compiled SQL code contained the same error with the erroneous “with” statement.

  1. Env

Docker:

  • python:3.8-slim-bullseye

GHA:

  • Ubuntu: 22.04.2 LTS

Requirements:

  • dbt-core==1.3.0
  • dbt-bigquery==1.3.0
  1. With this config

Ephemeral model code (my_ephemeral_model.sql):

{{ config(materialized = 'ephemeral') }}   

SELECT
       col
FROM `dwh`.`analytics`.`activity`
GROUP BY col

dim_model example model code:

{{
    config(
        materialized = 'table',
        partition_by = {
            "field": "source_partition_ts",
            "data_type": "TIMESTAMP",
            "granularity": "day"
        },
        cluster_by = ["id"]
    )
}}

-- my comment
WITH metadata AS (
    SELECT
        *
    FROM {{ ref(metadata) }}
),

{# my comment #}
stats AS (
    SELECT
        meta.id,
        source_partition_ts
        …

    FROM metadata as meta
    LEFT JOIN {{ ref(my_ephemeral_model) }} as eph
        ON meta.track_id = eph.track_id

)
)

SELECT
    *
FROM stats
  1. Run
    dbt run -s dim_table --full-refresh --defer --state manifest

  2. See error

Syntax error: Expected "(" or "," or keyword SELECT but got keyword WITH at [33:1]

Relevant log output

17:47:35    Syntax error: Expected "(" or "," or keyword SELECT but got keyword WITH at [111:1]

Environment

- OS: Ubuntu: 22.04.2 LTS
- Python: 3.8.10
- dbt-core==1.3.0
- dbt-bigquery==1.3.0

Which database adapter are you using with dbt?

bigquery

Additional Context

noresponse

@MauroLuzzatto MauroLuzzatto added bug Something isn't working triage labels May 11, 2023
@github-actions github-actions bot changed the title [Bug] dbt model referencing an ephemeral model results in broken compiled SQL code [CT-2561] [Bug] dbt model referencing an ephemeral model results in broken compiled SQL code May 11, 2023
@jtcohen6
Copy link
Contributor

@MauroLuzzatto I believe this is a duplicate of #7396 (and #7521). Could you check the installed version of sqlparse in your environment?

pip freeze | grep 'sqlparse'

I'm guessing it's sqlparse==0.4.4. If you try downgrading to v0.4.3 or earlier, this error should be resolved.

We were struggling to reproduce the error in all our environments. We could see it appearing for the same model codes in GHA running on Ubuntu: 22.04.2 LTS OS and on airflow using a docker image using the python:3.8-slim-bullseye base.

We've also struggled to reliably reproduce this error — it seems to vary based on operating system. If you manage to do any more digging, could you weigh into #7515 (issue to resolve the root cause going forward)?

@MauroLuzzatto
Copy link
Author

MauroLuzzatto commented May 12, 2023

@jtcohen6 Thank you very much for the quick replay!
You are correct. We were indeed running sqlparse==0.4.4 in the runs that failed. We will downgrade the version.
Yes, I will leave a comment in #7515, if we will find out anything about the error.

So far, I have only seen the errors on

  • python:3.8-slim-bullseye
  • Ubuntu: 22.04.2 LTS

And not in any local development environment:

  • macOS Ventura Version 13.3.1

However, that might also be connected to the fact that the dbt and other modules are installed there (CI and deployment) on every run.

I have also seen the error disappearing after retriggering the GHA run.

@dbeatty10
Copy link
Contributor

I have also seen the error disappearing after retriggering the GHA run.

All of this is really useful troubleshooting information for us about sqlparse==0.4.4 -- thank you @MauroLuzzatto!

I added this comment summarizing your findings so far: #7515 (comment)

Thanks again for reporting this issue. I'm going to close this in favor of #7515. Much appreciated if you are able to share in that issue anything else you find find out the error.

@gshank
Copy link
Contributor

gshank commented Jul 27, 2023

New fix in #8215

@dbeatty10
Copy link
Contributor

I meant to close this in favor of #7515 earlier, but I'm really going to do it this time!

Please reach out if this is still an issue and we can re-open or create a new issue.

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

4 participants