Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-346] [Bug] Snapshot never succeeds (but does all the work successfully) #4853

Closed
1 task done
MaxKrog opened this issue Mar 10, 2022 · 32 comments
Closed
1 task done
Labels
bug Something isn't working snapshots Issues related to dbt's snapshot functionality

Comments

@MaxKrog
Copy link

MaxKrog commented Mar 10, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When i run dbt snapshot my snapshot model is run, but the cli never moves on to the following task. As far as i can tell all the snapshot work is done successfully (the snapshot table is appended to etc).

This happens both locally and on dbt cloud. Exact same behaviour.

Expected Behavior

I expect the cli to either successfully take me back to an empty prompt, or continue with the next command.

Steps To Reproduce

  1. Setup a snapshot model like this:
% snapshot snap__transaction_records_state %}

{{
    config(
      description='Snapshot of transaction records state for accounting automation project. Started 2022-02-21.',
      target_database='transfer-galaxy',
      target_schema='dbt_snapshots',
      unique_key='tx_id',
      strategy='check',
      check_cols='all',
      partition_by={'field': 'tx_date_day', 'data_type': 'date'},
    )
}}

select
    tx_id,
    date(tx_time_cet) as tx_date_day,
    tx_state_enum,
    tx_state,
from
    {{ ref('src_portal__transaction_records') }}

{% endsnapshot %}
  1. Run "dbt snapshot'
  2. Get terminal output like:
11:12:50  Running with dbt=1.0.3
11:12:50  Found 205 models, 100 tests, 1 snapshot, 13 analyses, 392 macros, 0 operations, 5 seed files, 80 sources, 1 exposure, 0 metrics
11:12:50  
11:12:51  Concurrency: 16 threads (target='default')
11:12:51  
11:12:51  1 of 1 START snapshot dbt_snapshots.snap__transaction_records_state............. [RUN]

Wait an eternity (until a timeout of some sort, in dbt cloud i think it's 13 hours), until the job is cancelled.

Relevant log output

Log output from dbt cloud for "dbt snapshot" here:

2022-03-10 11:12:50.561223 (MainThread): 11:12:50  Running with dbt=1.0.3
2022-03-10 11:12:50.561843 (MainThread): 11:12:50  running dbt with arguments Namespace(cls=<class 'dbt.task.snapshot.SnapshotTask'>, debug=None, defer=None, event_buffer_size=None, exclude=None, fail_fast=None, log_cache_events=False, log_format=None, partial_parse=None, printer_width=None, profile='user', profiles_dir='/tmp/jobs/47162168/.dbt', project_dir=None, record_timing_info=None, rpc_method='snapshot', select=None, selector_name=None, send_anonymous_usage_stats=None, single_threaded=False, state=None, static_parser=None, target='default', threads=None, use_colors=None, use_experimental_parser=None, vars='{}', version_check=None, warn_error=None, which='snapshot', write_json=None)
2022-03-10 11:12:50.562060 (MainThread): 11:12:50  Tracking: tracking
2022-03-10 11:12:50.571622 (MainThread): 11:12:50  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f49a8616a90>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f49a8616850>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f49a8616c70>]}
2022-03-10 11:12:50.701971 (MainThread): 11:12:50  Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
2022-03-10 11:12:50.702302 (MainThread): 11:12:50  Partial parsing enabled, no changes found, skipping parsing
2022-03-10 11:12:50.723178 (MainThread): 11:12:50  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '35add917-e38d-4443-a411-d0171182a210', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f4991fe4e50>]}
2022-03-10 11:12:50.951696 (MainThread): 11:12:50  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '35add917-e38d-4443-a411-d0171182a210', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f49a8719e50>]}
2022-03-10 11:12:50.952118 (MainThread): 11:12:50  Found 205 models, 100 tests, 1 snapshot, 13 analyses, 392 macros, 0 operations, 5 seed files, 80 sources, 1 exposure, 0 metrics
2022-03-10 11:12:50.964288 (MainThread): 11:12:50  
2022-03-10 11:12:50.964778 (MainThread): 11:12:50  Acquiring new bigquery connection "master"
2022-03-10 11:12:50.965712 (ThreadPoolExecutor-0_0): 11:12:50  Acquiring new bigquery connection "list_transfer-galaxy"
2022-03-10 11:12:50.966081 (ThreadPoolExecutor-0_0): 11:12:50  Opening a new connection, currently in state init
2022-03-10 11:12:51.221515 (ThreadPoolExecutor-1_0): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_risk_scoring"
2022-03-10 11:12:51.222081 (ThreadPoolExecutor-1_1): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_forecasting"
2022-03-10 11:12:51.222541 (ThreadPoolExecutor-1_0): 11:12:51  Opening a new connection, currently in state closed
2022-03-10 11:12:51.222941 (ThreadPoolExecutor-1_2): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_staging_zendesk"
2022-03-10 11:12:51.223551 (ThreadPoolExecutor-1_3): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_snapshots"
2022-03-10 11:12:51.223754 (ThreadPoolExecutor-1_1): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.224255 (ThreadPoolExecutor-1_4): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_staging_airtable"
2022-03-10 11:12:51.224911 (ThreadPoolExecutor-1_5): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_behavioural"
2022-03-10 11:12:51.225430 (ThreadPoolExecutor-1_6): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_seeds"
2022-03-10 11:12:51.225943 (ThreadPoolExecutor-1_2): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.226308 (ThreadPoolExecutor-1_7): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_referral"
2022-03-10 11:12:51.227004 (ThreadPoolExecutor-1_8): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_playground"
2022-03-10 11:12:51.227548 (ThreadPoolExecutor-1_9): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_staging"
2022-03-10 11:12:51.228091 (ThreadPoolExecutor-1_10): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_staging_accounting"
2022-03-10 11:12:51.228691 (ThreadPoolExecutor-1_11): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_accounting"
2022-03-10 11:12:51.229038 (ThreadPoolExecutor-1_3): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.229424 (ThreadPoolExecutor-1_12): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_outgoing"
2022-03-10 11:12:51.230062 (ThreadPoolExecutor-1_13): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_marts"
2022-03-10 11:12:51.230479 (ThreadPoolExecutor-1_4): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.230918 (ThreadPoolExecutor-1_14): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_events"
2022-03-10 11:12:51.231544 (ThreadPoolExecutor-1_15): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_intermediary"
2022-03-10 11:12:51.232549 (ThreadPoolExecutor-1_5): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.232995 (ThreadPoolExecutor-1_6): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.233441 (ThreadPoolExecutor-1_7): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.233739 (ThreadPoolExecutor-1_8): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.234178 (ThreadPoolExecutor-1_9): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.235224 (ThreadPoolExecutor-1_10): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.235596 (ThreadPoolExecutor-1_11): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.236978 (ThreadPoolExecutor-1_12): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.237314 (ThreadPoolExecutor-1_13): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.238763 (ThreadPoolExecutor-1_14): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.240000 (ThreadPoolExecutor-1_15): 11:12:51  Opening a new connection, currently in state init
2022-03-10 11:12:51.509916 (ThreadPoolExecutor-1_11): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_staging_soluno"
2022-03-10 11:12:51.510480 (ThreadPoolExecutor-1_11): 11:12:51  Opening a new connection, currently in state closed
2022-03-10 11:12:51.664519 (MainThread): 11:12:51  Concurrency: 16 threads (target='default')
2022-03-10 11:12:51.664816 (MainThread): 11:12:51  
2022-03-10 11:12:51.667539 (Thread-18): 11:12:51  Began running node snapshot.transfergalaxy.snap__transaction_records_state
2022-03-10 11:12:51.667922 (Thread-18): 11:12:51  1 of 1 START snapshot dbt_snapshots.snap__transaction_records_state............. [RUN]
2022-03-10 11:12:51.668325 (Thread-18): 11:12:51  Acquiring new bigquery connection "snapshot.transfergalaxy.snap__transaction_records_state"
2022-03-10 11:12:51.668513 (Thread-18): 11:12:51  Began compiling node snapshot.transfergalaxy.snap__transaction_records_state
2022-03-10 11:12:51.668694 (Thread-18): 11:12:51  Compiling snapshot.transfergalaxy.snap__transaction_records_state
2022-03-10 11:12:51.673363 (Thread-18): 11:12:51  finished collecting timing info
2022-03-10 11:12:51.673586 (Thread-18): 11:12:51  Began executing node snapshot.transfergalaxy.snap__transaction_records_state
2022-03-10 11:12:51.692717 (Thread-18): 11:12:51  Opening a new connection, currently in state closed
2022-03-10 11:12:51.923759 (Thread-18): 11:12:51  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */

    select CURRENT_TIMESTAMP() as snapshot_start
  
2022-03-10 11:12:52.774374 (Thread-18): 11:12:52  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */
select * from (
            



select
    tx_id,
    date(tx_time_cet) as tx_date_day,
    tx_state_enum,
    tx_state,
from
    `transfer-galaxy`.`dbt_staging`.`src_portal__transaction_records`

        ) as __dbt_sbq
        where false
        limit 0
    
2022-03-10 11:12:53.703297 (Thread-18): 11:12:53  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */
select * from (
            select * from `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state`
        ) as __dbt_sbq
        where false
        limit 0
    
2022-03-10 11:12:55.098683 (Thread-18): 11:12:55  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */

        

  create or replace table `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state__dbt_tmp`
  partition by tx_date_day
  
  OPTIONS(
      expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)
    )
  as (
    with snapshot_query as (

        



select
    tx_id,
    date(tx_time_cet) as tx_date_day,
    tx_state_enum,
    tx_state,
from
    `transfer-galaxy`.`dbt_staging`.`src_portal__transaction_records`


    ),

    snapshotted_data as (

        select *,
            tx_id as dbt_unique_key

        from `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state`
        where dbt_valid_to is null

    ),

    insertions_source_data as (

        select
            *,
            tx_id as dbt_unique_key,
            TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as dbt_updated_at,
            TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as dbt_valid_from,
            nullif(TIMESTAMP("2022-03-10 11:12:52.097775+00:00"), TIMESTAMP("2022-03-10 11:12:52.097775+00:00")) as dbt_valid_to,
            to_hex(md5(concat(coalesce(cast(tx_id as string), ''), '|',coalesce(cast(TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as string), '')))) as dbt_scd_id

        from snapshot_query
    ),

    updates_source_data as (

        select
            *,
            tx_id as dbt_unique_key,
            TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as dbt_updated_at,
            TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as dbt_valid_from,
            TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as dbt_valid_to

        from snapshot_query
    ),

    insertions as (

        select
            'insert' as dbt_change_type,
            source_data.*

        from insertions_source_data as source_data
        left outer join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
        where snapshotted_data.dbt_unique_key is null
           or (
                snapshotted_data.dbt_unique_key is not null
            and (
                (snapshotted_data.tx_id != source_data.tx_id
        or
        (
            ((snapshotted_data.tx_id is null) and not (source_data.tx_id is null))
            or
            ((not snapshotted_data.tx_id is null) and (source_data.tx_id is null))
        ) or snapshotted_data.tx_date_day != source_data.tx_date_day
        or
        (
            ((snapshotted_data.tx_date_day is null) and not (source_data.tx_date_day is null))
            or
            ((not snapshotted_data.tx_date_day is null) and (source_data.tx_date_day is null))
        ) or snapshotted_data.tx_state_enum != source_data.tx_state_enum
        or
        (
            ((snapshotted_data.tx_state_enum is null) and not (source_data.tx_state_enum is null))
            or
            ((not snapshotted_data.tx_state_enum is null) and (source_data.tx_state_enum is null))
        ) or snapshotted_data.tx_state != source_data.tx_state
        or
        (
            ((snapshotted_data.tx_state is null) and not (source_data.tx_state is null))
            or
            ((not snapshotted_data.tx_state is null) and (source_data.tx_state is null))
        ))
            )
        )

    ),

    updates as (

        select
            'update' as dbt_change_type,
            source_data.*,
            snapshotted_data.dbt_scd_id

        from updates_source_data as source_data
        join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
        where (
            (snapshotted_data.tx_id != source_data.tx_id
        or
        (
            ((snapshotted_data.tx_id is null) and not (source_data.tx_id is null))
            or
            ((not snapshotted_data.tx_id is null) and (source_data.tx_id is null))
        ) or snapshotted_data.tx_date_day != source_data.tx_date_day
        or
        (
            ((snapshotted_data.tx_date_day is null) and not (source_data.tx_date_day is null))
            or
            ((not snapshotted_data.tx_date_day is null) and (source_data.tx_date_day is null))
        ) or snapshotted_data.tx_state_enum != source_data.tx_state_enum
        or
        (
            ((snapshotted_data.tx_state_enum is null) and not (source_data.tx_state_enum is null))
            or
            ((not snapshotted_data.tx_state_enum is null) and (source_data.tx_state_enum is null))
        ) or snapshotted_data.tx_state != source_data.tx_state
        or
        (
            ((snapshotted_data.tx_state is null) and not (source_data.tx_state is null))
            or
            ((not snapshotted_data.tx_state is null) and (source_data.tx_state is null))
        ))
        )
    )

    select * from insertions
    union all
    select * from updates

  );
    
2022-03-10 11:13:00.608568 (Thread-18): 11:13:00  BigQuery adapter: Adding columns ([]) to table `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state`".
2022-03-10 11:13:01.249736 (Thread-18): 11:13:01  Writing runtime SQL for node "snapshot.transfergalaxy.snap__transaction_records_state"
2022-03-10 11:13:01.250481 (Thread-18): 11:13:01  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */

      merge into `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state` as DBT_INTERNAL_DEST
    using `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state__dbt_tmp` as DBT_INTERNAL_SOURCE
    on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id

    when matched
     and DBT_INTERNAL_DEST.dbt_valid_to is null
     and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
        then update
        set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to

    when not matched
     and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
        then insert (`tx_id`, `tx_date_day`, `tx_state_enum`, `tx_state`, `dbt_updated_at`, `dbt_valid_from`, `dbt_valid_to`, `dbt_scd_id`)
        values (`tx_id`, `tx_date_day`, `tx_state_enum`, `tx_state`, `dbt_updated_at`, `dbt_valid_from`, `dbt_valid_to`, `dbt_scd_id`)


  
2022-03-10 11:13:06.113359 (Thread-18): 11:13:06  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */
drop table if exists `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state__dbt_tmp`
2022-03-10 11:13:06.726671 (Thread-18): 11:13:06  finished collecting timing info
2022-03-10 11:13:06.727242 (Thread-18): 11:13:06  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '35add917-e38d-4443-a411-d0171182a210', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f499065b280>]}
2022-03-10 14:11:42.039645 (MainThread): 14:11:42  �[33mThe bigquery adapter does not support query cancellation. Some queries may still be running!�[0m
2022-03-10 14:11:42.040546 (MainThread): 14:11:42  
2022-03-10 14:11:42.040856 (MainThread): 14:11:42  �[33mExited because of keyboard interrupt.�[0m
2022-03-10 14:11:42.041085 (MainThread): 14:11:42  
2022-03-10 14:11:42.041310 (MainThread): 14:11:42  Done. PASS=0 WARN=0 ERROR=0 SKIP=0 TOTAL=0
2022-03-10 14:11:42.041545 (MainThread): 14:11:42  Connection 'master' was properly closed.
2022-03-10 14:11:42.041723 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_risk_scoring' was properly closed.
2022-03-10 14:11:42.041904 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_forecasting' was properly closed.
2022-03-10 14:11:42.042059 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_staging_zendesk' was properly closed.
2022-03-10 14:11:42.042208 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_snapshots' was properly closed.
2022-03-10 14:11:42.042409 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_staging_airtable' was properly closed.
2022-03-10 14:11:42.042577 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_behavioural' was properly closed.
2022-03-10 14:11:42.042729 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_seeds' was properly closed.
2022-03-10 14:11:42.042878 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_referral' was properly closed.
2022-03-10 14:11:42.043024 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_playground' was properly closed.
2022-03-10 14:11:42.043170 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_staging' was properly closed.
2022-03-10 14:11:42.043315 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_staging_accounting' was properly closed.
2022-03-10 14:11:42.043459 (MainThread): 14:11:42  Connection 'snapshot.transfergalaxy.snap__transaction_records_state' was properly closed.
2022-03-10 14:11:42.043603 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_outgoing' was properly closed.
2022-03-10 14:11:42.043746 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_marts' was properly closed.
2022-03-10 14:11:42.043890 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_events' was properly closed.
2022-03-10 14:11:42.044032 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_intermediary' was properly closed.
2022-03-10 14:11:42.044218 (MainThread): 14:11:42  Flushing usage events
2022-03-10 14:11:42.065240 (MainThread):

Environment

- OS: macOS Big Sur version 11.1
- Python: 3.7.9
- dbt: 1.0

What database are you using dbt with?

bigquery

Additional Context

Snapshot model code:

{% snapshot snap__transaction_records_state %}

{{
    config(
      description='Snapshot of transaction records state for accounting automation project. Started 2022-02-21.',
      target_database='transfer-galaxy',
      target_schema='dbt_snapshots',
      unique_key='tx_id',
      strategy='check',
      check_cols='all',
      partition_by={'field': 'tx_date_day', 'data_type': 'date'},
    )
}}

select
    tx_id,
    date(tx_time_cet) as tx_date_day,
    tx_state_enum,
    tx_state,
from
    {{ ref('src_portal__transaction_records') }}

{% endsnapshot %}
@MaxKrog MaxKrog added bug Something isn't working triage labels Mar 10, 2022
@github-actions github-actions bot changed the title [Bug] Snapshot never succeeds (but does all the work successfully) [CT-346] [Bug] Snapshot never succeeds (but does all the work successfully) Mar 10, 2022
@Lunin-Marketing
Copy link

I think I am having the same issue here.

@ChenyuLInx
Copy link
Contributor

Hey @MaxKrog and @actonmarketing sorry to hear this happening. Here are some questions that might be able to help us figure out what is going on here.

  1. Does this happen for other models other than snapshot?
  2. Can you guys try to look up the bigquery run history and see how does the last query for snapshot finished and if yes how long it took? we store involcation ID as label for query. 'label': '35add917-e38d-4443-a411-d0171182a210' Hope this could be helpful to find the query we are looking for.

      merge into `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state` as DBT_INTERNAL_DEST
    using `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state__dbt_tmp` as DBT_INTERNAL_SOURCE
    on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id

    when matched
     and DBT_INTERNAL_DEST.dbt_valid_to is null
     and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
        then update
        set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to

    when not matched
     and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
        then insert (`tx_id`, `tx_date_day`, `tx_state_enum`, `tx_state`, `dbt_updated_at`, `dbt_valid_from`, `dbt_valid_to`, `dbt_scd_id`)
        values (`tx_id`, `tx_date_day`, `tx_state_enum`, `tx_state`, `dbt_updated_at`, `dbt_valid_from`, `dbt_valid_to`, `dbt_scd_id`)

@ChenyuLInx ChenyuLInx removed the triage label Mar 15, 2022
@Lunin-Marketing
Copy link

@ChenyuLInx

This issue is only happening in snapshots for me, dbt run works just fine.

I have yet to have a snapshot run correctly.

Here are the details from a recently failed snapshot run:

9-bbab1e362607', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f01046257f0>]}
2022-03-17 01:21:05.629198 (Thread-13): 01:21:05  On snapshot.acton_marketing_data.sfdc_contact_snapshot: Close
2022-03-17 01:21:05.629719 (Thread-12): 01:21:05  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '0cd37f71-b6f3-4d45-b409-bbab1e362607', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f0104625280>]}
2022-03-17 01:21:05.630889 (Thread-11): 01:21:05  Database Error in snapshot sfdc_lead_snapshot (snapshots/sfdc/sfdc_lead_snapshot.sql)
  terminating connection due to administrator command
  SSL connection has been closed unexpectedly
  compiled SQL at target/run/acton_marketing_data/snapshots/sfdc/sfdc_lead_snapshot.sql
2022-03-17 01:21:05.631991 (Thread-13): 01:21:05  Database Error in snapshot sfdc_contact_snapshot (snapshots/sfdc/sfdc_contact_snapshot.sql)
  terminating connection due to administrator command
  SSL connection has been closed unexpectedly
  compiled SQL at target/run/acton_marketing_data/snapshots/sfdc/sfdc_contact_snapshot.sql
2022-03-17 01:21:05.631467 (Thread-10): 01:21:05  1 of 7 ERROR snapshotting snapshots.sfdc_account_snapshot....................... [ERROR in 86386.83s]
2022-03-17 01:21:05.632746 (Thread-12): 01:21:05  7 of 7 ERROR snapshotting snapshots.sfdc_opportunity_snapshot................... [ERROR in 86371.75s]
2022-03-17 01:21:05.633146 (Thread-11): 01:21:05  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '0cd37f71-b6f3-4d45-b409-bbab1e362607', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f010484b280>]}
2022-03-17 01:21:05.633573 (Thread-13): 01:21:05  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '0cd37f71-b6f3-4d45-b409-bbab1e362607', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f010484b760>]}
2022-03-17 01:21:05.634446 (Thread-10): 01:21:05  Finished running node snapshot.acton_marketing_data.sfdc_account_snapshot
2022-03-17 01:21:05.635025 (Thread-11): 01:21:05  6 of 7 ERROR snapshotting snapshots.sfdc_lead_snapshot.......................... [ERROR in 86381.29s]
2022-03-17 01:21:05.635944 (Thread-12): 01:21:05  Finished running node snapshot.acton_marketing_data.sfdc_opportunity_snapshot
2022-03-17 01:21:05.635605 (Thread-13): 01:21:05  4 of 7 ERROR snapshotting snapshots.sfdc_contact_snapshot....................... [ERROR in 86386.84s]
2022-03-17 01:21:05.636502 (Thread-11): 01:21:05  Finished running node snapshot.acton_marketing_data.sfdc_lead_snapshot
2022-03-17 01:21:05.637040 (Thread-13): 01:21:05  Finished running node snapshot.acton_marketing_data.sfdc_contact_snapshot
2022-03-17 01:21:05.691350 (MainThread): 01:21:05  On master: Close
2022-03-17 01:21:05.692341 (MainThread): 01:21:05  
2022-03-17 01:21:05.692828 (MainThread): 01:21:05  Exited because of keyboard interrupt.
2022-03-17 01:21:05.693162 (MainThread): 01:21:05  
2022-03-17 01:21:05.693498 (MainThread): 01:21:05  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
2022-03-17 01:21:05.693804 (MainThread): 01:21:05  Connection 'master' was properly closed.
2022-03-17 01:21:05.693977 (MainThread): 01:21:05  Connection 'snapshot.acton_marketing_data.sfdc_lead_snapshot' was properly closed.
2022-03-17 01:21:05.694203 (MainThread): 01:21:05  Connection 'snapshot.acton_marketing_data.sfdc_account_snapshot' was properly closed.
2022-03-17 01:21:05.694381 (MainThread): 01:21:05  Connection 'snapshot.acton_marketing_data.sfdc_opportunity_snapshot' was properly closed.
2022-03-17 01:21:05.694545 (MainThread): 01:21:05  Connection 'snapshot.acton_marketing_data.sfdc_contact_snapshot' was properly closed.
2022-03-17 01:21:05.694736 (MainThread): 01:21:05  Flushing usage events
2022-03-17 01:21:05.723402 (MainThread):

@MaxKrog
Copy link
Author

MaxKrog commented Mar 18, 2022

Hi @ChenyuLInx,

This only affects snapshots, no other models are affected.

Absolutely. These are the queries ran in BigQuery:

image

All 4 completed, with runtimes:

  1. 0.1 seconds
  2. 0.05 seconds
  3. 0.076 seconds
  4. 5.1 seconds

@ChenyuLInx ChenyuLInx assigned ChenyuLInx and iknox-fa and unassigned ChenyuLInx Mar 22, 2022
@ChenyuLInx ChenyuLInx assigned ChenyuLInx and unassigned iknox-fa Apr 4, 2022
@ChenyuLInx
Copy link
Contributor

Sorry that it took a while for us to get back on this.
@actonmarketing I am sorry to hear that you haven't been able to have a snapshot running correctly. In the log,

terminating connection due to administrator command
  SSL connection has been closed unexpectedly
  compiled SQL at target/run/acton_marketing_data/snapshots/sfdc/sfdc_lead_snapshot.sql
2022-03-17 01:21:05.631991 (Thread-13): 01:21:05  Database Error in snapshot sfdc_contact_snapshot (snapshots/sfdc/sfdc_contact_snapshot.sql)
  terminating connection due to administrator command
  SSL connection has been closed unexpectedly

This is when you mannually cancel the run right? And you are also running on BigQuery?

@MaxKrog From your comment I think we are almost certain that for some reason dbt missed the query finish and was just thinking that the query is still ongoing, I am going to try to reproduce this on my side and troubleshoot it.

@ChenyuLInx
Copy link
Contributor

I was able to reproduce this locally, the thing that fixed it for my test was to remove the description in config. I don't think description is a legal parameter to pass in here but also we should throw some better errors. I am still looking into why it does that.

@ChenyuLInx
Copy link
Contributor

RCA: when snapshot finish it would fire an event , then in PrintSnapshotResultLine, we would get TypeError: str.format() got multiple values for keyword argument 'description' here. But because of #4357 this will leave the proces hang rather than raise the error.

@ChenyuLInx ChenyuLInx removed their assignment Apr 6, 2022
@Lunin-Marketing
Copy link

@ChenyuLInx I don't manually cancel the run, it times out after ~24 hours and cancels itself. :(

I am not running on BigQuery, it's Postgres SQL through an Amazon AWS RDS database.

Here's a sample of my snapshot file, I don't have a description parameter but it still fails.

{% snapshot sfdc_account_snapshot %}

{{
    config (
        target_schema='snapshots',
        unique_key = 'account_id',
        strategy='timestamp',
        updated_at='systemmodstamp'
    )
}}

SELECT *
FROM {{ref('account_source_xf')}}

{% endsnapshot %}

@Lunin-Marketing
Copy link

@ChenyuLInx

I had one of my snapshots run successfully, but two other snapshots fail with a "EOF Detected".

Working snapshot:

{% snapshot sfdc_ao_instance_user_snapshot %}

{{
    config (
        target_schema='snapshots',
        unique_key = 'ao_user_id',
        strategy='timestamp',
        updated_at='systemmodstamp'
    )
}}

SELECT *
FROM {{ref('ao_instance_user_source_xf')}}

{% endsnapshot %}

Failed Snapshot:

{% snapshot sfdc_account_snapshot %}

{{
    config (
        target_schema='snapshots',
        unique_key = 'account_id',
        strategy='timestamp',
        updated_at='systemmodstamp'
    )
}}

SELECT *
FROM {{ref('account_source_xf')}}

{% endsnapshot %}

@ChenyuLInx
Copy link
Contributor

@actonmarketing Sorry I didn't realize your issue is actually different.

it times out after ~24 hours and cancels itself.
two other snapshots fail with an "EOF Detected"

Are these two describing the same failure? The time out one sounds a lot like something broke in a thread and we didn't handle it properly. The proper way to solve this kind of issues is to fix #4357. @jtcohen6 should we try to prioritize it?

In the meantime, @actonmarketing looks like your case is tricky to reproduce on my side, can you try to set this value to True and see if it will allow us to get more traceback instead of hanging? Or set a breakpoint here and see if it runs to there without issue, if yes, steping through the code to see which part went wrong? Thanks a lot!

@Lunin-Marketing
Copy link

@ChenyuLInx yes, those two errors are the same failure.

I don't see a snapshot.py file anywhere in my github instance. Where should it be located and might that be the problem here?

@ChenyuLInx
Copy link
Contributor

@actonmarketing sorry that it took me a while to get back to this! I was thinking of maybe you can setup dbt-core locally, update that value in dbt-core's code, and see if it can produce more log.
If that doesn't work, do you mind giving me some detailed reproduction steps so that I can reproduce this issue on my side and see what I can do? For the detailed reproduction steps I am looking for:

  • sample data(maybe in CSV format) that I can put into a testing warehouse to run the snapshot on
  • command I will need to run in your project to reproduce the hanging behavior

Thanks again for reporting this!

@ChenyuLInx ChenyuLInx removed the triage label Apr 21, 2022
@Lunin-Marketing
Copy link

@ChenyuLInx I am still struggling to get dbt running locally, so here are the logs, some sample data, and some steps that I take.

The logs are from a recently failed dbt snapshot job.

console_output.log
debug.log

Note, the sample data is only 500 lines whereas my opp_source_xf table has 116321 rows.

file____models_opp_source_xf.csv

@ChenyuLInx
Copy link
Contributor

@actonmarketing I am sorry to hear that.
You mentioned

a warning popup that I should contact support

Is there any additional information on that popup? And do you know which software that popup came from?

Another thing I can see from the log is that the snapshot query didn't seem to finish. Have you tried to run that query against your data warehouse directly and see what happens?

@Lunin-Marketing
Copy link

@ChenyuLInx the popup is in the dbt Web GUI. I can get a screenshot if that is helpful.

How would I go about running it directly against my data warehouse?
I would love to do that with a little guidance.

@ChenyuLInx
Copy link
Contributor

@actonmarketing I took a look at the log again and noticed something new. The 4 snapshots that didn't succeed got cancelled roughly after 24 hours.

7 of 7 ERROR snapshotting snapshots.sfdc_opportunity_snapshot................... [�[31mERROR�[0m in 86355.68s]
6 of 7 ERROR snapshotting snapshots.sfdc_lead_snapshot.......................... [�[31mERROR�[0m in 86356.66s]
4 of 7 ERROR snapshotting snapshots.sfdc_contact_snapshot....................... [�[31mERROR�[0m in 86379.96s]
1 of 7 ERROR snapshotting snapshots.sfdc_account_snapshot....................... [�[31mERROR�[0m in 86379.92s]

And I am guessing that you are running this on cloud IDE? I remember there might be a hard limit of 24 hrs for a command to run so it might get canceled because of that. I forwarded the issue to the customer support team but you might want to contact them directly also to get directly in touch with them.

@Lunin-Marketing
Copy link

@ChenyuLInx I have been using the cloud IDE, however I FINALLY got dbt running locally, and am running the snapshot command there to see what happens.

Thank you for forwarding this to support, I will definitely reach out as well.

@Lunin-Marketing
Copy link

@ChenyuLInx here are the result from running the above snapshot command locally:

dbt snapshot -s sfdc_opportunity_snapshot
18:24:24 Running with dbt=1.1.0
18:24:24 Unable to do partial parsing because profile has changed
18:24:26 Found 204 models, 0 tests, 7 snapshots, 0 analyses, 354 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
18:24:26
18:24:27 Concurrency: 5 threads (target='postgres')
18:24:27
18:24:27 1 of 1 START snapshot snapshots.sfdc_opportunity_snapshot ...................... [RUN]
20:24:28 1 of 1 ERROR snapshotting snapshots.sfdc_opportunity_snapshot .................. [←[31mERROR←[0m in 7200.41s]
20:24:28
20:24:28 Finished running 1 snapshot in 7201.80s.
20:24:28
20:24:28 ←[31mCompleted with 1 error and 0 warnings:←[0m
20:24:28
20:24:28 ←[33mDatabase Error in snapshot sfdc_opportunity_snapshot (snapshots\sfdc\sfdc_opportunity_snapshot.sql)←[0m
20:24:28 server closed the connection unexpectedly
20:24:28 This probably means the server terminated abnormally
20:24:28 before or while processing the request.
20:24:28 server closed the connection unexpectedly
20:24:28 This probably means the server terminated abnormally
20:24:28 before or while processing the request.
20:24:28 server closed the connection unexpectedly
20:24:28 This probably means the server terminated abnormally
20:24:28 before or while processing the request.

@ChenyuLInx
Copy link
Contributor

Hey @actonmarketing, sorry to get back to you late, and thanks for taking the effort to run this locally.

From what I gathered from @jeremyyeo (Thanks!!).
By default, all dbt cloud accounts will have a 24-hour maximum time limit that a single job can run for. Customer support could increase from time to time when customers require a large backfill job that could run longer than 24 hours.
But your issue seems more like those snapshots on Postgres take a really long time and there are multiple of them running. Our guess is that the Postgres instance you are running the dbt project on can't really handle that workload. And the fact that only running one snapshot would also fail due to server closed the connection unexpectedly also implies that.

@Lunin-Marketing
Copy link

@ChenyuLInx do I need to contact support directly to get that limit increased?

My thought is that once the backfill job runs (one time) then the incremental snapshots will be easy. There's just a LOT of data to process at once for the job. I can also temporarily increase the data limits on my database, so that both sides of the coin are working together.

@ChenyuLInx
Copy link
Contributor

@actonmarketing yes that would be great!

Another thing you can also try is to run those snapshot models locally once since you already have the local dbt setup. And maybe try running 1 snapshot at a time could be helpful? I am not super familiar with the Postgres architecture but I would imagine if you run all of the big snapshot models all together that could create a lot of stress on the instance and increase the chance of them failing.

@Lunin-Marketing
Copy link

@ChenyuLInx if I run them locally, how do I get it to sync back to the cloud? (forgive me, sort of new at all of this).

@ChenyuLInx
Copy link
Contributor

@actonmarketing No problem!!
If you run locally using the same target you are using in the production against the same datawarehouse/database(Postgres instance in your case), your initial snapshot models will be built inside the database. So that the next run you have anywhere including on the cloud is going to keep using those snapshot models and only add the changed part if you set the model to be incremental.

I don't think you will need to sync anything to the cloud again.

And some tips from @dbeatty10

If I were @actonmarking, here's what I would do to confirm/deny the hypothesis that the Postgres instance is under-provisioned:

  • Add a limit 1 clause to the end of the failed snapshot here
  • Run dbt snapshot -s sfdc_account_snapshot --full-refresh (which will work hopefully:crossed_fingers:)

Assuming it works with limit 1, then steps 1 and 2 can be repeated with 10x the number of rows each time until it takes more than 5-10 minutes to execute.

@jeremyyeo
Copy link
Contributor

Nice tip @dbeatty10.

Ftr:

18:24:27 1 of 1 START snapshot snapshots.sfdc_opportunity_snapshot ...................... [RUN]
20:24:28 1 of 1 ERROR snapshotting snapshots.sfdc_opportunity_snapshot .................. [←[31mERROR←[0m in 7200.41s]

This tells me that the Postgres/RDS instance has got a statement_timeout param (or equivalent param) of 7200 - because the snapshot query is computationally (?) expensive, it is running into the timeout and pg has terminated the connection.

Possibly what is could be happening in the dbt Cloud side is that this termination is not handled properly so from dbt Cloud's job perspective - it keeps waiting for the job to complete - running into the dbt Cloud job max of 24 hours and then getting cancelled.

@Lunin-Marketing
Copy link

@jeremyyeo thank you for all your help.

@dbeatty10 - great idea, testing now!

I checked in RDS and my statement_timeout is actually set ridiculously high. I am going to reach out to them with this error and make sure that the timeout setting is actually being applied.

I am testing smaller runs with LIMITS coded into the snapshots and testing some of the other snapshots.

One of them ran successfully, so I know the snapshots as a concept are working. :)

Thank you all for your help and I will report back with the results.

@Lunin-Marketing
Copy link

Status Update:
I added LIMIT 1 to the bottom of my opportunity snapshot query:

{% snapshot sfdc_opportunity_snapshot %}

{{
    config (
        target_schema='snapshots',
        unique_key = 'opportunity_id',
        strategy='timestamp',
        updated_at='systemmodstamp'
    )
}}

SELECT *
FROM {{ref('opp_source_xf')}}
LIMIT 1

{% endsnapshot %}

It still won't run...

I validated in a postgres console that the statement_timeout parameter is VERY high, so that's not the issue here. I also have an open support ticket with Amazon to help troubleshoot things on their end.

dbt support upped my query timeout to 48 hours for the next week, so I am running snapshots locally in git console, to see if I can generate some logs or successes.

Thanks again for all of your patience with this and your help group!

@Lunin-Marketing
Copy link

So - there was something faulty with the original snapshot files. I created direct copies of them (without limits) and they ran just fine in a few seconds.

I guess this case is closed for me.

Thanks for all of the help.

@ChenyuLInx
Copy link
Contributor

@actonmarketing glad it works out!!
I noticed the statement_timeout discussion, did you figure out what setting on your RDS was used in the end? The reason I am asking is to figure out whether we should look into whether @jeremyyeo mentioned above is true

Possibly what could be happening on the dbt Cloud side is that this termination is not handled properly so from dbt Cloud's job perspective - it keeps waiting for the job to complete - running into the dbt Cloud job max of 24 hours and then getting canceled.

@Lunin-Marketing
Copy link

@ChenyuLInx statement_timeout is the setting in the parameter group in RDS. Mine was set absurdly high and I verified that it was working in the Amazon CLI, so that wasn't the issue.

@ChenyuLInx
Copy link
Contributor

Thanks @actonmarketing I am going to close this we have everything resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

No branches or pull requests

8 participants