Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unhandled exception for failing on-run-* hooks in source freshness #9511

Closed
2 tasks done
jtcohen6 opened this issue Feb 2, 2024 · 5 comments · Fixed by #9763
Closed
2 tasks done

[Bug] Unhandled exception for failing on-run-* hooks in source freshness #9511

jtcohen6 opened this issue Feb 2, 2024 · 5 comments · Fixed by #9763
Assignees
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe pre-release Bug not yet in a stable release
Milestone

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 2, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

follow up to #9366

Failing hooks in source freshness lead to an unhandled exception

Expected Behavior

The exception is handled, as it is for other commands

Steps To Reproduce

  1. Create a simple project that defines a source with freshness + loaded_at_field
  2. In the project file:
# dbt_project.yml
flags:
  source_freshness_run_project_hooks: true

on-run-start:
  "select fake_column from table_does_not_exist"
  1. dbt source freshness

Relevant log output

$ dbt source freshness
12:11:06  Running with dbt=1.8.0-a1
12:11:06  Registered adapter: postgres=1.8.0-a1
12:11:06  Unable to do partial parsing because a project config has changed
12:11:07  Found 1 operation, 1 source, 521 macros
12:11:07
12:11:08
12:11:08  Running 1 on-run-start hook
12:11:08  1 of 1 START hook: my_dbt_project.on-run-start.0 ............................... [RUN]
12:11:08  Database error while running on-run-start
12:11:08  Concurrency: 5 threads (target='dev')
12:11:08
12:11:08  1 of 1 START freshness of my_src.my_tbl ........................................ [RUN]
12:11:08  1 of 1 WARN freshness of my_src.my_tbl ......................................... [WARN in 0.04s]
12:11:08
12:11:08  Finished running 1 source in 0 hours 0 minutes and 0.32 seconds (0.32s).
12:11:08  Encountered an error:
'BaseResult' object has no attribute 'node'
12:11:08  Traceback (most recent call last):
  File "/Users/jerco/dev/product/dbt-core/core/dbt/cli/requires.py", line 97, in wrapper
    result, success = func(*args, **kwargs)
  File "/Users/jerco/dev/product/dbt-core/core/dbt/cli/requires.py", line 82, in wrapper
    return func(*args, **kwargs)
  File "/Users/jerco/dev/product/dbt-core/core/dbt/cli/requires.py", line 175, in wrapper
    return func(*args, **kwargs)
  File "/Users/jerco/dev/product/dbt-core/core/dbt/cli/requires.py", line 204, in wrapper
    return func(*args, **kwargs)
  File "/Users/jerco/dev/product/dbt-core/core/dbt/cli/requires.py", line 251, in wrapper
    return func(*args, **kwargs)
  File "/Users/jerco/dev/product/dbt-core/core/dbt/cli/requires.py", line 280, in wrapper
    return func(*args, **kwargs)
  File "/Users/jerco/dev/product/dbt-core/core/dbt/cli/main.py", line 789, in freshness
    results = task.run()
  File "/Users/jerco/dev/product/dbt-core/core/dbt/task/runnable.py", line 524, in run
    result.write(self.result_path())
  File "/Users/jerco/dev/product/dbt-core/core/dbt/artifacts/schemas/freshness/v3/freshness.py", line 67, in write
    FreshnessExecutionResultArtifact.from_result(self).write(path)
  File "/Users/jerco/dev/product/dbt-core/core/dbt/artifacts/schemas/freshness/v3/freshness.py", line 110, in from_result
    processed = [process_freshness_result(r) for r in base.results]
  File "/Users/jerco/dev/product/dbt-core/core/dbt/artifacts/schemas/freshness/v3/freshness.py", line 110, in <listcomp>
    processed = [process_freshness_result(r) for r in base.results]
  File "/Users/jerco/dev/product/dbt-core/core/dbt/artifacts/schemas/freshness/v3/freshness.py", line 119, in process_freshness_result
    unique_id = result.node.unique_id
AttributeError: 'BaseResult' object has no attribute 'node'

Environment

- OS: 13.4.1
- Python: 3.10.11
- dbt: 1.8.0-a1

Which database adapter are you using with dbt?

No response

Additional Context

No response

@jtcohen6 jtcohen6 added the bug Something isn't working label Feb 2, 2024
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Feb 2, 2024

This is an issue while serializing the on-run-* result into the freshness result artifact (sources.json) here. Naïve fix:

    @classmethod
    def from_result(cls, base: FreshnessResult):
        # on-run-* hooks are of type BaseResult, not SourceFreshnessResult
        # they don't make sense in sources.json -- let's exclude them here
        processed = [
            process_freshness_result(r)
            for r in base.results
            if isinstance(r, SourceFreshnessResult)
        ]
        return cls(
            metadata=base.metadata,
            results=processed,
            elapsed_time=base.elapsed_time,
        )

@jtcohen6 jtcohen6 added this to the v1.8 milestone Feb 22, 2024
@graciegoheen graciegoheen added the High Severity bug with significant impact that should be resolved in a reasonable timeframe label Feb 22, 2024
@martynydbt martynydbt added the pre-release Bug not yet in a stable release label Feb 29, 2024
aranke added a commit that referenced this issue Mar 14, 2024
@aranke
Copy link
Member

aranke commented Mar 14, 2024

@jtcohen6 @graciegoheen

I tried the proposed fix locally and observed the following behaviors:

  1. ✅ The error is surfaced in the CLI
  2. ❓ The hook result isn't written to run_results.json or sources.json

Is behavior 2 to be expected?

Relevant logs

❯ dbt source freshness
22:24:43  Running with dbt=1.8.0-b1
22:24:43  Registered adapter: duckdb=1.7.2
22:24:43  Found 5 models, 3 seeds, 1 operation, 20 data tests, 1 source, 515 macros
22:24:43  
22:24:43  
22:24:43  Running 1 on-run-start hook
22:24:43  1 of 1 START hook: jaffle_shop.on-run-start.0 .................................. [RUN]
22:24:43  Database error while running on-run-start
22:24:43  Concurrency: 24 threads (target='dev')
22:24:43  
22:24:43  1 of 1 START freshness of external_source.flights .............................. [RUN]
22:24:43  1 of 1 WARN freshness of external_source.flights ............................... [WARN in 0.11s]
22:24:43  
22:24:43  Finished running 1 source in 0 hours 0 minutes and 0.18 seconds (0.18s).
22:24:43  
22:24:43    on-run-start failed, error:
 Catalog Error: Table with name table_does_not_exist does not exist!
Did you mean "system.information_schema.tables"?
LINE 2: select fake_column from table_does_not_exist
                                ^
22:24:43  Done.
❯ jq . target/sources.json
{
  "metadata": {
    "dbt_schema_version": "https://schemas.getdbt.com/dbt/sources/v3.json",
    "dbt_version": "1.8.0b1",
    "generated_at": "2024-03-17T06:05:18.514529Z",
    "invocation_id": "0f3a372c-80c0-4847-ae15-8e2f77c3a1ca",
    "env": {}
  },
  "results": [
    {
      "unique_id": "source.jaffle_shop.external_source.flights",
      "max_loaded_at": "2013-12-31T23:00:00+00:00",
      "snapshotted_at": "2024-03-17T06:05:18.448000+00:00",
      "max_loaded_at_time_ago_in_s": 322124718.448,
      "status": "warn",
      "criteria": {
        "warn_after": {
          "count": 1,
          "period": "minute"
        },
        "error_after": {
          "count": null,
          "period": null
        },
        "filter": null
      },
      "adapter_response": {
        "_message": "OK"
      },
      "timing": [
        {
          "name": "compile",
          "started_at": "2024-03-17T06:05:18.399754Z",
          "completed_at": "2024-03-17T06:05:18.399759Z"
        },
        {
          "name": "execute",
          "started_at": "2024-03-17T06:05:18.400089Z",
          "completed_at": "2024-03-17T06:05:18.508173Z"
        }
      ],
      "thread_id": "Thread-1 (worker)",
      "execution_time": 0.1091008186340332
    }
  ],
  "elapsed_time": 0.1851181983947754
}
❯ jq . target/run_results.json
{
  "metadata": {
    "dbt_schema_version": "https://schemas.getdbt.com/dbt/run-results/v6.json",
    "dbt_version": "1.8.0b1",
    "generated_at": "2024-03-17T06:03:22.241956Z",
    "invocation_id": "bfc13b1d-bf76-4849-965b-3843b65b1c20",
    "env": {}
  },
  "results": [
    {
      "status": "success",
      "timing": [
        {
          "name": "compile",
          "started_at": "2024-03-17T06:03:22.191457Z",
          "completed_at": "2024-03-17T06:03:22.193270Z"
        },
        {
          "name": "execute",
          "started_at": "2024-03-17T06:03:22.193623Z",
          "completed_at": "2024-03-17T06:03:22.234649Z"
        }
      ],
      "thread_id": "Thread-1 (worker)",
      "execution_time": 0.04505109786987305,
      "adapter_response": {
        "_message": "OK"
      },
      "message": "OK",
      "failures": null,
      "unique_id": "model.jaffle_shop.stg_customers",
      "compiled": true,
      "compiled_code": "with source as (\n    select * from \"jaffle_shop\".\"main\".\"raw_customers\"\n\n),\n\nrenamed as (\n\n    select\n        id as customer_id,\n        first_name,\n        last_name\n\n    from source\n\n)\n\nselect * from renamed",
      "relation_name": "\"jaffle_shop\".\"main\".\"stg_customers\""
    }
  ],
  "elapsed_time": 0.11771893501281738,
  "args": {
    "exclude": [],
    "print": true,
    "printer_width": 80,
    "send_anonymous_usage_stats": true,
    "quiet": false,
    "static_parser": true,
    "vars": {},
    "empty": false,
    "version_check": true,
    "which": "run",
    "log_format": "default",
    "use_colors_file": true,
    "introspect": true,
    "source_freshness_run_project_hooks": true,
    "favor_state": false,
    "log_level_file": "debug",
    "log_format_file": "debug",
    "select": [
      "stg_customers"
    ],
    "enable_legacy_logger": false,
    "log_level": "info",
    "log_file_max_bytes": 10485760,
    "project_dir": "/Users/karanke/workspace/jaffle_shop_duckdb",
    "strict_mode": false,
    "use_colors": true,
    "defer": false,
    "cache_selected_only": false,
    "write_json": true,
    "show_resource_report": false,
    "partial_parse_file_diff": true,
    "invocation_command": "dbt run --select stg_customers",
    "partial_parse": true,
    "warn_error_options": {
      "include": [],
      "exclude": []
    },
    "log_path": "/Users/karanke/workspace/jaffle_shop_duckdb/logs",
    "populate_cache": true,
    "profiles_dir": "/Users/karanke/workspace/jaffle_shop_duckdb",
    "macro_debugging": false,
    "indirect_selection": "eager"
  }
}

@graciegoheen
Copy link
Contributor

I am surprised that the failing hook isn't written to the run_results - how are similar errors typically handled?

@aranke
Copy link
Member

aranke commented Mar 17, 2024

It looks like on-run-start errors for dbt run aren't written to run_results.json, so I'll maintain this behavior for dbt source freshness for now.

Do we want to keep this behavior, or should we create a follow-up ticket to fix this?

@jtcohen6
Copy link
Contributor Author

Just chatted about this with @aranke. I agree that it's surprising behavior, and also outside the scope of the fix for this bug, so @aranke said he would open a new issue. The thing I'm not sure about is exactly how hook nodes should be included in the results dictionary of run_results.json / sources.json — especially the latter, which is currently Union[SourceFreshnessOutput, SourceFreshnessRuntimeError].

The ability to dbt retry from point of failures (including failed hooks!) makes sense. This also feels related to another issue, which is that hooks should actually participate in the DAG in the sense of on-run-start hooks preventing subsequent DAG nodes from running (marking them Skipped):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe pre-release Bug not yet in a stable release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants