-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2959] [Feature] when --output=json for dbt list and dbt show, have additional structured data available inside the ListCmdOut data object #8358
Comments
It's also worth mentioning that we use the same approach for dbt-core events for It is working for us! But structured data is nice where we can have it. |
I don't yet understand the nuances of either the end goal nor the intermediate steps here, but my curiosity is piqued about both. Doing some experimentation: rm -f logs/dbt.log
dbt --quiet --log-format-file json list --output json > dbt-list.json This will create JSON output in two different files:
Would having separate files each containing valid non-stringified JSON be useful for you @davidharting? Click to toggle pretty-printed outputFor human-readability, I like using cat dbt-list.json | jq . > dbt-list.pp.json
cat logs/dbt.log | jq . > dbt.pp.log
{
"name": "abc",
"resource_type": "model",
"package_name": "my_project",
"original_file_path": "models/abc_v1.sql",
"unique_id": "model.my_project.abc.v1",
"alias": "abc_v1",
"config": {
"enabled": true,
"alias": null,
"schema": null,
"database": null,
"tags": [],
"meta": {},
"group": null,
"materialized": "view",
"incremental_strategy": null,
"persist_docs": {},
"quoting": {},
"column_types": {},
"full_refresh": null,
"unique_key": null,
"on_schema_change": "ignore",
"grants": {},
"packages": [],
"docs": {
"show": true,
"node_color": null
},
"contract": {
"enforced": true
},
"post-hook": [],
"pre-hook": []
},
"tags": [],
"depends_on": {
"macros": [],
"nodes": []
}
}
{
"name": "abc",
"resource_type": "model",
"package_name": "my_project",
"original_file_path": "models/abc_v2.sql",
"unique_id": "model.my_project.abc.v2",
"alias": "abc_v2",
"config": {
"enabled": true,
"alias": null,
"schema": null,
"database": null,
"tags": [],
"meta": {},
"group": null,
"materialized": "view",
"incremental_strategy": null,
"persist_docs": {},
"quoting": {},
"column_types": {},
"full_refresh": null,
"unique_key": null,
"on_schema_change": "ignore",
"grants": {},
"packages": [],
"docs": {
"show": true,
"node_color": null
},
"contract": {
"enforced": true
},
"post-hook": [],
"pre-hook": []
},
"tags": [],
"depends_on": {
"macros": [],
"nodes": [
"model.my_project.abc.v1"
]
}
}
{
"name": "my_seed",
"resource_type": "seed",
"package_name": "my_project",
"original_file_path": "seeds/my_seed.csv",
"unique_id": "seed.my_project.my_seed",
"alias": "my_seed",
"config": {
"enabled": true,
"alias": null,
"schema": null,
"database": null,
"tags": [],
"meta": {},
"group": null,
"materialized": "seed",
"incremental_strategy": null,
"persist_docs": {},
"quoting": {},
"column_types": {},
"full_refresh": null,
"unique_key": null,
"on_schema_change": "ignore",
"grants": {},
"packages": [],
"docs": {
"show": true,
"node_color": "purple"
},
"contract": {
"enforced": false
},
"quote_columns": null,
"post-hook": [],
"pre-hook": []
},
"tags": [],
"depends_on": {
"macros": []
}
}
{
"data": {
"log_version": 3,
"version": "=1.5.1"
},
"info": {
"category": "",
"code": "A001",
"extra": {},
"invocation_id": "5a461268-907b-4f7d-84ca-0f17d333ebc0",
"level": "info",
"msg": "Running with dbt=1.5.1",
"name": "MainReportVersion",
"pid": 58578,
"thread": "MainThread",
"ts": "2023-08-11T15:17:40.092948Z"
}
}
{
"data": {
"args": {
"cache_selected_only": "False",
"debug": "False",
"fail_fast": "False",
"indirect_selection": "eager",
"introspect": "True",
"log_cache_events": "False",
"log_format": "default",
"log_path": "/Users/dbeatty/projects/copier-templates/duckdb-docs-440/logs",
"no_print": "None",
"partial_parse": "True",
"printer_width": "80",
"profiles_dir": "/Users/dbeatty/projects/copier-templates/duckdb-docs-440",
"quiet": "True",
"send_anonymous_usage_stats": "False",
"static_parser": "True",
"target_path": "None",
"use_colors": "True",
"use_experimental_parser": "False",
"version_check": "True",
"warn_error": "None",
"warn_error_options": "WarnErrorOptions(include=[], exclude=[])",
"write_json": "True"
}
},
"info": {
"category": "",
"code": "A002",
"extra": {},
"invocation_id": "5a461268-907b-4f7d-84ca-0f17d333ebc0",
"level": "debug",
"msg": "running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'profiles_dir': '/Users/dbeatty/projects/copier-templates/duckdb-docs-440', 'version_check': 'True', 'debug': 'False', 'log_path': '/Users/dbeatty/projects/copier-templates/duckdb-docs-440/logs', 'fail_fast': 'False', 'warn_error': 'None', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'True', 'log_format': 'default', 'static_parser': 'True', 'introspect': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'target_path': 'None', 'send_anonymous_usage_stats': 'False'}",
"name": "MainReportArgs",
"pid": 58578,
"thread": "MainThread",
"ts": "2023-08-11T15:17:40.097418Z"
}
}
{
"data": {
"checksum": "51f6b581eba8f8101bc020bf9faf8f96af641e2da86d581f66e2bdf0ff384b1c",
"profile": "",
"target": "",
"vars": "{}",
"version": "1.5.1"
},
"info": {
"category": "",
"code": "I025",
"extra": {},
"invocation_id": "5a461268-907b-4f7d-84ca-0f17d333ebc0",
"level": "debug",
"msg": "checksum: 51f6b581eba8f8101bc020bf9faf8f96af641e2da86d581f66e2bdf0ff384b1c, vars: {}, profile: , target: , version: 1.5.1",
"name": "StateCheckVarsHash",
"pid": 58578,
"thread": "MainThread",
"ts": "2023-08-11T15:17:41.073512Z"
}
}
{
"data": {
"added": 0,
"changed": 0,
"deleted": 0
},
"info": {
"category": "",
"code": "I040",
"extra": {},
"invocation_id": "5a461268-907b-4f7d-84ca-0f17d333ebc0",
"level": "debug",
"msg": "Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.",
"name": "PartialParsingEnabled",
"pid": 58578,
"thread": "MainThread",
"ts": "2023-08-11T15:17:41.104266Z"
}
}
{
"data": {},
"info": {
"category": "",
"code": "I017",
"extra": {},
"invocation_id": "5a461268-907b-4f7d-84ca-0f17d333ebc0",
"level": "debug",
"msg": "Partial parsing enabled, no changes found, skipping parsing",
"name": "PartialParsingSkipParsing",
"pid": 58578,
"thread": "MainThread",
"ts": "2023-08-11T15:17:41.104759Z"
}
}
{
"data": {
"stat_line": "2 models, 0 tests, 0 snapshots, 0 analyses, 313 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups"
},
"info": {
"category": "",
"code": "W006",
"extra": {},
"invocation_id": "5a461268-907b-4f7d-84ca-0f17d333ebc0",
"level": "info",
"msg": "Found 2 models, 0 tests, 0 snapshots, 0 analyses, 313 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups",
"name": "FoundStats",
"pid": 58578,
"thread": "MainThread",
"ts": "2023-08-11T15:17:41.119910Z"
}
}
{
"data": {
"command": "dbt list",
"completed_at": "2023-08-11T15:17:41.121310Z",
"elapsed": 1.0645814,
"success": true
},
"info": {
"category": "",
"code": "Q039",
"extra": {},
"invocation_id": "5a461268-907b-4f7d-84ca-0f17d333ebc0",
"level": "debug",
"msg": "Command `dbt list` succeeded at 09:17:41.121310 after 1.06 seconds",
"name": "CommandCompleted",
"pid": 58578,
"thread": "MainThread",
"ts": "2023-08-11T15:17:41.121495Z"
}
}
{
"data": {},
"info": {
"category": "",
"code": "Z042",
"extra": {},
"invocation_id": "5a461268-907b-4f7d-84ca-0f17d333ebc0",
"level": "debug",
"msg": "Flushing usage events",
"name": "FlushEvents",
"pid": 58578,
"thread": "MainThread",
"ts": "2023-08-11T15:17:41.122033Z"
}
} |
Doing further exploring, I discovered some key differences between
As a result, the example above is something that only works with the former, and not the latter. ❔ Maybe I should open a separate issue for this? ☝️ |
@dbeatty10 Thanks for digging into this! My goal is to build a lineage graph out of structured information emitted by Today, I am using the events emitted into What I would love is for dbt to provide a well-defined schema for the structured data provided by I hadn't considered redirecting standard out to a file and using that. I am using the One downside of using stdout and using Fortunately, that has never been a problem in the past. But having a data contract for this structured data could be beneficial. This whole line of reasoning applies for |
Is this your first time submitting a feature request?
Describe the feature
Originally from @davidharting via slack
@peterallenwebb noted some challenges with this
Describe alternatives you've considered
parse the message itself to JSON, then we pull msg string out and parse that as JSON as well.
Who will this benefit?
People who want a well-defined schema for preview table json and list output json
Are you interested in contributing this feature?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: