From 6d766fd89930588334fc76c66503edf1816571d1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Sun, 12 Oct 2025 13:36:07 +0200 Subject: [PATCH 01/13] CI/CD suggestions. How to troubleshooting long running operations. Complete index docs. Document projections. Document typing complex types. --- .../dbt/features-and-configurations.md | 62 ++++++++++++++++++- .../data-ingestion/etl-tools/dbt/index.md | 37 ++++++++++- 2 files changed, 95 insertions(+), 4 deletions(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md index 33a2b86f30b..7e9b4e0708b 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md @@ -144,9 +144,41 @@ dbt relies on a read-after-insert consistency model. This is not compatible with | settings | A map/dictionary of "TABLE" settings to be used to DDL statements like 'CREATE TABLE' with this model | | | query_settings | A map/dictionary of ClickHouse user level settings to be used with `INSERT` or `DELETE` statements in conjunction with this model | | | ttl | A TTL expression to be used with the table. The TTL expression is a string that can be used to specify the TTL for the table. | | -| indexes | A list of indexes to create, available only for `table` materialization. For examples look at ([#397](https://github.com/ClickHouse/dbt-clickhouse/pull/397)) | | -| sql_security | Allow you to specify which ClickHouse user to use when executing the view's underlying query. [`SQL SECURITY`](https://clickhouse.com/docs/sql-reference/statements/create/view#sql_security) has two legal values: `definer` `invoker`. | | +| indexes | A list of [data skipping indexes to create](/optimize/skipping-indexes). Check below for more information. | | +| sql_security | Allow you to specify which ClickHouse user to use when executing the view's underlying query. `SQL SECURITY` [has two legal values](/sql-reference/statements/create/view#sql_security): `definer` `invoker`. | | | definer | If `sql_security` was set to `definer`, you have to specify any existing user or `CURRENT_USER` in the `definer` clause. | | +| projections | A list of [projections](/data-modeling/projections) to be created. Check below for more information. | | + +#### About Data Skipping Indexes {#data-skipping-indexes} + +These indexes are only available for `table` materialization. A list of these indexes can be added in the tablle setting as + +```sql +{{ config( + materialized='table', + indexes=[{ + 'name': 'your_index_name', + 'definition': 'your_column TYPE minmax GRANULARITY 2' + }] +) }} +``` + +#### About Projections {#projections} + +Projections are added to the `table` and `distributed_table` materializations as a model setting. For distributed tables, the projection is applied to the `_local` tables, not to the distributed proxy table. For example + +```sql +{{ config( + materialized='table', + projections=[ + { + 'name': 'your_projection_name', + 'query': 'SELECT department, avg(age) AS avg_age GROUP BY department' + } + ] +) }} +``` + ### Supported table engines {#supported-table-engines} @@ -191,7 +223,7 @@ should be carefully researched and tested. | codec | A string consisting of arguments passed to `CODEC()` in the column's DDL. For example: `codec: "Delta, ZSTD"` will be compiled as `CODEC(Delta, ZSTD)`. | | ttl | A string consisting of a [TTL (time-to-live) expression](https://clickhouse.com/docs/guides/developer/ttl) that defines a TTL rule in the column's DDL. For example: `ttl: ts + INTERVAL 1 DAY` will be compiled as `TTL ts + INTERVAL 1 DAY`. | -#### Example {#example} +#### Example of schema configuration {#example-of-schema-configuration} ```yaml models: @@ -209,6 +241,30 @@ models: ttl: ts + INTERVAL 1 DAY ``` +#### Adding complex types {#adding-complex-types} + +dbt attempts to infer the types of each column based on the SQL used to create the model. However, some types may not be directly inferable by this process, which may lead to collisions with the types defined in the contract `data_type` property. To resolve this, we recommend using the `CAST()` function in the model SQL to force the type you need. For example: + +```sql +{{ + config( + materialized="materialized_view", + engine="AggregatingMergeTree", + order_by=["event_type"], + ) +}} + +select + -- event_type may be infered as a String but we may prefer LowCardinality(String): + CAST(event_type, 'LowCardinality(String)') as event_type, + -- countState() may be infered as `AggregateFunction(count)` but we may prefer to change the type of the argument used: + CAST(countState(), 'AggregateFunction(count, UInt32)') as response_count, + -- maxSimpleState() may be infered as `SimpleAggregateFunction(max, String)` but we may prefer to also change the type of the argument used: + CAST(maxSimpleState(event_type), 'SimpleAggregateFunction(max, LowCardinality(String))') as max_event_type +from {{ ref('user_events') }} +group by event_type +``` + ## Features {#features} ### Materialization: view {#materialization-view} diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/index.md b/docs/integrations/data-ingestion/etl-tools/dbt/index.md index 277b70afa33..4fe724b3910 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/index.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/index.md @@ -41,6 +41,8 @@ List of supported features: - [x] Distributed table materialization (experimental) - [x] Distributed incremental materialization (experimental) - [x] Contracts +- [x] ClickHouse-specific column configurations (Codec, TTL...) +- [x] ClickHouse-specific table settings (indexes, projections...) All features up to dbt-core 1.9 are supported. We will soon add the features added in dbt-core 1.10. @@ -125,7 +127,36 @@ Execute `dbt debug` with the CLI tool to confirm whether dbt is able to connect Go to the [guides page](/integrations/dbt/guides) to learn more about how to use dbt with ClickHouse. -## Troubleshooting Connections {#troubleshooting-connections} +### Testing and Deploying your models (CI/CD) {#testing-and-deploying-your-models-ci-cd} + +There are many ways to test and deploy your dbt project. dbt has some suggestions for [best practice workflows](https://docs.getdbt.com/best-practices/best-practice-workflows#pro-tips-for-workflows) and [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs). We are going to discuss several strategies, but keep into account that these strategies may need to be deeply adjusted to fit your specific use case. + +#### CI/CD with simple data tests and unit tests {#ci-with-simple-data-tests-and-unit-tests} + +One simple way to kick-start your CI pipeline is to run a ClickHouse cluster inside your job and then run your models against it. You can insert demo data into this cluster before running your models. You can just use a [seed](https://docs.getdbt.com/reference/commands/seed) to populate the staging environment with a subset of your production data. + +Once the data is inserted, you can then run your [data tests](https://docs.getdbt.com/docs/build/data-tests) and your [unit tests](https://docs.getdbt.com/docs/build/unit-tests). + +Your CD step can be as simple as running `dbt build` against your production ClickHouse cluster. + +#### More complete CI/CD stage: Use recent data, only test affected models {#more-complete-ci-stage} + +One common strategy is to use [Slim CI](https://docs.getdbt.com/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci) jobs, where only the refreshed models (and their downstream dependencies) are tested. You can use the artifacts from your production runs to keep your development environment(s) in sync + +To keep your development environments in sync and avoid running your models against stale deployments, you can use [clone](https://docs.getdbt.com/reference/commands/clone) or even [defer](https://docs.getdbt.com/reference/node-selection/defer). + +It's better to use a different ClickHouse cluster (an `staging` one) to handle the testing phase. That way you can avoid impacting the performance of your production environment and the data there. You can keep a small subset of your production data there so you can run your models against it. There are different ways of handling this: +- If your data doesn't need to be really recent, you can load backups of your production data into the staging cluster. +- If you need more recent data, you can also find different strategies to load your data into the staging cluster. For example, you could use a refreshable materialized view and `remoteSecure()` and insert the data daily. If the insert fails or if there is data loss, you should be able to quickly re-trigger it. +- Another way could be to use a cron or refreshable materialized view to write the data to object storage and then set up a clickpipe on staging to pull any new files when they drop. + +Doing your CI testing in an accessible cluster can let you also do some manual testing of your results. For example, you may want to access to this environment using one of your BI tools. + +Your CD step can reuse the artifacts from your last production deployment to only update the models that have changed with something like `dbt build --select state:modified+ --state path/to/last/deploy/state.json` + +## Troubleshooting common issues {#troubleshooting-common-issues} + +### Connections {#troubleshooting-connections} If you encounter issues connecting to ClickHouse from dbt, make sure the following criteria are met: @@ -134,6 +165,10 @@ If you encounter issues connecting to ClickHouse from dbt, make sure the followi - If you're not using the default table engine for the database, you must specify a table engine in your model configuration. +### Understanding long-running operations {#understanding-long-running-operations} + +Some operations may take longer than expected due to specific ClickHouse queries. To gain more insight into which queries are taking longer, you can increase the log level to `debug` as it will print the time used by each one. For example, this can be achieved by appending `---log-level debug` to the command. + ## Limitations {#limitations} The current ClickHouse adapter for dbt has several limitations users should be aware of: From 1aa5430156f2199928c3d4602e9a58d7d3da3b6d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Sun, 12 Oct 2025 13:39:35 +0200 Subject: [PATCH 02/13] lint --- .../data-ingestion/etl-tools/dbt/features-and-configurations.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md index 7e9b4e0708b..318fe983753 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md @@ -179,7 +179,6 @@ Projections are added to the `table` and `distributed_table` materializations as ) }} ``` - ### Supported table engines {#supported-table-engines} | Type | Details | From 7b3fd5ec8ef2e44f1e2b0a2f8597a872190cd73d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Sun, 12 Oct 2025 13:58:34 +0200 Subject: [PATCH 03/13] spelling --- .../etl-tools/dbt/features-and-configurations.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md index 318fe983753..6b2583860c3 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md @@ -151,7 +151,7 @@ dbt relies on a read-after-insert consistency model. This is not compatible with #### About Data Skipping Indexes {#data-skipping-indexes} -These indexes are only available for `table` materialization. A list of these indexes can be added in the tablle setting as +These indexes are only available for `table` materialization. A list of these indexes can be added in the table setting as ```sql {{ config( @@ -242,7 +242,7 @@ models: #### Adding complex types {#adding-complex-types} -dbt attempts to infer the types of each column based on the SQL used to create the model. However, some types may not be directly inferable by this process, which may lead to collisions with the types defined in the contract `data_type` property. To resolve this, we recommend using the `CAST()` function in the model SQL to force the type you need. For example: +dbt automatically determines the data type of each column by analyzing the SQL used to create the model. However, in some cases this process may not accurately determine the data type, leading to conflicts with the types specified in the contract `data_type` property. To address this, we recommend using the `CAST()` function in the model SQL to explicitly define the desired type. For example: ```sql {{ From c1472c6df22bb52c436e92ad7bc8a73d31f02a30 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Mon, 13 Oct 2025 18:12:06 +0200 Subject: [PATCH 04/13] Update docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md Co-authored-by: Marta Paes --- .../data-ingestion/etl-tools/dbt/features-and-configurations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md index 6b2583860c3..4227bdc227e 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md @@ -147,7 +147,7 @@ dbt relies on a read-after-insert consistency model. This is not compatible with | indexes | A list of [data skipping indexes to create](/optimize/skipping-indexes). Check below for more information. | | | sql_security | Allow you to specify which ClickHouse user to use when executing the view's underlying query. `SQL SECURITY` [has two legal values](/sql-reference/statements/create/view#sql_security): `definer` `invoker`. | | | definer | If `sql_security` was set to `definer`, you have to specify any existing user or `CURRENT_USER` in the `definer` clause. | | -| projections | A list of [projections](/data-modeling/projections) to be created. Check below for more information. | | +| projections | A list of [projections](/data-modeling/projections) to be created. Check [About projections](#projections) for details. | | #### About Data Skipping Indexes {#data-skipping-indexes} From f79ccf360fb33c43fc205d1505a085bf71108e54 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Mon, 13 Oct 2025 18:40:07 +0200 Subject: [PATCH 05/13] Update docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md Co-authored-by: Marta Paes --- .../data-ingestion/etl-tools/dbt/features-and-configurations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md index 4227bdc227e..05f4d171b97 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md @@ -149,7 +149,7 @@ dbt relies on a read-after-insert consistency model. This is not compatible with | definer | If `sql_security` was set to `definer`, you have to specify any existing user or `CURRENT_USER` in the `definer` clause. | | | projections | A list of [projections](/data-modeling/projections) to be created. Check [About projections](#projections) for details. | | -#### About Data Skipping Indexes {#data-skipping-indexes} +#### About data skipping indexes {#data-skipping-indexes} These indexes are only available for `table` materialization. A list of these indexes can be added in the table setting as From 8bcb4c9f142180b7864f9ca0dcba771acb335a9b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Mon, 13 Oct 2025 18:40:21 +0200 Subject: [PATCH 06/13] Update docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md Co-authored-by: Marta Paes --- .../data-ingestion/etl-tools/dbt/features-and-configurations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md index 05f4d171b97..499bd23aa8c 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md @@ -163,7 +163,7 @@ These indexes are only available for `table` materialization. A list of these in ) }} ``` -#### About Projections {#projections} +#### About projections {#projections} Projections are added to the `table` and `distributed_table` materializations as a model setting. For distributed tables, the projection is applied to the `_local` tables, not to the distributed proxy table. For example From b0a4cff5a0dcd6c8c052fceb30b14227b5adbabf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Mon, 13 Oct 2025 18:40:59 +0200 Subject: [PATCH 07/13] Update docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md Co-authored-by: Marta Paes --- .../data-ingestion/etl-tools/dbt/features-and-configurations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md index 499bd23aa8c..040c53ba0d2 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md @@ -151,7 +151,7 @@ dbt relies on a read-after-insert consistency model. This is not compatible with #### About data skipping indexes {#data-skipping-indexes} -These indexes are only available for `table` materialization. A list of these indexes can be added in the table setting as +Data skipping indexes are only available for the `table` materialization. To add a list of data skipping indexes to a table, use the `indexes` configuration: ```sql {{ config( From 896dbecc50dfd2ac678b15a7102e4766aa339077 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Mon, 13 Oct 2025 18:41:47 +0200 Subject: [PATCH 08/13] Update docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md Co-authored-by: Marta Paes --- .../data-ingestion/etl-tools/dbt/features-and-configurations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md index 040c53ba0d2..a60102a3c7e 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md @@ -165,7 +165,7 @@ Data skipping indexes are only available for the `table` materialization. To add #### About projections {#projections} -Projections are added to the `table` and `distributed_table` materializations as a model setting. For distributed tables, the projection is applied to the `_local` tables, not to the distributed proxy table. For example +You can add [projections](/data-modeling/projections) to `table` and `distributed_table` materializations using the `projections` configuration: ```sql {{ config( From c238df841bd0ec2ac89058a52197428c0e323c0d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Mon, 13 Oct 2025 18:41:55 +0200 Subject: [PATCH 09/13] Update docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md Co-authored-by: Marta Paes --- .../data-ingestion/etl-tools/dbt/features-and-configurations.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md index a60102a3c7e..005e4ca0b8f 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md @@ -178,6 +178,7 @@ You can add [projections](/data-modeling/projections) to `table` and `distribute ] ) }} ``` +**Note**: For distributed tables, the projection is applied to the `_local` tables, not to the distributed proxy table. ### Supported table engines {#supported-table-engines} From 4fd1e1e04a0fea5c89ad5ad493a847716a8262bf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Mon, 13 Oct 2025 18:42:41 +0200 Subject: [PATCH 10/13] Update docs/integrations/data-ingestion/etl-tools/dbt/index.md Co-authored-by: Marta Paes --- docs/integrations/data-ingestion/etl-tools/dbt/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/index.md b/docs/integrations/data-ingestion/etl-tools/dbt/index.md index 4fe724b3910..8ecbc110f29 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/index.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/index.md @@ -141,7 +141,7 @@ Your CD step can be as simple as running `dbt build` against your production Cli #### More complete CI/CD stage: Use recent data, only test affected models {#more-complete-ci-stage} -One common strategy is to use [Slim CI](https://docs.getdbt.com/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci) jobs, where only the refreshed models (and their downstream dependencies) are tested. You can use the artifacts from your production runs to keep your development environment(s) in sync +One common strategy is to use [Slim CI](https://docs.getdbt.com/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci) jobs, where only the modified models (and their up- and downstream dependencies) are re-deployed. This approach uses artifacts from your production runs (i.e., the [dbt manifest](https://docs.getdbt.com/reference/artifacts/manifest-json)) to reduce the run time of your project and ensure there is no schema drift across environments. To keep your development environments in sync and avoid running your models against stale deployments, you can use [clone](https://docs.getdbt.com/reference/commands/clone) or even [defer](https://docs.getdbt.com/reference/node-selection/defer). From 49b49b8b9b7b27b399b39b062fbf428ecdd50f25 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Mon, 13 Oct 2025 18:53:54 +0200 Subject: [PATCH 11/13] Update docs/integrations/data-ingestion/etl-tools/dbt/index.md Co-authored-by: Marta Paes --- .../data-ingestion/etl-tools/dbt/index.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/index.md b/docs/integrations/data-ingestion/etl-tools/dbt/index.md index 8ecbc110f29..cc5432402e2 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/index.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/index.md @@ -145,14 +145,14 @@ One common strategy is to use [Slim CI](https://docs.getdbt.com/best-practices/b To keep your development environments in sync and avoid running your models against stale deployments, you can use [clone](https://docs.getdbt.com/reference/commands/clone) or even [defer](https://docs.getdbt.com/reference/node-selection/defer). -It's better to use a different ClickHouse cluster (an `staging` one) to handle the testing phase. That way you can avoid impacting the performance of your production environment and the data there. You can keep a small subset of your production data there so you can run your models against it. There are different ways of handling this: -- If your data doesn't need to be really recent, you can load backups of your production data into the staging cluster. -- If you need more recent data, you can also find different strategies to load your data into the staging cluster. For example, you could use a refreshable materialized view and `remoteSecure()` and insert the data daily. If the insert fails or if there is data loss, you should be able to quickly re-trigger it. -- Another way could be to use a cron or refreshable materialized view to write the data to object storage and then set up a clickpipe on staging to pull any new files when they drop. +We recommend using a dedicated ClickHouse cluster or service for the testing environment (i.e., a staging environment) to avoid impacting the operation of your production environment. To ensure the testing environment is representative, it's important that you use a subset of your production data, as well as run dbt in a way that prevents schema drift between environments. -Doing your CI testing in an accessible cluster can let you also do some manual testing of your results. For example, you may want to access to this environment using one of your BI tools. +- If you don't need fresh data to test against, you can restore a backup of your production data into the staging environment. +- If you need fresh data to test against, you can use a combination of the [`remoteSecure()` table function](/sql-reference/table-functions/remote) and refreshable materialized views to insert at the desired frequency. Another option is to use object storage as an intermediate and periodically write data from your production service, then import it into the staging environment using the object storage table functions or ClickPipes (for continuous ingestion). -Your CD step can reuse the artifacts from your last production deployment to only update the models that have changed with something like `dbt build --select state:modified+ --state path/to/last/deploy/state.json` +Using a dedicated environment for CI testing also allows you to perform manual testing without impacting your production environment. For example, you may want to point a BI tool to this environment for testing. + +For deployment (i.e., the CD step), we recommend using the artifacts from your production deployments to only update the models that have changed. This requires setting up object storage (e.g., S3) as intermediate storage for your dbt artifacts. Once that is set up, you can run a command like `dbt build --select state:modified+ --state path/to/last/deploy/state.json` to selectively rebuild the minimum amount of models needed based on what changed since the last run in production. ## Troubleshooting common issues {#troubleshooting-common-issues} From 3c227b563b6508926fc2717850508108cdeedd6f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Mon, 13 Oct 2025 18:54:51 +0200 Subject: [PATCH 12/13] Update docs/integrations/data-ingestion/etl-tools/dbt/index.md Co-authored-by: Marta Paes --- docs/integrations/data-ingestion/etl-tools/dbt/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/index.md b/docs/integrations/data-ingestion/etl-tools/dbt/index.md index cc5432402e2..6108f1dee8a 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/index.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/index.md @@ -167,7 +167,7 @@ If you encounter issues connecting to ClickHouse from dbt, make sure the followi ### Understanding long-running operations {#understanding-long-running-operations} -Some operations may take longer than expected due to specific ClickHouse queries. To gain more insight into which queries are taking longer, you can increase the log level to `debug` as it will print the time used by each one. For example, this can be achieved by appending `---log-level debug` to the command. +Some operations may take longer than expected due to specific ClickHouse queries. To gain more insight into which queries are taking longer, increase the [log level](https://docs.getdbt.com/reference/global-configs/logs#log-level) to `debug` — this will print the time used by each query. For example, this can be achieved by appending `--log-level debug` to dbt commands. ## Limitations {#limitations} From 9eb6a7d3af020df62ea8ac1e69bbf1b4ca681540 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Mu=C3=B1oz?= Date: Mon, 13 Oct 2025 19:06:03 +0200 Subject: [PATCH 13/13] Update docs/integrations/data-ingestion/etl-tools/dbt/index.md Co-authored-by: Bentsi Leviav --- docs/integrations/data-ingestion/etl-tools/dbt/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/index.md b/docs/integrations/data-ingestion/etl-tools/dbt/index.md index 6108f1dee8a..945698631af 100644 --- a/docs/integrations/data-ingestion/etl-tools/dbt/index.md +++ b/docs/integrations/data-ingestion/etl-tools/dbt/index.md @@ -129,7 +129,7 @@ Go to the [guides page](/integrations/dbt/guides) to learn more about how to use ### Testing and Deploying your models (CI/CD) {#testing-and-deploying-your-models-ci-cd} -There are many ways to test and deploy your dbt project. dbt has some suggestions for [best practice workflows](https://docs.getdbt.com/best-practices/best-practice-workflows#pro-tips-for-workflows) and [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs). We are going to discuss several strategies, but keep into account that these strategies may need to be deeply adjusted to fit your specific use case. +There are many ways to test and deploy your dbt project. dbt has some suggestions for [best practice workflows](https://docs.getdbt.com/best-practices/best-practice-workflows#pro-tips-for-workflows) and [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs). We are going to discuss several strategies, but keep in mind that these strategies may need to be deeply adjusted to fit your specific use case. #### CI/CD with simple data tests and unit tests {#ci-with-simple-data-tests-and-unit-tests}