Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -131,23 +131,23 @@

## General information about features {#general-information-about-features}

### General table configurations {#general-table-configurations}
### General model configurations {#general-model-configurations}

The following table shows configurations shared by some of the available materializations. For in-depth information about general dbt model configurations, see the [dbt documentation](https://docs.getdbt.com/category/general-configs):
Comment on lines -134 to +136
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morsapaes I have been doing some changes related to the reorganization of the settings per each materialization:

  • It has not been easy to redistribute the settings for each materialization as most of them are used by several materializations. The only only two that I could move have been unique_key and sharding_key
  • I have changed the title and the explanation to make it clear that these configs are shared by most of the materializations and that this is related to the model, not to a table.
  • Do you have any suggestion about possible structure improvements? I guess that at some point we can just move all configs to each of the materializations and just duplicate the setting explanation when needed. But thinking about it, I feel like it may complicate the reading/understanding of existing configurations.

What do you think?


| Option | Description | Default if any |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- |
| engine | The table engine (type of table) to use when creating tables | `MergeTree()` |
| order_by | A tuple of column names or arbitrary expressions. This allows you to create a small sparse index that helps find data faster. | `tuple()` |
| partition_by | A partition is a logical combination of records in a table by a specified criterion. The partition key can be any expression from the table columns. | |
| sharding_key | Sharding key determines the destination server when inserting into distributed engine table. The sharding key can be random or as an output of a hash function | `rand()`) |
| primary_key | Like order_by, a ClickHouse primary key expression. If not specified, ClickHouse will use the order by expression as the primary key | |

Check warning on line 143 in docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will use', use present tense.
| unique_key | A tuple of column names that uniquely identify rows. Used with incremental models for updates. | |
| settings | A map/dictionary of "TABLE" settings to be used to DDL statements like 'CREATE TABLE' with this model | |
| query_settings | A map/dictionary of ClickHouse user level settings to be used with `INSERT` or `DELETE` statements in conjunction with this model | |
| ttl | A TTL expression to be used with the table. The TTL expression is a string that can be used to specify the TTL for the table. | |
| indexes | A list of [data skipping indexes to create](/optimize/skipping-indexes). Check below for more information. | |
| sql_security | Allow you to specify which ClickHouse user to use when executing the view's underlying query. `SQL SECURITY` [has two legal values](/sql-reference/statements/create/view#sql_security): `definer` `invoker`. | |
| indexes | A list of [data skipping indexes](/optimize/skipping-indexes) to create. See [About data skipping indexes](#data-skipping-indexes) for details. | |
| sql_security | The ClickHouse user to use when executing the view's underlying query. [Accepted values](/sql-reference/statements/create/view#sql_security): `definer`, `invoker`. | |
Comment on lines +147 to +148
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morsapaes sorry, I changed these lines with your suggestions in #4562 (review) and I didn't commit them 🤦

| definer | If `sql_security` was set to `definer`, you have to specify any existing user or `CURRENT_USER` in the `definer` clause. | |
| projections | A list of [projections](/data-modeling/projections) to be created. Check [About projections](#projections) for details. | |
| projections | A list of [projections](/data-modeling/projections) to be created. Check [About projections](#projections) for details. | |

#### About data skipping indexes {#data-skipping-indexes}

Expand Down Expand Up @@ -191,6 +191,8 @@
| EmbeddedRocksDB | https://clickhouse.com/docs/en/engines/table-engines/integrations/embedded-rocksdb |
| Hive | https://clickhouse.com/docs/en/engines/table-engines/integrations/hive |

**Note**: For materialized views, all *MergeTree engines are supported.

Check warning on line 194 in docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Colons

': F' should be in lowercase.

### Experimental supported table engines {#experimental-supported-table-engines}

| Type | Details |
Expand Down Expand Up @@ -341,7 +343,7 @@
) }}
```

#### Configurations {#configurations}
#### Configurations {#incremental-configurations}
Configurations that are specific for this materialization type are listed below:

| Option | Description | Required? |
Expand Down Expand Up @@ -599,6 +601,13 @@
ENGINE = Distributed ('cluster', 'db', 'table_local', cityHash64(id));
```

#### Configurations {#distributed-table-configurations}
Configurations that are specific for this materialization type are listed below:

| Option | Description | Default if any |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- |
| sharding_key | Sharding key determines the destination server when inserting into distributed engine table. The sharding key can be random or as an output of a hash function | `rand()`) |

### materialization: distributed_incremental (experimental) {#materialization-distributed-incremental}

Incremental model based on the same idea as distributed table, the main difficulty is to process all incremental
Expand Down
24 changes: 8 additions & 16 deletions docs/integrations/data-ingestion/etl-tools/dbt/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,31 +48,23 @@ All features up to dbt-core 1.9 are supported. We will soon add the features add

This adapter is still not available for use inside [dbt Cloud](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview), but we expect to make it available soon. Please reach out to support to get more information on this.

## Concepts {#concepts}
## dbt concepts and supported materializations {#concepts-and-supported-materializations}

dbt introduces the concept of a model. This is defined as a SQL statement, potentially joining many tables. A model can be "materialized" in a number of ways. A materialization represents a build strategy for the model's select query. The code behind a materialization is boilerplate SQL that wraps your SELECT query in a statement in order to create a new or update an existing relation.

dbt provides 4 types of materialization:
dbt provides 5 types of materialization. All of them are supported by `dbt-clickhouse`:

* **view** (default): The model is built as a view in the database.
* **table**: The model is built as a table in the database.
* **ephemeral**: The model is not directly built in the database but is instead pulled into dependent models as common table expressions.
* **view** (default): The model is built as a view in the database. At ClickHouse this is built as a [view](/sql-reference/statements/create/view).
* **table**: The model is built as a table in the database. At ClickHouse this is built as a [table](/sql-reference/statements/create/table).
* **ephemeral**: The model is not directly built in the database but is instead pulled into dependent models as CTEs (Common Table Expressions).
* **incremental**: The model is initially materialized as a table, and in subsequent runs, dbt inserts new rows and updates changed rows in the table.
* **materialized view**: The model is built as a materialized view in the database. At ClickHouse this is built as a [materialized view](/sql-reference/statements/create/view#materialized-view).

Additional syntax and clauses define how these models should be updated if their underlying data changes. dbt generally recommends starting with the view materialization until performance becomes a concern. The table materialization provides a query time performance improvement by capturing the results of the model's query as a table at the expense of increased storage. The incremental approach builds on this further to allow subsequent updates to the underlying data to be captured in the target table.

The[ current adapter](https://github.com/silentsokolov/dbt-clickhouse) for ClickHouse supports also support **materialized view**, **dictionary**, **distributed table** and **distributed incremental** materializations. The adapter also supports dbt[ snapshots](https://docs.getdbt.com/docs/building-a-dbt-project/snapshots#check-strategy) and [seeds](https://docs.getdbt.com/docs/building-a-dbt-project/seeds).
The[ current adapter](https://github.com/silentsokolov/dbt-clickhouse) for ClickHouse supports also support **dictionary**, **distributed table** and **distributed incremental** materializations. The adapter also supports dbt [snapshots](https://docs.getdbt.com/docs/building-a-dbt-project/snapshots#check-strategy) and [seeds](https://docs.getdbt.com/docs/building-a-dbt-project/seeds).

### Details about supported materializations {#details-about-supported-materializations}

| Type | Supported? | Details |
|-----------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------|
| view materialization | YES | Creates a [view](https://clickhouse.com/docs/en/sql-reference/table-functions/view/). |
| table materialization | YES | Creates a [table](https://clickhouse.com/docs/en/operations/system-tables/tables/). See below for the list of supported engines. |
| incremental materialization | YES | Creates a table if it doesn't exist, and then writes only updates to it. |
| ephemeral materialized | YES | Creates a ephemeral/CTE materialization. This does model is internal to dbt and does not create any database objects |

The following are [experimental features](https://clickhouse.com/docs/en/beta-and-experimental-features) in ClickHouse:
The following are [experimental features](https://clickhouse.com/docs/en/beta-and-experimental-features) in `dbt-clickhouse`:

| Type | Supported? | Details |
|-----------------------------------------|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Expand Down