Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions docs/_snippets/_gather_your_details_http.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,18 @@ import Image from '@theme/IdealImage';

To connect to ClickHouse with HTTP(S) you need this information:

- The HOST and PORT: typically, the port is 8443 when using TLS or 8123 when not using TLS.
| Parameter(s) | Description |
|-------------------------|---------------------------------------------------------------------------------------------------------------|
|`HOST` and `PORT` | Typically, the port is 8443 when using TLS or 8123 when not using TLS. |
|`DATABASE NAME` | Out of the box, there is a database named `default`, use the name of the database that you want to connect to.|
|`USERNAME` and `PASSWORD`| Out of the box, the username is `default`. Use the username appropriate for your use case. |

- The DATABASE NAME: out of the box, there is a database named `default`, use the name of the database that you want to connect to.

- The USERNAME and PASSWORD: out of the box, the username is `default`. Use the username appropriate for your use case.

The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console. Select the service that you will connect to and click **Connect**:
The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console.
Select a service and click **Connect**:

<Image img={cloud_connect_button} size="md" alt="ClickHouse Cloud service connect button" border />

Choose **HTTPS**, and the details are available in an example `curl` command.
Choose **HTTPS**. Connection details are displayed in an example `curl` command.

<Image img={connection_details_https} size="md" alt="ClickHouse Cloud HTTPS connection details" border/>

Expand Down
15 changes: 8 additions & 7 deletions docs/_snippets/_gather_your_details_native.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@

To connect to ClickHouse with native TCP you need this information:

- The HOST and PORT: typically, the port is 9440 when using TLS, or 9000 when not using TLS.

- The DATABASE NAME: out of the box there is a database named `default`, use the name of the database that you want to connect to.

- The USERNAME and PASSWORD: out of the box the username is `default`. Use the username appropriate for your use case.

The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console. Select the service that you will connect to and click **Connect**:
| Parameter(s) | Description |

Check warning on line 7 in docs/_snippets/_gather_your_details_native.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.BadPlurals

Rewrite '(s)' to be plural without parentheses.
|---------------------------|---------------------------------------------------------------------------------------------------------------|
| `HOST` and `PORT` | Typically, the port is 9440 when using TLS, or 9000 when not using TLS. |
| `DATABASE NAME` | Out of the box there is a database named `default`, use the name of the database that you want to connect to. |
| `USERNAME` and `PASSWORD` | Out of the box the username is `default`. Use the username appropriate for your use case. |

The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console.
Select the service that you will connect to and click **Connect**:

<Image img={cloud_connect_button} size="md" alt="ClickHouse Cloud service connect button" border/>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ Please note that the Airbyte source and destination for ClickHouse are currently

<a href="https://www.airbyte.com/" target="_blank">Airbyte</a> is an open-source data integration platform. It allows the creation of <a href="https://airbyte.com/blog/why-the-future-of-etl-is-not-elt-but-el" target="_blank">ELT</a> data pipelines and is shipped with more than 140 out-of-the-box connectors. This step-by-step tutorial shows how to connect Airbyte to ClickHouse as a destination and load a sample dataset.

## 1. Download and run Airbyte {#1-download-and-run-airbyte}
<VerticalStepper headerLevel="h2">

## Download and run Airbyte {#1-download-and-run-airbyte}

1. Airbyte runs on Docker and uses `docker-compose`. Make sure to download and install the latest versions of Docker.

Expand All @@ -50,7 +52,7 @@ Please note that the Airbyte source and destination for ClickHouse are currently
Alternatively, you can signup and use <a href="https://docs.airbyte.com/deploying-airbyte/on-cloud" target="_blank">Airbyte Cloud</a>
:::

## 2. Add ClickHouse as a destination {#2-add-clickhouse-as-a-destination}
## Add ClickHouse as a destination {#2-add-clickhouse-as-a-destination}

In this section, we will display how to add a ClickHouse instance as a destination.

Expand Down Expand Up @@ -80,7 +82,7 @@ GRANT CREATE ON * TO my_airbyte_user;
```
:::

## 3. Add a dataset as a source {#3-add-a-dataset-as-a-source}
## Add a dataset as a source {#3-add-a-dataset-as-a-source}

The example dataset we will use is the <a href="https://clickhouse.com/docs/getting-started/example-datasets/nyc-taxi/" target="_blank">New York City Taxi Data</a> (on <a href="https://github.com/toddwschneider/nyc-taxi-data" target="_blank">Github</a>). For this tutorial, we will use a subset of this dataset which corresponds to the month of Jan 2022.

Expand All @@ -98,7 +100,7 @@ The example dataset we will use is the <a href="https://clickhouse.com/docs/gett

3. Congratulations! You have now added a source file in Airbyte.

## 4. Create a connection and load the dataset into ClickHouse {#4-create-a-connection-and-load-the-dataset-into-clickhouse}
## Create a connection and load the dataset into ClickHouse {#4-create-a-connection-and-load-the-dataset-into-clickhouse}

1. Within Airbyte, select the "Connections" page and add a new connection

Expand Down Expand Up @@ -170,3 +172,5 @@ The example dataset we will use is the <a href="https://clickhouse.com/docs/gett
Now that the dataset is loaded on your ClickHouse instance, you can create an new table and use more suitable ClickHouse data types (<a href="https://clickhouse.com/docs/getting-started/example-datasets/nyc-taxi/" target="_blank">more details</a>).

8. Congratulations - you have successfully loaded the NYC taxi data into ClickHouse using Airbyte!

</VerticalStepper>
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ pip install "dlt[clickhouse]"

## Setup guide {#setup-guide}

### 1. Initialize the dlt Project {#1-initialize-the-dlt-project}
<VerticalStepper headerLevel="h3">

### Initialize the dlt Project {#1-initialize-the-dlt-project}

Start by initializing a new `dlt` project as follows:
```bash
Expand All @@ -42,7 +44,7 @@ pip install -r requirements.txt

or with `pip install dlt[clickhouse]`, which installs the `dlt` library and the necessary dependencies for working with ClickHouse as a destination.

### 2. Setup ClickHouse Database {#2-setup-clickhouse-database}
### Setup ClickHouse Database {#2-setup-clickhouse-database}

To load data into ClickHouse, you need to create a ClickHouse database. Here's a rough outline of what should you do:

Expand All @@ -60,7 +62,7 @@ GRANT SELECT ON INFORMATION_SCHEMA.COLUMNS TO dlt;
GRANT CREATE TEMPORARY TABLE, S3 ON *.* TO dlt;
```

### 3. Add credentials {#3-add-credentials}
### Add credentials {#3-add-credentials}

Next, set up the ClickHouse credentials in the `.dlt/secrets.toml` file as shown below:

Expand All @@ -78,8 +80,7 @@ secure = 1 # Set to 1 if using HTTPS, else 0.
dataset_table_separator = "___" # Separator for dataset table names from dataset.
```

:::note
HTTP_PORT
:::note HTTP_PORT
The `http_port` parameter specifies the port number to use when connecting to the ClickHouse server's HTTP interface. This is different from default port 9000, which is used for the native TCP protocol.

You must set `http_port` if you are not using external staging (i.e. you don't set the staging parameter in your pipeline). This is because the built-in ClickHouse local storage staging uses the <a href="https://github.com/ClickHouse/clickhouse-connect">clickhouse content</a> library, which communicates with ClickHouse over HTTP.
Expand All @@ -94,6 +95,8 @@ You can pass a database connection string similar to the one used by the `clickh
destination.clickhouse.credentials="clickhouse://dlt:Dlt*12345789234567@localhost:9000/dlt?secure=1"
```

</VerticalStepper>

## Write disposition {#write-disposition}

All [write dispositions](https://dlthub.com/docs/general-usage/incremental-loading#choosing-a-write-disposition)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,20 +33,23 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';

<a href="https://nifi.apache.org/" target="_blank">Apache NiFi</a> is an open-source workflow management software designed to automate data flow between software systems. It allows the creation of ETL data pipelines and is shipped with more than 300 data processors. This step-by-step tutorial shows how to connect Apache NiFi to ClickHouse as both a source and destination, and to load a sample dataset.

## 1. Gather your connection details {#1-gather-your-connection-details}
<VerticalStepper headerLevel="h2">

## Gather your connection details {#1-gather-your-connection-details}

<ConnectionDetails />

## 2. Download and run Apache NiFi {#2-download-and-run-apache-nifi}
## Download and run Apache NiFi {#2-download-and-run-apache-nifi}

1. For a new setup, download the binary from https://nifi.apache.org/download.html and start by running `./bin/nifi.sh start`
For a new setup, download the binary from https://nifi.apache.org/download.html and start by running `./bin/nifi.sh start`

## 3. Download the ClickHouse JDBC driver {#3-download-the-clickhouse-jdbc-driver}
## Download the ClickHouse JDBC driver {#3-download-the-clickhouse-jdbc-driver}

1. Visit the <a href="https://github.com/ClickHouse/clickhouse-java/releases" target="_blank">ClickHouse JDBC driver release page</a> on GitHub and look for the latest JDBC release version
2. In the release version, click on "Show all xx assets" and look for the JAR file containing the keyword "shaded" or "all", for example, `clickhouse-jdbc-0.5.0-all.jar`
3. Place the JAR file in a folder accessible by Apache NiFi and take note of the absolute path

## 4. Add `DBCPConnectionPool` Controller Service and configure its properties {#4-add-dbcpconnectionpool-controller-service-and-configure-its-properties}
## Add `DBCPConnectionPool` Controller Service and configure its properties {#4-add-dbcpconnectionpool-controller-service-and-configure-its-properties}

1. To configure a Controller Service in Apache NiFi, visit the NiFi Flow Configuration page by clicking on the "gear" button

Expand Down Expand Up @@ -90,7 +93,7 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';

<Image img={nifi08} size="lg" border alt="Controller Services list showing enabled ClickHouse JDBC service" />

## 5. Read from a table using the `ExecuteSQL` processor {#5-read-from-a-table-using-the-executesql-processor}
## Read from a table using the `ExecuteSQL` processor {#5-read-from-a-table-using-the-executesql-processor}

1. Add an ​`​ExecuteSQL` processor, along with the appropriate upstream and downstream processors

Expand All @@ -115,7 +118,7 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';

<Image img={nifi12} size="lg" border alt="FlowFile content viewer showing query results in formatted view" />

## 6. Write to a table using `MergeRecord` and `PutDatabaseRecord` processor {#6-write-to-a-table-using-mergerecord-and-putdatabaserecord-processor}
## Write to a table using `MergeRecord` and `PutDatabaseRecord` processor {#6-write-to-a-table-using-mergerecord-and-putdatabaserecord-processor}

1. To write multiple rows in a single insert, we first need to merge multiple records into a single record. This can be done using the `MergeRecord` processor

Expand Down Expand Up @@ -153,3 +156,5 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
<Image img={nifi15} size="sm" border alt="Query results showing row count in the destination table" />

5. Congratulations - you have successfully loaded your data into ClickHouse using Apache NiFi !

</VerticalStepper>
Loading
Loading