sidebar_label | sidebar_position | keywords | slug | description | ||||||
---|---|---|---|---|---|---|---|---|---|---|
NiFi |
12 |
|
/en/integrations/nifi |
Stream data into ClickHouse using NiFi data pipelines |
import ConnectionDetails from '@site/docs/en/_snippets/_gather_your_details_http.mdx';
Apache NiFi is an open-source workflow management software designed to automate data flow between software systems. It allows the creation of ETL data pipelines and is shipped with more than 300 data processors. This step-by-step tutorial shows how to connect Apache NiFi to ClickHouse as both a source and destination, and to load a sample dataset.
- For a new setup, download the binary from https://nifi.apache.org/download.html and start by running
./bin/nifi.sh start
- Visit the ClickHouse JDBC driver release page on GitHub and look for the latest JDBC release version
- In the release version, click on "Show all xx assets" and look for the JAR file containing the keyword "shaded" or "all", for example,
clickhouse-jdbc-0.5.0-all.jar
- Place the JAR file in a folder accessible by Apache NiFi and take note of the absolute path
-
To configure a Controller Service in Apache NiFi, visit the NiFi Flow Configuration page by clicking on the "gear" button
<img src={require('./images/nifi_01.png').default} class="image" alt="Nifi Flow Configuration" style={{width: '50%'}}/>
-
Select the Controller Services tab and add a new Controller Service by clicking on the
+
button at the top right<img src={require('./images/nifi_02.png').default} class="image" alt="Add Controller Service" style={{width: '80%'}}/>
-
Search for
DBCPConnectionPool
and click on the "Add" button<img src={require('./images/nifi_03.png').default} class="image" alt="Search for DBCPConnectionPool" style={{width: '80%'}}/>
-
The newly added DBCPConnectionPool will be in an Invalid state by default. Click on the "gear" button to start configuring
<img src={require('./images/nifi_04.png').default} class="image" alt="Nifi Flow Configuration" style={{width: '80%'}}/>
-
Under the "Properties" section, input the following values
Property | Value | Remark |
---|---|---|
Database Connection URL | jdbc:ch:https://HOSTNAME:8443/default?ssl=true | Replace HOSTNAME in the connection URL accordingly |
Database Driver Class Name | com.clickhouse.jdbc.ClickHouseDriver | |
Database Driver Location(s) | /etc/nifi/nifi-X.XX.X/lib/clickhouse-jdbc-0.X.X-patchXX-shaded.jar | Absolute path to the ClickHouse JDBC driver JAR file |
Database User | default | ClickHouse username |
Password | password | ClickHouse password |
-
In the Settings section, change the name of the Controller Service to "ClickHouse JDBC" for easy reference
<img src={require('./images/nifi_05.png').default} class="image" alt="Nifi Flow Configuration" style={{width: '80%'}}/>
-
Activate the DBCPConnectionPool Controller Service by clicking on the "lightning” button and then the "Enable" button
<img src={require('./images/nifi_06.png').default} class="image" alt="Nifi Flow Configuration" style={{width: '80%'}}/>
<img src={require('./images/nifi_07.png').default} class="image" alt="Nifi Flow Configuration" style={{width: '80%'}}/>
-
Check the Controller Services tab and ensure that the Controller Service is enabled
<img src={require('./images/nifi_08.png').default} class="image" alt="Nifi Flow Configuration" style={{width: '80%'}}/>
-
Add an ExecuteSQL processor, along with the appropriate upstream and downstream processors
<img src={require('./images/nifi_09.png').default} class="image" alt="ExecuteSQL processor" style={{width: '50%'}}/>
-
Under the "Properties” section of the ExecuteSQL processor, input the following values
Property Value Remark Database Connection Pooling Service ClickHouse JDBC Select the Controller Service configured for ClickHouse SQL select query SELECT * FROM system.metrics Input your query here -
Start the ExecuteSQL processor
<img src={require('./images/nifi_10.png').default} class="image" alt="ExecuteSQL processor" style={{width: '80%'}}/>
-
To confirm that the query has been processed successfully, inspect one of the FlowFile in the output queue
<img src={require('./images/nifi_11.png').default} class="image" alt="ExecuteSQL processor" style={{width: '80%'}}/>
-
Switch view to "formatted” to view the result of the output FlowFile
<img src={require('./images/nifi_12.png').default} class="image" alt="ExecuteSQL processor" style={{width: '80%'}}/>
-
To write multiple rows in a single insert, we first need to merge multiple records into a single record. This can be done using the MergeRecord processor
-
Under the "Properties” section of the MergeRecord processor, input the following values
Property Value Remark Record Reader JSONTreeReader Select the appropriate record reader Record Writer JSONReadSetWriter Select the appropriate record writer Minimum Number of Records 1000 Change this to a higher number so that the minimum number of rows are merged to form a single record. Default to 1 row Maximum Number of Records 10000 Change this to a higher number than “Minimum Number of Records”. Default to 1,000 rows -
To confirm that multiple records are merged into one, examine the input and output of the MergeRecord processor. Note that the output is an array of multiple input records
Input <img src={require('./images/nifi_13.png').default} class="image" alt="ExecuteSQL processor" style={{width: '50%'}}/>
Ouput <img src={require('./images/nifi_14.png').default} class="image" alt="ExecuteSQL processor" style={{width: '50%'}}/>
-
Under the "Properties" section of the PutDatabaseRecord processor, input the following values
Property Value Remark Record Reader JSONTreeReader Select the appropriate record reader Database Type Generic Leave as default Statement Type INSERT Database Connection Pooling Service ClickHouse JDBC Select the ClickHouse controller service Table Name tbl Input your table name here Translate Field Names false Set to "false" so that field names inserted must match the column name Maximum Batch Size 1000 Maximum number of rows per insert. This value should not be lower than the value of “Minimum Number of Records” in MergeRecord processor -
To confirm that each insert contains multiple rows, check that the row count in the table is incrementing by at least the value of "Minimum Number of Records” defined in MergeRecord.
<img src={require('./images/nifi_15.png').default} class="image" alt="ExecuteSQL processor" style={{width: '50%'}}/>
-
Congratulations - you have successfully loaded your data into ClickHouse using Apache Nifi!