This lab details usage of self-managed Azure Data Explorer KafkaConnect sink connector with Confluent Cloud on Azure.
KafkaConnect is an open source Apache Kafka framework for reliably and scalably integration Kafka with other systems with zero code and just configuration. Azure Data Explorer has an open source KafkaConnect sink connector and is the focal point of this lab.
The lab showcases very basic Kafka ingestion into ADX. It does not feature a real time usecase, and does not include some of the streaming capabilities of Confluent Cloud, and Kafka in general, to keep the lab simple and Azure Data Explorer integration focused.
Essentially the following are the aspects covered in the lab; Each aspect covered includes provisioning (screenshots included), code, step-by-step instructions, commands and the outcome-
We will use the Chicago crimes public dataset. It is about 7 million records.
We will leverage Confluent Cloud on Azure
We will use Spark on Databricks to publish to Kafka, as its a PaaS, easy to provision and use, for the simplicity of use of notebooks, and the distributed nature and the robust integration of Spark with Kafka.
We will use Azure Kubernetes Service (AKS), collectively AKS and Kubernetes in general make a great platform for distributed KafkaConnect.
For the purpose of simplicity, we will use a cluster that is not in a virtual network.
Any data practitioner - architect or developer.
Depends on your knowledge of Azure, and technologies included. It should take about 8-12 hours if you are entirely unfamiliar.
Approximately $300-$600 - depends on familiarity worth services and Azure, and whether you work contiunously.
Follow sequentially, every module through completion.
1. Provision foundational resources
2. Provision Confluent Cloud and configure Kafka
3. Provision Azure Data Explorer, and associated database objects and permissions
4. Import the Spark Kafka producer code, and configure Spark to produce to your Confluent Cloud Kafka topic
5. Configure the KafkaConnect cluster, launch connector tasks
6. Run the end to end pipeline
About the KafkaConnect framework
Confluent cloud on Azure
Azure Data Explorer docs
Azure Data Explorer Kafka ingestion docs
Git repo for the KafkaConnect Kusto sink connector
Confluent Connector Hub
From Zero to Hero with KafkaConnect - webinar by Robin Moffat