Skip to content

Commit

Permalink
Kafka review - Course openshift-labs#4 edits
Browse files Browse the repository at this point in the history
Signed-off-by: prmellor <pmellor@redhat.com>
  • Loading branch information
PaulRMellor committed May 5, 2021
1 parent de51272 commit 2d7b608
Show file tree
Hide file tree
Showing 8 changed files with 88 additions and 93 deletions.
8 changes: 4 additions & 4 deletions middleware/middleware-kafka/kafka-basic/index.json
Expand Up @@ -22,19 +22,19 @@
"details": {
"steps": [
{
"title": "Install Red Hat AMQ Streams Operator",
"title": "Installing Red Hat AMQ Streams Operators",
"text": "step1.md"
},
{
"title": "Deploy your Kafka cluster",
"title": "Deploying a Kafka cluster",
"text": "step2.md"
},
{
"title": "Access the cluster from console",
"title": "Accessing the Kafka cluster from a console",
"text": "step3.md"
},
{
"title": "Produce and consume records",
"title": "Producing and consuming records",
"text": "step4.md"
}
],
Expand Down
18 changes: 9 additions & 9 deletions middleware/middleware-kafka/kafka-debezium/00-intro.md
@@ -1,13 +1,13 @@
Change data capture, or CDC, is a well-established software design pattern for capturing changes to tables in a database.
CDC captures row-level changes that occur in database tables and emits event records for those changes to a Kafka data streaming bus.
Change data capture, or CDC, is a well-established software design pattern for capturing changes to tables in a database.
CDC captures row-level changes that occur in database tables and emits event records for those changes to a Kafka data streaming bus.
You can configure applications that rely on the data in particular tables to consume the change event streams for those tables.
Consuming applications read the streams of event records in the order in which the events occurred.

### What you will learn

In this scenario you will learn about [Debezium](https://debezium.io/), a component of [Red Hat Integration](https://www.redhat.com/en/products/integration) that provides change data capture for the following supported databases:

* Db2 (Technology Preview)
* Db2
* Microsoft SQL Server
* MongoDB
* MySQL
Expand All @@ -19,13 +19,13 @@ You will deploy a complete end-to-end solution that captures events from databas

![Logo](../../../assets/middleware/debezium-getting-started/debezium-logo.png)

[Debezium](https://debezium.io/) is a set of distributed services that capture row-level changes in a database.
Debezium records the change events for each table in a database to a dedicated Kafka topic.
You can configure applications to read from the topics that contain data change event records for specific tables.
The consuming applications can then respond to change events with minimal latency.
[Debezium](https://debezium.io/) is a set of distributed services that capture row-level changes in a database.
Debezium records the change events for each table in a database to a dedicated Kafka topic.
You can configure applications to read from the topics that contain change event records for data in specific tables.
The consuming applications can then respond to the change events with minimal latency.
Applications read event records from a topic in the same order in which the events occurred.

A Debezium source connector captures change events from a database and uses the [Apache Kafka](https://kafka.apache.org/) streaming platform to distribute and publish the captured event records to a [Kafka broker](https://kafka.apache.org/documentation/#uses_messaging).
Each Debezium source connector is built as a plugin for [Kafka Connect](https://kafka.apache.org/documentation/#connect).
A Debezium source connector captures change events from a database and uses the [Apache Kafka](https://kafka.apache.org/) streaming platform to distribute and publish the captured event records to a [Kafka broker](https://kafka.apache.org/documentation/#uses_messaging).

In the steps that follow we will deploy a Debezium MySQL connector and use it to set up a data flow between a MySQL database and a Kafka broker.
In this scenario we will deploy a Debezium MySQL connector and use it to set up a data flow between a MySQL database and a Kafka broker.
60 changes: 26 additions & 34 deletions middleware/middleware-kafka/kafka-debezium/01-deploying-a-broker.md
@@ -1,17 +1,16 @@
Debezium uses the Apache Kafka Connect framework, and Debezium connectors are implemented as Kafka Connector source connectors.
Debezium uses the Apache Kafka Connect framework. Debezium connectors are implemented as Kafka Connector source connectors.

Debezium connectors capture change events from database tables and emit records of those changes to a [Red Hat AMQ Streams](https://developers.redhat.com/blog/2018/10/29/how-to-run-kafka-on-openshift-the-enterprise-kubernetes-with-amq-streams/) Kafka cluster.
Applications can consume event records through AMQ Streams.
Debezium connectors capture change events from database tables and emit records of those changes to a [Red Hat AMQ Streams](https://developers.redhat.com/blog/2018/10/29/how-to-run-kafka-on-openshift-the-enterprise-kubernetes-with-amq-streams/) Kafka cluster. Applications can consume event records through AMQ Streams.

In AMQ Streams, you use Kafka Connect custom Kubernetes resources to deploy and manage the Debezium connectors.

### Logging in to the cluster from the OpenShift CLI

To log in to the OpenShift cluster from a _terminal_, enter the following command:
Log in to the OpenShift cluster from a _terminal_ with the following command:

``oc login -u developer -p developer``{{execute}}

The preceding command logs you in with the following credentials:
The command logs you in with the following credentials:

* **Username:** ``developer``
* **Password:** ``developer``
Expand All @@ -20,20 +19,19 @@ You can use the same credentials to log into the web console.

### Creating a namespace

Let's create a namespace (project) with the name ``debezium`` for the AMQ Streams Kafka Cluster Operator.
Enter the following command:
Create a namespace (project) with the name ``debezium`` for the AMQ Streams Kafka Cluster Operator:

``oc new-project debezium``{{execute}}

### Creating a Kafka cluster

Now we'll create a Kafka cluster named `my-cluster` that has one ZooKeeper node and one broker node.
Now we'll create a Kafka cluster named `my-cluster` that has one ZooKeeper node and one Kafka broker node.
To simplify the deployment, the YAML file that we'll use to create the cluster specifies the use of `ephemeral` storage.

> **Note:**
The Red Hat AMQ Streams Operator is pre-installed in the cluster. Because we don't have to install the Operators in this scenario,`admin` permissions are not required to complete the steps that follow. In an actual deployment, to make an Operator available from all projects in a cluster, you must be logged in with `admin` permission before you install the Operator.
Red Hat AMQ Streams Operators are pre-installed in the cluster. Because we don't have to install the Operators in this scenario,`admin` permissions are not required to complete the steps that follow. In an actual deployment, to make an Operator available from all projects in a cluster, you must be logged in with `admin` permission before you install the Operator.

Enter the following command to create the Kafka cluster:
Create the Kafka cluster by applying the following command:

`oc -n debezium apply -f /root/projects/debezium/kafka-cluster.yaml`{{execute}}

Expand All @@ -45,29 +43,23 @@ Enter the following command to check the status of the pods:

``oc -n debezium get pods -w``{{execute}}

After a few minutes, the status of the pods for ZooKeeper, Kafka, and the Entity Operator change to `running`.
After a few minutes, the status of the pods for ZooKeeper, Kafka, and the AMQ Streams Entity Operator change to `running`.
The output of the `get pods` command should look similar to the following example:

```bash
NAME READY STATUS RESTARTS AGE
my-cluster-zookeeper-0 0/1 ContainerCreating 0 3s
my-cluster-zookeeper-0 0/1 ContainerCreating 0 5s
my-cluster-zookeeper-0 0/1 Running 0 23s
my-cluster-zookeeper-0 1/1 Running 0 38s
my-cluster-kafka-0 0/2 Pending 0 0s
my-cluster-kafka-0 0/2 Pending 0 0s
my-cluster-kafka-0 0/2 ContainerCreating 0 0s
my-cluster-kafka-0 0/2 ContainerCreating 0 2s
my-cluster-kafka-0 0/2 Running 0 4s
my-cluster-kafka-0 1/2 Running 0 20s
my-cluster-kafka-0 2/2 Running 0 27s
my-cluster-entity-operator-57bb594d9d-z4gs6 0/2 Pending 0 0s
my-cluster-entity-operator-57bb594d9d-z4gs6 0/2 Pending 0 0s
my-cluster-entity-operator-57bb594d9d-z4gs6 0/2 ContainerCreating 0 1s
my-cluster-entity-operator-57bb594d9d-z4gs6 0/2 ContainerCreating 0 3s
my-cluster-entity-operator-57bb594d9d-z4gs6 0/2 Running 0 4s
my-cluster-entity-operator-57bb594d9d-z4gs6 1/2 Running 0 18s
my-cluster-entity-operator-57bb594d9d-z4gs6 2/2 Running 0 21s
NAME READY STATUS
my-cluster-zookeeper-0 0/1 ContainerCreating
my-cluster-zookeeper-0 1/1 Running
my-cluster-kafka-0 0/2 Pending
my-cluster-kafka-0 0/2 ContainerCreating
my-cluster-kafka-0 0/2 Running
my-cluster-kafka-0 1/2 Running
my-cluster-kafka-0 2/2 Running
my-cluster-entity-operator-57bb594d9d-z4gs6 0/2 Pending
my-cluster-entity-operator-57bb594d9d-z4gs6 0/2 ContainerCreating
my-cluster-entity-operator-57bb594d9d-z4gs6 0/2 Running
my-cluster-entity-operator-57bb594d9d-z4gs6 1/2 Running
my-cluster-entity-operator-57bb594d9d-z4gs6 2/2 Running
```

> Notice that the Cluster Operator starts the Apache ZooKeeper clusters, as well as the broker nodes and the Entity Operator.
Expand All @@ -83,14 +75,14 @@ Enter the following command to send a message to the broker that you just deploy

``echo "Hello world" | oc exec -i -c kafka my-cluster-kafka-0 -- /opt/kafka/bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test``{{execute interrupt}}

>The command does not return any output unless it fails.
If you see warning messages in the the following format, you can ignore them:
The command does not return any output unless it fails.
If you see warning messages in the following format, you can ignore them:

```
>[2021-01-11 20:37:29,491] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 1 : {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
>[DATE] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 1 : {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
```

These warnings result because the producer requests metadata from the topic that it wants to write to, but that topic and the broker partition leader don't exist yet in the cluster.
The error is generated when the producer requests metadata for the topic, because the producer wants to write to a topic and broker partition leader that does not exit yet.

To verify that the broker is available, enter the following command to retrieve a message from the broker:

Expand Down
@@ -1,29 +1,30 @@
After you set up a Kafka cluster, deploy Kafka Connect in a custom container image for Debezium.
After you set up a Kafka cluster, deploy Kafka Connect in a custom container image for Debezium.
The Kafka Connect service provides a framework for managing Debezium connectors.

You can create a custom container image by downloading the Debezium MySQL connector archive from the [Red Hat Integration](https://access.redhat.com/jbossnetwork/restricted/listSoftware.html?product=red.hat.integration&downloadType=distributions) download site and extracting it to create the directory structure for the connector plugin.

After you obtain the connector plugin, you can create and publish a custom Linux container image by running the `docker build` or `podman build` commands with a custom Dockerfile.

> To save some time, we have already created an image for you.
For detailed information about deploying Kafka Connect with Debezium, see the [Debezium documentation](https://access.redhat.com/documentation/en-us/red_hat_integration/2020-q3/html-single/getting_started_with_debezium/index#deploying-kafka-connect).
>For detailed information about deploying Kafka Connect with Debezium, see the [Debezium documentation](https://access.redhat.com/documentation/en-us/red_hat_integration/2021.q1/html-single/getting_started_with_debezium/index#deploying-kafka-connect).
To deploy the Kafka Connect cluster with the custom image that is provided in the scenario, enter the following command:
To save some time, we have already created an image for you.

To deploy the Kafka Connect cluster with the custom image, enter the following command:

``oc -n debezium apply -f /root/projects/debezium/kafka-connect.yaml``{{execute interrupt}}

After a few minutes, the Kafka Connect node is deployed. To check the pod status, enter the following command:
The Kafka Connect node is deployed.

Check the pod status:

``oc get pods -w -l app.kubernetes.io/name=kafka-connect``{{execute}}

The command returns status in the following format:

```bash
NAME READY STATUS RESTARTS AGE
debezium-connect-6fc5b7f97d-g4h2l 0/1 ContainerCreating 0 3s
debezium-connect-6fc5b7f97d-g4h2l 0/1 ContainerCreating 0 9s
debezium-connect-6fc5b7f97d-g4h2l 0/1 Running 0 25s
debezium-connect-6fc5b7f97d-g4h2l 1/1 Running 0 90s
NAME READY STATUS
debezium-connect-6fc5b7f97d-g4h2l 0/1 ContainerCreating
debezium-connect-6fc5b7f97d-g4h2l 1/1 Running
```
After a couple of minutes, the pod status changes to `Running`.
When the **READY** column shows **1/1**, you are ready to proceed.
Expand All @@ -34,14 +35,14 @@ Enter <kbd>Ctrl</kbd>+<kbd>C</kbd> to stop the process.

## Verify that Kafka Connect is running with Debezium

After the Connect node is running, you can verify that the Debezium connectors are available.
Because AMQ Streams lets you manage most components of the Kafka ecosystem as Kubernetes custom resources, you can obtain information about Kafka Connect from the `status` section of the `KafkaConnect` resource.
After the Connect node is running, you can verify that the Debezium connectors are available.
Because AMQ Streams lets you manage most components of the Kafka ecosystem as Kubernetes custom resources, you can obtain information about Kafka Connect from the `status` of the `KafkaConnect` resource.

Enter the following command to list the connector plugins that are available on the Kafka Connect node.
List the connector plugins that are available on the Kafka Connect node:

``oc get kafkaconnect/debezium -o json | jq .status.connectorPlugins``{{execute interrupt}}

The command returns output that is similar to the following example:
The output shows the type and version for the connector plugins:

```json
[
Expand Down Expand Up @@ -74,10 +75,10 @@ The command returns output that is similar to the following example:
]
```

> Note: The preceding output is formatted to improve readability
> Note: The output shown is formatted to improve readability
Our Debezium `MySqlConnector` connector is now available for use on the Connect node.
The Debezium `MySqlConnector` connector is now available for use on the Connect node.

You have successfully deployed a Kafka Connect node and configured it to contain Debezium.

In the next step of this scenario we will finish the deployment by creating a connection between the MySQL database source and Kafka Connect.
In the next step of this scenario, we will finish the deployment by creating a connection between the MySQL database source and Kafka Connect.

0 comments on commit 2d7b608

Please sign in to comment.