# Amazon MSK

## Introduction

*Amazon Managed Streaming for Apache Kafka (Amazon MSK)* is a fully managed service used to build and run applications that use Apache Kafka to process data. Apache Kafka is an open-source technology for distributed data storage, optimized for ingesting and processing streaming data in real-time. 

Apache Kafka clusters are challenging to setup, scale, and manage in production. Amazon MSK makes it easy for you to build and run production applications on Apache Kafka without needing Apache Kafka infrastructure management expertise. That means you spend less time managing infrastructure and more time building applications.

## How does MSK work?

The following diagram demonstrates the interaction between MSK components:

<p align="center">
    <img src="images/MSK workflow.png" height="400" width="400"/>
</p>

- *Broker nodes*: When creating a MSK cluster, you specify how many broker nodes you want to create in each *Availability Zone (AZ)* (an Availability Zone is a distinct location within an AWS Region that is engineered to be isolated from failures in other AZs). In the example provided above there's one broker per Availability Zone. 

- *ZooKeeper nodes*: MSK creates the Apache ZooKeeper nodes for you. 

- *Producers, consumers, and topics*: MSK lets you use Apache Kafka operations to create topics and to produce/consume data.

- *Cluster operations*: you can use the AWS Management Console or the AWS Command Line Interface (AWS CLI) to perform control-plane operations, such as create/delete an Amazon MSK cluster, list all the clusters in an account, etc.

## Benefits of using Amazon MSK

1. **Fully managed**: Amazon MSK lets you focus on creating your streaming applications without having to worry about the operational overhead of managing your Apache Kafka environment. Amazon MSK manages the provisioning, configuration, and maintenance of Apache Kafka clusters and Apache ZooKeeper nodes for you.

2. **Fully compatible**: Amazon MSK runs and manages Apache Kafka for you. This makes it easy for you to migrate and run your existing Apache Kafka applications on Amazon Web Services without changes to the application code.

3. **Highly available**: Amazon MSK creates an Apache Kafka cluster and offers multi-AZ replication within an Amazon Web Services Region. Amazon MSK continuously monitors cluster health. It also detects and automatically recovers from the most common failure scenarios for clusters. When Amazon MSK detects a broker failure, it mitigates the failure or replaces the unhealthy/unreachable broker with a new one.

4. **Highly secure**: Amazon MSK provides multiple levels of security for your Apache Kafka clusters including VPC network isolation, Amazon IAM for control-plane API authorization, etc.

## MSK console: How to create a cluster?

With a few clicks in the Amazon MSK console you can create highly available Apache Kafka clusters. Amazon MSK automatically provisions and runs your Apache Kafka clusters, as well as monitors the cluster's health.

To create an Amazon MSK cluster using the Management Console, you will first need to sign into the Management Console and open the Amazon MSK console, which should look like the picture below:

<p align="center">
    <img src="images/MSK console 1.png" width="650" height="300"/>
</p>

To create a new cluster, choose **Create Cluster**. You will be able to choose between a **Quick create** and a **Custom create** option. Custom create allows you to specify additional settings such as security, availability and custom configuration.

Mainly, there are two elements you need to provision when creating a Amazon MSK cluster: *the broker instances* and *broker storage*.

<p align="center">
    <img src="images/General cluster properties.png" width="600" height="400"/>
</p>

**A Broker instance** is a worker node that helps manage the Kafka cluster. Your cluster can have multiple brokers, but can also operate as a single node. Broker instances can be run within the same availability zone or across many availability zones to create a highly available cluster, something many architectures require.

**Broker storage** is where all the data that comes into Amazon MSK will be stored. Within AWS, this storage is housed within EBS volumes, and gains all the protections that EBS provides, such as durability and fault tolerance. Once you have assigned your Broker storage, you can only increase the amount of storage.

Once you have created the cluster, the status will change from **Creating** to **Active** once AWS provisions the cluster. When the status is **Active** (as seen in the image below), you can connect to the cluster.

<p align="center">
    <img src="images/Active cluster.png" width="600" height="300"/>
</p>

Additionally, clusters can also be created using the AWS command line interface (CLI).


## Using the MSK Management Console to get cluster information

To obtain information about your desired cluster, you will first have to select it from the list of Clusters in the MSK Console.

<p align="center">
    <img src="images/Desired Cluster.png" width="850" height="300"/>
</p>

Once the desired cluster has been select, choose **View client information** to get the cluster information. This should prompt you to a new window, where you can find information about the *Bootstrap servers* under the **Private endpoint** section. The bootstrap brokers string will contain the number of brokers you provisioned when creating the cluster.

On the same page, you will find information about the *Apache Zookeeper connection string*. The Zookeper connection string contains the host:port pairs, each corresponding to a Zookeeper server. Your cluster might have a two sets of connection strings: Plaintext and TLS. By default, Apache Kafka communicates in PLAINTEXT, which means that all data is sent in the clear. To encrypt communication, you can use Transport Layer Security(TLS).

Make a note of both strings: the **Boostrap server string** and the **Plaintext Apache Zookeeper connection string**.

## Using AWS CLI to get cluster information

**Getting the Apache ZooKeeper connection string**

To get the Apache Zookeper connection string we will first need to make a note of the cluster ARN. You can find the cluster ARN as shown below:

<p align="center">
    <img src="images/ClusterARN.png" width="700" height="350"/>
</p>

Once you have correctly configured your AWS CLI, you can run the following command to get the Zookeper connection string. Replace `ClusterARN` with the ARN of your cluster:

`aws kafka describe-cluster --cluster-arn ClusterArn`

The output of this command will look like the following JSON example:

In [None]:
{
    "ClusterInfo": {
        "BrokerNodeGroupInfo": {
            "BrokerAZDistribution": "DEFAULT",
            "ClientSubnets": [
                "subnet-0123456789abcdef0",
                "subnet-2468013579abcdef1",
                "subnet-1357902468abcdef2"
            ],
            "InstanceType": "kafka.m5.large",
            "StorageInfo": {
                "EbsStorageInfo": {
                    "VolumeSize": 1000
                }
            }
        },
        "ClusterArn": "arn:aws:kafka:us-east-1:111122223333:cluster/testcluster/12345678-abcd-4567-2345-abcdef123456-2",
        "ClusterName": "testcluster",
        "CreationTime": "2018-12-02T17:38:36.75Z",
        "CurrentBrokerSoftwareInfo": {
            "KafkaVersion": "2.2.1"
        },
        "CurrentVersion": "K13V1IB3VIYZZH",
        "EncryptionInfo": {
            "EncryptionAtRest": {
                "DataVolumeKMSKeyId": "arn:aws:kms:us-east-1:555555555555:key/12345678-abcd-2345-ef01-abcdef123456"
            }
        },
        "EnhancedMonitoring": "DEFAULT",
        "NumberOfBrokerNodes": 3,
        "State": "ACTIVE",
        "ZookeeperConnectString": "10.0.1.101:2018,10.0.2.101:2018,10.0.3.101:2018"
    }
}

This JSON example shows the **ZookeeperConnectionString** key in the output of the `describe-cluster` command. Copy the value corresponding to this key and save it for later (make a new note on your local machine) for when you need to create a topic on the cluster.

**Getting the bootstrap brokers**

Once you have correctly configured your AWS CLI, you can run the following command to get the boostrap brokers. Replace `ClusterARN` with the ARN of your cluster:

`aws kafka get-bootstrap-brokers --cluster-arn ClusterArn`

The output of this command will look like the following JSON example:

In [None]:
{
    "BootstrapBrokerString": "b-1.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098,b-2.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098"
}

Copy the **BootstrapBrokerString** in the same note you made earlier.

## Create a client machine for the MSK cluster

In this step, you will create an EC2 instance that will act as an Apache Kafka client instance. You will later use this instance to create topics in the cluster.

To create an EC2 instance, open the EC2 console, specifically the Instances tab and choose **Launch instances**. Keep the default Amazon Machine Image, and for **Instance type** select the t2.micro.

Under **Key pair(login)** select **Create a new key pair**, and enter a name for this pair. Then choose **Download Key Pair**. Alternatively you can use an existing key pair. Finally, choose **Launch instance**.

### 1. Allow client machine to send data to the cluster

Make sure the client machine can send data to the MSK cluster, by checking the Security groups of the cluster Virtual Private Cloud (VPC). To access this, open the VPC console and under **Security** choose **Security groups**. Select the default security group associated with the cluster VPC.

<p align="center">
    <img src="images/Security Groups.png" width="700" height="350"/>
</p>

Now, choose **Edit inbound rules** and select **Add rule**. In the **Type** column choose **All traffic**. In the **Source** column add the ID of the security group of the client machine (this can be found in the EC2 console). Once you choose **Save rules**, your cluster will accept all traffic from the client machine.

### 2. Install Kafka on the client machine

Connect to the EC2 client machine using the terminal. On the EC2 console you should choose **Connect**, and you will see something like the picture below: 

<p align="center">
    <img src="images/Connect EC2.png" width="650" height="350"/>
</p>

First of all, you should be in the folder (in your terminal) where you saved you **Private key file**, then follow the steps in the image above.

Once inside the EC2 client we will first need to install `Java` by running the following command:

`sudo yum install java-1.8.0`

Then we will download Apache Kafka using the commands below:

In [None]:
wget https://archive.apache.org/dist/kafka/2.6.2/kafka_2.12-2.6.2.tgz
tar -xzf kafka_2.12-2.6.2.tgz

If you list your directories, you should see a `Kafka directory` inside your EC2 client.

MSK clusters also support IAM authentication. IAM access control allows MSK to enable both authentication and authorization for clusters. This means, that if a client tries to write something to the cluster, MSK uses IAM to check whether the client is an authenticated identity and also whether it is authorized to produce to the cluster.

To connect to a cluster that uses IAM authentication, we will need to follow additional steps before we are ready to create a topic on our client machine.

First, navigate to your `Kafka installation folder` and then in the `libs` folder. Inside here we will download the **IAM MSK authentication package** from Github, using the following command:

`wget https://github.com/aws/aws-msk-iam-auth/releases/download/v1.1.5/aws-msk-iam-auth-1.1.5-all.jar`

To read more about this package, check out their Github repository: https://github.com/aws/aws-msk-iam-auth.

Once downloaded, we will be able to see a new file inside the libs directory: `aws-msk-iam-auth-1.1.5-all.jar`. We will create an environment variable called `CLASSPATH` to store the location of this jar file and make sure that the Amazon MSK IAM libraries are accessible to the Kafka client, regardless of the location where we will be running commands from.

`export CLASSPATH=/home/ec2-user/kafka_2.12-2.6.2/libs/aws-msk-iam-auth-1.1.5-all.jar`

Make sure to double check that your location corresponds to the one given in the example above.

## Configure Kafka client to use AWS IAM

To configure a Kafka client to use AWS IAM for authentication you should first navigate to your Kafka installation folder, and then in the `bin` folder.

Here, you should create a `client.properties` file, using the following command:

`nano client.properties`

The `client's configuration file` should contain the following:

In [None]:
# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL

# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM

# Binds SASL client implementation.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required awsRoleArn="Your Access Role";

# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler

## Create a topic on a client machine

To create a topic, make sure you are inside your `<KAFKA_FOLDER>/bin` and then run the following command, replacing **BoostrapServerString** with the connection string you have previously saved, and `<topic_name>` with your desired topic name:

`./kafka-topics.sh --bootstrap-server BootstrapServerString --command-config client.properties --create --topic <topic_name>`

If the command run succesfully you will see the following message: **Created topic `<topic_name>`.**

## Create & run a producer and a consumer

To start a producer, run the following command, replacing **BoostrapServerString** with the connection string you have previously saved, and `<topic_name>` with your desired topic name.

`./kafka-console-producer.sh --bootstrap-server BootstrapServerString --producer.config client.properties --group students --topic <topic_name>`

Enter any message you want and press **Enter**. Every time you enter a message press **Enter** and that line will be sent to your Kafka cluster as a separate message.

To create a message consumer open a new window on your client machine and run the following command:

`./kafka-console-consumer.sh --bootstrap-server BootstrapServerString --consumer.config client.properties --group students --topic <topic_name> --from-beginning`

You should start seeing the messages you entered earlier in your producer. You can now add more messages in the producer window and watch them appear in the consumer window.

## Conclusion
At this point, you should have a good understanding of:
- Amazon MSK clusters and how to create one
- How the get MSK cluster information using the MSK Management Console and the AWS CLI
- How to create a Kafka topic using MSK
- How to produce and consume messages using MSK