<h1 align="center"> Lesson 2 - Kafka Topology and Components </h1>


As mentioned in Lesson 1, Kafka is a Publish-Subscribe messaging system used with real-time streaming data (in addition to it's ability to process batch data)

In order for Kafka to be able to handle massive volumes of data at scale coming in at rapid velocity, several components are required to setup a robust Kafka system.

As a quick reminder, below is the overall Kafka topology at a high-level:

<p align="center">

<img src= "images/Kafka_Architecture2.png">

</p>


## Main Concepts and Terminology

An __Event__ records the fact that "something happened" in the world or in your business. It is also called record or message in the documentation. When you read or write data to Kafka, you do this in the form of Events. 

Conceptually, an Event has a _Key_, _Value_, _Timestamp_, and optional _Metadata headers_. Here's an example Event:

- Event Key: "Alice"

- Event Value: "Made a payment of $200 to Bob"

- Event Timestamp: "Jun. 25, 2020 at 2:06 p.m."

__Producers__ are those client applications that publish (write) Events to Kafka, and __Consumers__ are those that subscribe to (read and process) these Events. 
In Kafka, Producers and Consumers are fully decoupled and agnostic of each other, which is a key design element to achieve the high scalability that Kafka is known for. For example, Producers never need to wait for Consumers. Kafka provides various guarantees such as the ability to process events exactly-once.

Events are organized and durably stored in __Topics__. Very simplified, a Topic is similar to a folder in a filesystem, and the Events are the files in that folder. An example Topic name could be "payments". Topics in Kafka are always multi-producer and multi-subscriber: a Topic can have zero, one, or many Producers that write Events to it, as well as zero, one, or many Consumers that subscribe to these Events. 

Events in a Topic can be read as often as needed—unlike traditional messaging systems, Events are not deleted after consumption. Instead, you define for how long Kafka should retain your Events through a per-Topic configuration setting, after which old Events will be discarded. Kafka's performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.

When a new Event is published to a Topic, it is actually appended to one of the Topic's __Partitions__. Events with the same Event Key (e.g., a customer or vehicle ID) are written to the same Partition, and Kafka guarantees that any consumer of a given Topic-partition will always read that Partition's Events in exactly the same order as they were written.

Below is a visual representation of what Partitions look like:
<p align="center">
<img src= "images/Kafka_Topics.png" width=600>
</p>

To recap what we've covered so far in Lessons 1 and 2, there are __five main components__ in a Kafka System:

__1.	Broker:__ 
- A Broker is a Kafka node or server which is part of the Kafka system
- A Kafka __Cluster__ is usually composed of multiple Brokers
- Each Broker has a unique ID
- Stores the Topic log Partitions
- Handles all requests from Clients (produce, consume, and metadata) and keeps data replicated within the cluster. 
- There can be one or more brokers in a cluster.

For a video explanation on Brokers, please watch the following video:

- [__Brokers Introduction Video__](https://www.youtube.com/embed/jHnyBSUVcOU)



__2.	Zookeeper:__
- Kafka uses Zookeeper to manage service discovery for Brokers (e.g. if a new Broker joins, or a Broker dies etc.)
- Zookeeper is part of the Hadoop technology stack (external to Kafka) and is responsible for coordinating Kafka with other Big Data components 
- Maintains the state of the cluster (Brokers, Topics, Users).


__3.  Producer:__ 
- Connect to a Kafka cluster either through Zookeeper or directly to a Kafka Broker
- Sends records (data) to a broker.

For a video explanation on Producers, please watch the following video:

- [__Producer Introduction Video__](https://www.youtube.com/embed/I7zm3on_cQQ)

__4.	Consumer:__ 
- Consumes batches of records (data) from the broker.
- Consumers can specify both the Topic and Partition from which they will consume data
- There are two types of Consumers:
 
        - Low-level 
        - High-level

For a video explanation on Consumers, please watch the following video:


- [__Consumer Introduction Video__](https://www.youtube.com/embed/Z9g4jMQwog0)


__5.	Topic:__
- A Topic is a category/feed name to which data records are stored and published.
- All Kafka data records are organized into Topics. 
- Producers write data to the Topics and Consumers read data from the Topics
- Data Records plubished in the Cluster remain persisted until a specified Retention Period has passed by.
- Each Topic is divided into _Partitions_, which contain records in an unchangeable sequence. Each record within a Partition is assigned and identified by a unique Offset (ID)

Below is a visual representation of how these different components interact with one another:


<p align="center">
<img src= "images/Kafka Zookeeper Brokers.png" width=600>
</p>

For a video explanation on Topics, please watch the following video:

- [__Topic Introduction Video__](https://www.youtube.com/embed/kj9JH3ZdsBQ)
