Skip to content

0.11 Kafka and Kafka stream

Amresh Verma edited this page Apr 18, 2024 · 18 revisions

In kafka, everything start with topic. You can say topic is equal to Table in database. Topic is divided in two part one is key another is value part. Key is equal to primary key in table.

image

But if you working ok billion of data then in database all the data and table sitting like one sute case. If low data then database is fast. But when billion data preasent then it will very slow. Becuase in database there is no concept called load balancing.

Kafka is reduce limitation of database becuase kafka is concept called partition. Topic is divided into partition

image

Now question comming, in which partition which data will go. So it is based on key but in database all keys will go in same table. Even if you have one billion record then all data will go in same table. Getting then one billion record from the database taking time.

image

But what happend in kafka, we can create partitions mean one topic have multiple partitions.

Another concept in kafka called partitioner assigner. It is responsible to select the partition for each key. It will decide which key will go in which partition. The partitioner assigner internally used one algorithem called murmur2 The murmur2 algorithem intenal working like below formula

The murmur2 algorithem intenal working like below formula

 hash(Key)% number of partitions

It is like hashing machisim in HashMap.

image

In kafka, each partition we can consider indivisul entity. Mean every partition we can put seprate system. Now we can say each partition can give one server.

How many server we can assign mean server equal to partition. if you are going to add more server compare to partition. Then server will be ideal not in use.

We can also assign less server compare to partitions

image

For controll them, there are another concept called consumer group The consumer group is nothing but the group of servers working together.

image

We can create consumer group based on micro service.

  • For example some service can take more time like batch job or some schedule job but some micro service need process in less time.

  • For example there are two microservice called order and transport.

  • For example order service deliver message with in 5 seconds and transport service can execute daily once.

Consumer group is dependent on each micro service image

Same data multiple people can consume and people called as consumer group.

In Kafka, another concept called Topology. Topology is nothing but group of statement. Input, processing, output combined together called Topology .

image

Topology is nothing but entrie flow

Another concept in kafka is state Store. It is storing data for statefull processing. Now we are trying to understand state store for suppose we need to calculate salary for total employee. While we are calculate we need to store in memory for proccessing is called state store. State store is reside out of Topology. Here we are calculating total salary so we are storing in state store then fetching and perform addition then again store in state store. Finally it will calculate all salary image

In kafka, there another concept called DSL and Stream API How to proccesor doing proccessing either it can used DSL or Stream API. If we going to use low level api then it is called DSL. Kafka is started younger state it is give you can write everthing like you need to create topic, partitoner assigner all things you are doing then it is called DSL.

Stream API is just like forEach loop. Stream API is high level image

In Kafka another concept called windowing Like in table if you want to get data from 9am to 6pm then we need to use where condition. In kafka, if we need to perform time base operation then we need to used windowing Mean for example every 15 min you will get one chunk. Becuace topic contain hole data. So we can say it is kafka batching processing by using windowing

image

If you want to use our own partitioner assigner then we need to use DSL. If you do not want use own partitioner the default partitioner assigner then we can use stream Api.

Kafka Evaluation

image

Kafka Component

image

Kafka partition offset

image

Kafka consumer group

image image

Kafka Connector

image image

Kafka Stream

image image image

Kafka Stream Architecture

image image image image image

KSQL

image image image image image image

https://github.com/LearningJournal/Apache-Kafka-For-Absolute-Beginners