Apache Kafka was originally created at LinkedIn and open-sourced in 2012 as a highly-available, fault-tolerant, high-throughput, low-latency, horizontally scalable message bus optimized for writes.
The creators subsequently spun-off Confluent to provide enterprise support, associated products and consultancy services for Kafka.
Technically speaking, Kafka itself is a distributed commit log, and its inner workings, design, and speed have been written about quite extensively. The top-response from this Stack Overflow sites these principal design decisions:
- Zero Copy: basically it calls the OS kernal direct rather than at the application layer to move data fast.
- Batching/Chunking: Kafka is all about batching the data into chunks. This minimises cross-machine latency and associated buffering/copying
- Avoids Random Disk Access - as Kafka is an immutable commit log it does not need to rewind the disk and do many random I/O operations and can just access the disk in a sequential manner.
- Can Scale Horizontally - The ability to have thousands of partitions for a single topic spread among thousands of machines means Kafka can handle huge loads
It is written in Scala and Java (source here), and its deployments require Apache Zookeeper to facilitate leadership election and store metadata, though a RAFT-based version is underway which removes that requirement.
See here
See client APIs
See serde
See schema registry
See securing kafka
There are a couple runnable examples so far in: