Skip to content

avishkar-004/streamflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

78 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

StreamFlow - Event Streaming Platform

A high-performance, distributed event streaming platform inspired by Apache Kafka, built from scratch in Java.

πŸš€ Features

Core Capabilities

  • Persistent Storage: Append-only commit log with memory-mapped indexes
  • High Throughput: Batch processing and zero-copy transfers
  • Fault Tolerance: Leader-follower replication with ISR (In-Sync Replicas)
  • Scalability: Partitioned topics for parallel processing
  • Consumer Groups: Automatic load balancing and rebalancing
  • REST API: Spring Boot admin API with Prometheus metrics
  • Docker Support: Complete Docker Compose setup with monitoring

Performance Optimizations

  • Memory-mapped file I/O for fast reads
  • Zero-copy transfers using FileChannel.transferTo()
  • Message batching for reduced network overhead
  • GZIP compression support
  • Sparse indexing for efficient offset lookups

πŸ“ Project Structure

streamflow/
β”œβ”€β”€ common/              # Shared models and protocol
β”‚   β”œβ”€β”€ model/           # Message, metadata
β”‚   β”œβ”€β”€ protocol/        # Binary wire protocol
β”‚   └── compression/     # Compression codecs
β”œβ”€β”€ broker/              # Broker implementation
β”‚   β”œβ”€β”€ storage/         # Log segments, partitions, topics
β”‚   β”œβ”€β”€ replication/     # Leader election, ISR, fetchers
β”‚   β”œβ”€β”€ coordinator/     # Consumer groups, offsets
β”‚   └── server/          # Netty TCP server
β”œβ”€β”€ client/              # Producer/Consumer clients
β”‚   β”œβ”€β”€ producer/        # Producer with batching
β”‚   └── consumer/        # Consumer with offset management
β”œβ”€β”€ admin/               # Spring Boot REST API
β”‚   β”œβ”€β”€ controller/      # REST endpoints
β”‚   β”œβ”€β”€ service/         # Business logic
β”‚   └── dto/             # Data transfer objects
└── docs/                # Documentation

πŸ› οΈ Building

Prerequisites

  • Java 17 or higher
  • Maven 3.8+
  • Docker (optional, for containerized deployment)

Build All Modules

mvn clean install

Build Individual Modules

# Broker
cd broker
mvn clean package

# Client
cd client
mvn clean package

# Admin API
cd admin
mvn clean package

🎯 Quick Start

Option 1: Run Locally

Start Broker

java -jar broker/target/streamflow-broker-1.0.0-SNAPSHOT.jar \
  --broker-id 0 \
  --host localhost \
  --port 9092 \
  --data-dir ./data

Start Admin API

java -jar admin/target/streamflow-admin-1.0.0-SNAPSHOT.jar

Access Swagger UI: http://localhost:8080/api/swagger-ui.html

Producer Example

StreamFlowProducer producer = new StreamFlowProducer("localhost", 9092);
producer.connect();

RecordMetadata metadata = producer.send("orders", "user-123", "Order created");
System.out.println("Sent to offset: " + metadata.offset());

producer.close();

Consumer Example

StreamFlowConsumer consumer = new StreamFlowConsumer("localhost", 9092, "analytics-group");
consumer.connect();
consumer.subscribe("orders", 0);

List<Message> messages = consumer.poll(100);
for (Message msg : messages) {
    System.out.println("Received: " + msg.getValue());
}

consumer.commitSync();
consumer.close();

Option 2: Run with Docker

Start Complete Stack

docker-compose up -d

This starts:

  • Broker on port 9092
  • Admin API on port 8080
  • Prometheus on port 9090
  • Grafana on port 3000

Access Services

Stop Stack

docker-compose down

πŸ“Š REST API

Topics

# List all topics
curl http://localhost:8080/api/topics

# Get topic details
curl http://localhost:8080/api/topics/orders

# Create topic
curl -X POST http://localhost:8080/api/topics \
  -H "Content-Type: application/json" \
  -d '{"name":"orders","partitions":3,"replicationFactor":1}'

# Delete topic
curl -X DELETE http://localhost:8080/api/topics/orders

Broker Health

# Health check
curl http://localhost:8080/api/broker/health

# Broker info
curl http://localhost:8080/api/broker/info

Consumer Groups

# List all consumer groups
curl http://localhost:8080/api/consumer-groups

# Get consumer group details
curl http://localhost:8080/api/consumer-groups/analytics-group

Metrics

# Prometheus metrics
curl http://localhost:8080/api/actuator/prometheus

πŸ§ͺ Testing

Run All Tests

mvn test

Run Performance Benchmark

cd client
mvn test -Dtest=PerformanceBenchmark

Benchmark tests:

  • Producer throughput (messages/sec)
  • Consumer throughput (messages/sec)
  • End-to-end latency (p50, p95, p99)
  • Batching impact

πŸ“ˆ Monitoring

Prometheus Metrics

Key metrics exposed:

  • streamflow_topics_total - Total topics
  • streamflow_partitions_total - Total partitions
  • streamflow_messages_total - Total messages
  • streamflow_bytes_total - Total bytes
  • streamflow_consumer_groups_total - Total consumer groups

Grafana Dashboards

Import dashboards from monitoring/grafana-dashboards/ for:

  • Cluster overview
  • Topic metrics
  • Consumer group lag
  • Broker performance

πŸ—οΈ Architecture

Storage Layer

Topic (e.g., "orders")
β”œβ”€β”€ Partition 0
β”‚   β”œβ”€β”€ 00000000000000000000.log    (messages)
β”‚   β”œβ”€β”€ 00000000000000000000.index  (offset index)
β”‚   β”œβ”€β”€ 00000000000000100000.log
β”‚   └── 00000000000000100000.index
β”œβ”€β”€ Partition 1
└── Partition 2

Replication

Partition 0: Leader=Broker0, Followers=[Broker1, Broker2]
Partition 1: Leader=Broker1, Followers=[Broker0, Broker2]
Partition 2: Leader=Broker2, Followers=[Broker0, Broker1]

Consumer Groups

Consumer Group "analytics"
β”œβ”€β”€ Consumer A β†’ Partitions [0, 1]
β”œβ”€β”€ Consumer B β†’ Partitions [2, 3]
└── Consumer C β†’ Partitions [4, 5]

πŸ”§ Configuration

Broker Configuration

BrokerConfig config = BrokerConfig.builder()
    .brokerId(0)
    .host("localhost")
    .port(9092)
    .dataDir(new File("/var/streamflow/data"))
    .defaultPartitions(3)
    .replicationFactor(1)
    .segmentSizeBytes(100 * 1024 * 1024)  // 100MB
    .build();

Admin API Configuration

Edit admin/src/main/resources/application.yml:

streamflow:
  broker:
    host: localhost
    port: 9092

server:
  port: 8080

πŸ“š Documentation

Complete Documentation Suite

πŸŽ“ Learning Resources:

πŸ‘¨β€πŸ’» For Developers:

🚒 For Operations:

πŸ“‹ Phase Documentation:

πŸ“Š Project Status:

Total Documentation: 11 files, ~4,700 lines

πŸŽ“ Learning Objectives

This project demonstrates:

  • Distributed Systems: Replication, consensus, fault tolerance
  • Storage Engines: Log-structured storage, memory-mapped I/O
  • Network Programming: Binary protocols, async I/O with Netty
  • Performance Optimization: Zero-copy, batching, compression
  • Microservices: REST APIs, metrics, monitoring

🚧 Current Limitations

  • Single-broker mode (multi-broker replication not fully implemented)
  • No authentication/authorization
  • Basic leader election (not full Raft)
  • Consumer group info partially mocked
  • Snappy/LZ4 compression not implemented (GZIP only)

πŸ›£οΈ Future Enhancements

  • Multi-broker cluster support
  • ZooKeeper/etcd integration for coordination
  • Transactions and exactly-once semantics
  • Tiered storage (hot/cold data)
  • Schema registry
  • Stream processing (joins, aggregations)
  • Kafka protocol compatibility

🀝 Contributing

This is an educational project. Contributions welcome!

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

πŸ“„ License

This project is for educational purposes.

πŸ™ Acknowledgments

Inspired by:

  • Apache Kafka
  • Apache Pulsar
  • RabbitMQ

Built with:

  • Java 17
  • Spring Boot 3.2
  • Netty 4.1
  • Lombok
  • Micrometer (Prometheus)

Project Status: 100% Complete (6/6 Phases)

Built with ❀️ for learning distributed systems

About

Kafka-like distributed event streaming platform built in Java with Netty, consumer groups, leader-follower replication, and Spring Boot admin API

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages