Real Time Streaming API with Kafka(Redpanda)

A high-performance real-time streaming API built with Golang and Kafka (Redpanda). This API enables clients to start streams, send data, and retrieve results via a scalable and robust architecture. Designed for handling large-scale concurrent connections, this API is benchmarked and optimized for real-time performance.

Architecture

Design Overview

API Design:
- RESTful endpoints in Golang for initiating streams, sending data, and retrieving real-time results.
- Efficient management of concurrent connections using Goroutines.
Kafka (Redpanda) Integration:
- Each stream_id maps to a unique Kafka topic or partition ensure user isolation
- Redpanda ensures fault tolerance and scalability for message processing.
- Kafka consumers process incoming messages in real time.
Processing:
- Incoming data chunks are pushed to Kafka topics.
- Consumers simulate real-time processing, transforming data dynamically.
- Results are streamed back to clients via Server-Sent Events (SSE).

Components

API Server:
- Built in Golang.
- Manages stream lifecycle: start, send, retrieve results, and end.
- Ensures fault tolerance with graceful handling of inactive streams.
Kafka Cluster:
- Uses a Redpanda-backed Kafka cluster with 3 brokers.
- Each stream gets its own topic or partition for data isolation.
Fault Tolerance:
- Kafka’s replication ensures no data loss during broker failures.
- Timeouts for inactive streams prevent resource leakage.

API Design

Endpoints

1. API Design

Endpoint: POST /stream/start
- Description: Start a new data stream.
- Request Payload: None
- Response:
```
{
  "status": "success",
  "message": "Stream started successfully!"
}
```

Endpoint: POST /stream/{stream_id}/send

Description: Send chunks of data to the server.
Request Payload:
```
{
  "data": "Message content here"
}
```

Response:

{
  "status": "success",
  "message": "Data sent to stream {stream_id}!"
}

Endpoint: GET /stream/{stream_id}/results
- Description: Retrieve real-time results for the stream.
- Response:
```
{
  "status": "success",
  "data": "Processed data stream content"
}
```
Endpoint: POST /stream/{stream_id}/end
- Description: Ends a stream and cleans up resources.
- Request Payload: None
- Response:
```
{
  "status": "success",
  "message": "Stream {stream_id} disconnected successfully"
}
```

2. Kafka (Redpanda) Integration

Message Broker: Redpanda (a Kafka-compatible platform) is used to handle data streams for high throughput and fault tolerance.
Stream Isolation: Each stream_id corresponds to a unique Kafka topic or partition to simulate isolated clients.
Consumer Processing: Kafka consumers process the incoming data in real-time, simulating operations such as transformation or analytics.

3. High-Level Architecture Diagram

Client ----> [API Gateway] ----> [Kafka Producer]
  |                                     |
  |                                     v
  |                                [Kafka Topic]
  |                                     |
  v                                     v
[Client SSE] <---- [Kafka Consumer] <---- [Stream Processing]

Demo

This demo showcases the real-time streaming API handling one connection. Server(left) and Client (right)

Testing

Tool: wrk
Duration: 30 seconds per test
Threads: 3
Concurrent Connections: Varying (100, 200, 300, ..., 800)
API Endpoint: http://localhost:8080
Script: benchmark.lua for simulating traffic patterns

Command

wrk -t3 -c<CONNECTIONS> -d30s -s test/benchmark.lua http://localhost:8080

Active connections	Requests/Sec	Transfer/Sec	Avg Latency	Max Latency	Socket Errors (Connect/Read/Write/Timeout)	Total Requests	Total Data Read
100	81274.48	13.25MB	13.15ms	1.59s	0/0/0/84	2443754	398.35MB
200	60725.90	9.93MB	26.57ms	1.96s	0/47/0/145	1827753	298.90MB
300	55103.13	9.02MB	32.63ms	1.97s	0/179/0/208	1658462	271.51MB
400	39182.14	6.42MB	70.12ms	2.00s	0/341/0/211	1179250	193.19MB
500	37079.17	6.08MB	89.75ms	2.00s	0/302/64/358	1115027	182.74MB
600	33486.18	5.49MB	75.79ms	2.00s	0/679/92/298	1007270	165.12MB
700	13327.64	2.19MB	172.25ms	2.00s	0/1424/0/827	401533	65.88MB
800	14255.99	2.34MB	163.93ms	2.00s	0/1816/2/361	428815	70.36MB

Key Insights

Optimal Performance: Best performance with 100-200 concurrent connections.
High Connection Impact: Latency increases and throughput decreases with >300 connections.
Fault Tolerance: Kafka ensures resilience and data integrity under load.
Resource Bottlenecks: API struggles with heavy traffic due to server-side constraints.

Future Improvements

Load Balancing: Distribute traffic using tools like NGINX or HAProxy.
Horizontal Scaling: Deploy multiple API server instances with Kafka partitioning.
Kafka Tuning: Optimize producer/consumer configurations and increase partitions.
Connection Pooling: Enhance persistent connection handling for lower latency.
Backpressure: Regulate request rate to prevent server overload.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
graphic		graphic
server		server
test		test
.DS_Store		.DS_Store
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go
stress_test.txt		stress_test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real Time Streaming API with Kafka(Redpanda)

Architecture

Design Overview

Components

API Design

Endpoints

1. API Design

2. Kafka (Redpanda) Integration

3. High-Level Architecture Diagram

Demo

Testing

Command

Key Insights

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real Time Streaming API with Kafka(Redpanda)

Architecture

Design Overview

Components

API Design

Endpoints

1. API Design

2. Kafka (Redpanda) Integration

3. High-Level Architecture Diagram

Demo

Testing

Command

Key Insights

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages