Arrow-Kafka-Pyo3

High-performance Kafka sink with Arrow zero-copy support for financial data, real-time streaming, and batch processing scenarios.

🚀 Key Features

✅ Production-Ready

Structured error handling: 7 dedicated exception classes with clear error context
Complete type support: Covers all financial Arrow types (Date32, Timestamp, Decimal128, etc.)
Reliability configuration: Supports idempotent production, exactly-once semantics
Observability: Built-in statistics counters, monitoring cache hit rate and throughput

🔧 Core Capabilities

Zero-copy: Direct Arrow FFI from pyarrow.Table to Avro, no memory copying
Schema Registry integration: Supports Confluent/Redpanda Schema Registry
Materialize compatible: Uses Confluent wire format, directly compatible with Materialize

📦 Installation

From PyPI (Recommended)

pip install arrow-kafka-pyo3

From Source

git clone https://github.com/your-org/arrow-kafka.git
cd arrow-kafka

# Install Rust toolchain if needed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Build Python extension
cd crates/arrow-kafka-pyo3
maturin develop

🚀 5-Minute Quick Start

import pyarrow as pa
from arrow_kafka_pyo3 import ArrowKafkaSink

# Create sink instance
sink = ArrowKafkaSink(
    kafka_servers="localhost:9092",
    schema_registry_url="http://localhost:8081",
)

# Prepare data
table = pa.table({
    "symbol": ["AAPL", "GOOGL", "MSFT"],
    "price": [189.3, 2750.5, 342.8],
    "volume": [1000, 500, 1200]
})

# Send data
rows_sent = sink.consume_arrow(
    table=table,
    topic="stock_quotes",
    key_cols=["symbol"]
)

print(f"✅ Sent {rows_sent} rows to Kafka")

# Ensure delivery
sink.flush(timeout_ms=10000)

# Close connection
sink.close()

For more detailed examples, see Getting Started Guide.

📚 Documentation

Quick Navigation

Getting Started - Installation and first steps
Complete User Guide - Comprehensive usage guide with examples
API Reference - Detailed API documentation
Schema Evolution - Schema compatibility rules
FAQ - Common questions and troubleshooting
中文文档 - Complete documentation in Chinese

Topics Covered

Performance tuning and configuration presets
Production deployment and monitoring
Error handling and exception hierarchy
Kafka headers and topic administration
Materialize integration examples

🔧 Advanced Configuration Example

sink = ArrowKafkaSink(
    kafka_servers="kafka1:9092,kafka2:9092,kafka3:9092",
    schema_registry_url="http://schema-registry:8081",
    
    # Reliability
    enable_idempotence=True,
    acks="all",
    retries=10,
    
    # Performance
    linger_ms=20,
    batch_size=65536,
    compression_type="lz4",
    
    # Schema Registry
    subject_name_strategy="topic_name",  # Materialize compatible
)

📊 Monitoring

stats = sink.stats()
print(f"Rows enqueued: {stats.enqueued_total}")
print(f"Cache hit rate: {stats.sr_hit_rate():.1%}")
print(f"Cache hits: {stats.sr_cache_hits}, misses: {stats.sr_cache_misses}")

See User Guide for detailed monitoring instructions.

🧪 Testing

# Rust tests
cargo test -p arrow-kafka

# Python tests (requires built extension)
cd crates/arrow-kafka-pyo3 && maturin develop
python -m pytest tests/ -v

📈 Performance Benchmarks

Scenario	Throughput	Latency	Memory
Low latency mode	10-100 MB/s	1-10ms	Low
High throughput mode	500 MB/s+	20-100ms	Medium
Exactly-once mode	100-300 MB/s	10-50ms	Low

🤝 Contributing

We welcome issues and pull requests! See CONTRIBUTING.md for details.

Development Setup

rustup install stable
pip install -r requirements-dev.txt
pre-commit install

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

librdkafka - Reliable Kafka client
Apache Arrow - Zero-copy data exchange
Materialize - Real-time data warehouse

📞 Support

GitHub Issues: Report bugs or request features
Documentation: Submit improvements via pull requests

Arrow-Kafka-Pyo3 - High-performance Kafka data sink for production environments

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
crates		crates
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
arrow_kafka_pyo3.pyi		arrow_kafka_pyo3.pyi
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arrow-Kafka-Pyo3

🚀 Key Features

✅ Production-Ready

🔧 Core Capabilities

📦 Installation

From PyPI (Recommended)

From Source

🚀 5-Minute Quick Start

📚 Documentation

Quick Navigation

Topics Covered

🔧 Advanced Configuration Example

📊 Monitoring

🧪 Testing

📈 Performance Benchmarks

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

ProFinLab/arrow-kafka

Folders and files

Latest commit

History

Repository files navigation

Arrow-Kafka-Pyo3

🚀 Key Features

✅ Production-Ready

🔧 Core Capabilities

📦 Installation

From PyPI (Recommended)

From Source

🚀 5-Minute Quick Start

📚 Documentation

Quick Navigation

Topics Covered

🔧 Advanced Configuration Example

📊 Monitoring

🧪 Testing

📈 Performance Benchmarks

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages