Skip to content

Educational Python application demonstrating local message queuing, stream processing engine, windowing functions, data aggregation pipelines, and stream analytics with SQLite3 storage

Notifications You must be signed in to change notification settings

Amruth22/W2-D7-S3-Python-Stream-Processing-Engine

Repository files navigation

Stream Processing Engine

Educational Python application demonstrating local message queuing, stream processing, stream processing engine, windowing functions, data aggregation pipelines, and stream analytics with SQLite3 storage.

Features

📨 Local Message Queuing

  • FIFO Queue - First-in-first-out message ordering
  • Priority Queue - Priority-based message handling
  • Queue Persistence - SQLite storage for durability
  • Queue Manager - Manage multiple queues
  • Queue Statistics - Track enqueue/dequeue metrics

🌊 Stream Processing

  • Stream Processor - Process continuous data streams
  • Transformations - Map, filter, reduce operations
  • Stream Sources - Sensor data, events, queues
  • Stream Sinks - Database, console, queue outputs
  • Batch Processing - Process events in batches

⏱️ Windowing Functions

  • Tumbling Windows - Fixed-size, non-overlapping windows
  • Sliding Windows - Overlapping time windows
  • Window Aggregation - Aggregate data within windows
  • Time-Based Grouping - Group events by time
  • Window Management - Open, close, slide windows

📊 Data Aggregation Pipelines

  • Multi-Stage Pipelines - Chain aggregation operations
  • Group By - Aggregate by key
  • Aggregation Functions - Sum, avg, count, min, max
  • Pipeline Results - Store in SQLite
  • Custom Aggregators - Build custom aggregations

📈 Stream Analytics

  • Real-Time Metrics - Track metrics as they happen
  • Metric Statistics - Count, sum, avg, min, max
  • Rate Calculation - Events per second
  • Metric History - Store in SQLite
  • Analytics Dashboard - View metrics via API

Quick Start

1. Clone the Repository

git clone https://github.com/Amruth22/Python-Stream-Processing-Engine.git
cd Python-Stream-Processing-Engine

2. Create Virtual Environment

python -m venv venv

# On Windows:
venv\Scripts\activate

# On macOS/Linux:
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Run Demonstrations

python main.py

5. Run Flask API

python api/app.py

6. Run Tests

python tests.py

Project Structure

Python-Stream-Processing-Engine/
│
├── queue/
│   ├── message_queue.py         # Message queue
│   ├── queue_manager.py         # Queue management
│   └── queue_store.py           # SQLite persistence
│
├── stream/
│   ├── stream_processor.py      # Stream engine
│   ├── stream_source.py         # Data sources
│   └── stream_sink.py           # Data sinks
│
├── windowing/
│   ├── window_functions.py      # Base windowing
│   ├── tumbling_window.py       # Tumbling windows
│   └── sliding_window.py        # Sliding windows
│
├── aggregation/
│   ├── aggregation_pipeline.py  # Aggregation pipeline
│   ├── aggregators.py           # Aggregator functions
│   └── pipeline_store.py        # SQLite storage
│
├── analytics/
│   ├── stream_analytics.py      # Stream analytics
│   └── analytics_engine.py      # Analytics engine
│
├── api/
│   └── app.py                   # Flask API
│
├── main.py                      # Demonstration
├── tests.py                     # 10 unit tests
└── README.md                    # This file

Usage Examples

Message Queue

from queue.message_queue import MessageQueue

# Create queue
queue = MessageQueue('events', max_size=1000)

# Enqueue messages
queue.enqueue({'event': 'user_login', 'user_id': 1})
queue.enqueue({'event': 'page_view', 'page': '/dashboard'})

# Dequeue messages
message = queue.dequeue()
print(message)

# Get stats
stats = queue.get_stats()
print(f"Queue size: {stats['current_size']}")

Stream Processing

from stream.stream_processor import StreamProcessor

# Create processor
processor = StreamProcessor('my-processor')

# Add transformations
processor.map(lambda x: {**x, 'value': x['value'] * 2})
processor.filter(lambda x: x['value'] > 10)

# Process events
events = [{'value': 5}, {'value': 10}, {'value': 15}]
results = processor.process_batch(events)

Tumbling Window

from windowing.tumbling_window import TumblingWindow

# Create 10-second tumbling window
window = TumblingWindow(duration=10)

# Add events
window.add_event({'value': 10, 'timestamp': time.time()})
window.add_event({'value': 20, 'timestamp': time.time()})

# Get closed windows
closed_windows = window.get_closed_windows()

# Aggregate window
total = window.aggregate(lambda events: sum(e['value'] for e in events))

Sliding Window

from windowing.sliding_window import SlidingWindow

# Create sliding window: 30s size, slide every 10s
window = SlidingWindow(size=30, slide=10)

# Add events
window.add_event({'value': 10, 'timestamp': time.time()})

# Get current window
current = window.get_current_window()
print(f"Events in window: {current['count']}")

Aggregation Pipeline

from aggregation.aggregation_pipeline import AggregationPipeline

# Create pipeline
pipeline = AggregationPipeline('sales')

# Configure
pipeline.group_by('product')
pipeline.aggregate('count', '*')
pipeline.aggregate('sum', 'amount')
pipeline.aggregate('avg', 'amount')

# Process data
data = [
    {'product': 'Laptop', 'amount': 999.99},
    {'product': 'Laptop', 'amount': 1299.99},
    {'product': 'Mouse', 'amount': 29.99}
]

results = pipeline.process(data)
# Results: {'Laptop': {'count': 2, 'sum': 2299.98, 'avg': 1149.99}}

Stream Analytics

from analytics.stream_analytics import StreamAnalytics

# Create analytics
analytics = StreamAnalytics()

# Track metrics
analytics.track_metric('page_views', 1)
analytics.track_metric('api_calls', 5)

# Get metrics
metrics = analytics.get_all_metrics()
print(metrics)

# Get specific metric
page_views = analytics.get_metric('page_views')
print(f"Page views: {page_views['count']}")
print(f"Rate: {page_views['rate_per_second']}/sec")

Windowing Concepts

Tumbling Window

Fixed-size, non-overlapping windows

[0-10s] [10-20s] [20-30s] [30-40s]
   |       |        |        |
  W1      W2       W3       W4

Each event belongs to exactly one window

Sliding Window

Overlapping windows

[0-30s]
    [10-40s]
        [20-50s]
            [30-60s]

Events can belong to multiple windows

Testing

Run the comprehensive test suite:

python tests.py

Test Coverage (10 Tests)

  1. Message Queue - Test enqueue/dequeue
  2. Queue Persistence - Test SQLite storage
  3. Stream Processing - Test pipeline
  4. Tumbling Window - Test fixed windows
  5. Sliding Window - Test overlapping windows
  6. Data Aggregation - Test aggregation functions
  7. Stream Analytics - Test real-time metrics
  8. Aggregators - Test count, sum, avg
  9. Window Aggregation - Test windowed aggregation
  10. Queue Manager - Test queue management

Educational Notes

1. Stream Processing vs Batch Processing

Batch:

  • Process data in chunks
  • Higher latency
  • Simpler to implement

Stream:

  • Process data as it arrives
  • Low latency
  • Real-time results

2. When to Use Windowing

Use Cases:

  • Calculate metrics per time period
  • Detect patterns in time windows
  • Aggregate recent data
  • Time-based analytics

3. Aggregation Patterns

Common Aggregations:

  • Count: Number of events
  • Sum: Total of values
  • Avg: Average value
  • Min/Max: Range of values

Production Considerations

For production use:

  1. Message Broker:

    • Use Kafka or RabbitMQ
    • Implement distributed queuing
    • Add message persistence
  2. Stream Processing:

    • Use Apache Flink or Spark Streaming
    • Implement exactly-once semantics
    • Add checkpointing
  3. Storage:

    • Use time-series database
    • Implement data partitioning
    • Add data retention policies
  4. Scalability:

    • Distribute processing
    • Add horizontal scaling
    • Implement backpressure

Dependencies

  • Flask 3.0.0 - Web framework
  • python-dotenv 1.0.0 - Environment variables
  • pytest 7.4.3 - Testing framework
  • requests 2.31.0 - HTTP client
  • sqlite3 - Database (built-in)

License

This project is for educational purposes. Feel free to use and modify as needed.


Happy Streaming! 🚀

About

Educational Python application demonstrating local message queuing, stream processing engine, windowing functions, data aggregation pipelines, and stream analytics with SQLite3 storage

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages