Publish-subscribe with replay and batching for Windows Azure.
This project is heavily inspired by Apache Kafka and all the log shipping projects out there.
Message Vault is a thin wrapper around capabilities provided by Windows Azure. Producers push messages to the Vault which serves them to consumers. Consumers can replay events from any point in time or chase the tail.
Messages are partitioned by streams. Each stream is an immutable and ordered sequence of messages. All messages in a stream are assigned a unique offset and timestamp. Order of messages in different streams is not guaranteed (within the time drift on Azure).
Message Vault makes several design trade-offs:
- optimize for high throughput over low latency;
- optimize for message streams which are gigabytes large;
- prefer code simplicity over complex performance optimizations;
- http protocol instead of binary protocol;
- rely on Windows Azure to do all the heavy-lifting (this simplifies code, but couples implementation to Azure);
- high-availability via master-slave setup (uptime is limited by Azure uptime, no writes during failover);
- no channel encryption (if needed, use SSL with Azure Load Balancer or your load balancer);
- implemented in imperative C# (.NET runtime is heavy, but Windows Azure is optimized for it);
- client library is intentionally simple (view projections and even checkpoints are outside the scope);
- each stream is a sepate page blob (they can grow to 1TB out-of-the-box, having thousands of streams isn't a good idea).
This project is currently deployed to production at SkuVault, holding 1.5B of events with the total size of 400GB of data.
There is a follow-up implementation of this project, designed for ingesting message vault data (or any event data) into a storage on the local machine, compressing and processing it iteratively for various analytical tasks. See Geyser-net