Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
README.md
consumers.md
naming-conventions.md
pipeline-checklist.md
producers.md
streams.md

README.md

Buffer Unified Data Architecture

Documentation, guidelines and resources related to the Buffer Unified Data Architecture, also known as BUDA. A unified data architecture mean that all the different data paths will share patterns between them. This architecture relies on event logs to synchronize the different steps of the data pipeline.

Table of Contents

Principles

  • Simplicity: The different steps are easy to understand, modify, and test.
  • Reliability: All the errors in the pipeline can be recovered.
  • Modularity: Each step can be iterated independently.
  • Consistency: We follow the same conventions and design patterns in all pipelines.
  • Efficiency: Data flows as real time as possible, and we can scale out to handle load as needed.
  • Flexibility: We have the ability to transform, model and annotate data easily to enable power analysis and application of data.

Overview

Buda, or Buffer Unified Data Architecture, consolidates the architecture of our data flows. The general pattern of this architecture involves different steps; producing the data, storing the data in a log, and consuming the data.

Buffer Unified Data Architecture

📝 Producers

Different services take care of gathering events from specific sources (Mongo, Stripe, Buffer) and sending them to a Kinesis Stream. The responsibility of integrating with the pipeline and providing a clean, well-structured data feed lies with the producers.

📚 Streams

Kinesis Streams act as the unified log layer where the data is stored for a certain period of time. Each stream contains similar events.

📖 Consumers

Services read data from the streams and apply transformations. All the data transformations should be included here. New processing services can subscribe to streams and add new transformations without having to rely in the other consumers. Consumers services take care of saving the data.

🏠 Data Warehouse

Data is in the desired state and placed in a Data Warehouse, ready to be used!

Structure

Each pipeline is self-contained in a repository. This repository should contain all the necessary code to run, understand, and modify the entire pipeline.

Resources

Roadmap

  • Handle Schema Evolution
  • Hability to replay events
  • Continuous Integration and Deployment
  • Standard format for data at rest
  • Compress data in streams
  • Kinesis Shards Autoscaling