Alluvium

A scalable realtime streaming search platform

Overview

Alluvium provides a clean, scalable architecture in Python for realtime streaming search. Realtime search provides insight into high velocity feeds, with applications ranging from media monitoring to up-to-date anti-vandelism detection notifications. A practical example is monitoring community health by tracking the frequency of words in the Twitte feed that correlate with heart disease mortality rates, as presented by Eichstaedt et al in 2015.

Achieving realtime search in high volume streams presents a unique set of engineering challenges. For example, when we search in a static setting we typically create an index on the document we are searching, which is often not feasible in high-volume streams. This limitation led to the development of reverse search where queries are indexed and matched against a tokenized stream of text. Some challenges emerge as additional queries are added. Should we tokenize the streaming documents for each query, or tokenize them once and run them against several queries in batches? How should we remove queries from the list? How shall we scale the processing distribution to handle both an increase in document volume as well as an increase in number and complexity of queries? These are some of the questions I've been addressing with Alluvium.

Architecture

AWS (S3): Simulated firehose of tweets from 2012
Kakfa: Scalable, fault-tolerant message delivery
Storm: Event-based stream processing
Elasticsearch: Tweet search with percolator index
RethinkDB: Key-value data store
Flask-Socket.io: Server socket connection delivering real-time results to client

Engineering Challenges

Kafka tuning
Storm topology configuration and deployment in Python
Pipeline metrics

Performance monitoring

Currently clocking an average of 4 milliseconds per search on a 2000 tweets/second stream.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
app		app
config		config
local_demo		local_demo
media		media
metrics		metrics
search		search
utils		utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alluvium

Overview

Architecture

Engineering Challenges

Performance monitoring

About

Releases

Packages

Languages

SioKCronin/alluvium

Folders and files

Latest commit

History

Repository files navigation

Alluvium

Overview

Architecture

Engineering Challenges

Performance monitoring

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages