Using Monoids for Large Scale Business Stats

At Indix we collect and process lots of data. Most of our processing initially were done as MapReduce (henceforth MR) jobs but as our data grew in size we moved towards stream processing. We monitor the behaviour of our systems through collection of business metrics. It was relatively easy to write Stats jobs on our MR output but things got tricky when we moved to Stream based processing.

Our key learnings over the years have been

Approximate stats now > Accurate stats tomorrow
Our metrics were just aggregates (counts / uniques) with rollups
Existing open source systems were more for system monitoring than business metrics
Model aggregates as Commutative Monoids using Algebird's typeclasses.

We put all our learnings and built a system called Abel which solved this for us. It aggregates a million events in ~15 seconds on a single box.

About Author

Ashwanth Kumar is a Principal Engineer at Indix. His major interest lies in building and operating large data systems. When not dealing with data, he spends his time reading research papers in similar topics.

Abel as an idea was conceptualised by Vinoth Kumar while working at Indix as part of the Ingestion team.

References

Evolution of stats @ Indix from Vinothkumar Raman
HyperLogLog Overview by Andrew Sy
Add ALL the Things: Abstract Algebra Meets Analytics by Avi Bryant. One of the best introductions on Monoids and Algebird library.
Video by @brewkode on Suuchi - Toolkit to build distributed systems at Fifth Elephant, 2017.
Zach Tellman - Everything Will Flow
Of Algebirds, Monoids, Monads, and Other Bestiary for Large-Scale Data Analytics by Michael G. Noll
Functional Programming in Scala by P. Chiusano and R. Bjarnason, published by Manning. Includes chapters on monoids and monads, and how to implement them in Scala.

Links

Image Credits

Twitter by aguycalledgary from the Noun Project
team by Wilson Joseph from the Noun Project
Earth by To Uyen from the Noun Project
database by Kevin Woodland from the Noun Project
options by Mert Güler from the Noun Project
SQL File by Viktor Vorobyev from the Noun Project
database by ✦ Shmidt Sergey ✦ from the Noun Project
sigma by Davo Sime from the Noun Project
tools by Viktor Vorobyev from the Noun Project
Info by Lance Weisser from the Noun Project
Heart by i cons from the Noun Project

License

Using Monoids for Large Scale Business Stats by Ashwanth Kumar is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Using Monoids for Large Scale Business Stats

About Author

References

Links

Image Credits

License

About

Releases

Packages

ashwanthkumar/large-scale-business-stats-talk

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Using Monoids for Large Scale Business Stats

About Author

References

Links

Image Credits

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages