Skip to content

ashwanthkumar/large-scale-business-stats-talk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Using Monoids for Large Scale Business Stats

At Indix we collect and process lots of data. Most of our processing initially were done as MapReduce (henceforth MR) jobs but as our data grew in size we moved towards stream processing. We monitor the behaviour of our systems through collection of business metrics. It was relatively easy to write Stats jobs on our MR output but things got tricky when we moved to Stream based processing.

Our key learnings over the years have been

  • Approximate stats now > Accurate stats tomorrow
  • Our metrics were just aggregates (counts / uniques) with rollups
  • Existing open source systems were more for system monitoring than business metrics
  • Model aggregates as Commutative Monoids using Algebird's typeclasses.

We put all our learnings and built a system called Abel which solved this for us. It aggregates a million events in ~15 seconds on a single box.

About Author

Ashwanth Kumar is a Principal Engineer at Indix. His major interest lies in building and operating large data systems. When not dealing with data, he spends his time reading research papers in similar topics.

Abel as an idea was conceptualised by Vinoth Kumar while working at Indix as part of the Ingestion team.

References

Links

Image Credits

License

Creative Commons License
Using Monoids for Large Scale Business Stats by Ashwanth Kumar is licensed under a Creative Commons Attribution 4.0 International License.

About

References to Using Monoids for Large Scale Business Stats

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published