This project demonstrates an implementation of the HyperLogLog algorithm, which estimates the number of distinct elements in a dataset using a fixed amount of memory probabilistically. HyperLogLog is used in big data processing for its efficiency and accuracy when working with large data streams.
This implementation leverages OpenMP and OpenMPI to optionally support parallelism and distributed computation.
A C++ Compiler such as g++
OpenMP
OpenMPI
*The code has been tested on Ubuntu 22.04.3 LTS
git clone <repository-url>
cd <repository-name>
mpicxx -fopenmp Mpi.c -o opq && mpirun -np 4 ./opq- -np dictates number of processes to distribute workload.
github.com/AdKnow
- 0.2
- Improved Code quality
- 0.1
- Initial Release
- Improve implementation accuracy especially for smaller datasets.
Inspiration