Implementation of HyperLogLog Algorithm in C++

Description

This project demonstrates an implementation of the HyperLogLog algorithm, which estimates the number of distinct elements in a dataset using a fixed amount of memory probabilistically. HyperLogLog is used in big data processing for its efficiency and accuracy when working with large data streams.

This implementation leverages OpenMP and OpenMPI to optionally support parallelism and distributed computation.

Getting Started

Dependencies

A C++ Compiler such as g++
OpenMP
OpenMPI

*The code has been tested on Ubuntu 22.04.3 LTS

Installing

git clone <repository-url>
cd <repository-name>

Executing program

mpicxx -fopenmp Mpi.c -o opq && mpirun -np 4 ./opq
-np dictates number of processes to distribute workload.

Author

github.com/AdKnow

Version History

0.2
- Improved Code quality
0.1
- Initial Release

ToDo

Improve implementation accuracy especially for smaller datasets.

Acknowledgments

Inspiration

https://arxiv.org/abs/2205.11327

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
HllFinal.cpp		HllFinal.cpp
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Implementation of HyperLogLog Algorithm in C++

Description

Getting Started

Dependencies

Installing

Executing program

Author

Version History

ToDo

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

Uh oh!

AdKnow/DistributedHyperLogLog

Folders and files

Latest commit

History

Repository files navigation

Implementation of HyperLogLog Algorithm in C++

Description

Getting Started

Dependencies

Installing

Executing program

Author

Version History

ToDo

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages