Skip to content

AdKnow/DistributedHyperLogLog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Implementation of HyperLogLog Algorithm in C++

Description

This project demonstrates an implementation of the HyperLogLog algorithm, which estimates the number of distinct elements in a dataset using a fixed amount of memory probabilistically. HyperLogLog is used in big data processing for its efficiency and accuracy when working with large data streams.

This implementation leverages OpenMP and OpenMPI to optionally support parallelism and distributed computation.

Getting Started

Dependencies

A C++ Compiler such as g++
OpenMP
OpenMPI

*The code has been tested on Ubuntu 22.04.3 LTS

Installing

git clone <repository-url>
cd <repository-name>

Executing program

  • mpicxx -fopenmp Mpi.c -o opq && mpirun -np 4 ./opq
  • -np dictates number of processes to distribute workload.

Author

github.com/AdKnow

Version History

  • 0.2
    • Improved Code quality
  • 0.1
    • Initial Release

ToDo

  • Improve implementation accuracy especially for smaller datasets.

Acknowledgments

Inspiration

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages