Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how do I efficiently query for unique values of a field #82

Open
ami-m opened this issue Dec 9, 2022 · 2 comments
Open

how do I efficiently query for unique values of a field #82

ami-m opened this issue Dec 9, 2022 · 2 comments

Comments

@ami-m
Copy link

ami-m commented Dec 9, 2022

say I get a stream of data: {machineCode: "", lat: , lon: }
And I want to display a count of such datums per machineCode.

Is there a way to efficiently get all the unique machine codes? or should I just keep track of them while inserting data?

@kelindar
Copy link
Owner

No built-in feature in column for this, but there's 2 ways I can think of to solve this problem:

  1. if you're okay with imprecise measurement, use HyperLogLog to store machine codes
  2. otherwise, a standard map/set is required

You can do both during insertion or a range query that iterates over all elements.

@ami-m
Copy link
Author

ami-m commented Dec 12, 2022

thanks, I went with the second method, but that leaves me with having to do the range query when restoring state from a snapshot :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants