Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: tobuffer should save IDs to samples, not the other way around #418

Open
balintlaczko opened this issue Apr 15, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@balintlaczko
Copy link
Contributor

Is your feature request related to a problem? Please describe the problem.

When using tobuffer on a fluid.dataset~, you specify a fluid.labelset~ that will contain the dataset ID (as the label) for each sample in the buffer. Then you can only query with a buffer sample index to get the corresponding dataset ID. I almost never need it this way, but I always need the opposite: to query with a dataset ID and get the corresponding buffer sample index. It is possible to work around this via dataset/labelset dumping, but that gets very prohibitive with large datasets.

Describe the solution you'd like to see.

If tobuffer would create a labelset where the dataset ID becomes the labelset identifier and the buffer sample index the label, it would be much easier down the line to query which dataset ID got mapped to which index (without excessive loops).

Describe alternatives you've considered

An alternative could be that fluid.labelset~ gets an indexof method, which gets the first index (identifier) for a given label. But I don't think this would be as efficient as the above solution.

Another option (if this is not that expensive) to export 2 labelsets where in one we have <buffer-index> : <dataset identifier> (like now) and another where we have <dataset identifier> : <buffer-index> (which would be new). This option could mean that the new version wouldn't break backwards compatibility so much, since the additional labelset would be just the next element in the list of what the dataset~ reports after tobuffer.

Additional context

Normally, other fluid objects (such as kdtree~) will operate on the dataset, and likely give you back dataset IDs. Example:
fluid.jit.plotter: to efficiently create a mesh from the 2D dataset, I go refer <datasetname>--> tobuffer --> to matrix with jit.buffer~. Luckily this is super fast even with millions of points because I get to avoid loops.

But if I am highlighting the dataset elements closest to the mouse pointer using a kdtree~, now I get dataset IDs which I need to map to buffer indices (which is the same as matrix indices) to know which points to "highlight". And for this it is unavoidable to dump the samples-to-ids labelset at least once, which can cause seconds of hanging with large datasets. It also adds the burden of now having to update two books with the same data relationship.

@balintlaczko balintlaczko added the enhancement New feature or request label Apr 15, 2024
@tremblap
Copy link
Member

Hello

This is an interesting flip. Maybe we could add that option, the same way we can transpose. Let me think of an interface that would not be a problem and would be backward compatible

@balintlaczko
Copy link
Contributor Author

The option to flip with an int after the labelset name (like the int after the buffer name) would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants