New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a dynamic LSH table #2

Open
ludwigschmidt opened this Issue Dec 10, 2015 · 4 comments

Comments

Projects
None yet
3 participants
@ludwigschmidt
Collaborator

ludwigschmidt commented Dec 10, 2015

No description provided.

@ludwigschmidt ludwigschmidt self-assigned this Dec 10, 2015

@hxwu

This comment has been minimized.

Show comment
Hide comment
@hxwu

hxwu Nov 21, 2016

Excuse me, could you please show me how to add a dynamic LSH table?

hxwu commented Nov 21, 2016

Excuse me, could you please show me how to add a dynamic LSH table?

@hxwu

This comment has been minimized.

Show comment
Hide comment
@hxwu

hxwu Nov 21, 2016

I just want to insert or remove some record from the established LSH table

hxwu commented Nov 21, 2016

I just want to insert or remove some record from the established LSH table

@ludwigschmidt

This comment has been minimized.

Show comment
Hide comment
@ludwigschmidt

ludwigschmidt Nov 27, 2016

Collaborator

Hi hxwu,

updates are supported in some of the core classes, in particular the DynamicLinearProbingHashTable and the DynamicCompositeHashTable:

https://github.com/ludwigschmidt/fth_lsh/blob/master/lsh_table/src/probing_hash_table.h
https://github.com/ludwigschmidt/fth_lsh/blob/master/lsh_table/src/composite_hash_table.h

What we need next is a dynamic version of the StaticLSHTable in

https://github.com/FALCONN-LIB/FALCONN/blob/master/src/include/falconn/core/lsh_table.h

We should be able to re-use quite a bit of code here. The main difference is that the static LSH table always assumes that the key range is between 0 and data_set_size - 1. This allows us to simply keep a large array for checking whether a certain candidate already appears in the answer for the current query. If the key range is arbitrary, we need a less specialized set data structure here (a hash table would probably be the right thing). Overall it should not be a big performance issue, but it's good to get it right.

The code is organized so that the query objects in the LSHTable are only responsible for producing a list of candidates. The actual distances are then computed in the NearestNeighborQuery class:

https://github.com/ludwigschmidt/fth_lsh/blob/master/lsh_table/src/nn_query.h

Two questions about your application:

  • Are you planning to use the C++ or Python interface?
  • How is your dynamic point set stored? Would it make sense to always have the point indices in a small range (say 0 to some_constant * largest_point_set_size)?

Cheers,
Ludwig

Collaborator

ludwigschmidt commented Nov 27, 2016

Hi hxwu,

updates are supported in some of the core classes, in particular the DynamicLinearProbingHashTable and the DynamicCompositeHashTable:

https://github.com/ludwigschmidt/fth_lsh/blob/master/lsh_table/src/probing_hash_table.h
https://github.com/ludwigschmidt/fth_lsh/blob/master/lsh_table/src/composite_hash_table.h

What we need next is a dynamic version of the StaticLSHTable in

https://github.com/FALCONN-LIB/FALCONN/blob/master/src/include/falconn/core/lsh_table.h

We should be able to re-use quite a bit of code here. The main difference is that the static LSH table always assumes that the key range is between 0 and data_set_size - 1. This allows us to simply keep a large array for checking whether a certain candidate already appears in the answer for the current query. If the key range is arbitrary, we need a less specialized set data structure here (a hash table would probably be the right thing). Overall it should not be a big performance issue, but it's good to get it right.

The code is organized so that the query objects in the LSHTable are only responsible for producing a list of candidates. The actual distances are then computed in the NearestNeighborQuery class:

https://github.com/ludwigschmidt/fth_lsh/blob/master/lsh_table/src/nn_query.h

Two questions about your application:

  • Are you planning to use the C++ or Python interface?
  • How is your dynamic point set stored? Would it make sense to always have the point indices in a small range (say 0 to some_constant * largest_point_set_size)?

Cheers,
Ludwig

@pengfu008

This comment has been minimized.

Show comment
Hide comment
@pengfu008

pengfu008 Aug 29, 2017

Insert and remove record is very important in my application, I need to dynamic update LSH table, I use python interface, Do you have the plan to support it? @ludwigschmidt

pengfu008 commented Aug 29, 2017

Insert and remove record is very important in my application, I need to dynamic update LSH table, I use python interface, Do you have the plan to support it? @ludwigschmidt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment