get_optimal_index_keys_v2 support faiss AutoTune #139

xiongqiangcs · 2022-11-16T09:49:20Z

autofaiss/autofaiss/external/optimize.py

Lines 139 to 159 in d5c773f

    
           def get_optimal_index_keys_v2( 
        
               nb_vectors: int, 
        
               dim_vector: int, 
        
               max_index_memory_usage: str, 
        
               flat_threshold: int = 1000, 
        
               quantization_threshold: int = 10000, 
        
               force_pq: Optional[int] = None, 
        
               make_direct_map: bool = False, 
        
               should_be_memory_mappable: bool = False, 
        
               ivf_flat_threshold: int = 1_000_000, 
        
               use_gpu: bool = False, 
        
           ) -> List[str]: 
        
               """ 
        
               Gives a list of interesting indices to try, *the one at the top is the most promising* 
        
               See: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index for 
        
               detailed explanations. 
        
               """ 
        
               # Exception cases:

rom1504 · 2022-11-16T10:13:34Z

Autofaiss is an alternative implementation of faiss AutoTune that works based on the contraints listed in the readme.
Do you have any specific suggestions?

xiongqiangcs · 2022-11-17T13:15:05Z

I think the faiss readme is a recommendation rather than a standard, it's a tradeoff between performance and recall.
Examples:
http://ann-benchmarks.com/sift-128-euclidean_10_euclidean.html

but autofaiss

autofaiss/autofaiss/external/optimize.py

Lines 174 to 178 in d5c773f

    
           if not should_be_memory_mappable: 
        
               # If we can build an HNSW with the given memory constraints, it's the best 
        
               m_hnsw = int(floor((max_size_in_bytes / (4 * nb_vectors) - dim_vector) / 2)) 
        
               if m_hnsw >= 8: 
        
                   return [f"HNSW{min(m_hnsw, 32)}"]

I recommend randomly sampling the embeddings and evaluate recall/perf by various index key parameters to get optimal index key，such as https://github.com/erikbern/ann-benchmarks/blob/8807d6ead4cf14318ac43166cdabf02b491f620e/algos.yaml#L146-L187

rom1504 · 2022-11-17T14:43:04Z

Feel free to try it out. Beating autofaiss current heuristic on all relevant metrics (latency, recall and memory) while taking a reasonable amount of compute (IE not recomputing the whole index many times) would be great!

…

On Thu, Nov 17, 2022, 14:15 xiongqiangcs ***@***.***> wrote: I think the faiss readme is a recommendation rather than a standard, it's a tradeoff between performance and recall. Examples: http://ann-benchmarks.com/sift-128-euclidean_10_euclidean.html [image: image] <https://user-images.githubusercontent.com/7856886/202449023-559804e3-a84d-404e-a7aa-13478543cdec.png> but autofaiss https://github.com/criteo/autofaiss/blob/d5c773fa8ab78ae0dddb22cad60832c55eadc999/autofaiss/external/optimize.py#L174-L178 I recommend randomly sampling the embeddings and evaluate recall/perf by various index key parameters to get optimal index key，such as https://github.com/erikbern/ann-benchmarks/blob/8807d6ead4cf14318ac43166cdabf02b491f620e/algos.yaml#L146-L187 — Reply to this email directly, view it on GitHub <#139 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAR437XMKA2MCZMJTSLX4ELWIYVWJANCNFSM6AAAAAASCAVI3Y> . You are receiving this because you commented.Message ID: ***@***.***>

victor-paltz · 2022-11-17T14:59:19Z

Hello @xiongqiangcs !

We chose to use a heuristic rather than benchmarking n different indices because it is n times faster and that the heuristic is not too far from finding the best index.
Also, the fine-tuning step makes it possible to improve the final index performance, so we close the gap with an eventually better index key to use.

But indeed, benchmarking n index keys could give better results. There might be good ideas to use to make it possible to benchmark quickly several indices while not taking too much time! We are open to any suggestions if you want to try it out 😁

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_optimal_index_keys_v2 support faiss AutoTune #139

get_optimal_index_keys_v2 support faiss AutoTune #139

xiongqiangcs commented Nov 16, 2022

rom1504 commented Nov 16, 2022

xiongqiangcs commented Nov 17, 2022

rom1504 commented Nov 17, 2022 via email

victor-paltz commented Nov 17, 2022

get_optimal_index_keys_v2 support faiss AutoTune #139

get_optimal_index_keys_v2 support faiss AutoTune #139

Comments

xiongqiangcs commented Nov 16, 2022

rom1504 commented Nov 16, 2022

xiongqiangcs commented Nov 17, 2022

rom1504 commented Nov 17, 2022 via email

victor-paltz commented Nov 17, 2022