## FAISS

Considerations
- Ensure your vectors are normalized if using cosine similarity.
- Tune your FAISS index settings based on your specific requirements (speed vs. accuracy).
- Be mindful of memory usage, especially with very large datasets.
- If your dataset is too large to handle in memory, consider using FAISS's on-disk indexing or clustering large datasets into smaller chunks.

In [None]:
%pip install faiss-cpu  # For CPU
# or
%pip install faiss-gpu  # For GPU


In [None]:
import numpy as np

dim=?
size_small_set=20   # Number of vectors in the small set
size_large_set=14000   # Number of vectors in the large set
# Example data
small_set = np.random.rand(size_small_set, dim).astype('float32')  # Replace 'dim' with your vector dimension
large_set = np.random.rand(size_large_set, dim).astype('float32')


In [None]:
import faiss

# Create a FAISS index
index = faiss.IndexFlatL2(dim)  # Use L2 distance for similarity; 'dim' is the dimension of vectors

# Add the large set to the index
index.add(large_set)


#### knn search for small set:

To find the closest vectors from the small set to the large set, use the search method.

In [None]:
k = 1  # Number of nearest neighbors to find

# Find the k nearest neighbors
D, I = index.search(small_set, k)  # D is the distance, I is the index of the neighbors in large_set


D and I will have the distances and the indices of the closest vectors, respectively.

### Searching for Furthest Vectors

In [None]:
# Calculate distances to all vectors (inefficient for very large datasets)
all_distances = np.linalg.norm(large_set[:, np.newaxis, :] - small_set[np.newaxis, :, :], axis=2)

# Get the index of the furthest vector for each vector in the small set
furthest_indices = np.argmax(all_distances, axis=0)

# Get the actual distances for the furthest vectors
furthest_distances = np.max(all_distances, axis=0)
