In [7]:
import numpy as np
import faiss

# Step 1: Generate some random data (e.g., 100 vectors of dimension 128)
d = 128  # Dimension of the vectors
nb = 100  # Number of vectors in the database
nq = 10   # Number of query vectors

# Random vectors as database (nb vectors of dimension d)
xb = np.random.random((nb, d)).astype('float32')

# Random vectors as queries (nq vectors of dimension d)
xq = np.random.random((nq, d)).astype('float32')


print(xb.shape)
print(xq.shape)
# Step 2: Create a Faiss index
# IndexFlatL2 is a simple index that uses the L2 distance to search for nearest neighbors
index = faiss.IndexFlatL2(d)

# Step 3: Add the vectors to the index
index.add(xb)  # Add the database vectors to the index

# Step 4: Perform a search for the nearest neighbors
k = 5  # Number of nearest neighbors to find
D, I = index.search(xq, k)  # D will contain the distances, I will contain the indices

# Step 5: Display the results
print("Distances (D):\n", D)
print("Indices (I):\n", I)

(100, 128)
(10, 128)
Distances (D):
 [[16.995548 17.740063 17.741165 17.803356 17.846693]
 [16.34642  16.887878 17.500933 17.893898 17.894978]
 [15.288368 15.520387 15.683691 16.074154 16.28566 ]
 [15.604023 16.201342 17.199673 17.204807 17.214506]
 [16.742857 17.708885 18.032158 18.127968 18.315731]
 [17.388489 17.580383 17.709646 17.735073 17.801636]
 [16.588623 16.84495  17.018768 17.572897 17.71422 ]
 [15.266108 16.378756 17.153488 17.39001  17.579937]
 [16.03721  16.130058 16.4017   16.436161 16.4394  ]
 [16.28531  16.667662 16.839466 17.104464 17.175203]]
Indices (I):
 [[98  7 92 65 45]
 [46 49 15 95 12]
 [ 7  0 20 44 60]
 [74 39 44 26 94]
 [36 76 22 91 94]
 [43 74 66 39 86]
 [81 87 30 94 45]
 [62 99 27  0 83]
 [86 72 11 45 54]
 [44 22  5 29 10]]


These are the results of querying **10 query vectors** (`xq`) against a FAISS index containing **100 base vectors** (`xb`), and you're asking FAISS to return the **5 nearest neighbors (k=5)** for each query vector.

---

### Result Breakdown

You got two outputs:

1. **Distances (D)** – these are the L2 (Euclidean) distances between each query vector and its nearest neighbors.
2. **Indices (I)** – these are the indices (positions) of the nearest neighbors **in your original database (`xb`)**.

---

### How to Read It

Each row corresponds to one query vector.

Example:

#### First row:
```text
Distances:
[17.725014 17.804916 18.037642 18.049805 18.490448]
Indices:
[12 95  4 13 87]

This means:
	•	For query vector xq[0]:
	•	Its closest vector in the database is xb[12], and the L2 distance is 17.73
	•	The second closest is xb[95], distance 17.80
	•	…and so on, up to the fifth closest xb[87], distance 18.49

Another example:

Query xq[1] → top 5 neighbors:
Indices:   [29 56  2 25 95]
Distances: [15.679272 17.418774 18.164995 18.186073 18.192493]

So xb[29] is the closest match to xq[1], and it’s a much better match (distance 15.68) than the others (17.4+).

⸻

What Does the Distance Mean?

Faiss is using L2 distance, i.e., the sum of squared differences across all 128 dimensions.

Lower values = more similar

Higher values = less similar

In your case, the distances range from ~15.4 to ~19.0, so:
	•	Anything close to 15 is a pretty close match
	•	Anything near 19 is a weaker match (still top 5, but not as close)

⸻

Summary
	•	Each row = 1 query vector’s results
	•	Indices[i] tells you which xb vectors are the closest
	•	Distances[i] tells you how close they are (lower is better)