# Uso Avanzado de CUVS: IVF-PQ, CAGRA y Optimización de Parámetros

Este notebook explora características avanzadas de CUVS, incluyendo algoritmos como IVF-PQ y CAGRA, ajuste de parámetros y benchmarks de rendimiento.

## 1. Instalar e Importar CUVS

Instalamos CUVS y importamos módulos necesarios.

In [None]:
!pip install cuvs-cu12 --extra-index-url=https://pypi.nvidia.com

import numpy as np
import cupy as cp
from cuvs.common import Resources
from cuvs.neighbors import ivf_pq, cagra
import time
import matplotlib.pyplot as plt

## 2. Cargar y Preparar Datos

Cargamos un dataset más grande para pruebas avanzadas.

In [None]:
np.random.seed(42)
n_samples = 50000
dim = 128
dataset = np.random.randn(n_samples, dim).astype(np.float32)
dataset = dataset / np.linalg.norm(dataset, axis=1, keepdims=True)

n_queries = 1000
queries = np.random.randn(n_queries, dim).astype(np.float32)
queries = queries / np.linalg.norm(queries, axis=1, keepdims=True)

print(f"Dataset: {dataset.shape}, Queries: {queries.shape}")

## 3. Construir Índice de Vectores

Construimos índices usando IVF-PQ y CAGRA.

In [None]:
resources = Resources()

# IVF-PQ
pq_dim = 64
build_params_ivf_pq = ivf_pq.IndexParams(
    n_lists=1024, 
    metric="cosine", 
    pq_dim=pq_dim, 
    pq_bits=8
)
start = time.time()
index_ivf_pq = ivf_pq.build(build_params_ivf_pq, cp.asarray(dataset), resources=resources)
resources.sync()
ivf_pq_build_time = time.time() - start
print(f"IVF-PQ index built in {ivf_pq_build_time:.2f}s")

# CAGRA
build_params_cagra = cagra.IndexParams(metric="cosine", intermediate_graph_degree=64, graph_degree=32)
start = time.time()
index_cagra = cagra.build(build_params_cagra, cp.asarray(dataset), resources=resources)
resources.sync()
cagra_build_time = time.time() - start
print(f"CAGRA index built in {cagra_build_time:.2f}s")

## 4. Realizar Búsqueda de Vectores

Ejecutamos búsquedas en ambos índices.

In [None]:
k = 10

# IVF-PQ search
search_params_ivf_pq = ivf_pq.SearchParams(n_probes=20)
start = time.time()
dist_ivf_pq, neigh_ivf_pq = ivf_pq.search(search_params_ivf_pq, index_ivf_pq, cp.asarray(queries), k=k, resources=resources)
resources.sync()
ivf_pq_search_time = time.time() - start

# CAGRA search
search_params_cagra = cagra.SearchParams(itopk_size=64)
start = time.time()
dist_cagra, neigh_cagra = cagra.search(search_params_cagra, index_cagra, cp.asarray(queries), k=k, resources=resources)
resources.sync()
cagra_search_time = time.time() - start

print(f"IVF-PQ search: {ivf_pq_search_time:.4f}s")
print(f"CAGRA search: {cagra_search_time:.4f}s")

## 5. Evaluar Precisión de Búsqueda

Calculamos recall para ambos algoritmos.

In [None]:
from sklearn.metrics.pairwise import cosine_distances
exact_distances = cosine_distances(queries, dataset)
exact_neighbors = np.argsort(exact_distances, axis=1)[:, :k]

def recall_at_k(pred, true, k):
    recall = 0
    for i in range(len(pred)):
        recall += len(set(pred[i]) & set(true[i])) / k
    return recall / len(pred)

recall_ivf_pq = recall_at_k(cp.asnumpy(neigh_ivf_pq), exact_neighbors, k)
recall_cagra = recall_at_k(cp.asnumpy(neigh_cagra), exact_neighbors, k)

print(f"IVF-PQ Recall@{k}: {recall_ivf_pq:.4f}")
print(f"CAGRA Recall@{k}: {recall_cagra:.4f}")

## 6. Benchmark de Rendimiento

Medimos tiempos de construcción y búsqueda.

In [None]:
print(f"IVF-PQ Build Time: {ivf_pq_build_time:.2f}s")
print(f"CAGRA Build Time: {cagra_build_time:.2f}s")
print(f"IVF-PQ Search Time: {ivf_pq_search_time:.4f}s ({n_queries / ivf_pq_search_time:.0f} QPS)")
print(f"CAGRA Search Time: {cagra_search_time:.4f}s ({n_queries / cagra_search_time:.0f} QPS)")

## 7. Comparar con Alternativas

Comparamos con FAISS IVF-PQ.

In [None]:
!pip install faiss-gpu
import faiss

# FAISS IVF-PQ
quantizer = faiss.IndexFlatIP(dim)
index_faiss = faiss.IndexIVFPQ(quantizer, dim, 1024, pq_dim, 8)
index_faiss.train(dataset)
index_faiss.add(dataset)

start = time.time()
dist_faiss, neigh_faiss = index_faiss.search(queries, k)
faiss_search_time = time.time() - start

recall_faiss = recall_at_k(neigh_faiss, exact_neighbors, k)

print(f"FAISS IVF-PQ Search Time: {faiss_search_time:.4f}s")
print(f"FAISS Recall: {recall_faiss:.4f}")
print(f"CUVS IVF-PQ Recall: {recall_ivf_pq:.4f}")

## 8. Visualizar Resultados

Graficamos recall vs tiempo para diferentes algoritmos.

In [None]:
algorithms = ['IVF-PQ', 'CAGRA', 'FAISS IVF-PQ']
recalls = [recall_ivf_pq, recall_cagra, recall_faiss]
times = [ivf_pq_search_time, cagra_search_time, faiss_search_time]

plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.bar(algorithms, recalls)
plt.ylabel('Recall@10')
plt.title('Accuracy Comparison')

plt.subplot(1, 2, 2)
plt.bar(algorithms, times)
plt.ylabel('Search Time (s)')
plt.title('Performance Comparison')
plt.show()