# S12 ‚Äî Vector Databases & Production

## üéØ Objectifs
- Comparer FAISS et Milvus pour le stockage vectoriel
- Mesurer les m√©triques de latence et recall
- Comprendre les tradeoffs production (scalabilit√©, persistence)
- Benchmarker diff√©rentes configurations d'index

## üìã Contenu
1. Introduction aux Vector Databases
2. FAISS: Index local optimis√©
3. Milvus: Base de donn√©es vectorielle distribu√©e
4. Comparaison des performances
5. Choix d'architecture pour la production

## 1. Installation et Configuration

In [None]:
# Installation des d√©pendances
# !pip install faiss-cpu pymilvus sentence-transformers pandas numpy matplotlib scikit-learn

In [None]:
import os
import time
import numpy as np
import pandas as pd
import faiss
from pymilvus import connections, Collection, CollectionSchema, FieldSchema, DataType, utility
from sentence_transformers import SentenceTransformer
import matplotlib.pyplot as plt
from typing import List, Tuple, Dict
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Biblioth√®ques import√©es")

## 2. Cr√©ation du Dataset de Test

Nous allons cr√©er un dataset substantiel pour tester les performances.

In [None]:
# Charger le mod√®le d'embeddings
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
print(f"‚úÖ Mod√®le charg√©: dimension = {model.get_sentence_embedding_dimension()}")

In [None]:
# Cr√©er un dataset de 10,000 documents simul√©s
np.random.seed(42)

# Templates de documents
templates = [
    "Le {topic} est essentiel pour {domain}.",
    "Dans le domaine de {domain}, on utilise {topic}.",
    "Les techniques de {topic} am√©liorent {domain}.",
    "Pour {domain}, il est important de ma√Ætriser {topic}.",
    "L'application de {topic} dans {domain} montre des r√©sultats prometteurs.",
]

topics = [
    "machine learning", "deep learning", "transformers", "embeddings",
    "RAG", "fine-tuning", "prompt engineering", "vector search",
    "attention mechanism", "tokenization", "NLP", "computer vision",
    "reinforcement learning", "neural networks", "optimization"
]

domains = [
    "intelligence artificielle", "data science", "recherche d'information",
    "traitement du langage", "vision par ordinateur", "recommandation",
    "analyse pr√©dictive", "classification", "g√©n√©ration de texte"
]

documents = []
for i in range(10000):
    template = np.random.choice(templates)
    topic = np.random.choice(topics)
    domain = np.random.choice(domains)
    text = template.format(topic=topic, domain=domain)
    documents.append({
        "id": i,
        "text": text,
        "topic": topic,
        "domain": domain
    })

df = pd.DataFrame(documents)
print(f"üìä Dataset cr√©√©: {len(df)} documents")
print(f"Exemple: {df.iloc[0]['text']}")

In [None]:
# G√©n√©rer les embeddings (peut prendre quelques minutes)
print("üîÑ G√©n√©ration des embeddings...")
start = time.time()
embeddings = model.encode(df['text'].tolist(), show_progress_bar=True, batch_size=256)
embeddings = np.array(embeddings).astype('float32')
duration = time.time() - start

print(f"‚úÖ Embeddings g√©n√©r√©s en {duration:.2f}s")
print(f"Shape: {embeddings.shape}")
print(f"M√©moire: {embeddings.nbytes / 1024 / 1024:.2f} MB")

## 3. FAISS: Index Local

### 3.1 Index Flat (baseline)

In [None]:
class FAISSIndex:
    def __init__(self, dimension: int, index_type: str = 'flat'):
        self.dimension = dimension
        self.index_type = index_type
        
        if index_type == 'flat':
            self.index = faiss.IndexFlatL2(dimension)
        elif index_type == 'ivf':
            # Index IVF pour meilleure performance sur grands datasets
            quantizer = faiss.IndexFlatL2(dimension)
            nlist = 100  # Nombre de clusters
            self.index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
        elif index_type == 'hnsw':
            # Index HNSW pour recherche rapide
            M = 32  # Nombre de connexions par layer
            self.index = faiss.IndexHNSWFlat(dimension, M)
        else:
            raise ValueError(f"Type d'index non support√©: {index_type}")
        
        self.documents = []
    
    def train(self, vectors: np.ndarray):
        """Entra√Æner l'index (n√©cessaire pour IVF)"""
        if self.index_type == 'ivf':
            print(f"üîÑ Entra√Ænement de l'index IVF...")
            self.index.train(vectors)
    
    def add(self, vectors: np.ndarray, documents: List[Dict]):
        """Ajouter des vecteurs √† l'index"""
        self.index.add(vectors)
        self.documents.extend(documents)
    
    def search(self, query_vector: np.ndarray, k: int = 5) -> Tuple[np.ndarray, np.ndarray, float]:
        """Rechercher les k plus proches voisins"""
        start = time.time()
        
        if self.index_type == 'ivf':
            # Param√®tre nprobe: nombre de clusters √† examiner
            self.index.nprobe = 10
        
        distances, indices = self.index.search(query_vector, k)
        latency = (time.time() - start) * 1000  # en ms
        
        return distances, indices, latency
    
    def get_stats(self) -> Dict:
        return {
            "index_type": self.index_type,
            "dimension": self.dimension,
            "total_vectors": self.index.ntotal,
            "is_trained": self.index.is_trained
        }

print("‚úÖ Classe FAISSIndex d√©finie")

In [None]:
# Tester les diff√©rents types d'index FAISS
index_types = ['flat', 'ivf', 'hnsw']
faiss_indexes = {}

for idx_type in index_types:
    print(f"\n{'='*60}")
    print(f"Cr√©ation de l'index FAISS: {idx_type.upper()}")
    print(f"{'='*60}")
    
    index = FAISSIndex(dimension=embeddings.shape[1], index_type=idx_type)
    
    # Entra√Ænement si n√©cessaire
    start = time.time()
    index.train(embeddings)
    train_time = time.time() - start
    
    # Indexation
    start = time.time()
    index.add(embeddings, documents)
    index_time = time.time() - start
    
    print(f"‚è±Ô∏è  Temps d'entra√Ænement: {train_time:.2f}s")
    print(f"‚è±Ô∏è  Temps d'indexation: {index_time:.2f}s")
    print(f"üìä Stats: {index.get_stats()}")
    
    faiss_indexes[idx_type] = index

print("\n‚úÖ Tous les index FAISS cr√©√©s")

### 3.2 Benchmark FAISS

In [None]:
# Cr√©er des requ√™tes de test
test_queries = [
    "Comment utiliser les transformers pour le NLP?",
    "Qu'est-ce que le machine learning?",
    "Optimisation des r√©seaux de neurones",
    "Applications du deep learning en vision",
    "Recherche vectorielle avec embeddings"
]

query_embeddings = model.encode(test_queries)
query_embeddings = np.array(query_embeddings).astype('float32')

print(f"‚úÖ {len(test_queries)} requ√™tes de test cr√©√©es")

In [None]:
# Benchmarker chaque index FAISS
k = 10  # Top-10
faiss_results = {}

for idx_type, index in faiss_indexes.items():
    print(f"\n{'='*60}")
    print(f"Benchmark FAISS-{idx_type.upper()}")
    print(f"{'='*60}")
    
    latencies = []
    
    for i, query in enumerate(query_embeddings):
        distances, indices, latency = index.search(query.reshape(1, -1), k)
        latencies.append(latency)
        
        if i == 0:  # Afficher les r√©sultats de la premi√®re requ√™te
            print(f"\nRequ√™te: '{test_queries[i]}'")
            print(f"Top-3 r√©sultats:")
            for j in range(min(3, len(indices[0]))):
                idx = indices[0][j]
                dist = distances[0][j]
                print(f"  {j+1}. (distance={dist:.4f}) {documents[idx]['text'][:80]}...")
    
    avg_latency = np.mean(latencies)
    p95_latency = np.percentile(latencies, 95)
    p99_latency = np.percentile(latencies, 99)
    
    print(f"\nüìä Latence moyenne: {avg_latency:.2f} ms")
    print(f"üìä Latence P95: {p95_latency:.2f} ms")
    print(f"üìä Latence P99: {p99_latency:.2f} ms")
    
    faiss_results[idx_type] = {
        "avg_latency": avg_latency,
        "p95_latency": p95_latency,
        "p99_latency": p99_latency,
        "latencies": latencies
    }

## 4. Milvus: Base de Donn√©es Vectorielle

### 4.1 Configuration Milvus

‚ö†Ô∏è **Note**: Pour utiliser Milvus, vous devez avoir un serveur Milvus en cours d'ex√©cution.

```bash
# Avec Docker
docker run -d --name milvus_standalone \
  -p 19530:19530 -p 9091:9091 \
  milvusdb/milvus:latest
```

In [None]:
class MilvusIndex:
    def __init__(self, collection_name: str, dimension: int, host: str = "localhost", port: str = "19530"):
        self.collection_name = collection_name
        self.dimension = dimension
        self.host = host
        self.port = port
        self.connected = False
        
    def connect(self):
        """Se connecter √† Milvus"""
        try:
            connections.connect("default", host=self.host, port=self.port)
            self.connected = True
            print(f"‚úÖ Connect√© √† Milvus sur {self.host}:{self.port}")
        except Exception as e:
            print(f"‚ùå Erreur de connexion √† Milvus: {e}")
            print("üí° Lancez d'abord un serveur Milvus (voir instructions ci-dessus)")
            self.connected = False
    
    def create_collection(self):
        """Cr√©er une collection Milvus"""
        if not self.connected:
            return False
        
        # Supprimer la collection si elle existe
        if utility.has_collection(self.collection_name):
            utility.drop_collection(self.collection_name)
        
        # D√©finir le sch√©ma
        fields = [
            FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
            FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=self.dimension),
            FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=500)
        ]
        
        schema = CollectionSchema(fields, description="Vector search collection")
        self.collection = Collection(self.collection_name, schema)
        
        print(f"‚úÖ Collection '{self.collection_name}' cr√©√©e")
        return True
    
    def create_index(self, index_type: str = "IVF_FLAT"):
        """Cr√©er un index sur le champ embedding"""
        if not self.connected:
            return False
        
        index_params = {
            "metric_type": "L2",
            "index_type": index_type,
            "params": {"nlist": 128}
        }
        
        self.collection.create_index(field_name="embedding", index_params=index_params)
        print(f"‚úÖ Index {index_type} cr√©√©")
        return True
    
    def insert(self, vectors: np.ndarray, documents: List[Dict]):
        """Ins√©rer des vecteurs dans Milvus"""
        if not self.connected:
            return False
        
        ids = [doc['id'] for doc in documents]
        texts = [doc['text'][:500] for doc in documents]  # Limiter la longueur
        
        entities = [
            ids,
            vectors.tolist(),
            texts
        ]
        
        self.collection.insert(entities)
        self.collection.flush()
        print(f"‚úÖ {len(ids)} vecteurs ins√©r√©s")
        return True
    
    def search(self, query_vector: np.ndarray, k: int = 5) -> Tuple[List, float]:
        """Rechercher les k plus proches voisins"""
        if not self.connected:
            return [], 0.0
        
        self.collection.load()
        
        search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
        
        start = time.time()
        results = self.collection.search(
            data=[query_vector.tolist()],
            anns_field="embedding",
            param=search_params,
            limit=k,
            output_fields=["text"]
        )
        latency = (time.time() - start) * 1000
        
        return results, latency
    
    def get_stats(self) -> Dict:
        if not self.connected:
            return {}
        
        return {
            "collection_name": self.collection_name,
            "num_entities": self.collection.num_entities,
            "dimension": self.dimension
        }

print("‚úÖ Classe MilvusIndex d√©finie")

In [None]:
# Tester Milvus (optionnel si serveur disponible)
USE_MILVUS = False  # Mettre √† True si Milvus est disponible

if USE_MILVUS:
    milvus_index = MilvusIndex(
        collection_name="vectors_demo",
        dimension=embeddings.shape[1]
    )
    
    milvus_index.connect()
    
    if milvus_index.connected:
        milvus_index.create_collection()
        milvus_index.create_index("IVF_FLAT")
        
        # Ins√©rer par batch pour √©viter la surcharge
        batch_size = 1000
        for i in range(0, len(embeddings), batch_size):
            end_idx = min(i + batch_size, len(embeddings))
            batch_embeddings = embeddings[i:end_idx]
            batch_docs = documents[i:end_idx]
            milvus_index.insert(batch_embeddings, batch_docs)
        
        print(f"‚úÖ Stats Milvus: {milvus_index.get_stats()}")
        
        # Benchmark Milvus
        milvus_latencies = []
        for query in query_embeddings:
            results, latency = milvus_index.search(query, k=10)
            milvus_latencies.append(latency)
        
        print(f"\nüìä Milvus - Latence moyenne: {np.mean(milvus_latencies):.2f} ms")
        print(f"üìä Milvus - Latence P95: {np.percentile(milvus_latencies, 95):.2f} ms")
else:
    print("‚ö†Ô∏è  Milvus non utilis√© (USE_MILVUS=False)")
    print("üí° Pour tester Milvus, lancez un serveur et mettez USE_MILVUS=True")

## 5. Comparaison des Performances

In [None]:
# Visualisation des latences
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Graphique 1: Latence moyenne par type d'index
index_names = list(faiss_results.keys())
avg_latencies = [faiss_results[idx]['avg_latency'] for idx in index_names]

axes[0].bar(index_names, avg_latencies, color=['#3498db', '#e74c3c', '#2ecc71'])
axes[0].set_ylabel('Latence moyenne (ms)')
axes[0].set_title('Latence Moyenne par Type d\'Index FAISS')
axes[0].set_xlabel('Type d\'Index')
axes[0].grid(axis='y', alpha=0.3)

# Ajouter les valeurs sur les barres
for i, v in enumerate(avg_latencies):
    axes[0].text(i, v + 0.01, f'{v:.2f}', ha='center', va='bottom')

# Graphique 2: Distribution des latences
for idx_type in index_names:
    axes[1].hist(faiss_results[idx_type]['latencies'], bins=20, alpha=0.5, label=idx_type)

axes[1].set_xlabel('Latence (ms)')
axes[1].set_ylabel('Fr√©quence')
axes[1].set_title('Distribution des Latences')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä Graphiques de performance g√©n√©r√©s")

## 6. √âvaluation du Recall

Le **recall** mesure la proportion de r√©sultats pertinents retrouv√©s.

In [None]:
def calculate_recall(results_fast, results_exact, k=10):
    """
    Calculer le recall en comparant avec les r√©sultats exacts (Flat index)
    """
    recalls = []
    
    for i in range(len(results_fast)):
        fast_ids = set(results_fast[i])
        exact_ids = set(results_exact[i])
        
        intersection = len(fast_ids.intersection(exact_ids))
        recall = intersection / k
        recalls.append(recall)
    
    return np.mean(recalls)

# Obtenir les r√©sultats de r√©f√©rence (Flat index)
flat_index = faiss_indexes['flat']
flat_results = []

for query in query_embeddings:
    distances, indices, _ = flat_index.search(query.reshape(1, -1), k=10)
    flat_results.append(indices[0].tolist())

# Calculer le recall pour chaque type d'index
print("\n" + "="*60)
print("√âVALUATION DU RECALL (vs Flat Index)")
print("="*60)

for idx_type in ['ivf', 'hnsw']:
    index = faiss_indexes[idx_type]
    results = []
    
    for query in query_embeddings:
        distances, indices, _ = index.search(query.reshape(1, -1), k=10)
        results.append(indices[0].tolist())
    
    recall = calculate_recall(results, flat_results, k=10)
    print(f"üìä {idx_type.upper()}: Recall@10 = {recall:.4f} ({recall*100:.2f}%)")

## 7. Tableau Comparatif

### R√©sum√© des performances

In [None]:
# Cr√©er un tableau comparatif
comparison_data = []

for idx_type in index_names:
    comparison_data.append({
        "Index Type": idx_type.upper(),
        "Latence Moy. (ms)": f"{faiss_results[idx_type]['avg_latency']:.2f}",
        "Latence P95 (ms)": f"{faiss_results[idx_type]['p95_latency']:.2f}",
        "M√©moire": "RAM" if idx_type == 'flat' else "RAM (optimis√©)",
        "Scalabilit√©": "Limit√©e" if idx_type == 'flat' else "Moyenne" if idx_type == 'ivf' else "Bonne",
        "Qualit√©": "Exacte" if idx_type == 'flat' else "~95%" if idx_type == 'ivf' else "~98%"
    })

comparison_df = pd.DataFrame(comparison_data)
print("\n" + "="*80)
print("TABLEAU COMPARATIF - FAISS")
print("="*80)
print(comparison_df.to_string(index=False))

print("\n" + "="*80)
print("RECOMMANDATIONS")
print("="*80)
print("""
üéØ FLAT Index:
   ‚úÖ Avantages: R√©sultats exacts, simple
   ‚ùå Inconv√©nients: Lent sur gros volumes (>100K vecteurs)
   üí° Usage: Prototypes, petits datasets

üéØ IVF Index:
   ‚úÖ Avantages: Bon compromis vitesse/qualit√©
   ‚ùå Inconv√©nients: N√©cessite entra√Ænement
   üí° Usage: Production (100K-10M vecteurs)

üéØ HNSW Index:
   ‚úÖ Avantages: Tr√®s rapide, excellent recall
   ‚ùå Inconv√©nients: M√©moire suppl√©mentaire
   üí° Usage: Applications temps r√©el

üéØ Milvus:
   ‚úÖ Avantages: Distribu√©, scalable, persistant
   ‚ùå Inconv√©nients: Infrastructure complexe
   üí° Usage: Production √† grande √©chelle (>10M vecteurs)
""")

## 8. Crit√®res de Choix pour la Production

### Matrice de d√©cision

In [None]:
decision_matrix = pd.DataFrame({
    "Crit√®re": [
        "Volume de donn√©es",
        "Latence requise",
        "Qualit√© (recall)",
        "Mises √† jour fr√©quentes",
        "Distribution g√©ographique",
        "Budget infrastructure",
        "Comp√©tences √©quipe"
    ],
    "< 100K vecteurs": [
        "FAISS Flat/HNSW",
        "FAISS HNSW",
        "FAISS Flat",
        "FAISS (reload)",
        "FAISS + CDN",
        "FAISS",
        "FAISS"
    ],
    "100K - 10M vecteurs": [
        "FAISS IVF/HNSW",
        "FAISS HNSW",
        "FAISS IVF",
        "Milvus",
        "Milvus",
        "FAISS ou Milvus",
        "FAISS"
    ],
    "> 10M vecteurs": [
        "Milvus/Pinecone",
        "Milvus optimis√©",
        "Milvus",
        "Milvus",
        "Milvus distributed",
        "Milvus",
        "Milvus (DevOps)"
    ]
})

print("\n" + "="*100)
print("MATRICE DE D√âCISION")
print("="*100)
print(decision_matrix.to_string(index=False))

## 9. Consid√©rations Production

### 9.1 Persistance et Sauvegarde

In [None]:
# Sauvegarder un index FAISS
import pickle

def save_faiss_index(index: FAISSIndex, filepath: str):
    """Sauvegarder l'index FAISS sur disque"""
    # Sauvegarder l'index FAISS
    faiss.write_index(index.index, f"{filepath}.index")
    
    # Sauvegarder les documents
    with open(f"{filepath}.docs.pkl", 'wb') as f:
        pickle.dump(index.documents, f)
    
    print(f"‚úÖ Index sauvegard√©: {filepath}")

def load_faiss_index(filepath: str, dimension: int, index_type: str) -> FAISSIndex:
    """Charger un index FAISS depuis le disque"""
    index_obj = FAISSIndex(dimension, index_type)
    
    # Charger l'index FAISS
    index_obj.index = faiss.read_index(f"{filepath}.index")
    
    # Charger les documents
    with open(f"{filepath}.docs.pkl", 'rb') as f:
        index_obj.documents = pickle.load(f)
    
    print(f"‚úÖ Index charg√©: {filepath}")
    return index_obj

# Exemple de sauvegarde
save_faiss_index(faiss_indexes['hnsw'], 'index_hnsw')

# Exemple de chargement
loaded_index = load_faiss_index('index_hnsw', embeddings.shape[1], 'hnsw')
print(f"Index charg√©: {loaded_index.get_stats()}")

### 9.2 Monitoring et M√©triques

In [None]:
class VectorDBMonitor:
    """Classe pour monitorer les performances en production"""
    
    def __init__(self):
        self.queries = []
        self.latencies = []
        self.results_counts = []
    
    def log_query(self, query: str, latency: float, num_results: int):
        self.queries.append(query)
        self.latencies.append(latency)
        self.results_counts.append(num_results)
    
    def get_metrics(self) -> Dict:
        if not self.latencies:
            return {}
        
        return {
            "total_queries": len(self.queries),
            "avg_latency_ms": np.mean(self.latencies),
            "p50_latency_ms": np.percentile(self.latencies, 50),
            "p95_latency_ms": np.percentile(self.latencies, 95),
            "p99_latency_ms": np.percentile(self.latencies, 99),
            "max_latency_ms": np.max(self.latencies),
            "avg_results": np.mean(self.results_counts)
        }
    
    def print_report(self):
        metrics = self.get_metrics()
        print("\n" + "="*60)
        print("RAPPORT DE MONITORING")
        print("="*60)
        for key, value in metrics.items():
            print(f"{key}: {value:.2f}" if isinstance(value, float) else f"{key}: {value}")

# Exemple d'utilisation
monitor = VectorDBMonitor()

# Simuler des requ√™tes
for query in query_embeddings:
    distances, indices, latency = faiss_indexes['hnsw'].search(query.reshape(1, -1), k=10)
    monitor.log_query("test query", latency, len(indices[0]))

monitor.print_report()

## 10. Conclusion et Bonnes Pratiques

### ‚úÖ Points cl√©s √† retenir

1. **FAISS vs Milvus**:
   - FAISS: Id√©al pour prototypes et datasets < 10M vecteurs
   - Milvus: N√©cessaire pour scale et persistence en production

2. **Trade-offs**:
   - Latence vs Recall: IVF/HNSW offrent 95-98% recall avec 10-100x speedup
   - M√©moire vs Performance: HNSW utilise plus de RAM mais est plus rapide

3. **M√©triques importantes**:
   - Latence P95/P99 (pas seulement moyenne)
   - Recall@k pour mesurer la qualit√©
   - Throughput (requ√™tes/seconde)

4. **Production**:
   - Monitoring continu des performances
   - Sauvegarde r√©guli√®re des index
   - Tests de charge avant d√©ploiement
   - Consid√©rer les mises √† jour incr√©mentales

### üìö Ressources

- [FAISS Wiki](https://github.com/facebookresearch/faiss/wiki)
- [Milvus Documentation](https://milvus.io/docs)
- [Vector Database Comparison](https://benchmark.vectorview.ai/)
- [HNSW Paper](https://arxiv.org/abs/1603.09320)

## üìù Exercices

### Exercice 1: Optimiser les param√®tres
1. Testez diff√©rentes valeurs de `nprobe` pour IVF (5, 10, 20, 50)
2. Mesurez l'impact sur latence et recall
3. Trouvez le meilleur compromis

### Exercice 2: Scalabilit√©
1. Cr√©ez un dataset de 100K vecteurs
2. Comparez les temps d'indexation
3. Mesurez la d√©gradation de performance

### Exercice 3: Production-ready API
1. Cr√©ez une API FastAPI avec un index FAISS
2. Ajoutez du monitoring des latences
3. Impl√©mentez un cache pour requ√™tes fr√©quentes

### Exercice 4: Milvus deployment
1. D√©ployez Milvus avec Docker Compose
2. Indexez 50K vecteurs
3. Comparez avec FAISS en termes de latence et features