# MRR 指标

MRR (Mean Reciprocal Rank) 是评估信息检索系统的重要指标之一。让我解释下：

1. **定义**
MRR = 1/N * Σ(1/rank_i)
- N 是查询总数
- rank_i 是第i个查询的第一个正确答案的排名
- 如果没有正确答案，该查询的得分为0

2. **举例**
```python
# 假设有3个查询
# 查询1：正确答案在第1位，得分 1/1
# 查询2：正确答案在第3位，得分 1/3
# 查询3：正确答案在第2位，得分 1/2

MRR = (1/1 + 1/3 + 1/2) / 3 = 0.611
```

这是一个计算MRR的具体实现：

```python
def calculate_mrr(self, 
                 query_embeddings: np.ndarray, 
                 corpus_embeddings: np.ndarray, 
                 relevant_docs: Dict[int, List[int]]) -> Tuple[float, List[float]]:
    """
    计算MRR，并返回每个查询的具体得分
    
    Args:
        query_embeddings: 查询的嵌入向量
        corpus_embeddings: 文档库的嵌入向量
        relevant_docs: 字典，key是查询ID，value是相关文档ID列表
    
    Returns:
        tuple: (mrr_score, list_of_individual_scores)
    """
    reciprocal_ranks = []
    individual_scores = []
    
    # 计算所有查询和文档间的相似度
    similarities = cosine_similarity(query_embeddings, corpus_embeddings)
    
    for query_id, relevant_doc_ids in relevant_docs.items():
        # 获取相似度降序排序的文档索引
        ranked_indices = np.argsort(similarities[query_id])[::-1]
        
        # 找到第一个相关文档的排名
        rank = None
        for pos, doc_idx in enumerate(ranked_indices, 1):
            if doc_idx in relevant_doc_ids:
                rank = pos
                break
        
        # 计算这个查询的得分
        if rank is not None:
            score = 1.0 / rank
            reciprocal_ranks.append(score)
        else:
            # 如果没找到相关文档，得分为0
            score = 0.0
            reciprocal_ranks.append(score)
            
        individual_scores.append({
            'query_id': query_id,
            'rank': rank,
            'score': score,
            'top_docs': ranked_indices[:5].tolist(),  # 记录前5个文档的ID
            'top_similarities': similarities[query_id][ranked_indices[:5]].tolist()  # 记录相似度分数
        })
    
    mrr = np.mean(reciprocal_ranks)
    return mrr, individual_scores

# 使用示例：
def analyze_results(self, queries, corpus, results):
    """
    详细分析每个查询的表现
    """
    print("\nDetailed Analysis:")
    print("=================")
    
    for model_key, metrics in results.items():
        print(f"\n{self.model_names[model_key]}:")
        mrr, individual_scores = metrics['individual_scores']
        
        print(f"Overall MRR: {mrr:.4f}")
        print("\nPer-query analysis:")
        
        for score_info in individual_scores:
            query_id = score_info['query_id']
            print(f"\nQuery {query_id}: {queries[query_id]}")
            print(f"First relevant doc rank: {score_info['rank']}")
            print(f"Reciprocal rank score: {score_info['score']:.4f}")
            
            print("Top 5 retrieved documents:")
            for i, (doc_id, sim) in enumerate(zip(score_info['top_docs'], 
                                                score_info['top_similarities']), 1):
                print(f"{i}. [{sim:.4f}] {corpus[doc_id][:100]}...")
```

MRR的特点：

1. **优点**：
- 关注第一个正确答案的位置
- 计算简单，易于理解
- 对排名靠前的结果更敏感

2. **适用场景**：
- 信息检索系统评估
- 问答系统评估
- 搜索引擎评估
- 推荐系统评估

3. **解释**：
- MRR的值在0到1之间
- 值越大表示性能越好
- 1表示所有查询的正确答案都在第一位
- 如果正确答案排名靠后，得分会很低

4. **与其他指标的比较**：
- Hit@K关注是否有正确答案在前K个结果中
- MAP(Mean Average Precision)考虑所有相关文档的位置
- MRR只关注第一个正确答案的位置

这种细化的分析可以帮助我们：
1. 了解模型在不同类型查询上的表现
2. 发现模型的优势和不足
3. 识别需要改进的地方
4. 比较不同模型的具体表现差异

# sklearn 中的 cosine_similarity

假设:
- query_embeddings 的维度是 [num_queries, embedding_dim]
- corpus_embeddings 的维度是 [num_documents, embedding_dim]

其中：
- num_queries 是查询的数量
- num_documents 是文档的数量
- embedding_dim 是向量维度（这个必须相同）

举例：
```python
# 示例维度
query_embeddings.shape  = (16, 384)    # 16个查询，每个embedding维度384
corpus_embeddings.shape = (28, 384)    # 28个文档，每个embedding维度384

# cosine_similarity 的结果维度
similarities.shape = (16, 28)  # 每个查询对应每个文档的相似度
```

cosine_similarity 的计算过程：
```python
def cosine_similarity_explained(query_embeddings, corpus_embeddings):
    """
    演示余弦相似度的计算过程
    
    Args:
        query_embeddings: shape [num_queries, embedding_dim]
        corpus_embeddings: shape [num_documents, embedding_dim]
    
    Returns:
        similarities: shape [num_queries, num_documents]
    """
    # 1. 计算 L2 范数 (向量长度)
    query_norms = np.linalg.norm(query_embeddings, axis=1, keepdims=True)
    corpus_norms = np.linalg.norm(corpus_embeddings, axis=1, keepdims=True)
    
    # 2. 归一化向量
    query_normalized = query_embeddings / query_norms
    corpus_normalized = corpus_embeddings / corpus_norms
    
    # 3. 矩阵乘法计算相似度
    # (num_queries, embedding_dim) @ (embedding_dim, num_documents)
    # = (num_queries, num_documents)
    similarities = np.dot(query_normalized, corpus_normalized.T)
    
    return similarities

# 使用示例
num_queries = 3
num_documents = 4
embedding_dim = 5

query_embeddings = np.random.rand(num_queries, embedding_dim)
corpus_embeddings = np.random.rand(num_documents, embedding_dim)

similarities = cosine_similarity_explained(query_embeddings, corpus_embeddings)
print("Query embeddings shape:", query_embeddings.shape)
print("Corpus embeddings shape:", corpus_embeddings.shape)
print("Similarities shape:", similarities.shape)
```

关键点：
1. embedding_dim 必须相同，因为这是同一个向量空间
2. num_queries 和 num_documents 可以不同
3. 最终得到的相似度矩阵维度是 [num_queries, num_documents]
4. 每个元素 similarities[i,j] 表示第i个查询与第j个文档的余弦相似度

在我们的评估代码中：
```python
def calculate_mrr(self, query_embeddings, corpus_embeddings, relevant_docs):
    # similarities[i,j] 是第i个查询和第j个文档的相似度
    similarities = cosine_similarity(query_embeddings, corpus_embeddings)
    
    for query_id, relevant_doc_ids in relevant_docs.items():
        # 对于每个查询，获取所有文档的相似度排序
        ranked_indices = np.argsort(similarities[query_id])[::-1]
        # similarities[query_id] 是一个长度为 num_documents 的向量
```

这也解释了为什么 similarities[query_id] 可以得到特定查询与所有文档的相似度向量。

In [2]:
query_embeddings

array([[0.10553585, 0.1955418 , 0.85905077, 0.50188372, 0.96730807],
       [0.61035641, 0.22100457, 0.46646891, 0.58499486, 0.94678857],
       [0.60733863, 0.30711586, 0.93369464, 0.47716306, 0.04734569]])

In [3]:
corpus_embeddings

array([[0.5731121 , 0.02161613, 0.80159162, 0.8401583 , 0.80807207],
       [0.51913204, 0.0789993 , 0.85645666, 0.44378902, 0.92486281],
       [0.39858005, 0.98111986, 0.8457453 , 0.46485596, 0.8464015 ],
       [0.70138594, 0.77555282, 0.01866932, 0.52679028, 0.65922755]])

In [6]:
corpus_embeddings.T

array([[0.5731121 , 0.51913204, 0.39858005, 0.70138594],
       [0.02161613, 0.0789993 , 0.98111986, 0.77555282],
       [0.80159162, 0.85645666, 0.8457453 , 0.01866932],
       [0.8401583 , 0.44378902, 0.46485596, 0.52679028],
       [0.80807207, 0.92486281, 0.8464015 , 0.65922755]])

In [4]:
similarities

array([[0.91206929, 0.95319192, 0.86055598, 0.60565383],
       [0.94899571, 0.9500114 , 0.84549014, 0.8366131 ],
       [0.8076444 , 0.77653305, 0.7661669 , 0.57354788]])

In [7]:
similarities[0]

array([0.91206929, 0.95319192, 0.86055598, 0.60565383])

In [9]:
similarities[0,0]

0.9120692908380097

In [11]:
import numpy as np
from sentence_transformers import SentenceTransformer
import time
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
from typing import List, Dict, Tuple
from collections import defaultdict

class EmbeddingEvaluator:
    def __init__(self):
        self.model1 = SentenceTransformer('/root/app/models/point_large_embedding_zh')
        self.model2 = SentenceTransformer('/root/app/models/Conan-embedding-v1')
        
        self.model_names = {
            'model1': 'Point',
            'model2': 'Cona'
        }

    def compute_embeddings(self, texts: List[str], model: SentenceTransformer) -> np.ndarray:
        return model.encode(texts)

    def calculate_mrr(self, 
                     query_embeddings: np.ndarray, 
                     corpus_embeddings: np.ndarray, 
                     relevant_docs: Dict[int, List[int]]) -> float:
        """
        计算 Mean Reciprocal Rank
        relevant_docs: 字典，key是查询ID，value是相关文档ID列表
        """
        reciprocal_ranks = []
        
        # 计算余弦相似度
        similarities = cosine_similarity(query_embeddings, corpus_embeddings)
        
        for query_id, relevant_doc_ids in relevant_docs.items():
            # 获取相似度排序的索引
            ranked_indices = np.argsort(similarities[query_id])[::-1]
            
            # 找到第一个相关文档的位置
            for rank, doc_idx in enumerate(ranked_indices, 1):
                if doc_idx in relevant_doc_ids:
                    reciprocal_ranks.append(1.0 / rank)
                    break
                    
        return np.mean(reciprocal_ranks)

    def calculate_hit_rate(self, 
                          query_embeddings: np.ndarray, 
                          corpus_embeddings: np.ndarray, 
                          relevant_docs: Dict[int, List[int]], 
                          k: int) -> float:
        """
        计算 Hit@k
        """
        hits = 0
        total_queries = len(relevant_docs)
        
        similarities = cosine_similarity(query_embeddings, corpus_embeddings)
        
        for query_id, relevant_doc_ids in relevant_docs.items():
            # 获取前k个最相似文档的索引
            top_k_indices = np.argsort(similarities[query_id])[::-1][:k]
            
            # 检查是否有相关文档在前k个结果中
            if any(idx in relevant_doc_ids for idx in top_k_indices):
                hits += 1
                
        return hits / total_queries

    def evaluate_models(self, 
                       queries: List[str], 
                       corpus: List[str], 
                       relevant_docs: Dict[int, List[int]], 
                       k_values: List[int]) -> Dict:
        results = {
            'model1': defaultdict(dict),
            'model2': defaultdict(dict)
        }
        
        for model_key, model in [('model1', self.model1), ('model2', self.model2)]:
            # 计算嵌入向量
            start_time = time.time()
            query_embeddings = self.compute_embeddings(queries, model)
            corpus_embeddings = self.compute_embeddings(corpus, model)
            processing_time = time.time() - start_time
            
            # 计算 MRR
            mrr = self.calculate_mrr(query_embeddings, corpus_embeddings, relevant_docs)
            
            # 计算不同k值的Hit@k
            hit_rates = {
                k: self.calculate_hit_rate(query_embeddings, corpus_embeddings, 
                                         relevant_docs, k) 
                for k in k_values
            }
            
            results[model_key] = {
                'processing_time': processing_time,
                'mrr': mrr,
                'hit_rates': hit_rates
            }
            
        return results

def get_test_data():
    # 测试数据集包含以下几个挑战类别：
    # 1. 跨语言相似性
    # 2. 同义词和近义词
    # 3. 抽象概念
    # 4. 专业领域知识
    # 5. 长文本与短文本
    # 6. 隐含语义
    
    queries = [
        # 跨语言查询
        "What is machine learning",
        "如何实现数据可视化",
        "deep learning applications",
        "自然语言处理的发展趋势",
        
        # 专业领域查询
        "BERT模型的原理",
        "Transformer architecture explained",
        "GPU vs TPU performance comparison",
        "分布式训练策略",
        
        # 抽象概念查询
        "人工智能的伦理问题",
        "The future of autonomous systems",
        "数据隐私保护方法",
        "Sustainable AI development",
        
        # 具体技术查询
        "PyTorch实现多GPU训练",
        "Kubernetes deployment best practices",
        "优化神经网络训练速度",
        "Implementing attention mechanism"
    ]
    
    corpus = [
        # 机器学习基础解释
        "Machine learning is a subset of artificial intelligence that focuses on data and algorithms",
        "机器学习是人工智能的一个子集，主要关注数据和算法",
        "深度学习是机器学习的一个分支，使用多层神经网络进行学习",
        "Deep learning is a branch of machine learning using multi-layer neural networks",
        
        # 数据可视化相关
        "Data visualization techniques include charts, graphs, and interactive dashboards",
        "数据可视化技术包括图表、图形和交互式仪表板",
        "使用Python的Matplotlib和Seaborn库进行数据可视化",
        "Advanced data visualization can be achieved using D3.js and WebGL",
        
        # BERT和Transformer相关
        "BERT uses bidirectional transformer architecture to understand context",
        "BERT模型通过双向Transformer架构来理解上下文语义",
        "Transformer architecture relies heavily on self-attention mechanisms",
        "Transformer架构主要依赖于自注意力机制",
        
        # 硬件和性能优化
        "GPUs excel at parallel processing while TPUs are optimized for tensor operations",
        "GPU适合并行处理，而TPU针对张量运算进行了优化",
        "分布式训练可以显著提高大规模模型的训练效率",
        "Distributed training can significantly improve the efficiency of large-scale models",
        
        # AI伦理和未来发展
        "AI ethics concerns include bias, privacy, and accountability",
        "人工智能伦理问题包括偏见、隐私和责任归属",
        "Autonomous systems must balance efficiency with safety and reliability",
        "自动驾驶系统需要在效率和安全性之间取得平衡",
        
        # 技术实现细节
        "PyTorch provides DataParallel and DistributedDataParallel for multi-GPU training",
        "PyTorch提供了DataParallel和DistributedDataParallel用于多GPU训练",
        "Kubernetes可以有效管理和扩展机器学习工作负载",
        "Kubernetes can effectively manage and scale machine learning workloads",
        
        # 神经网络优化
        "Neural network optimization techniques include gradient clipping and batch normalization",
        "神经网络优化技术包括梯度裁剪和批量归一化",
        "注意力机制通过权重计算来关注重要特征",
        "Attention mechanisms focus on important features through weight calculations"
    ]
    
    # 相关文档映射：查询ID -> 相关文档ID列表
    relevant_docs = {
        0: [0, 1, 2, 3],           # What is machine learning
        1: [4, 5, 6, 7],           # 数据可视化
        2: [2, 3],                 # deep learning applications
        3: [2, 3, 10, 11],         # 自然语言处理发展趋势
        4: [8, 9, 10, 11],         # BERT模型原理
        5: [10, 11],               # Transformer architecture
        6: [12, 13],               # GPU vs TPU
        7: [14, 15],               # 分布式训练
        8: [16, 17],               # AI伦理
        9: [18, 19],               # autonomous systems
        10: [16, 17],              # 数据隐私
        11: [16, 17, 18, 19],      # Sustainable AI
        12: [20, 21],              # PyTorch多GPU
        13: [22, 23],              # Kubernetes
        14: [24, 25],              # 优化神经网络
        15: [26, 27]               # attention mechanism
    }
    
    return queries, corpus, relevant_docs

def main():
    queries, corpus, relevant_docs = get_test_data()
    k_values = [1, 3, 5, 10]
    
    evaluator = EmbeddingEvaluator()
    results = evaluator.evaluate_models(queries, corpus, relevant_docs, k_values)
    
    print("\nEvaluation Results:")
    print("==================")
    
    for model_key, metrics in results.items():
        print(f"\n{evaluator.model_names[model_key]}:")
        print(f"Processing time: {metrics['processing_time']:.2f} seconds")
        print(f"MRR: {metrics['mrr']:.4f}")
        for k, hit_rate in metrics['hit_rates'].items():
            print(f"Hit@{k}: {hit_rate:.4f}")

if __name__ == "__main__":
    main()


Evaluation Results:

Point:
Processing time: 0.06 seconds
MRR: 0.9583
Hit@1: 0.9375
Hit@3: 1.0000
Hit@5: 1.0000
Hit@10: 1.0000

Cona:
Processing time: 0.08 seconds
MRR: 0.9688
Hit@1: 0.9375
Hit@3: 1.0000
Hit@5: 1.0000
Hit@10: 1.0000


## 读取 JSON 文件

In [None]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '6'
import numpy as np
from sentence_transformers import SentenceTransformer
import time
from sklearn.metrics.pairwise import cosine_similarity
from tqdm.autonotebook import tqdm, trange
from typing import List, Dict
from collections import defaultdict

class EmbeddingEvaluator:
    def __init__(self):
        self.model1 = SentenceTransformer('/root/app/models/Conan-embedding-v1')
        self.model2 = SentenceTransformer('/root/app/models/bge-m3')
        
        self.model_names = {
            'model1': 'Conan',
            'model2': 'BGE-M3'
        }

    def compute_embeddings(self, texts: List[str], model: SentenceTransformer) -> np.ndarray:
        return model.encode(texts)

    def calculate_mrr(self, 
                     query_embeddings: np.ndarray, 
                     corpus_embeddings: np.ndarray, 
                     relevant_docs: Dict[int, List[int]]) -> float:
        """
        计算 Mean Reciprocal Rank
        relevant_docs: 字典，key是查询ID，value是相关文档ID列表
        """
        reciprocal_ranks = []
        
        # 计算余弦相似度
        similarities = cosine_similarity(query_embeddings, corpus_embeddings)
        
        for query_id, relevant_doc_ids in relevant_docs.items():
            # 获取相似度排序的索引
            ranked_indices = np.argsort(similarities[query_id])[::-1]
            
            # 找到第一个相关文档的位置
            for rank, doc_idx in enumerate(ranked_indices, 1):
                if doc_idx in relevant_doc_ids:
                    reciprocal_ranks.append(1.0 / rank)
                    break
                    
        return np.mean(reciprocal_ranks)

    def calculate_hit_rate(self, 
                          query_embeddings: np.ndarray, 
                          corpus_embeddings: np.ndarray, 
                          relevant_docs: Dict[int, List[int]], 
                          k: int) -> float:
        """
        计算 Hit@k
        """
        hits = 0
        total_queries = len(relevant_docs)
        
        similarities = cosine_similarity(query_embeddings, corpus_embeddings)
        
        for query_id, relevant_doc_ids in relevant_docs.items():
            # 获取前k个最相似文档的索引
            top_k_indices = np.argsort(similarities[query_id])[::-1][:k]
            
            # 检查是否有相关文档在前k个结果中
            if any(idx in relevant_doc_ids for idx in top_k_indices):
                hits += 1
                
        return hits / total_queries

    def evaluate_models(self, 
                       queries: List[str], 
                       corpus: List[str], 
                       relevant_docs: Dict[int, List[int]], 
                       k_values: List[int]) -> Dict:
        results = {
            'model1': defaultdict(dict),
            'model2': defaultdict(dict)
        }
        
        for model_key, model in tqdm([('model1', self.model1), ('model2', self.model2)], 
                                desc="评估模型", position=0):
            # 计算嵌入向量
            start_time = time.time()
            query_embeddings = self.compute_embeddings(queries, model)
            corpus_embeddings = self.compute_embeddings(corpus, model)
            processing_time = time.time() - start_time
            
            # 计算 MRR
            mrr = self.calculate_mrr(query_embeddings, corpus_embeddings, relevant_docs)
            
            # 计算不同k值的Hit@k
            hit_rates = {
                k: self.calculate_hit_rate(query_embeddings, corpus_embeddings, 
                                         relevant_docs, k) 
                for k in k_values
            }
            
            results[model_key] = {
                'processing_time': processing_time,
                'mrr': mrr,
                'hit_rates': hit_rates
            }
            
        return results



def get_test_data():
    # 测试数据集包含以下几个挑战类别：
    # 1. 跨语言相似性
    # 2. 同义词和近义词
    # 3. 抽象概念
    # 4. 专业领域知识
    # 5. 长文本与短文本
    # 6. 隐含语义
    
    queries = [
        # 跨语言查询
        "What is machine learning",
        "如何实现数据可视化",
        "deep learning applications",
        "自然语言处理的发展趋势",
        
        # 专业领域查询
        "BERT模型的原理",
        "Transformer architecture explained",
        "GPU vs TPU performance comparison",
        "分布式训练策略",
        
        # 抽象概念查询
        "人工智能的伦理问题",
        "The future of autonomous systems",
        "数据隐私保护方法",
        "Sustainable AI development",
        
        # 具体技术查询
        "PyTorch实现多GPU训练",
        "Kubernetes deployment best practices",
        "优化神经网络训练速度",
        "Implementing attention mechanism"
    ]
    
    corpus = [
        # 机器学习基础解释
        "Machine learning is a subset of artificial intelligence that focuses on data and algorithms",
        "机器学习是人工智能的一个子集，主要关注数据和算法",
        "深度学习是机器学习的一个分支，使用多层神经网络进行学习",
        "Deep learning is a branch of machine learning using multi-layer neural networks",
        
        # 数据可视化相关
        "Data visualization techniques include charts, graphs, and interactive dashboards",
        "数据可视化技术包括图表、图形和交互式仪表板",
        "使用Python的Matplotlib和Seaborn库进行数据可视化",
        "Advanced data visualization can be achieved using D3.js and WebGL",
        
        # BERT和Transformer相关
        "BERT uses bidirectional transformer architecture to understand context",
        "BERT模型通过双向Transformer架构来理解上下文语义",
        "Transformer architecture relies heavily on self-attention mechanisms",
        "Transformer架构主要依赖于自注意力机制",
        
        # 硬件和性能优化
        "GPUs excel at parallel processing while TPUs are optimized for tensor operations",
        "GPU适合并行处理，而TPU针对张量运算进行了优化",
        "分布式训练可以显著提高大规模模型的训练效率",
        "Distributed training can significantly improve the efficiency of large-scale models",
        
        # AI伦理和未来发展
        "AI ethics concerns include bias, privacy, and accountability",
        "人工智能伦理问题包括偏见、隐私和责任归属",
        "Autonomous systems must balance efficiency with safety and reliability",
        "自动驾驶系统需要在效率和安全性之间取得平衡",
        
        # 技术实现细节
        "PyTorch provides DataParallel and DistributedDataParallel for multi-GPU training",
        "PyTorch提供了DataParallel和DistributedDataParallel用于多GPU训练",
        "Kubernetes可以有效管理和扩展机器学习工作负载",
        "Kubernetes can effectively manage and scale machine learning workloads",
        
        # 神经网络优化
        "Neural network optimization techniques include gradient clipping and batch normalization",
        "神经网络优化技术包括梯度裁剪和批量归一化",
        "注意力机制通过权重计算来关注重要特征",
        "Attention mechanisms focus on important features through weight calculations"
    ]
    
    # 相关文档映射：查询ID -> 相关文档ID列表
    relevant_docs = {
        0: [0, 1, 2, 3],           # What is machine learning
        1: [4, 5, 6, 7],           # 数据可视化
        2: [2, 3],                 # deep learning applications
        3: [2, 3, 10, 11],         # 自然语言处理发展趋势
        4: [8, 9, 10, 11],         # BERT模型原理
        5: [10, 11],               # Transformer architecture
        6: [12, 13],               # GPU vs TPU
        7: [14, 15],               # 分布式训练
        8: [16, 17],               # AI伦理
        9: [18, 19],               # autonomous systems
        10: [16, 17],              # 数据隐私
        11: [16, 17, 18, 19],      # Sustainable AI
        12: [20, 21],              # PyTorch多GPU
        13: [22, 23],              # Kubernetes
        14: [24, 25],              # 优化神经网络
        15: [26, 27]               # attention mechanism
    }
    
    return queries, corpus, relevant_docs

def main():
    import json
    json_path = "/root/app/en-unshuffle_combined_data.json"
    with open(json_path, 'r') as f:
        data = json.load(f)
    queries = [item['query'] for item in data]
    corpus = [item['chunk'] for item in data]
    relevant_docs = {i: [j for j in range(len(data)) if i == j] for i in range(len(data))}

    # queries, corpus, relevant_docs = get_test_data()
    k_values = [1, 3, 5, 10]
    
    evaluator = EmbeddingEvaluator()
    results = evaluator.evaluate_models(queries, corpus, relevant_docs, k_values)
    
    print("\nEvaluation Results:")
    print("==================")
    
    for model_key, metrics in results.items():
        print(f"\n{evaluator.model_names[model_key]}:")
        print(f"Processing time: {metrics['processing_time']:.2f} seconds")
        print(f"MRR: {metrics['mrr']:.4f}")
        for k, hit_rate in metrics['hit_rates'].items():
            print(f"Hit@{k}: {hit_rate:.4f}")

if __name__ == "__main__":
    main()