demo1

训练目标用对比学习（MNR）用相似度来刻画用户-物品的匹配程度，推动模型把正例拉近、负例推远。

In [1]:
import numpy as np

def mnr_loss(anchor: np.ndarray, positive: np.ndarray, negatives: np.ndarray) -> float:
    """
    计算 MNR Loss（Multiple Negative Ranking Loss）

    参数:
        anchor: [d,] 用户或查询向量
        positive: [d,] 正样本向量（如用户点击的商品）
        negatives: [N, d] N 个负样本向量（未点击的商品）

    返回:
        loss: float, MNR 损失值
    """

    # 1. 拼接所有样本 [positive; negatives] → shape: [1+N, d]
    all_items = np.vstack([positive[np.newaxis, :], negatives])  # shape: (1+N, d)

    # 2. 计算 anchor 和每个 item 的点积相似度 → shape: (1+N,)
    logits = np.dot(all_items, anchor)  # dot(anchor, item_i)

    # 3. softmax over logits
    exp_logits = np.exp(logits - np.max(logits))  # 防止数值爆炸
    probs = exp_logits / np.sum(exp_logits)

    # 4. 负对数正样本的概率
    loss = -np.log(probs[0])

    return loss


In [2]:
# 设置维度
d = 4
N = 3  # 3 个负样本

np.random.seed(42)

anchor = np.random.rand(d)
positive = np.random.rand(d)
negatives = np.random.rand(N, d)

loss = mnr_loss(anchor, positive, negatives)
print(f"MNR Loss = {loss:.4f}")


MNR Loss = 1.6965


🎯 推荐系统中的典型应用：
带日志点击/交互信息的推荐系统 → 监督 MNR

用户点击过的是正样本，未点击的是负样本（或负采样）

冷启动/行为稀疏场景，用文本/图像等进行 MNR 表征学习 → 无监督或自监督 MNR

用文案生成增强视图，对比多个表示

demo2

目标：
让用户向量和不感兴趣的 item 向量 保持“远距离”，从而避免推荐这些内容。

In [3]:
import torch
import torch.nn.functional as F

def repel_loss(user_embedding, neg_item_embeddings, margin=0.3):
    """
    user_embedding: [batch_size, hidden_dim]
    neg_item_embeddings: [batch_size, neg_num, hidden_dim]
    """
    # [B, 1, D] vs [B, N, D] → cosine similarity → [B, N]
    user = F.normalize(user_embedding, dim=-1).unsqueeze(1)         # [B, 1, D]
    neg_items = F.normalize(neg_item_embeddings, dim=-1)            # [B, N, D]
    
    sim_scores = torch.bmm(user, neg_items.transpose(1, 2)).squeeze(1)  # [B, N]

    # 惩罚越相似的内容 → 相似度越大 loss 越大
    loss = torch.clamp(sim_scores - margin, min=0).mean()
    
    return loss


In [4]:
# 假设 batch_size=2, neg_num=3, hidden_dim=4
user_embedding = torch.randn(2, 4)
neg_item_embeddings = torch.randn(2, 3, 4)

loss = repel_loss(user_embedding, neg_item_embeddings)
print("Repel Loss:", loss.item())


Repel Loss: 0.19077010452747345


NumPy 版 InfoNCE 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.preprocessing import normalize
import ace_tools as tools

# 1. 生成模拟用户点击序列 embedding（例如通过 RNN、Transformer 编码后的结果）
np.random.seed(0)
batch_size = 6
embedding_dim = 8

# 原始视图
z1 = np.random.randn(batch_size, embedding_dim)
# 增强视图（模拟数据增强后的结果，如序列裁剪、mask）
z2 = z1 + 0.1 * np.random.randn(batch_size, embedding_dim)  # 添加轻微扰动

# 归一化向量
z1_normalized = normalize(z1)
z2_normalized = normalize(z2)

# 2. 计算余弦相似度矩阵
sim_matrix = np.dot(z1_normalized, z2_normalized.T)  # shape: [batch, batch]

# 3. InfoNCE Loss 计算
temperature = 0.1
exp_sim = np.exp(sim_matrix / temperature)
probs = exp_sim / np.sum(exp_sim, axis=1, keepdims=True)
positive_probs = np.diag(probs)
info_nce_loss = -np.log(positive_probs).mean()

# 4. 可视化相似度矩阵
sim_df = pd.DataFrame(sim_matrix, columns=[f'z2_{i}' for i in range(batch_size)],
                      index=[f'z1_{i}' for i in range(batch_size)])

plt.figure(figsize=(8, 6))
sns.heatmap(sim_df, annot=True, fmt=".2f", cmap="coolwarm", cbar=True)
plt.title("Cosine Similarity between z1 and z2 (Augmented Views)")
plt.tight_layout()
plt.show()

# 输出 loss 和相似度矩阵
tools.display_dataframe_to_user("Cosine Similarity Matrix", sim_df)
info_nce_loss
