Cell 1: 环境设置与路径配置

功能说明：
将项目根目录添加到 Python 模块搜索路径中，确保后续能够正确加载项目内部模块。

In [1]:
# Cell 1: 环境设置与路径配置
import sys
import os

# 将项目根目录添加到 sys.path 中
project_path = "/scratch/guanguowei/Code/MyWork/VIP5_Shadowcast_DPA"
if project_path not in sys.path:
    sys.path.insert(0, project_path)
print("Project path:", project_path)

Project path: /scratch/guanguowei/Code/MyWork/VIP5_Shadowcast_DPA


Cell 2: 导入依赖库与模块

功能说明：
导入所有需要的第三方库和项目内部模块。注意部分模块（如 P5Tokenizer）在后续 cell 中会用到。

In [2]:
# Cell 2: 导入依赖库与模块
import collections
import random
import re
import os
import logging
import shutil
import time
from pathlib import Path
from packaging import version
from collections import defaultdict

from tqdm import tqdm
import numpy as np
import gzip
import torch
import torch.nn as nn
from torch.nn.parallel import DistributedDataParallel as DDP
import torch.distributed as dist
import torch.backends.cudnn as cudnn

# 导入项目内部模块
from src.param import parse_args
from src.utils import LossMeter, load_state_dict, set_global_logging_level
from src.dist_utils import reduce_dict
from transformers import T5Tokenizer
from src.tokenization import P5Tokenizer
from src.model import VIP5Tuning
from src.trainer_base import TrainerBase

# 判断是否使用 native AMP 或 Apex
_use_native_amp = False
_use_apex = False
if version.parse(torch.__version__) < version.parse("1.6"):
    from transormers.file_utils import is_apex_available
    if is_apex_available():
        from apex import amp
    _use_apex = True
else:
    _use_native_amp = True
    from torch.cuda.amp import autocast

print("所有依赖库已导入")


  from .autonotebook import tqdm as notebook_tqdm


所有依赖库已导入


Cell 3: 定义辅助函数

功能说明：
定义常用的辅助函数，如 pickle、json 的加载函数，以及文件读取函数等，方便后续调用。

In [4]:
# Cell 3: 定义辅助函数
import pickle
import json

def load_pickle(filename):
    with open(filename, "rb") as f:
        return pickle.load(f)

def save_pickle(data, filename):
    with open(filename, "wb") as f:
        pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)
        
def load_json(file_path):
    with open(file_path, "r") as f:
        return json.load(f)
    
def ReadLineFromFile(path):
    lines = []
    with open(path, 'r') as fd:
        for line in fd:
            lines.append(line.rstrip('\n'))
    return lines

def parse(path):
    g = gzip.open(path, 'r')
    for l in g:
        yield eval(l)

print("辅助函数定义完成")


辅助函数定义完成


Cell 4: 定义 DotDict 类及参数设置

功能说明：
定义一个 DotDict 类，使得可以通过属性方式访问字典中的值；并设置所有实验参数、随机种子等，保证实验结果可复现。

In [5]:
# Cell 4: 设置参数与随机种子
# 功能：构造参数对象、设置随机种子及各项实验参数，保证实验结果可复现。

class DotDict(dict):
    """将字典转化为对象，支持通过属性访问"""
    def __init__(self, **kwds):
        self.update(kwds)
        self.__dict__ = self
    def __repr__(self):
        # 避免递归调用 __repr__，直接调用 dict 的 __repr__
        return dict.__repr__(self)

# 构造参数对象
args = DotDict()

# ----------------- 基本训练参数 -----------------
args.distributed = False
args.multiGPU = True
args.fp16 = True

args.split = "clothing"
args.train = args.split
args.valid = args.split
args.test = args.split
args.batch_size = 16
args.optim = 'adamw'
args.warmup_ratio = 0.1
args.lr = 1e-3
args.num_workers = 4
args.clip_grad_norm = 5.0
args.losses = 'sequential,direct,explanation'
args.backbone = 't5-small'

# ----------------- 模型及视觉特征参数 -----------------
args.image_feature_type = 'vitb32'
args.image_feature_size_ratio = 2
args.use_adapter = True
args.reduction_factor = 8
args.use_single_adapter = True
args.use_vis_layer_norm = True
args.add_adapter_cross_attn = True
args.use_lm_head_adapter = True

# ----------------- 训练轮数、随机种子等 -----------------
args.epoch = 20
args.local_rank = 0
args.comment = ''
args.train_topk = -1
args.valid_topk = -1
args.dropout = 0.1
args.tokenizer = 'p5'
args.max_text_length = 1024
args.gen_max_length = 64
args.do_lower_case = False
args.weight_decay = 0.01
args.adam_eps = 1e-6
args.gradient_accumulation_steps = 1

# 设置随机种子
args.seed = 2022
torch.manual_seed(args.seed)
random.seed(args.seed)
np.random.seed(args.seed)

# ----------------- 启用 Whole Word 和 Category Embedding -----------------
args.whole_word_embed = True
args.category_embed = True

# ----------------- cudnn 及 GPU 参数 -----------------
cudnn.benchmark = True
ngpus_per_node = torch.cuda.device_count()
args.world_size = ngpus_per_node

# 设置损失项名称列表
LOSSES_NAME = [f'{name}_loss' for name in args.losses.split(',')]
LOSSES_NAME.append('total_loss')
args.LOSSES_NAME = LOSSES_NAME

print("当前参数配置：")
print(args)


当前参数配置：
{'distributed': False, 'multiGPU': True, 'fp16': True, 'split': 'clothing', 'train': 'clothing', 'valid': 'clothing', 'test': 'clothing', 'batch_size': 16, 'optim': 'adamw', 'warmup_ratio': 0.1, 'lr': 0.001, 'num_workers': 4, 'clip_grad_norm': 5.0, 'losses': 'sequential,direct,explanation', 'backbone': 't5-small', 'image_feature_type': 'vitb32', 'image_feature_size_ratio': 2, 'use_adapter': True, 'reduction_factor': 8, 'use_single_adapter': True, 'use_vis_layer_norm': True, 'add_adapter_cross_attn': True, 'use_lm_head_adapter': True, 'epoch': 20, 'local_rank': 0, 'comment': '', 'train_topk': -1, 'valid_topk': -1, 'dropout': 0.1, 'tokenizer': 'p5', 'max_text_length': 1024, 'gen_max_length': 64, 'do_lower_case': False, 'weight_decay': 0.01, 'adam_eps': 1e-06, 'gradient_accumulation_steps': 1, 'seed': 2022, 'whole_word_embed': True, 'category_embed': True, 'world_size': 4, 'LOSSES_NAME': ['sequential_loss', 'direct_loss', 'explanation_loss', 'total_loss']}


Cell 5: GPU设置与生成运行名称

功能说明：
指定使用的 GPU（手动设置），并构造一个运行名称（run_name），便于后续日志及保存结果区分。

In [6]:
# Cell 5: GPU设置与生成运行名称
# 功能：指定 GPU（手动设置），并构造一个运行名称

# 手动指定 GPU ID
gpu = 3
args.gpu = gpu
args.rank = gpu
print(f'Process Launching at GPU {gpu}')

# 设置当前 GPU 设备
torch.cuda.set_device(f'cuda:{gpu}')

# 构造运行名称
comments = []
dsets = []
if 'toys' in args.train:
    dsets.append('toys')
if 'beauty' in args.train:
    dsets.append('beauty')
if 'sports' in args.train:
    dsets.append('sports')
if 'clothing' in args.train:
    dsets.append('clothing')
comments.append(''.join(dsets))
if args.backbone:
    comments.append(args.backbone)
comments.append(''.join(args.losses.split(',')))
if args.comment != '':
    comments.append(args.comment)
comment = '_'.join(comments)

from datetime import datetime
current_time = datetime.now().strftime('%m%d')  # 例如 '0304'

if args.local_rank in [0, -1]:
    run_name = f'{current_time}_GPU{args.world_size}'
    if len(comments) > 0:
        run_name += f'_{comment}'
    args.run_name = run_name
    print("运行名称:", args.run_name)


Process Launching at GPU 3
运行名称: 0412_GPU4_clothing_t5-small_sequentialdirectexplanation


Cell 6: 构建模型配置、Tokenizer 与模型

功能说明：
根据参数构建模型配置（config）、创建 Tokenizer，并加载预训练模型。
注意：由于 checkpoint 使用的是 T5Tokenizer，而我们调用 P5Tokenizer，所以会有警告信息，但功能不受影响。
另外，为了适配 adapter，需要将 config.d_model 赋值给 adapter_config。

In [7]:
# Cell 6: 构建模型配置、Tokenizer 与模型
# 功能：根据参数构建模型配置，创建 Tokenizer，并加载预训练模型
import re  # 确保导入 re 模块

def create_config(args):
    from transformers import T5Config
    from adapters import AdapterConfig  # 使用适配器配置

    # 从预训练 checkpoint 加载 T5 配置
    config = T5Config.from_pretrained(args.backbone)
    # 将所有参数写入配置中
    for k, v in vars(args).items():
        setattr(config, k, v)
    config.non_linearity = "relu"

    # 设置视觉特征参数
    image_feature_dim_dict = {
        'vitb32': 512,
        'vitb16': 512,
        'vitl14': 768,
        'rn50': 1024,
        'rn101': 512
    }
    config.feat_dim = image_feature_dim_dict[args.image_feature_type]
    config.n_vis_tokens = args.image_feature_size_ratio
    config.use_vis_layer_norm = args.use_vis_layer_norm
    config.reduction_factor = args.reduction_factor

    config.use_adapter = args.use_adapter
    config.add_adapter_cross_attn = args.add_adapter_cross_attn
    config.use_lm_head_adapter = args.use_lm_head_adapter
    config.use_single_adapter = args.use_single_adapter

    config.dropout_rate = args.dropout
    config.dropout = args.dropout
    config.attention_dropout = args.dropout
    config.activation_dropout = args.dropout

    config.losses = args.losses

    # 如果使用适配器，则创建适配器配置，并将主配置的 d_model 传给 adapter_config
    tasks = re.split("[, ]+", args.losses)
    if args.use_adapter:
        adapter_config = AdapterConfig()
        adapter_config.tasks = tasks
        adapter_config.d_model = config.d_model  # 传递隐藏维度
        adapter_config.use_single_adapter = args.use_single_adapter
        adapter_config.reduction_factor = args.reduction_factor
        adapter_config.track_z = False
        config.adapter_config = adapter_config
    else:
        config.adapter_config = None

    return config

def create_tokenizer(args):
    from transformers import T5Tokenizer
    # 根据参数决定使用 P5Tokenizer 或 T5Tokenizer
    if 'p5' in args.tokenizer:
        from src.tokenization import P5Tokenizer
        tokenizer_class = P5Tokenizer
    else:
        tokenizer_class = T5Tokenizer

    tokenizer = tokenizer_class.from_pretrained(
        args.backbone,
        max_length=args.max_text_length,
        do_lower_case=args.do_lower_case,
    )
    print("Tokenizer:", tokenizer_class, args.backbone)
    return tokenizer

def create_model(model_class, config):
    print(f'Building Model at GPU {args.gpu}')
    model = model_class.from_pretrained(
        args.backbone,
        config=config
    )
    return model

# 构建配置、Tokenizer 和模型
config = create_config(args)
if args.tokenizer is None:
    args.tokenizer = args.backbone
tokenizer = create_tokenizer(args)
model_class = VIP5Tuning
model = create_model(model_class, config)

# 将模型移至指定 GPU
model = model.cuda()

# 如果使用 P5Tokenizer，则调整模型的词嵌入
if 'p5' in args.tokenizer:
    model.resize_token_embeddings(tokenizer.vocab_size)
model.tokenizer = tokenizer

print("模型和 Tokenizer 构建完成")


The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'T5Tokenizer'. 
The class this function is called from is 'P5Tokenizer'.


Tokenizer: <class 'src.tokenization.P5Tokenizer'> t5-small
Building Model at GPU 3
JointEncoder initialized successfully.
T5Stack initialized successfully.


Some weights of VIP5Tuning were not initialized from the model checkpoint at t5-small and are newly initialized: ['decoder.block.1.layer.0.attn_adapter.adapters.direct.down_sampler.weight', 'decoder.block.0.layer.2.ff_adapter.adapters.explanation.down_sampler.bias', 'decoder.block.5.layer.0.attn_adapter.adapters.direct.down_sampler.weight', 'decoder.block.4.layer.2.ff_adapter.adapters.explanation.up_sampler.bias', 'encoder.block.0.layer.1.ff_adapter.adapters.direct.up_sampler.weight', 'encoder.block.2.layer.1.ff_adapter.adapters.direct.down_sampler.bias', 'decoder.block.5.layer.1.enc_attn_adapter.adapters.sequential.down_sampler.bias', 'encoder.block.4.layer.0.attn_adapter.adapters.direct.up_sampler.bias', 'encoder.block.0.layer.0.attn_adapter.adapters.sequential.up_sampler.bias', 'decoder.block.0.layer.0.attn_adapter.adapters.direct.down_sampler.weight', 'encoder.block.2.layer.1.ff_adapter.adapters.sequential.down_sampler.weight', 'encoder.block.4.layer.1.ff_adapter.adapters.sequentia

lm_head initialized successfully.
OutputParallelAdapterLayer initialized successfully.
AdapterConfig: AdapterConfig(add_layer_norm_before_adapter=False, add_layer_norm_after_adapter=False, non_linearity='gelu_new', reduction_factor=8)
模型和 Tokenizer 构建完成


Cell 7: 加载预训练模型权重

功能说明：
从指定 checkpoint 路径加载预训练模型权重，并打印加载结果。

In [8]:
# Cell 7: 加载预训练模型权重
# 功能：从 checkpoint 加载预训练模型权重
from pprint import pprint

def load_checkpoint(ckpt_path):
    state_dict = load_state_dict(ckpt_path, 'cpu')
    results = model.load_state_dict(state_dict, strict=False)
    pprint(results)

# 指定 checkpoint 路径（需根据实际路径修改）
args.load = "/scratch/guanguowei/Code/MyWork/VIP5_Shadowcast_DPA/snap/clothing/0411/clothing-vitb32-2-8-20-NoAttack/BEST_EVAL_LOSS.pth"
ckpt_path = args.load
load_checkpoint(ckpt_path)


_IncompatibleKeys(missing_keys=['output_adapter.adapter.down_sampler.weight', 'output_adapter.adapter.down_sampler.bias', 'output_adapter.adapter.up_sampler.weight', 'output_adapter.adapter.up_sampler.bias'], unexpected_keys=[])


Cell 8: 加载数据集及数据映射

功能说明：
加载数据分割文件（如 rating_splits_augmented.pkl）以及数据映射文件（datamaps.json），用于后续评估。

In [9]:
# Cell 8: 加载数据集及数据映射
# 功能：加载 rating_splits_augmented.pkl 和 datamaps.json 数据文件

data_splits = load_pickle(f'../data/{args.split}/rating_splits_augmented.pkl')
test_review_data = data_splits['test']
print("Test data长度:", len(test_review_data))
print("Test data示例:", test_review_data[0])

data_maps = load_json(os.path.join('../data', args.split, 'datamaps.json'))
print("用户数量:", len(data_maps['user2id']))
print("物品数量:", len(data_maps['item2id']))


Test data长度: 27867
Test data示例: {'reviewerID': 'AQVU2X4NK5V31', 'asin': 'B004L4B7HG', 'reviewerName': 'Michael Duffy', 'helpful': [0, 3], 'reviewText': 'I loved the look of this shoe.  certainly better constructed than the Minimus but a different fit completely.  harder to break in.  another dust collector in my closet.', 'overall': 3.0, 'summary': 'Merrell Trail Glove Running', 'unixReviewTime': 1368144000, 'reviewTime': '05 10, 2013', 'explanation': 'certainly better constructed than the Minimus but a different fit completely', 'feature': 'fit'}
用户数量: 39387
物品数量: 23033


Cell 9: 加载数据生成器与评价指标

功能说明：
导入数据加载函数和评价指标函数，为后续评估生成数据加载器和计算 BLEU/ROUGE 等指标。

In [10]:
# Cell 9: 导入数据加载器与评价指标函数
# 功能：导入 get_loader、BLEU、ROUGE 等评价指标函数

from torch.utils.data import DataLoader, Dataset, Sampler
from src.data import get_loader
from evaluate.utils import rouge_score, bleu_score, unique_sentence_percent, root_mean_square_error, mean_absolute_error, feature_detect, feature_matching_ratio, feature_coverage_ratio, feature_diversity
from evaluate.metrics4rec import evaluate_all

print("数据加载器与评价指标函数已导入")


数据加载器与评价指标函数已导入


Cell 10: Evaluation - Explanation 任务

功能说明：
加载 explanation 任务的数据生成器，调用模型生成输出，并计算 BLEU、ROUGE 指标。

In [14]:
# =============================================================================
# Cell 10: Evaluation - Explanation 任务（带 Prompt 信息）
# 功能说明：
#   1. 加载指定 prompt（例如 'C-12'）下的 Explanation 任务测试数据；
#   2. 调用模型生成预测结果，并计算 BLEU（1-gram 和 4-gram）与 ROUGE 指标；
#   3. 在保存评估结果文件时，文件名和文件内容中都会包含当前使用的 prompt 信息，
#      便于后续对比不同 prompt 的评估效果。
# =============================================================================

import os
from datetime import datetime
from pathlib import Path
from tqdm import tqdm
import torch

# 如果 args.load 不为空，则从其中提取日期，否则使用当前日期
if args.load is not None:
    # 假定 args.load 形如 ".../snap/<split>/<日期>/<exp_name>/BEST_EVAL_LOSS.pth"
    eval_date = Path(args.load).parents[1].name
else:
    eval_date = datetime.now().strftime("%m%d")

# 指定 Explanation 任务的 prompt 及样本数量
exp_prompt = 'C-3'  # 可修改为 'C-12', 'C-3' 等所需的 prompt 编号
test_task_list = {'explanation': [exp_prompt]}
test_sample_numbers = {'sequential': (1, 1), 'direct': (1, 1), 'explanation': 1}

# 获取 Explanation 任务的测试数据加载器
zeroshot_test_loader = get_loader(
    args,
    test_task_list,
    test_sample_numbers,
    split=args.test, 
    mode='test', 
    batch_size=args.batch_size,
    workers=args.num_workers,
    distributed=args.distributed,
    data_root="../data",        # 显式指定数据目录
    feature_root="../features"  # 显式指定视觉特征目录
)
print(f"Explanation 任务 (Prompt: {exp_prompt}) 数据量:", len(zeroshot_test_loader))

tokens_predict = []
tokens_test = []

# 遍历测试数据加载器，调用模型生成预测结果
for i, batch in tqdm(enumerate(zeroshot_test_loader), total=len(zeroshot_test_loader), ncols=100):
    with torch.no_grad():
        results = model.generate_step(batch)
        tokens_predict.extend(results)
        tokens_test.extend(batch['target_text'])

# 计算 BLEU 与 ROUGE 指标
BLEU1 = bleu_score(tokens_test, tokens_predict, n_gram=1, smooth=False)
print(f'BLEU-1 {BLEU1:7.4f}')
BLEU4 = bleu_score(tokens_test, tokens_predict, n_gram=4, smooth=False)
print(f'BLEU-4 {BLEU4:7.4f}')

ROUGE = rouge_score(tokens_test, tokens_predict)
for k, v in ROUGE.items():
    print(f'{k} {v:7.4f}')

# 构建保存评估结果的目录和文件名，文件名中包含当前使用的 prompt 信息
eval_dir = f"/scratch/guanguowei/Code/MyWork/VIP5_Shadowcast_DPA/log/{args.split}/{eval_date}/evaluation_logs"
os.makedirs(eval_dir, exist_ok=True)

explanation_filename = (
    f"VIP5_{args.split}_{args.image_feature_type}_"
    f"{args.reduction_factor}_{args.epoch}_evaluation_explanation_{exp_prompt}.txt"
)
explanation_log_path = os.path.join(eval_dir, explanation_filename)

# 保存评估结果，文件内容中也包含 prompt 信息
with open(explanation_log_path, "w", encoding="utf-8") as f:
    f.write("Explanation Evaluation Results\n")
    f.write(f"Prompt: {exp_prompt}\n")
    f.write(f"BLEU-1: {BLEU1:7.4f}\n")
    f.write(f"BLEU-4: {BLEU4:7.4f}\n")
    for k, v in ROUGE.items():
        f.write(f"{k}: {v:7.4f}\n")

print(f"Explanation 任务 (Prompt: {exp_prompt}) 评价结果已保存至: {explanation_log_path}")


The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'T5Tokenizer'. 
The class this function is called from is 'P5Tokenizer'.


Data sources:  ['clothing']
compute_datum_info
Explanation 任务 (Prompt: C-3) 数据量: 1120


100%|███████████████████████████████████████████████████████████| 1120/1120 [01:39<00:00, 11.30it/s]


BLEU-1  7.2843
BLEU-4  2.2932
rouge_1/f_score  6.0033
rouge_1/r_score  4.5774
rouge_1/p_score 11.5490
rouge_2/f_score  0.8001
rouge_2/r_score  0.6465
rouge_2/p_score  1.5125
rouge_l/f_score  4.6208
rouge_l/r_score  4.4421
rouge_l/p_score 11.3287
Explanation 任务 (Prompt: C-3) 评价结果已保存至: /scratch/guanguowei/Code/MyWork/VIP5_Shadowcast_DPA/log/clothing/0411/evaluation_logs/VIP5_clothing_vitb32_8_20_evaluation_explanation_C-3.txt


Cell 11: Evaluation - Direct 任务

功能说明：
加载 direct 任务的测试数据，生成输出并计算评价指标。

In [None]:
# =============================================================================
# Cell 11: Evaluation - Direct 任务（带 Prompt 信息）
# =============================================================================

import os
from datetime import datetime
from pathlib import Path
from tqdm import tqdm
import torch

# 如果 args.load 不为空，则从其中提取日期，否则使用当前日期
if args.load is not None:
    eval_date = Path(args.load).parents[1].name
else:
    eval_date = datetime.now().strftime("%m%d")

# 指定 Direct 任务的测试任务与 Prompt
test_task_list = {'direct': ['B-5']}  # 例：可选 'B-5' 或 'B-8'
prompt = test_task_list['direct'][0]  # 获取当前使用的 Prompt

test_sample_numbers = {
    'sequential': (1, 1),
    'direct': (1, 1),
    'explanation': 1
}

# 获取 Direct 任务的测试数据加载器
zeroshot_test_loader = get_loader(
    args,
    test_task_list,
    test_sample_numbers,
    split=args.test,
    mode='test',
    batch_size=args.batch_size,
    workers=args.num_workers,
    distributed=args.distributed,
    data_root="../data",        # 显式指定数据目录
    feature_root="../features"  # 显式指定视觉特征目录
)

print(f"Direct 任务 (Prompt: {prompt}) 数据量:", len(zeroshot_test_loader))

all_info = []
for i, batch in tqdm(enumerate(zeroshot_test_loader), total=len(zeroshot_test_loader)):
    with torch.no_grad():
        results = model.generate_step(batch)
        beam_outputs = model.generate(
            input_ids=batch['input_ids'].to('cuda'),
            whole_word_ids=batch['whole_word_ids'].to('cuda'),
            category_ids=batch['category_ids'].to('cuda'),
            vis_feats=batch['vis_feats'].to('cuda'),
            task=batch["task"][0],
            max_length=50,
            num_beams=20,
            no_repeat_ngram_size=0,
            num_return_sequences=20,
            early_stopping=True
        )
        generated_sents = model.tokenizer.batch_decode(beam_outputs, skip_special_tokens=True)

        # 遍历当前批次中每个样本（假设每个样本生成20个候选）
        for j, (_, tgt_text, _) in enumerate(zip(results, batch['target_text'], batch['source_text'])):
            new_info = {}
            new_info['target_item'] = tgt_text
            new_info['gen_item_list'] = generated_sents[j * 20: (j + 1) * 20]
            all_info.append(new_info)

# 构造 ground truth 和预测得分字典
gt = {}
ui_scores = {}
for i, info in enumerate(all_info):
    gt[i] = [int(info['target_item'])]
    pred_dict = {}
    for j, pred in enumerate(info['gen_item_list']):
        try:
            pred_dict[int(pred)] = -(j + 1)
        except Exception:
            pass
    ui_scores[i] = pred_dict

# 计算推荐指标
msg_top1, res_top1 = evaluate_all(ui_scores, gt, 1)
msg_top5, res_top5 = evaluate_all(ui_scores, gt, 5)
msg_top10, res_top10 = evaluate_all(ui_scores, gt, 10)

print("\nEvaluation Metrics at top-5:")
print(msg_top5)
print("\nEvaluation Metrics at top-10:")
print(msg_top10)

# 保存 Direct 任务评价结果到文件，文件名包含 Prompt
eval_dir = f"/scratch/guanguowei/Code/MyWork/VIP5_Shadowcast_DPA/log/{args.split}/{eval_date}/evaluation_logs"
os.makedirs(eval_dir, exist_ok=True)

direct_filename = (
    f"VIP5_{args.split}_"
    f"{args.image_feature_type}_"
    f"{args.reduction_factor}_"
    f"{args.epoch}_evaluation_direct_{prompt}.txt"
)
direct_log_path = os.path.join(eval_dir, direct_filename)

with open(direct_log_path, "w", encoding="utf-8") as f:
    f.write("Direct Evaluation Results\n")
    f.write(f"Prompt: {prompt}\n")
    f.write("Evaluation Metrics at top-5:\n")
    f.write(msg_top5 + "\n")
    f.write("Evaluation Metrics at top-10:\n")
    f.write(msg_top10 + "\n")

print(f"Direct 任务 (Prompt: {prompt}) 评价结果已保存至: {direct_log_path}")


The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'T5Tokenizer'. 
The class this function is called from is 'P5Tokenizer'.


Data sources:  ['clothing']
compute_datum_info
Direct 任务 (Prompt: B-5) 数据量: 2462


100%|██████████| 2462/2462 [35:37<00:00,  1.15it/s]



NDCG@1	Rec@1	Hits@1	Prec@1	MAP@1	MRR@1
0.0473	0.0473	0.0473	0.0473	0.0473	0.0473

NDCG@5	Rec@5	Hits@5	Prec@5	MAP@5	MRR@5
0.0874	0.1265	0.1265	0.0253	0.0745	0.0745

NDCG@10	Rec@10	Hits@10	Prec@10	MAP@10	MRR@10
0.1097	0.1963	0.1963	0.0196	0.0836	0.0836

Evaluation Metrics at top-5:

NDCG@5	Rec@5	Hits@5	Prec@5	MAP@5	MRR@5
0.0874	0.1265	0.1265	0.0253	0.0745	0.0745

Evaluation Metrics at top-10:

NDCG@10	Rec@10	Hits@10	Prec@10	MAP@10	MRR@10
0.1097	0.1963	0.1963	0.0196	0.0836	0.0836
Direct 任务 (Prompt: B-5) 评价结果已保存至: /scratch/guanguowei/Code/MyWork/VIP5_Shadowcast_DPA/log/clothing/0411/evaluation_logs/VIP5_clothing_vitb32_8_20_evaluation_direct_B-5.txt


: 

Cell 12: Evaluation - Sequential 任务

功能说明：
加载 sequential 任务的测试数据，生成输出并计算评价指标，同时对 beam search 结果进行解码。

In [None]:
import os
from datetime import datetime
from pathlib import Path
from tqdm import tqdm
import torch

# 如果 args.load 不为空，则从其中提取日期，否则使用当前日期
if args.load is not None:
    eval_date = Path(args.load).parents[1].name
else:
    eval_date = datetime.now().strftime("%m%d")

# 指定 Sequential 任务的测试任务与 Prompt
test_task_list = {'sequential': ['A-3']}  # 例：可选 'A-9', 'A-3' 等
prompt = test_task_list['sequential'][0]  # 获取当前使用的 Prompt

test_sample_numbers = {
    'sequential': (1, 1),
    'direct': (1, 1),
    'explanation': 1
}

# 获取 Sequential 任务的测试数据加载器
zeroshot_test_loader = get_loader(
    args,
    test_task_list,
    test_sample_numbers,
    split=args.test,
    mode='test',
    batch_size=args.batch_size,
    workers=args.num_workers,
    distributed=args.distributed,
    data_root="../data",         # 显式指定数据所在目录
    feature_root="../features"   # 显式指定视觉特征目录
)

print(f"Sequential 任务 (Prompt: {prompt}) 数据量:", len(zeroshot_test_loader))

all_info = []
for i, batch in tqdm(enumerate(zeroshot_test_loader), total=len(zeroshot_test_loader)):
    with torch.no_grad():
        results = model.generate_step(batch)
        beam_outputs = model.generate(
            input_ids=batch['input_ids'].to('cuda'),
            whole_word_ids=batch['whole_word_ids'].to('cuda'),
            category_ids=batch['category_ids'].to('cuda'),
            vis_feats=batch['vis_feats'].to('cuda'),
            task=batch["task"][0],
            max_length=50,
            num_beams=20,
            no_repeat_ngram_size=0,
            num_return_sequences=20,
            early_stopping=True
        )
        generated_sents = model.tokenizer.batch_decode(beam_outputs, skip_special_tokens=True)

        # 遍历当前批次中每个样本（假设每个样本生成20个候选）
        for j in range(len(batch['target_text'])):
            new_info = {}
            new_info['target_item'] = batch['target_text'][j]
            new_info['gen_item_list'] = generated_sents[j * 20: (j + 1) * 20]
            all_info.append(new_info)

# 构造 ground truth 和预测得分字典
gt = {}
ui_scores = {}
for i, info in enumerate(all_info):
    gt[i] = [int(info['target_item'])]
    pred_dict = {}
    for j, pred in enumerate(info['gen_item_list']):
        try:
            pred_dict[int(pred)] = -(j + 1)
        except Exception:
            pass
    ui_scores[i] = pred_dict

# 计算推荐指标
msg_top1, res_top1 = evaluate_all(ui_scores, gt, 1)
msg_top5, res_top5 = evaluate_all(ui_scores, gt, 5)
msg_top10, res_top10 = evaluate_all(ui_scores, gt, 10)

print("\nEvaluation Metrics at top-1:")
print(msg_top1)
print("\nEvaluation Metrics at top-5:")
print(msg_top5)
print("\nEvaluation Metrics at top-10:")
print(msg_top10)

# 保存 Sequential 任务评价结果到文件，文件名包含 Prompt
eval_dir = f"/scratch/guanguowei/Code/MyWork/VIP5_Shadowcast_DPA/log/{args.split}/{eval_date}/evaluation_logs"
os.makedirs(eval_dir, exist_ok=True)

sequential_filename = (
    f"VIP5_{args.split}_"
    f"{args.image_feature_type}_"
    f"{args.reduction_factor}_"
    f"{args.epoch}_evaluation_sequential_{prompt}.txt"
)
sequential_log_path = os.path.join(eval_dir, sequential_filename)

with open(sequential_log_path, "w", encoding="utf-8") as f:
    f.write("Sequential Evaluation Results\n")
    f.write(f"Prompt: {prompt}\n")
    f.write("Evaluation Metrics at top-1:\n")
    f.write(msg_top1 + "\n")
    f.write("Evaluation Metrics at top-5:\n")
    f.write(msg_top5 + "\n")
    f.write("Evaluation Metrics at top-10:\n")
    f.write(msg_top10 + "\n")

print(f"Sequential 任务 (Prompt: {prompt}) 评价结果已保存至: {sequential_log_path}")


The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'T5Tokenizer'. 
The class this function is called from is 'P5Tokenizer'.


Data sources:  ['beauty']
compute_datum_info
Sequential 任务 (Prompt: A-3) 数据量: 1398


100%|██████████| 1398/1398 [13:39<00:00,  1.71it/s]



NDCG@1	Rec@1	Hits@1	Prec@1	MAP@1	MRR@1
0.0328	0.0328	0.0328	0.0328	0.0328	0.0328

NDCG@5	Rec@5	Hits@5	Prec@5	MAP@5	MRR@5
0.0465	0.0597	0.0597	0.0119	0.0422	0.0422

NDCG@10	Rec@10	Hits@10	Prec@10	MAP@10	MRR@10
0.0508	0.0730	0.0730	0.0073	0.0440	0.0440

Evaluation Metrics at top-1:

NDCG@1	Rec@1	Hits@1	Prec@1	MAP@1	MRR@1
0.0328	0.0328	0.0328	0.0328	0.0328	0.0328

Evaluation Metrics at top-5:

NDCG@5	Rec@5	Hits@5	Prec@5	MAP@5	MRR@5
0.0465	0.0597	0.0597	0.0119	0.0422	0.0422

Evaluation Metrics at top-10:

NDCG@10	Rec@10	Hits@10	Prec@10	MAP@10	MRR@10
0.0508	0.0730	0.0730	0.0073	0.0440	0.0440
Sequential 任务 (Prompt: A-3) 评价结果已保存至: /scratch/guanguowei/Code/MyWork/VIP5_Shadowcast_DPA/log/beauty/0410/evaluation_logs/VIP5_beauty_vitb32_8_20_evaluation_sequential_A-3.txt


: 