# YOLOv8における知識蒸留の実装（PAAMA論文 参考コンセプト）

このノートブックは、論文「Positive Anchor Area Merge Algorithm：YOLOv8に基づく果物検出タスク向け知識蒸留アルゴリズム」で提示されたアイデアを参考に、YOLOv8モデルで知識蒸留を行う基本的なアプローチと概念を検討するものです。

**知識蒸留の目的:**
大規模で高性能な教師モデル（Teacher Model）から、軽量で高速な生徒モデル（Student Model）へ知識を転送し、生徒モデルの性能を単独で学習するよりも向上させること。

**参考論文のキーポイント:**
- YOLOv8に特化した知識蒸留手法「Positive Anchor Area Merge Algorithm (PAAMA)」を提案。
- 教師モデルと生徒モデルの予測における「正のアンカー領域」を4つ（共通、生徒専用、教師専用、負例）に分類し、それぞれに最適化された学習戦略を適用。
- 蒸留用に分類損失、ボックス損失、DFL損失を細分化した10項目の損失関数を設計。
- YOLOv8sを教師、YOLOv8nを生徒として使用し、独自の果物データセットでmAP(50) = 99.47%を達成。

**このノートブックでのアプローチ:**
- Ultralytics YOLOライブラリを使用します。
- PAAMAや並列蒸留ネットワークの完全な再現は、ライブラリの標準機能を超える大幅なカスタマイズが必要なため、ここでは行いません。
- 代わりに、知識蒸留の基本的な考え方、特に教師モデルのソフトターゲット（予測確率など）を生徒モデルの学習に利用する一般的な蒸留損失の概念と、Ultralyticsの枠組みでそれをどのように組み込むかを考察します。

## 1. 必要なライブラリのインポート

In [1]:
import torch
from ultralytics import YOLO
from ultralytics.models.yolo.detect.train import DetectionTrainer # MODIFIED
from ultralytics.cfg import DEFAULT_CFG # MODIFIED
from copy import deepcopy
import os
import gc
import psutil  # CPU メモリ監視用

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.7.0+cu128
CUDA available: True
CUDA device: NVIDIA GeForce RTX 4060


## 1.1. メモリ管理用ユーティリティ関数

CPU・GPUの両方のメモリ使用量の監視とクリーンアップ用の関数を定義します。

In [2]:
def get_memory_usage():
    """現在のCPUとGPUメモリ使用量を表示"""
    # CPU メモリ使用量
    cpu_memory = psutil.virtual_memory()
    cpu_used_gb = cpu_memory.used / 1024**3
    cpu_total_gb = cpu_memory.total / 1024**3
    cpu_percent = cpu_memory.percent
    
    print(f"CPU Memory - Used: {cpu_used_gb:.2f}GB / {cpu_total_gb:.2f}GB ({cpu_percent:.1f}%)")
    
    # GPU メモリ使用量
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated() / 1024**3  # GB
        cached = torch.cuda.memory_reserved() / 1024**3      # GB
        print(f"GPU Memory - Allocated: {allocated:.2f}GB, Cached: {cached:.2f}GB")
    else:
        print("GPU Memory - Not available (CPU mode)")

def clear_memory():
    """CPU・GPU両方のメモリを手動でクリア"""
    # Python オブジェクトのガベージコレクション実行（複数回）
    for _ in range(3):
        gc.collect()
    
    # PyTorchのGPUキャッシュクリア
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.synchronize()
        torch.cuda.ipc_collect()  # プロセス間通信のキャッシュもクリア
    
    # CPU メモリの追加クリーンアップ
    # 不要な参照を削除
    import sys
    if hasattr(sys, '_clear_type_cache'):
        sys._clear_type_cache()
    
    print("Memory cleared (CPU + GPU)")

def force_cleanup(*objects):
    """指定されたオブジェクトを削除してメモリクリア"""
    for obj in objects:
        if obj is not None:
            try:
                # PyTorchテンソルの場合、CPUに移動してから削除
                if hasattr(obj, 'cpu') and hasattr(obj, 'device'):
                    if obj.device.type == 'cuda':
                        obj.cpu()
                del obj
            except Exception as e:
                print(f"Error deleting object: {e}")
    
    clear_memory()

def get_current_process_memory():
    """現在のプロセスのメモリ使用量を詳細表示"""
    process = psutil.Process()
    memory_info = process.memory_info()
    memory_percent = process.memory_percent()
    
    rss_gb = memory_info.rss / 1024**3  # 物理メモリ
    vms_gb = memory_info.vms / 1024**3  # 仮想メモリ
    
    print(f"Process Memory - RSS: {rss_gb:.2f}GB, VMS: {vms_gb:.2f}GB, Percent: {memory_percent:.1f}%")

def memory_profile(func_name=""):
    """メモリ使用量をプロファイル表示"""
    print(f"\n=== Memory Profile: {func_name} ===")
    get_current_process_memory()
    get_memory_usage()
    print("=" * 40)

# 初期メモリ使用量を確認
memory_profile("Initial")


=== Memory Profile: Initial ===
Process Memory - RSS: 0.44GB, VMS: 1.16GB, Percent: 2.8%
CPU Memory - Used: 8.20GB / 15.83GB (51.8%)
GPU Memory - Allocated: 0.00GB, Cached: 0.00GB


In [3]:
# --- データセット --- 
# ユーザー自身の data.yaml ファイルへのパスを指定してください
# 例: 'C:/Users/YourUser/datasets/my_yolo_dataset/data.yaml'
# 例: '../datasets/co./datco128.yaml' (Ultralyticsのサンプルデータセット)
dataset_yaml_path = 'C:/Users/akama/AppData/Local/Programs/Python/Python310/python_file/projects/tennisvision/data/processed/datasets/final_merged_dataset/data.yaml' # ★★★ 要変更 ★★★

# --- モデル --- 
# 教師モデル: ファインチューニング済みのyolov8x.ptファイルへのパス
teacher_model_path = 'C:/Users/akama/AppData/Local/Programs/Python/Python310/python_file/projects/tennisvision/models/weights/last.pt' # ★★★ 要変更: ファインチューニング済みyolov8xモデルのパスを指定 ★★★

# 生徒モデル: 事前学習済みyolov8n.pt
student_model_config = 'yolov8n.pt' # 事前学習済みモデルを直接使用
student_model_weights = None   # configで事前学習済みを指定しているため不要

# --- トレーニングパラメータ --- 
epochs = 50  # トレーニングエポック数 (デモ用に短く設定)
batch_size = 8 
img_size = 640
project_name = 'YOLOv8_Distillation_Demo'
experiment_name = 'student_distilled_run1'
device = 0 if torch.cuda.is_available() else 'cpu'

# --- 知識蒸留パラメータ --- 
distillation_temperature = 2.0  # ソフトターゲット生成時の温度パラメータ
alpha_distillation_cls = 0.5    # 分類蒸留損失の重み
alpha_distillation_bbox = 0.5   # バウンディングボックス蒸留損失の重み (もし実装する場合)

# 出力ディレクトリの確認
if not os.path.exists(dataset_yaml_path):
    print(f"警告: データセットYAML '{dataset_yaml_path}' が見つかりません。パスを確認してください。")

if not os.path.exists(teacher_model_path):
    print(f"警告: 教師モデル '{teacher_model_path}' が見つかりません。パスを確認してください。")

# 設定後のメモリ使用量確認
memory_profile("After Configuration")


=== Memory Profile: After Configuration ===
Process Memory - RSS: 0.45GB, VMS: 1.16GB, Percent: 2.8%
CPU Memory - Used: 8.21GB / 15.83GB (51.9%)
GPU Memory - Allocated: 0.00GB, Cached: 0.00GB


In [4]:
teacher_model = YOLO(teacher_model_path)
teacher_model.to(device)
teacher_model.eval() # 評価モード
print(f"教師モデル ({teacher_model_path}) をロードしました。")

# 教師モデルロード後のメモリ使用量確認
memory_profile("After Teacher Model Load")

教師モデル (C:/Users/akama/AppData/Local/Programs/Python/Python310/python_file/projects/tennisvision/models/weights/last.pt) をロードしました。

=== Memory Profile: After Teacher Model Load ===
Process Memory - RSS: 1.09GB, VMS: 2.49GB, Percent: 6.9%
CPU Memory - Used: 8.93GB / 15.83GB (56.4%)
GPU Memory - Allocated: 0.26GB, Cached: 0.29GB


In [5]:
class CustomDistillationTrainer(DetectionTrainer):
    def __init__(self, cfg=DEFAULT_CFG, overrides=None, _callbacks=None):
        if overrides is None:
            overrides = {}
        super().__init__(cfg=cfg, overrides=overrides, _callbacks=_callbacks)

        # グローバルスコープから教師モデルと蒸留パラメータを取得
        self.teacher_model = teacher_model # 事前にロードされた教師モデル
        self.distillation_temperature = distillation_temperature
        self.alpha_distillation_cls = alpha_distillation_cls
        
        # 損失名をログ表示用に追加
        if hasattr(self, 'loss_names'):
            self.loss_names = list(self.loss_names) + ['kd_cls_loss']
        
        print("CustomDistillationTrainer initialized.")
        print(f"  Teacher model: {self.teacher_model.ckpt_path if hasattr(self.teacher_model, 'ckpt_path') else 'N/A'}")
        print(f"  Distillation temperature: {self.distillation_temperature}")
        print(f"  Alpha (CLS distillation): {self.alpha_distillation_cls}")

    def get_model(self, cfg=None, weights=None, verbose=True):
        # 生徒モデルをロード
        student_yolo_obj = YOLO(student_model_config)
        model_to_return = student_yolo_obj.model # nn.Module を取得
        
        # データセットのクラス数にモデルを適合させる
        # self.model は nn.Module インスタンスになるため、直接属性を設定
        if hasattr(self, 'data') and self.data:
            if hasattr(model_to_return, 'nc'):
                model_to_return.nc = self.data['nc']
            # YAML設定も更新
            if hasattr(model_to_return, 'yaml') and isinstance(model_to_return.yaml, dict):
                model_to_return.yaml['nc'] = self.data['nc']
        
        return model_to_return # YOLOオブジェクトではなく、nn.Moduleを返す

    def criterion(self, preds, batch):
        # 1. 生徒モデルの標準損失を計算 (親クラスのcriterionを利用)
        loss_student, loss_items_student = super().criterion(preds, batch)

        kd_cls_loss = torch.tensor(0.0, device=self.device) # 初期化

        try:
            # 2. 教師モデルの生の予測を取得 (勾配計算なし)
            with torch.no_grad():
                # 教師モデルの内部nn.Moduleのforwardメソッドを直接呼び出し
                teacher_preds_raw = self.teacher_model.model(batch['img'])
                # タプルの場合は最初の要素が主要な予測テンソル
                if isinstance(teacher_preds_raw, tuple):
                    teacher_preds_raw = teacher_preds_raw[0]

            # 生徒モデルの出力もタプルの場合は最初の要素を取得
            student_preds = preds
            if isinstance(preds, tuple):
                student_preds = preds[0]

            # 3. 生徒モデルの分類ロジット抽出
            # モデルのreg_maxとncを取得
            # self.model は nn.Module を指すように変更された
            reg_max_student = getattr(self.model[-1], 'reg_max', 16)  # Detection head
            nc_student = self.model.nc
            
            # 分類ロジットを抽出: [..., 4 * reg_max:]
            bbox_dims_student = 4 * reg_max_student
            student_cls_logits = student_preds[..., bbox_dims_student:bbox_dims_student + nc_student]

            # 4. 教師モデルの分類ロジット抽出
            teacher_nn_model = self.teacher_model.model
            reg_max_teacher = getattr(teacher_nn_model.model[-1], 'reg_max', 16)  # Detection head
            nc_teacher = teacher_nn_model.nc
            
            if nc_teacher != nc_student:
                print(f"Warning: Teacher nc ({nc_teacher}) and Student nc ({nc_student}) mismatch.")
                # クラス数が異なる場合は蒸留をスキップ
                raise ValueError("Class number mismatch between teacher and student models")
            
            bbox_dims_teacher = 4 * reg_max_teacher
            teacher_cls_logits = teacher_preds_raw[..., bbox_dims_teacher:bbox_dims_teacher + nc_teacher]

            # num_predictions (アンカー数など) が一致しているか確認
            if student_cls_logits.shape[-2] != teacher_cls_logits.shape[-2]:
                print(f"Warning: Mismatch in num_predictions between student ({student_cls_logits.shape[-2]}) and teacher ({teacher_cls_logits.shape[-2]}).")
                # 予測数が異なる場合、より小さい方に合わせるか、interpolationを行う
                min_preds = min(student_cls_logits.shape[-2], teacher_cls_logits.shape[-2])
                student_cls_logits = student_cls_logits[..., :min_preds, :]
                teacher_cls_logits = teacher_cls_logits[..., :min_preds, :]

            # 5. KLダイバージェンス損失の計算
            T = self.distillation_temperature
            
            # log_softmax と softmax を適用
            log_softmax_student = torch.nn.functional.log_softmax(student_cls_logits / T, dim=-1)
            softmax_teacher = torch.nn.functional.softmax(teacher_cls_logits / T, dim=-1)
            
            # KLダイバージェンス計算 (reduction='none' で要素ごとの損失を取得)
            kl_div_element_wise = torch.nn.functional.kl_div(
                log_softmax_student, 
                softmax_teacher, 
                reduction='none', 
                log_target=False
            )
            
            # クラス次元で合計し、バッチと予測次元で平均を取る
            # kl_div_element_wise.shape = (batch_size, num_predictions, num_classes)
            kd_cls_loss_val = kl_div_element_wise.sum(dim=-1).mean()  # sum over classes, mean over batch and predictions
            
            # 重みと温度の2乗を乗じる
            kd_cls_loss = self.alpha_distillation_cls * kd_cls_loss_val * (T ** 2)

            # print(f"KD CLS Loss: {kd_cls_loss.item():.6f}") # デバッグ用にコメントアウトまたは頻度を減らす

        except Exception as e:
            print(f"Error in KD CLS loss calculation: {e}. Skipping KD CLS loss for this batch.")
            # kd_cls_loss は既に torch.tensor(0.0) で初期化されている
            import traceback
            traceback.print_exc()

        # 6. 総損失の計算
        total_loss = loss_student + kd_cls_loss

        # 7. loss_items も更新 (蒸留損失項を追加)
        # loss_items_student は通常 (box_loss, cls_loss, dfl_loss) のテンソル
        loss_items_updated = torch.cat((loss_items_student, kd_cls_loss.detach().unsqueeze(0)))

        return total_loss, loss_items_updated
    
    def _do_train(self, world_size=1):
        """トレーニング後にメモリクリーンアップを追加"""
        try:
            # 親クラスのトレーニングメソッドを実行
            return super()._do_train(world_size)
        finally:
            # トレーニング終了後に必ずメモリクリーンアップ実行
            print("\nトレーニング終了後にメモリクリーンアップを実行中...")
            clear_memory()
            memory_profile("After Training Cleanup")
    
    def on_train_epoch_end(self):
        """エポック終了時にメモリクリーンアップ"""
        super().on_train_epoch_end() if hasattr(super(), 'on_train_epoch_end') else None
        
        # 10エポックごとにメモリクリーンアップ
        if hasattr(self, 'epoch') and self.epoch % 10 == 0:
            print(f"\nEpoch {self.epoch}: メモリクリーンアップ実行")
            clear_memory()

print("CustomDistillationTrainer定義完了")
memory_profile("After Trainer Definition")

CustomDistillationTrainer定義完了

=== Memory Profile: After Trainer Definition ===
Process Memory - RSS: 1.09GB, VMS: 2.49GB, Percent: 6.9%
CPU Memory - Used: 8.97GB / 15.83GB (56.6%)
GPU Memory - Allocated: 0.26GB, Cached: 0.29GB


In [6]:
if os.path.exists(dataset_yaml_path):
    print("トレーニングを開始します...")
    memory_profile("Before Training Start")
    
    # トレーナーの初期化に必要なオーバーライドを設定
    cfg_args = deepcopy(DEFAULT_CFG)
    cfg_args.data = dataset_yaml_path
    cfg_args.model = student_model_config # get_model で使うため、ここに設定
    cfg_args.epochs = epochs
    cfg_args.batch = batch_size
    cfg_args.imgsz = img_size
    cfg_args.project = project_name
    cfg_args.name = experiment_name
    cfg_args.device = device

    trainer = CustomDistillationTrainer(overrides=vars(cfg_args)) # または overrides として渡す
    
    try:
        trainer.train()
        print("トレーニングが完了しました。")
        print(f"結果は {os.path.join(project_name, experiment_name)} に保存されました。")
        
        memory_profile("After Training Complete")
        
    except Exception as e:
        print(f"トレーニング中にエラーが発生しました: {e}")
        import traceback
        traceback.print_exc()
        
    finally:
        # トレーナーオブジェクトを削除してメモリ解放
        print("\nトレーナーオブジェクトを削除してメモリクリーンアップ...")
        if 'trainer' in locals():
            # モデルをCPUに移動してから削除
            if hasattr(trainer, 'model') and trainer.model is not None:
                try:
                    trainer.model.cpu()
                except:
                    pass
            force_cleanup(trainer)
            del trainer
        
        memory_profile("Final Cleanup")
        
else:
    print(f"データセットYAML '{dataset_yaml_path}' が見つからないため、トレーニングをスキップします。")

トレーニングを開始します...

=== Memory Profile: Before Training Start ===
Process Memory - RSS: 1.09GB, VMS: 2.49GB, Percent: 6.9%
CPU Memory - Used: 9.00GB / 15.83GB (56.9%)
GPU Memory - Allocated: 0.26GB, Cached: 0.29GB
Ultralytics 8.3.146  Python-3.10.11 torch-2.7.0+cu128 CUDA:0 (NVIDIA GeForce RTX 4060, 8188MiB)
[34m[1mengine\trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=8, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=C:/Users/akama/AppData/Local/Programs/Python/Python310/python_file/projects/tennisvision/data/processed/datasets/final_merged_dataset/data.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=50, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=F

[34m[1mtrain: [0mScanning C:\Users\akama\AppData\Local\Programs\Python\Python310\python_file\projects\tennisvision\data\processed\datasets\final_merged_dataset\train\labels.cache... 6051 images, 0 backgrounds, 0 corrupt: 100%|██████████| 6051/6051 [00:00<?, ?it/s]



[34m[1mval: [0mFast image access  (ping: 0.20.0 ms, read: 618.4258.2 MB/s, size: 2489.8 KB)


[34m[1mval: [0mScanning C:\Users\akama\AppData\Local\Programs\Python\Python310\python_file\projects\tennisvision\data\processed\datasets\final_merged_dataset\val\labels.cache... 1295 images, 0 backgrounds, 0 corrupt: 100%|██████████| 1295/1295 [00:00<?, ?it/s]



Plotting labels to YOLOv8_Distillation_Demo\student_distilled_run18\labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001429, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to [1mYOLOv8_Distillation_Demo\student_distilled_run18[0m
Starting training for 50 epochs...
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001429, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to [1mYOLOv8_Distillation_Demo\student_distilled_run18[0m


       1/50      1.44G     0.8569     0.5641     0.8959         16        640: 100%|██████████| 757/757 [02:05<00:00,  6.03it/s]
       1/50      1.44G     0.8569     0.5641     0.8959         16        640: 100%|██████████| 757/757 [02:05<00:00,  6.03it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:15<00:00,  5.12it/s]

                   all       1295       3627      0.916      0.662      0.676      0.517






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       2/50      1.56G     0.8027     0.4758     0.8814         10        640: 100%|██████████| 757/757 [01:56<00:00,  6.50it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95):   0%|          | 0/81 [00:00<?, ?it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:17<00:00,  4.63it/s]

                   all       1295       3627      0.937      0.666      0.686      0.539






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       3/50      1.58G     0.7821     0.4619     0.8748         17        640: 100%|██████████| 757/757 [02:05<00:00,  6.04it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.96it/s]

                   all       1295       3627       0.87      0.671      0.674      0.524






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       4/50       1.6G     0.7451       0.44     0.8691         19        640: 100%|██████████| 757/757 [02:04<00:00,  6.08it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.01it/s]

                   all       1295       3627       0.83      0.676      0.681      0.568






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       5/50      1.62G     0.7158     0.4227     0.8617         22        640: 100%|██████████| 757/757 [01:56<00:00,  6.50it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.00it/s]

                   all       1295       3627      0.828      0.672      0.674      0.554






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       6/50      1.63G      0.695     0.4084     0.8563          5        640: 100%|██████████| 757/757 [01:55<00:00,  6.54it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.94it/s]

                   all       1295       3627      0.816      0.676      0.679      0.556






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       7/50      1.63G     0.6795     0.3947     0.8541          8        640: 100%|██████████| 757/757 [01:56<00:00,  6.51it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.02it/s]

                   all       1295       3627      0.826      0.673      0.682      0.565






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       8/50      1.63G     0.6626     0.3853       0.85         15        640: 100%|██████████| 757/757 [01:55<00:00,  6.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.96it/s]

                   all       1295       3627      0.831      0.676      0.677      0.566






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       9/50      1.63G     0.6476     0.3761     0.8495         10        640: 100%|██████████| 757/757 [01:56<00:00,  6.48it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.97it/s]


                   all       1295       3627      0.817      0.676      0.679      0.565

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      10/50      1.63G     0.6388     0.3693     0.8484         16        640: 100%|██████████| 757/757 [01:55<00:00,  6.53it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.00it/s]

                   all       1295       3627      0.837      0.678      0.682      0.582






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      11/50      1.63G     0.6344     0.3655     0.8431          8        640: 100%|██████████| 757/757 [01:56<00:00,  6.52it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.97it/s]

                   all       1295       3627      0.851      0.683      0.691      0.586






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      12/50      1.63G     0.6235     0.3609     0.8433         13        640: 100%|██████████| 757/757 [01:56<00:00,  6.50it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:15<00:00,  5.09it/s]

                   all       1295       3627      0.879      0.679      0.691      0.581






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      13/50      1.63G     0.6073     0.3518     0.8392          4        640: 100%|██████████| 757/757 [01:55<00:00,  6.53it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.89it/s]

                   all       1295       3627      0.842      0.685      0.689      0.587






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      14/50      1.63G     0.6038     0.3494     0.8381          9        640: 100%|██████████| 757/757 [01:56<00:00,  6.52it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.00it/s]

                   all       1295       3627      0.841      0.685      0.686      0.589






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      15/50      1.63G     0.5891     0.3422     0.8361         17        640: 100%|██████████| 757/757 [01:55<00:00,  6.58it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:17<00:00,  4.76it/s]

                   all       1295       3627      0.834      0.684      0.689      0.591






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      16/50      1.63G      0.594     0.3433      0.836         17        640: 100%|██████████| 757/757 [01:56<00:00,  6.47it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.01it/s]

                   all       1295       3627      0.851       0.68      0.698      0.593






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      17/50      1.63G     0.5907     0.3372     0.8355         11        640: 100%|██████████| 757/757 [01:55<00:00,  6.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.03it/s]

                   all       1295       3627      0.866      0.687      0.696      0.597






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      18/50      1.63G     0.5761      0.331     0.8347         18        640: 100%|██████████| 757/757 [01:55<00:00,  6.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.03it/s]

                   all       1295       3627       0.86      0.688      0.698      0.601






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      19/50      1.63G     0.5726     0.3332     0.8336         14        640: 100%|██████████| 757/757 [01:54<00:00,  6.58it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.85it/s]

                   all       1295       3627      0.863      0.688      0.704      0.596






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      20/50      1.63G     0.5685     0.3263     0.8311         16        640: 100%|██████████| 757/757 [01:57<00:00,  6.46it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.95it/s]

                   all       1295       3627      0.822      0.689      0.697        0.6






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      21/50      1.63G     0.5587     0.3217     0.8312         13        640: 100%|██████████| 757/757 [01:56<00:00,  6.48it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:15<00:00,  5.10it/s]

                   all       1295       3627       0.87       0.69      0.697        0.6






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      22/50      1.63G     0.5626     0.3222     0.8339         19        640: 100%|██████████| 757/757 [01:56<00:00,  6.52it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.96it/s]


                   all       1295       3627      0.873      0.688      0.697      0.601

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      23/50      1.63G     0.5465     0.3142     0.8301         19        640: 100%|██████████| 757/757 [01:56<00:00,  6.49it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.94it/s]

                   all       1295       3627      0.864      0.692        0.7      0.605






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      24/50      1.63G     0.5431     0.3117     0.8283         15        640: 100%|██████████| 757/757 [01:55<00:00,  6.53it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.99it/s]

                   all       1295       3627      0.879      0.695        0.7      0.601






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      25/50      1.63G     0.5464     0.3097     0.8315         13        640: 100%|██████████| 757/757 [01:55<00:00,  6.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.06it/s]

                   all       1295       3627      0.801       0.69        0.7      0.607






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      26/50      1.63G     0.5407     0.3074     0.8286         19        640: 100%|██████████| 757/757 [01:54<00:00,  6.59it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:15<00:00,  5.09it/s]

                   all       1295       3627      0.852      0.691      0.701      0.607






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      27/50      1.63G     0.5338     0.3048     0.8269         12        640: 100%|██████████| 757/757 [01:54<00:00,  6.61it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.05it/s]

                   all       1295       3627      0.856      0.692      0.701      0.608






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      28/50      1.63G     0.5329     0.3025     0.8277         12        640: 100%|██████████| 757/757 [01:56<00:00,  6.47it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.86it/s]

                   all       1295       3627       0.86      0.693      0.703       0.61






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      29/50      1.63G     0.5275     0.2992     0.8252          8        640: 100%|██████████| 757/757 [01:56<00:00,  6.51it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.98it/s]

                   all       1295       3627      0.873      0.694       0.71      0.612






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      30/50      1.63G     0.5236     0.2963     0.8235         13        640: 100%|██████████| 757/757 [01:56<00:00,  6.51it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.04it/s]

                   all       1295       3627      0.886      0.694      0.705       0.61






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      31/50      1.63G     0.5168     0.2943     0.8252         14        640: 100%|██████████| 757/757 [01:55<00:00,  6.54it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.96it/s]

                   all       1295       3627      0.883      0.693       0.71      0.612






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      32/50      1.63G     0.5099     0.2884     0.8207         10        640: 100%|██████████| 757/757 [01:56<00:00,  6.48it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.04it/s]

                   all       1295       3627      0.896      0.695       0.72      0.619






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      33/50      1.63G     0.5032     0.2832     0.8215         12        640: 100%|██████████| 757/757 [01:55<00:00,  6.55it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.91it/s]

                   all       1295       3627      0.878      0.697      0.707      0.617






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      34/50      1.63G     0.5085     0.2885      0.823         15        640: 100%|██████████| 757/757 [01:56<00:00,  6.51it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:17<00:00,  4.73it/s]

                   all       1295       3627      0.874      0.696      0.712      0.618






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      35/50      1.63G     0.4954     0.2806     0.8207         21        640: 100%|██████████| 757/757 [01:55<00:00,  6.53it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.98it/s]

                   all       1295       3627      0.875      0.697      0.702       0.61






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      36/50      1.63G     0.4979     0.2815     0.8218         14        640: 100%|██████████| 757/757 [01:56<00:00,  6.47it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.97it/s]

                   all       1295       3627        0.9      0.696      0.716      0.617






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      37/50      1.63G     0.4938     0.2784     0.8205         14        640: 100%|██████████| 757/757 [01:55<00:00,  6.57it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.95it/s]

                   all       1295       3627      0.878      0.698      0.709      0.619






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      38/50      1.63G     0.4836     0.2726     0.8185          6        640: 100%|██████████| 757/757 [02:00<00:00,  6.28it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.98it/s]

                   all       1295       3627      0.894      0.698      0.708      0.617






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      39/50      1.63G     0.4819     0.2708     0.8188          5        640: 100%|██████████| 757/757 [01:55<00:00,  6.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.02it/s]

                   all       1295       3627      0.872      0.697      0.708      0.619






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      40/50      1.63G      0.481     0.2692     0.8196         11        640: 100%|██████████| 757/757 [01:55<00:00,  6.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.96it/s]

                   all       1295       3627      0.896      0.698      0.714      0.623





Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      41/50      1.63G     0.4406     0.2444     0.7926          7        640: 100%|██████████| 757/757 [01:55<00:00,  6.54it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  5.00it/s]

                   all       1295       3627      0.886        0.7      0.704      0.616






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      42/50      1.63G      0.432     0.2382     0.7921          6        640: 100%|██████████| 757/757 [01:54<00:00,  6.64it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.98it/s]

                   all       1295       3627      0.895      0.698      0.708      0.619






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      43/50      1.63G     0.4262     0.2351     0.7922          6        640: 100%|██████████| 757/757 [01:56<00:00,  6.52it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.91it/s]

                   all       1295       3627       0.89      0.699       0.71      0.621






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      44/50      1.63G     0.4215      0.232     0.7899          7        640: 100%|██████████| 757/757 [01:53<00:00,  6.65it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.86it/s]

                   all       1295       3627      0.891      0.698      0.715      0.624






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      45/50      1.63G     0.4169     0.2285     0.7887          7        640: 100%|██████████| 757/757 [01:55<00:00,  6.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.97it/s]

                   all       1295       3627      0.893        0.7      0.712      0.624






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      46/50      1.63G     0.4163     0.2269     0.7896          9        640: 100%|██████████| 757/757 [01:54<00:00,  6.61it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.87it/s]

                   all       1295       3627      0.904      0.699       0.71      0.624






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      47/50      1.63G     0.4076     0.2225     0.7887          8        640: 100%|██████████| 757/757 [01:54<00:00,  6.59it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.94it/s]

                   all       1295       3627      0.884      0.701      0.707      0.624






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      48/50      1.63G      0.402     0.2202     0.7878          7        640: 100%|██████████| 757/757 [01:53<00:00,  6.67it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.97it/s]

                   all       1295       3627      0.885      0.701       0.71      0.626






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      49/50      1.63G     0.4005     0.2176     0.7858          7        640: 100%|██████████| 757/757 [01:55<00:00,  6.58it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.84it/s]

                   all       1295       3627      0.891      0.701      0.711      0.625






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      50/50      1.63G     0.3941      0.214     0.7843          8        640: 100%|██████████| 757/757 [01:55<00:00,  6.57it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:16<00:00,  4.82it/s]

                   all       1295       3627      0.902      0.701      0.713      0.626






50 epochs completed in 1.860 hours.
Optimizer stripped from YOLOv8_Distillation_Demo\student_distilled_run18\weights\last.pt, 6.5MB
Optimizer stripped from YOLOv8_Distillation_Demo\student_distilled_run18\weights\best.pt, 6.5MB

Validating YOLOv8_Distillation_Demo\student_distilled_run18\weights\best.pt...
Ultralytics 8.3.146  Python-3.10.11 torch-2.7.0+cu128 CUDA:0 (NVIDIA GeForce RTX 4060, 8188MiB)
YOLOv8n summary (fused): 72 layers, 3,151,904 parameters, 0 gradients, 8.7 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 81/81 [00:18<00:00,  4.48it/s]


                   all       1295       3627      0.902      0.701      0.713      0.626
          player_front       1275       1276      0.991          1      0.995      0.975
           player_back       1290       1290       0.99      0.999      0.995      0.848
           tennis_ball       1061       1061      0.724      0.104      0.148     0.0537
Speed: 0.5ms preprocess, 2.3ms inference, 0.0ms loss, 3.1ms postprocess per image
Results saved to [1mYOLOv8_Distillation_Demo\student_distilled_run18[0m

トレーニング終了後にメモリクリーンアップを実行中...
Memory cleared (CPU + GPU)

=== Memory Profile: After Training Cleanup ===
Process Memory - RSS: 0.97GB, VMS: 5.62GB, Percent: 6.1%
CPU Memory - Used: 13.80GB / 15.83GB (87.2%)
GPU Memory - Allocated: 0.34GB, Cached: 0.42GB
トレーニングが完了しました。
結果は YOLOv8_Distillation_Demo\student_distilled_run1 に保存されました。

=== Memory Profile: After Training Complete ===
Process Memory - RSS: 0.97GB, VMS: 5.62GB, Percent: 6.1%
CPU Memory - Used: 13.80GB / 15.83GB (87.2%)
GPU Mem

In [None]:
# 教師モデルと関連オブジェクトを削除
print("教師モデルと関連オブジェクトのメモリクリーンアップ...")

# グローバル変数をクリア
objects_to_clear = []
if 'teacher_model' in globals():
    # 教師モデルをCPUに移動してから削除
    try:
        teacher_model.cpu()
    except:
        pass
    objects_to_clear.append(teacher_model)
    
if 'cfg_args' in globals():
    objects_to_clear.append(cfg_args)

force_cleanup(*objects_to_clear)

# グローバル変数の削除
globals_to_delete = ['teacher_model', 'cfg_args', 'trainer']
for var_name in globals_to_delete:
    if var_name in globals():
        del globals()[var_name]

# 強制的なメモリクリーンアップ
for _ in range(5):
    gc.collect()

print("メモリクリーンアップ完了")
memory_profile("Complete Cleanup")

## 5.2. メモリ監視用ユーティリティ (デバッグ用)

メモリリークの特定やメモリ使用量の詳細な監視が必要な場合は以下を実行してください。

In [None]:
import tracemalloc

def start_memory_trace():
    """メモリトレースを開始"""
    tracemalloc.start()
    print("メモリトレースを開始しました")

def get_memory_trace_stats():
    """メモリトレースの統計を表示"""
    if not tracemalloc.is_tracing():
        print("メモリトレースが開始されていません")
        return
    
    current, peak = tracemalloc.get_traced_memory()
    print(f"Current memory usage: {current / 1024**2:.1f} MB")
    print(f"Peak memory usage: {peak / 1024**2:.1f} MB")
    
    # トップ10のメモリ使用者を表示
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics('lineno')
    
    print("\nTop 10 memory consumers:")
    for index, stat in enumerate(top_stats[:10], 1):
        print(f"{index}. {stat}")

def stop_memory_trace():
    """メモリトレースを停止"""
    if tracemalloc.is_tracing():
        tracemalloc.stop()
        print("メモリトレースを停止しました")

# 使用例:
# start_memory_trace()  # トレーニング前に実行
# get_memory_trace_stats()  # 途中でメモリ状況を確認
# stop_memory_trace()  # トレーニング後に実行