# YOLOv7 腮腺檢測

## 資料前處理

先複製一份資料集至```./data```

In [1]:
import os

dataset_path = "./data/data_20231207/"
test_dataset_path = "./data/demo_test9/"
new_dataset_path = "./data/newdataset/"

images_path = os.path.join(new_dataset_path, "images")
images_train_path = os.path.join(images_path, "train")
images_val_path = os.path.join(images_path, "val")
images_test_path = os.path.join(images_path, "test")

labels_path = os.path.join(new_dataset_path, "labels")
labels_train_path = os.path.join(labels_path, "train")
labels_val_path = os.path.join(labels_path, "val")
labels_test_path = os.path.join(labels_path, "test")


for path in [
    new_dataset_path,
    images_path,
    images_train_path,
    images_val_path,
    images_test_path,
    labels_path,
    labels_train_path,
    labels_val_path,
    labels_test_path,
]:
    if not os.path.exists(path):
        os.mkdir(path)

### 將 annotation 從 json 轉 txt

[轉換原理說明](https://hackmd.io/dEAIfI_hS4mG25OPhEK9JQ?both#%E8%B3%87%E6%96%99%E6%A0%BC%E5%BC%8F%E8%BD%89%E6%8F%9B-LablemeJson--gt-YOLOtxt)

In [2]:
from json2txt import json2txt

# json2txt(json_dir, txt_dir)
json_path = [os.path.join(dataset_path, "Cancer"), os.path.join(dataset_path, "Mix"), os.path.join(dataset_path, "Warthin"), test_dataset_path]
for path in json_path:
    json2txt(path)

./data/data_20231207/Cancer: 100 annotations
./data/data_20231207/Mix: 100 annotations
./data/data_20231207/Warthin: 100 annotations
./data/demo_test9/: 9 annotations


### 以YOLO架構切分Train、Val、Test資料集

In [12]:
# 若沒安裝過 scikit-learn 則執行下面這行
# ! pip install -U scikit-learn

#### Train、Val資料集

In [3]:
from sklearn.model_selection import train_test_split
import os
from glob import glob

all_image_path = glob(os.path.join(dataset_path, "*", "*.png"))  # image_path/*/*.png
print(f"Total: {len(all_image_path)} images")
all_annotation_path = [os.path.splitext(x)[0] + ".txt" for x in all_image_path]

X_train, X_val, y_train, y_val = train_test_split(all_image_path, all_annotation_path, test_size=0.2, random_state=42)
print(f"Train: {len(X_train)} images")
print(f"Val: {len(X_val)} images")


Total: 300 images
Train: 240 images
Val: 60 images


In [4]:
import shutil

for x, y in zip(X_train, y_train):
    shutil.copy(x, images_train_path)
    shutil.copy(y, labels_train_path)

for x, y in zip(X_val, y_val):
    shutil.copy(x, images_val_path)
    shutil.copy(y, labels_val_path)


#### Test 資料集

In [5]:
all_test_image_path = glob(os.path.join(test_dataset_path, "*.png"))
all_test_annotation_path = [os.path.splitext(x)[0] + ".txt" for x in all_test_image_path]
print(f"Test: {len(all_test_image_path)} images")

for x, y in zip(all_test_image_path, all_test_annotation_path):
    shutil.copy(x, images_test_path)
    shutil.copy(y, labels_test_path)

Test: 9 images


### 刪除 .txt(若路徑打錯)

In [3]:
# import os
# from glob import glob

# # del_path = test_label_path  # modify this path to delete the txt files, label_path or test_label_path
# del_path = r"C:\Users\Xsheep\Desktop\FinalProject\demo_test9"
# del_txt = glob(os.path.join(del_path, "*.txt"))
# for txt in del_txt:
#     os.remove(txt)

## 訓練Model

先到github下載[`yolov7_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt)

用 Terminal 執行以下命令(在jupyternotenook會看不到進度條)

In [1]:
! python train.py --workers 8 --device 0 --batch-size 32 --epochs 300 --img 640 640 --data ./data/data_20231207.yaml --hyp ./data/hyp.scratch.custom.yaml --cfg ./cfg/training/yolov7_custom.yaml --weights 'yolov7_training.pt'

## Test

[更改`detect.py`](https://hackmd.io/dEAIfI_hS4mG25OPhEK9JQ?both#Yolo%E7%9A%84-Detectpy-%E7%95%AB-Bounding-box)

`--name exp`要改成該次exp的名字

In [None]:
# ! python mydetect.py --weights runs/train/nofliplr/weights/best.pt --conf-thres 0.4  --iou-thres 0.5 --source data/newdataset/images/test --device 0 --save-txt --save-conf --agnostic-nms

In [None]:
! python mydetect.py --weights runs/train/nofliplr/weights/best.pt --conf-thres 0.5  --iou-thres 0.45 --source data/newdataset/images/test --device 0 --save-txt --save-conf