# Entraînement de yolo V7

In [1]:
import json
import os
import random
from tqdm import tqdm
import shutil
import datetime

## Yolo V7

On clone directement le repo de [yolov7](https://github.com/WongKinYiu/yolov7.git) pour pouvoir réentrainer le modèle sur nos données.

In [2]:
!git clone https://github.com/WongKinYiu/yolov7.git

fatal: destination path 'yolov7' already exists and is not an empty directory.


Puis on installe le requirements. Selon la machine GPU que vous sélectionnez sur paperspace Gradient, vous pouvez avoir besoin (ou pas) de downgrader les versions Torch et Torchvision. Ici c'est le cas avec une VM A4000.

In [3]:
!pip install -r ./yolov7/requirements.txt
!pip install setuptools==59.5.0
!pip install torchvision==0.11.3+cu111 -f https://download.pytorch.org/whl/cu111/torch_stable.html

[0mLooking in links: https://download.pytorch.org/whl/cu111/torch_stable.html
[0m

## Train test split

On fait le découpage à la main à partir des noms d'images et on met directement les datasets `train`, `val`, `test` ainsi que le fichier .yaml associé dans le repo yolov7 pour l'entraînement.

In [4]:
# Split dataset

# read json file
with open('./TACO/data/images/annotations_wo_subdir.json', 'r+') as file:
    json_file = json.load(file)
    
# create directories (with replacement if exists)
for dirname in ['train', 'val', 'test']:
    dirpath = f"./yolov7/data/TACObbox/{dirname}"
    if os.path.exists(dirpath):
        shutil.rmtree(dirpath)
    os.makedirs(dirpath + '/images')
    os.makedirs(dirpath + '/labels')
    
# create yaml file (with replacement if exists)
cats = [cat['name'] for cat in json_file['categories']]

with open('./yolov7/data/TACObbox.yaml', 'w') as f:
    f.write(
f"""train: ./data/TACObbox/train/images
val: ./data/TACObbox/val/images
test: ./data/TACObbox/test/images

nc: {len(cats)}
names: {cats}""")
    
    
# read json annotations file
with open('./TACO/data/images/annotations_wo_subdir.json', 'r+') as file:
    json_file = json.load(file)

# get images names and shuffle
img_names = [img['file_name'].split('.')[0] for img in json_file['images']]
random.shuffle(img_names)

# create a splitting dictionnary
split = {
    'train' : img_names[:1200],
    'val' : img_names[1200:1400],
    'test' : img_names[1400:]
}

# copy each image and its label in the right directory
for setname, sample in split.items():
    print(f"Copying images to {setname.upper()} directory")
    for imgname in tqdm(sample):
        shutil.copy(f"./TACO/data/images/{imgname}.jpg", f"./yolov7/data/TACObbox/{setname}/images/{imgname}.jpg")
        shutil.copy(f"./TACO/data/labels/{imgname}.txt", f"./yolov7/data/TACObbox/{setname}/labels/{imgname}.txt")

Copying images to TRAIN directory


100%|██████████| 1200/1200 [00:06<00:00, 174.72it/s]


Copying images to VAL directory


100%|██████████| 200/200 [00:01<00:00, 181.41it/s]


Copying images to TEST directory


100%|██████████| 100/100 [00:01<00:00, 60.78it/s]


## Entraînement (transfer learning)

On doit dans un premier temps téléchargement les poids initiaux du modèle pré-entraîné puis lancer l'entraînement sur nos données.

In [5]:
%cd yolov7

/notebooks/yolov7


In [6]:
if os.path.exists('yolov7_training.pt'):
    print("Déjà téléchargé")
else:
    !wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt

--2022-10-19 20:03:44--  https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/511187726/13e046d1-f7f0-43ab-910b-480613181b1f?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20221019%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221019T200344Z&X-Amz-Expires=300&X-Amz-Signature=f0f02a9be9243d8aaecf6a0160c8f2311893d4d452df666f636bf2a610e0c850&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=511187726&response-content-disposition=attachment%3B%20filename%3Dyolov7_training.pt&response-content-type=application%2Foctet-stream [following]
--2022-10-19 20:03:44--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/511187726/13e046d1-f7f0-43ab-910b-480613181b1f?X-A

In [7]:
# entraînement sur un seul GPU
start = datetime.datetime.now()
print(f"Lancement de l'entraînement : {start.strftime('%H:%M')}")
print(f"_________________________________________________________________")


!python train.py --workers 8 --device 0 --batch-size 8 --data data/TACObbox.yaml --img 640 640 \
    --cfg cfg/training/yolov7.yaml --weights 'yolov7_training.pt' --name yolov7-TACObbox \
    --hyp data/hyp.scratch.custom.yaml --epochs 4


print(f"_________________________________________________________________")
print(f"Durée de l'entraînement : {datetime.datetime.now() - start}")

Lancement de l'entraînement : 20:03
_________________________________________________________________
YOLOR 🚀 v0.1-115-g072f76c torch 1.10.2+cu111 CUDA:0 (NVIDIA RTX A4000, 16117.3125MB)

Namespace(weights='yolov7_training.pt', cfg='cfg/training/yolov7.yaml', data='data/TACObbox.yaml', hyp='data/hyp.scratch.custom.yaml', epochs=4, batch_size=8, img_size=[640, 640], rect=False, resume=False, nosave=False, notest=False, noautoanchor=False, evolve=False, bucket='', cache_images=False, image_weights=False, device='0', multi_scale=False, single_cls=False, adam=False, sync_bn=False, local_rank=-1, workers=8, project='runs/train', entity=None, name='yolov7-TACObbox', exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias='latest', freeze=[0], v5_metric=False, world_size=1, global_rank=-1, save_dir='runs/train/yolov7-TACObbox3', total_batch_size=8)
[34m[1mtensorboard: [0mStart with 'tensorboard --logdir runs/t