<a href="https://colab.research.google.com/github/Vadimbuildercxx/AI-Learning/blob/main/Training_YOLOv7_on_Custom_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to Train YOLOv7 on a Custom Dataset

This tutorial is based on the [YOLOv7 repository](https://github.com/WongKinYiu/yolov7) by WongKinYiu. This notebook shows training on **your own custom objects**. Many thanks to WongKinYiu and AlexeyAB for putting this repository together.


### **Accompanying Blog Post**

We recommend that you follow along in this notebook while reading the blog post on [how to train YOLOv7](https://blog.roboflow.com/yolov7-custom-dataset-training-tutorial/), concurrently.

### **Steps Covered in this Tutorial**

To train our detector we take the following steps:

* Install YOLOv7 dependencies
* Load custom dataset from Roboflow in YOLOv7 format
* Run YOLOv7 training
* Evaluate YOLOv7 performance
* Run YOLOv7 inference on test images
* OPTIONAL: Deployment
* OPTIONAL: Active Learning


### Preparing a Custom Dataset

In this tutorial, we will utilize an open source computer vision dataset from one of the 90,000+ available on [Roboflow Universe](https://universe.roboflow.com).

If you already have your own images (and, optionally, annotations), you can convert your dataset using [Roboflow](https://roboflow.com), a set of tools developers use to build better computer vision models quickly and accurately. 100k+ developers use roboflow for (automatic) annotation, converting dataset formats (like to YOLOv7), training, deploying, and improving their datasets/models.

Follow [the getting started guide here](https://docs.roboflow.com/quick-start) to create and prepare your own custom dataset.

#Install Dependencies

_(Remember to choose GPU in Runtime if not already selected. Runtime --> Change Runtime Type --> Hardware accelerator --> GPU)_

In [None]:
# Download YOLOv7 repository and install requirements
!git clone https://github.com/WongKinYiu/yolov7
%cd yolov7
!pip install -r requirements.txt

Cloning into 'yolov7'...
remote: Enumerating objects: 1127, done.[K
remote: Total 1127 (delta 0), reused 0 (delta 0), pack-reused 1127[K
Receiving objects: 100% (1127/1127), 69.94 MiB | 33.03 MiB/s, done.
Resolving deltas: 100% (521/521), done.
/content/yolov7
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting thop
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Collecting jedi>=0.10
  Downloading jedi-0.18.2-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: jedi, thop
Successfully installed jedi-0.18.2 thop-0.1.1.post2209072238


# Download Correctly Formatted Custom Data

Next, we'll download our dataset in the right format. Use the `YOLOv7 PyTorch` export. Note that this model requires YOLO TXT annotations, a custom YAML file, and organized directories. The roboflow export writes this for us and saves it in the correct spot.


In [None]:
!pip install roboflow
from roboflow import Roboflow
rf = Roboflow(api_key="iK3IO4RGymzdS5x0XVwb")
project = rf.workspace("joseph-nelson").project("hard-hat-workers")
dataset = project.version(10).download("yolov7")

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting roboflow
  Downloading roboflow-0.2.29-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.0/49.0 KB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting pyparsing==2.4.7
  Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.8/67.8 KB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
Collecting python-dotenv
  Downloading python_dotenv-0.21.1-py3-none-any.whl (19 kB)
Collecting requests-toolbelt
  Downloading requests_toolbelt-0.10.1-py2.py3-none-any.whl (54 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.5/54.5 KB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
Collecting wget
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting cycler==0.10.0
  Downloading cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collectin

loading Roboflow workspace...
loading Roboflow project...
Downloading Dataset Version Zip in Hard-Hat-Workers-10 to yolov7pytorch: 100% [245541925 / 245541925] bytes


Extracting Dataset Version Zip to Hard-Hat-Workers-10 in yolov7pytorch:: 100%|██████████| 14082/14082 [00:03<00:00, 3880.13it/s]


# The part of dataset

In [None]:
# Get random 10% of training images
import random
import os
from os import listdir
from os.path import isfile, join
import re

# Setup data paths
dataset_name = "Hard-Hat-Workers-10"
data_path = "/content/yolov7" + "/" + dataset_name
target_classes = ["head", "helmet", "person"]

# Change amount of data to get (e.g. 0.1 = random 10%, 0.2 = random 20%)
amount_to_get = 0.1

# Create function to separate a random amount of data
def get_subset(image_path=data_path,
               data_splits=["test", "train", "valid"], 
               target_classes=["pizza", "steak", "sushi"],
               amount=0.1,
               seed=42):
    random.seed(42)
    label_splits = {}
    
    # Get labels
    for data_split in data_splits:
        print(f"[INFO] Creating image split for: {data_split}...")
        images_path = data_path + "/" + f"{data_split}" + "/"+ "images"
        image_paths = [f for f in listdir(images_path) if isfile(join(images_path, f))]

        index = 1
        
        number_to_sample = round(amount * len(image_paths))
        # print(f"[INFO] Getting random subset of {number_to_sample} images for {data_split}...")
        sampled_images_path_sliced = random.sample(image_paths, k=number_to_sample)
        print(f"files detected {len(sampled_images_path_sliced)} the {index}th file name: {sampled_images_path_sliced[index]}")
        


        label_path = data_path + "/" + f"{data_split}" + "/"+ "labels"
        label_paths_list = [f for f in listdir(label_path) if (isfile(join(label_path, f)) 
                                                                    and (f[:-4] + ".jpg") in sampled_images_path_sliced)]                                           
        print(f"files detected {len(label_paths_list)} the {index}th file name: {label_paths_list[index]}")

        label_splits[data_split]= {"images": sampled_images_path_sliced,"labels": label_paths_list}
    return label_splits
        
label_splits = get_subset(amount=amount_to_get)
label_splits["train"]["labels"][:10]

[INFO] Creating image split for: test...
files detected 71 the 1th file name: 005619_jpg.rf.9475f681ce754db0ab900a09e1fb4adc.jpg
files detected 71 the 1th file name: 005796_jpg.rf.53d3dfd1db86cb201f2137b4c60e4d3d.txt
[INFO] Creating image split for: train...
files detected 492 the 1th file name: 000173_jpg.rf.41b98513aaeecfb283d23df3e91cd44f.jpg
files detected 492 the 1th file name: 000735_jpg.rf.e5184da823a0bc624067f2d001a88434.txt
[INFO] Creating image split for: valid...
files detected 141 the 1th file name: 006936_jpg.rf.3a8dd93a22195e440abdc314ccb97431.jpg
files detected 141 the 1th file name: 004998_jpg.rf.2583635ad1948ede46836a416361d54e.txt


['004607_jpg.rf.4c1a9ec865263ba3a58ee31ee770cd48.txt',
 '000735_jpg.rf.e5184da823a0bc624067f2d001a88434.txt',
 '004465_jpg.rf.df7b10e5be196831935bbaf5f753514f.txt',
 '004687_jpg.rf.b7d9da0f4236cb53cdb3d09d28625dee.txt',
 '003714_jpg.rf.c7c0a411e1b57b618c1e63dbb709534d.txt',
 '002173_jpg.rf.85167615cb57ea23473e972436707d79.txt',
 '000743_jpg.rf.5dbd63c1d3e5e50ddb0e8b2a59cfec7c.txt',
 '004216_jpg.rf.8f07dc12bf379849fbfd091af4b0dc58.txt',
 '004458_jpg.rf.aaacbb6996b7e4a51d565f7e0a6353e9.txt',
 '003960_jpg.rf.b2967296fdac44c251ec2e60529beb63.txt']

In [None]:
import shutil
import yaml

In [None]:
#
def create_sub_dataset_dir(old_dir_in, new_dir_in, label_splits):
  workspace_path = "/content/yolov7/"
  new_directory = workspace_path + new_dir_in
  old_directory = workspace_path + old_dir_in
  data_splits=["test", "train", "valid"]
  data_types= ["images", "labels"]
  if not os.path.exists(new_directory):
    os.mkdir(new_directory)
  
  for data_split in data_splits:
    split_dir = new_directory + "/" + data_split
    split_dir_old = old_directory + "/" + data_split
    if not os.path.exists(split_dir):
      os.mkdir(split_dir)

    for data_type in data_types:
      split_dir_type = split_dir + "/" + data_type
      split_dir_type_old = split_dir_old + "/" + data_type
      if not os.path.exists(split_dir_type):
        os.mkdir(split_dir_type)

      files=os.listdir(split_dir_type_old)
      
      # iterating over all the files in
      # the source directory
      print(len(files))
      for fname in files:
        
        # copying the files to the
        # destination directory
        if fname in label_splits[data_split][data_type] and not os.path.exists(os.path.join(split_dir_type,fname)):
          print(os.path.join(split_dir_type_old,fname))
          shutil.copy2(os.path.join(split_dir_type_old,fname), split_dir_type)

create_sub_dataset_dir("Hard-Hat-Workers-10", "Hard-Hat-Workers-10__10per", label_splits)

706
/content/yolov7/Hard-Hat-Workers-10/test/images/005863_jpg.rf.27c1a479dccb5aff0cc28747a99babc8.jpg
/content/yolov7/Hard-Hat-Workers-10/test/images/005934_jpg.rf.9b88756f84710c89bc49e4af6bdfbdac.jpg
/content/yolov7/Hard-Hat-Workers-10/test/images/005783_jpg.rf.b0d26f276a1bc1df46aee2477774283d.jpg
/content/yolov7/Hard-Hat-Workers-10/test/images/005738_jpg.rf.09f49a6673da104588b927ff6f94c7ac.jpg
/content/yolov7/Hard-Hat-Workers-10/test/images/005303_jpg.rf.3d32f0671214557596bb04022c82efda.jpg
/content/yolov7/Hard-Hat-Workers-10/test/images/005624_jpg.rf.b19e3a2d9991d5fa0901c91024cff172.jpg
/content/yolov7/Hard-Hat-Workers-10/test/images/005350_jpg.rf.f7b899e1160b8de5f1f5a470c2fbff4b.jpg
/content/yolov7/Hard-Hat-Workers-10/test/images/005570_jpg.rf.66db82fe73e3353ab1c1d2ab773c5c91.jpg
/content/yolov7/Hard-Hat-Workers-10/test/images/005646_jpg.rf.41a798048e258458a7cd41fe01052b18.jpg
/content/yolov7/Hard-Hat-Workers-10/test/images/005436_jpg.rf.2da0d7991e7ec319ac14d3a407334ed2.jpg
/conte

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
import numpy as np
#/content/yolov7/Hard-Hat-Workers-14_v3/valid/labels/006025_jpg.rf.6c083383eb74bf01f34479ab259411b7.txt
#filename = "006386_jpg.rf.bcee45151d9aa656eca42d3fc34e7577" 
filename = "004944_jpg.rf.9aa40fc1e3cf340ac98d3946c21ab3a8"
dataset_name = "Hard-Hat-Workers-10__10per"
im = Image.open(f'/content/yolov7/{dataset_name}/valid/images/{filename}.jpg')

with open(f"/content/yolov7/{dataset_name}/valid/labels/{filename}.txt") as f:
  data = f.read()
rects = np.array([row.split(" ") for row in data.split("\n")], dtype=np.float32)
print(rects)
# Create figure and axes
fig, ax = plt.subplots()

# Display the image
ax.imshow(im)

# Create a Rectangle patch
for rect_info in rects:
  rect = patches.Rectangle((rect_info[1] * im.width - (rect_info[3] * im.width) / 2.0,
                            rect_info[2] * im.height - (rect_info[4] * im.height) / 2.0), 
                           rect_info[3] * im.width, rect_info[4] * im.height, linewidth=1, edgecolor='r', facecolor='none')

  # Add the patch to the Axes
  ax.add_patch(rect)

plt.show()

FileNotFoundError: ignored

# ***Augmentation***

In [None]:
!pip install -U albumentations

# Begin Custom Training

We're ready to start custom training.

NOTE: We will only modify one of the YOLOv7 training defaults in our example: `epochs`. We will adjust from 300 to 100 epochs in our example for speed. If you'd like to change other settings, see details in [our accompanying blog post](https://blog.roboflow.com/yolov7-custom-dataset-training-tutorial/).

In [None]:
# download COCO starting checkpoint
%cd /content/yolov7
!wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt

/content/yolov7
--2023-02-17 13:23:14--  https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/511187726/13e046d1-f7f0-43ab-910b-480613181b1f?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230217%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230217T132314Z&X-Amz-Expires=300&X-Amz-Signature=e155a91766fa9b21c023dfd816e18e9a850ebd40c84564628c3151fd7545ea9f&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=511187726&response-content-disposition=attachment%3B%20filename%3Dyolov7_training.pt&response-content-type=application%2Foctet-stream [following]
--2023-02-17 13:23:14--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/511187726/13e046d1-f7f0-43ab-9

In [None]:
%cd /content/yolov7

!python train.py --batch 8 --epochs 10 --img 1280 1280 --data /content/yolov7/Hard-Hat-Workers-10/data.yaml --weights '/content/yolov7/yolov7_training.pt' --freeze 50 --project "/content/drive/MyDrive/exp_10epochs"

/content/yolov7
YOLOR 🚀 v0.1-121-g2fdc7f1 torch 1.13.1+cu116 CUDA:0 (Tesla T4, 15109.875MB)

Namespace(adam=False, artifact_alias='latest', batch_size=8, bbox_interval=-1, bucket='', cache_images=False, cfg='', data='/content/yolov7/Hard-Hat-Workers-10/data.yaml', device='', entity=None, epochs=10, evolve=False, exist_ok=False, freeze=[50], global_rank=-1, hyp='data/hyp.scratch.p5.yaml', image_weights=False, img_size=[1280, 1280], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='/content/drive/MyDrive/exp_10epochs', quad=False, rect=False, resume=False, save_dir='/content/drive/MyDrive/exp_10epochs/exp', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=8, upload_dataset=False, v5_metric=False, weights='/content/yolov7/yolov7_training.pt', workers=8, world_size=1)
[34m[1mtensorboard: [0mStart with 'tensorboard --logdir /content/drive/MyDrive/exp_10epochs', view at http://local

In [None]:
%cd /content/yolov7

!python train.py --batch 8 --epochs 5 --img 1280 1280 --data /content/yolov7/Hard-Hat-Workers-10/data.yaml --weights 'yolov7_training.pt' --freeze 50


/content/yolov7
YOLOR 🚀 v0.1-121-g2fdc7f1 torch 1.13.1+cu116 CUDA:0 (Tesla T4, 15109.875MB)

Namespace(adam=False, artifact_alias='latest', batch_size=8, bbox_interval=-1, bucket='', cache_images=False, cfg='', data='/content/yolov7/Hard-Hat-Workers-10/data.yaml', device='', entity=None, epochs=5, evolve=False, exist_ok=False, freeze=[50], global_rank=-1, hyp='data/hyp.scratch.p5.yaml', image_weights=False, img_size=[1280, 1280], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, save_dir='runs/train/exp2', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=8, upload_dataset=False, v5_metric=False, weights='yolov7_training.pt', workers=8, world_size=1)
[34m[1mtensorboard: [0mStart with 'tensorboard --logdir runs/train', view at http://localhost:6006/
2023-02-17 13:25:19.913069: I tensorflow/core/platform/cpu_feature_guard.cc:193]

In [None]:
%load_ext tensorboard
%tensorboard --logdir runs


# Evaluation

We can evaluate the performance of our custom training using the provided evalution script.

Note we can adjust the below custom arguments. For details, see [the arguments accepted by detect.py](https://github.com/WongKinYiu/yolov7/blob/main/detect.py#L154).

In [None]:
!python detect.py --weights runs/train/exp2/weights/best.pt --conf 0.3 --source /content/yolov7/Hard-Hat-Workers-10/test/images

In [None]:
# Run evaluation
!python detect.py --weights runs/train/exp3/weights/best.pt  --img-size 1280 --conf 0.1 --source /content/hight_cam3.jpg


Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.1, device='', exist_ok=False, img_size=1280, iou_thres=0.45, name='exp', no_trace=False, nosave=False, project='runs/detect', save_conf=False, save_txt=False, source='/content/hight_cam3.jpg', update=False, view_img=False, weights=['runs/train/exp3/weights/best.pt'])
YOLOR 🚀 v0.1-121-g2fdc7f1 torch 1.13.1+cu116 CUDA:0 (Tesla T4, 15109.875MB)

Fusing layers... 
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
IDetect.fuse
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 314 layers, 36492560 parameters, 6194944 gradients, 103.2 GFLOPS
 Convert model to Traced-model... 
 traced_script_module saved! 
 model is traced! 

14 helmets, 1 person, Done. (40.9ms) Inference, (1.7ms) NMS
 The image with the result is saved in: runs/detect/exp10/hight_cam3.jpg
Done. (0.287s)


In [None]:
#display inference on ALL test images

import glob
from IPython.display import Image, display

i = 0
limit = 100 # max images to print
for imageName in glob.glob('/content/yolov7/runs/detect/exp/*.jpg'): #assuming JPG
    if i < limit:
      display(Image(filename=imageName))
      print("\n")
    i = i + 1
    

In [None]:
from google.colab import drive
drive.mount('/content/drive')
# copy it there
!cp -av /content/yolov7/runs/train/exp3 /content/drive/MyDrive

In [None]:
!python test.py --weights runs/train/exp3/weights/best.pt  --img-size 1280 --data /content/yolov7/Hard-Hat-Workers-10/data.yaml

Namespace(augment=False, batch_size=32, conf_thres=0.001, data='/content/yolov7/Hard-Hat-Workers-10/data.yaml', device='', exist_ok=False, img_size=1280, iou_thres=0.65, name='exp', no_trace=False, project='runs/test', save_conf=False, save_hybrid=False, save_json=False, save_txt=False, single_cls=False, task='val', v5_metric=False, verbose=False, weights=['runs/train/exp3/weights/best.pt'])
YOLOR 🚀 v0.1-121-g2fdc7f1 torch 1.13.1+cu116 CUDA:0 (Tesla T4, 15109.875MB)

Fusing layers... 
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
IDetect.fuse
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 314 layers, 36492560 parameters, 6194944 gradients, 103.2 GFLOPS
 Convert model to Traced-model... 
 traced_script_module saved! 
 model is traced! 

[34m[1mval: [0mScanning 'Hard-Hat-Workers-10/valid/labels.cache' images and labels... 1413 found, 0 missing, 0 empty, 0 corrupted: 100% 1413/1413 [00:00<?, ?it/s]
               Cl

# Reparameterize for Inference

https://github.com/WongKinYiu/yolov7/blob/main/tools/reparameterization.ipynb

# OPTIONAL: Deployment

To deploy, you'll need to export your weights and save them to use later.

In [None]:
# optional, zip to download weights and results locally

!zip -r export.zip runs/detect
!zip -r export.zip runs/train/exp/weights/best.pt
!zip export.zip runs/train/exp/*

# OPTIONAL: Active Learning Example

Once our first training run is complete, we should use our model to help identify which images are most problematic in order to investigate, annotate, and improve our dataset (and, therefore, model).

To do that, we can execute code that automatically uploads images back to our hosted dataset if the image is a specific class or below a given confidence threshold.


In [None]:
# # setup access to your workspace
# rf = Roboflow(api_key="YOUR_API_KEY")                               # used above to load data
# inference_project =  rf.workspace().project("YOUR_PROJECT_NAME")    # used above to load data
# model = inference_project.version(1).model

# upload_project = rf.workspace().project("YOUR_PROJECT_NAME")

# print("inference reference point: ", inference_project)
# print("upload destination: ", upload_project)

In [None]:
# # example upload: if prediction is below a given confidence threshold, upload it 

# confidence_interval = [10,70]                                   # [lower_bound_percent, upper_bound_percent]

# for prediction in predictions:                                  # predictions list to loop through
#   if(prediction['confidence'] * 100 >= confidence_interval[0] and 
#           prediction['confidence'] * 100 <= confidence_interval[1]):
        
#           # upload on success!
#           print(' >> image uploaded!')
#           upload_project.upload(image, num_retry_uploads=3)     # upload image in question

# Next steps

Congratulations, you've trained a custom YOLOv7 model! Next, start thinking about deploying and [building an MLOps pipeline](https://docs.roboflow.com) so your model gets better the more data it sees in the wild.