Name: Meghana Rabba and Rohan Singh Rajendra Singh

Hawk ID: A20572009 and A20572007

Subject: CS512 (Computer Vision)

Project: Road Damage Detection using YOLOv7 and Coordinate Attention

Kernel used: Python 3.9.21 (creating a "yolov7" environment using conda)

For GPU processing: CUDA 11.8 (Nvidia RTX 4070 Super)

YOLOv7 cloning

In [1]:
!git clone https://github.com/WongKinYiu/yolov7
%cd yolov7

d:\CS512-project\yolov7


Cloning into 'yolov7'...


YOLOv7 dependencies installation

In [2]:
# Install requirements
!pip install -r requirements.txt




In [3]:
# Ensure torch is CUDA-enabled
import torch
print("CUDA Available:", torch.cuda.is_available())
print("Torch CUDA version:", torch.version.cuda)
print("Device Name:", torch.cuda.get_device_name(0))

CUDA Available: True
Torch CUDA version: 11.8
Device Name: NVIDIA GeForce RTX 4070 SUPER


Creating a dataset folder structure

In [4]:
import os
from pathlib import Path

# Define dataset paths
base_path = Path("datasets/RDD2022_combined")
(base_path / 'images/train').mkdir(parents=True, exist_ok=True)
(base_path / 'images/val').mkdir(parents=True, exist_ok=True)
(base_path / 'labels/train').mkdir(parents=True, exist_ok=True)
(base_path / 'labels/val').mkdir(parents=True, exist_ok=True)

print("Dataset folder structure created.")

Dataset folder structure created.


Checking the Annotations folder for xml files that need to be converted to YOLO format (images/train to labels/train)

In [87]:
import os
import xml.etree.ElementTree as ET
from pathlib import Path

# Function to convert XML to YOLO format
def convert_xml_to_yolo(xml_folder, img_folder, output_folder):
    # Make output directory if it doesn't exist
    Path(output_folder).mkdir(parents=True, exist_ok=True)

    # Iterate over all XML files in the folder
    for xml_file in os.listdir(xml_folder):
        if xml_file.endswith(".xml"):
            xml_path = os.path.join(xml_folder, xml_file)
            tree = ET.parse(xml_path)
            root = tree.getroot()

            # Get image width and height from the XML file
            image_filename = root.find("filename").text
            img_path = os.path.join(img_folder, image_filename)
            img_width, img_height = get_image_dimensions(img_path)  # Add your image dimension function here

            # Prepare YOLO annotation file
            txt_filename = os.path.splitext(xml_file)[0] + ".txt"
            txt_path = os.path.join(output_folder, txt_filename)
            with open(txt_path, "w") as txt_file:
                # Iterate through each object in the XML
                for obj in root.findall("object"):
                    class_name = obj.find("name").text
                    # Convert class name to class id (you need to create a dictionary for this)
                    class_id = class_dict.get(class_name, None)
                    if class_id is not None:
                        bndbox = obj.find("bndbox")
                        xmin = int(bndbox.find("xmin").text)
                        ymin = int(bndbox.find("ymin").text)
                        xmax = int(bndbox.find("xmax").text)
                        ymax = int(bndbox.find("ymax").text)

                        # Convert bounding box to YOLO format (normalized)
                        center_x = (xmin + xmax) / 2 / img_width
                        center_y = (ymin + ymax) / 2 / img_height
                        width = (xmax - xmin) / img_width
                        height = (ymax - ymin) / img_height

                        # Write to the txt file in YOLO format
                        txt_file.write(f"{class_id} {center_x} {center_y} {width} {height}\n")

# Function to get image dimensions (width, height)
def get_image_dimensions(img_path):
    from PIL import Image
    with Image.open(img_path) as img:
        return img.size

# Define class dictionary (adjust according to your dataset)
class_dict = {
    'D00': 0, # Longitudinal cracks
    'D10': 1, # Transverse cracks
    'D20': 2, # Alligator cracks
    'D40': 3, # Potholes and surface damage
    'D43': 4, # Background man-made objects like repair patches, manholes, etc.
    'D44': 5 # Roadside objects like water drains, poles, vegetation (These are not defects, but contextually important)
}

# Paths to the directories
xml_folder = "datasets/RDD2022_combined/annotations/train"
img_folder = "datasets/RDD2022_combined/images/train"
output_folder = "datasets/RDD2022_combined/labels/train"

# Convert all XML annotations to YOLO format
convert_xml_to_yolo(xml_folder, img_folder, output_folder)


YOLOv7 yaml file configuration (RDD2022.yaml)

In [86]:
dataset_yaml = """
train: datasets/RDD2022_combined/images/train
val: datasets/RDD2022_combined/images/val

nc: 6
names: ['D00', 'D10', 'D20', 'D40', 'D43', 'D44']
"""

with open("RDD2022.yaml", "w") as f:
    f.write(dataset_yaml.strip())

print("YOLOv7 dataset YAML created: RDD2022.yaml")


YOLOv7 dataset YAML created: RDD2022.yaml


Downloading Coordinate Attention module

In [20]:
import urllib.request

url = "https://raw.githubusercontent.com/houqb/CoordAttention/main/coordatt.py"
save_path = "models/common/coordatt.py"

# Create the directory if it doesn't exist
import os
os.makedirs("models/common", exist_ok=True)

# Download the file
urllib.request.urlretrieve(url, save_path)

print("coordatt.py downloaded successfully.")


coordatt.py downloaded successfully.


Modify YOLOv7 to include CoordAtt (injecting into the model as a fresh import at the start)

In [23]:
# Inject import for CoordAtt into models/yolo.py
file_path = "models/yolo.py"

with open(file_path, "r") as file:
    lines = file.readlines()

# Check if 'CoordAtt' is already imported; if not, add it
import_line_added = False
for i, line in enumerate(lines):
    if "from models.common import" in line:
        # Ensure "CoordAtt" is imported correctly
        if "CoordAtt" not in line:
            lines[i] = line.strip() + ", CoordAtt\n"  # Add CoordAtt import
            import_line_added = True
        break

# If the import wasn't added, add it at the start
if not import_line_added:
    lines.insert(0, "from models.common.coordatt import CoordAtt\n")  # Add it to the top if necessary

# Write the modified content back to the file
with open(file_path, "w") as file:
    file.writelines(lines)

print("CoordAtt imported into models/yolo.py")


CoordAtt imported into models/yolo.py


Modify YAML config to use CoordAtt block

In [24]:
import shutil

# Copy original config
shutil.copy("cfg/training/yolov7.yaml", "cfg/training/yolov7_CA.yaml")

# Inject CoordAtt block in backbone (you can tweak where to apply)
config_path = "cfg/training/yolov7_CA.yaml"
with open(config_path, "r") as file:
    yaml_lines = file.readlines()

# Inject CoordAtt after one of the early layers in backbone
insert_index = next(i for i, line in enumerate(yaml_lines) if "backbone:" in line) + 2
yaml_lines.insert(insert_index, "  [-1, 1, CoordAtt, [64]],\n")

with open(config_path, "w") as file:
    file.writelines(yaml_lines)

print("YOLOv7 config updated with CoordAtt (cfg/training/yolov7_CA.yaml)")


YOLOv7 config updated with CoordAtt (cfg/training/yolov7_CA.yaml)


In [79]:
# Confirm CoordAtt is recognized
!python models/yolo.py --cfg cfg/training/yolov7_CA.yaml

Reversing anchor order


YOLOR  v0.1-128-ga207844 torch 2.5.1 CUDA:0 (NVIDIA GeForce RTX 4070 SUPER, 12281.5MB)


                 from  n    params  module                                  arguments                     
  0                -1  1       102  models.common.coordatt.CoordAtt         [3, 3]                        
  1                -1  1       928  models.common.common.Conv               [3, 32, 3, 1]                 
  2                -1  1     18560  models.common.common.Conv               [32, 64, 3, 2]                
  3                -1  1     36992  models.common.common.Conv               [64, 64, 3, 1]                
  4                -1  1     73984  models.common.common.Conv               [64, 128, 3, 2]               
  5                -1  1      8320  models.common.common.Conv               [128, 64, 1, 1]               
  6                -2  1      8320  models.common.common.Conv               [128, 64, 1, 1]               
  7                -1  1     36992  models.common.commo

Checking for success of label conversion from annotations/xmls folder

In [90]:
import os

label_dir = 'datasets/RDD2022_combined/labels/train'
txt_files = [f for f in os.listdir(label_dir) if f.endswith('.txt')]
print(f"Total label files: {len(txt_files)}")


Total label files: 14488


In [91]:
import os

image_dir = 'datasets/RDD2022_combined/images/train'
label_dir = 'datasets/RDD2022_combined/labels/train'

image_files = [f.split('.')[0] for f in os.listdir(image_dir) if f.endswith(('.jpg', '.png'))]
label_files = [f.split('.')[0] for f in os.listdir(label_dir) if f.endswith('.txt')]

missing_labels = set(image_files) - set(label_files)
print(f"Missing label files for {len(missing_labels)} images.")


Missing label files for 0 images.


In [92]:
val_label_dir = 'datasets/RDD2022_combined/labels/val'
val_label_files = [f.split('.')[0] for f in os.listdir(val_label_dir) if f.endswith('.txt')]

missing_val_labels = set(os.listdir('datasets/RDD2022_combined/images/val')) - set(val_label_files)
print(f"Missing validation label files for {len(missing_val_labels)} validation images.")


Missing validation label files for 0 validation images.


In [101]:
import os
import cv2
import torch
from models.experimental import attempt_load  # For loading YOLOv7 model
from utils.general import non_max_suppression, scale_coords
from utils.torch_utils import select_device

# Load model on CUDA
def load_yolo_model(model_path='yolov7.pt'):
    device = select_device('cuda')  # Ensure we select GPU
    model = attempt_load(model_path, map_location=device)
    model.eval()
    return model, device

# Inference with postprocessing
def get_yolo_detections(image_path, model, device, conf_thres=0.25, iou_thres=0.45):
    # Load and preprocess image
    img0 = cv2.imread(image_path)
    assert img0 is not None, f"Image not found: {image_path}"
    image_height, image_width = img0.shape[:2]

    # Resize and normalize
    img = cv2.resize(img0, (640, 640))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.transpose(2, 0, 1)  # HWC to CHW
    img = torch.from_numpy(img).float() / 255.0
    img = img.unsqueeze(0).to(device)

    # Run inference
    with torch.no_grad():
        pred = model(img)[0]
        detections = non_max_suppression(pred, conf_thres, iou_thres)[0]  # [x1, y1, x2, y2, conf, cls]

    if detections is not None and len(detections):
        detections = detections.clone().detach()
        detections[:, :4] = scale_coords(img.shape[2:], detections[:, :4], img0.shape).round()
    else:
        detections = []

    return detections, image_width, image_height

def postprocess_predictions(detections, image_width, image_height):
    processed = []
    for det in detections:
        x1, y1, x2, y2, conf, cls = det
        x_center = ((x1 + x2) / 2) / image_width
        y_center = ((y1 + y2) / 2) / image_height
        w = (x2 - x1) / image_width
        h = (y2 - y1) / image_height
        processed.append((int(cls.item()), x_center.item(), y_center.item(), w.item(), h.item()))
    return processed

# Save to YOLO text file
def save_yolo_format(label_path, detections):
    with open(label_path, 'w') as f:
        for cls, x, y, w, h in detections:
            f.write(f"{cls} {x:.6f} {y:.6f} {w:.6f} {h:.6f}\n")

# Run on entire val set
def convert_to_yolo_format(model, device):
    val_images_dir = 'datasets/RDD2022_combined/images/val'
    label_dir = 'datasets/RDD2022_combined/labels/val'
    os.makedirs(label_dir, exist_ok=True)

    for image_file in os.listdir(val_images_dir):
        if image_file.lower().endswith(('.jpg', '.jpeg', '.png')):
            image_path = os.path.join(val_images_dir, image_file)
            label_path = os.path.join(label_dir, os.path.splitext(image_file)[0] + '.txt')

            detections, width, height = get_yolo_detections(image_path, model, device)
            if detections is not None and len(detections) > 0:
                detections = postprocess_predictions(detections, width, height)
                save_yolo_format(label_path, detections)

# Load and run
model_path = 'D:/CS512-project/yolov7/yolov7.pt'
model, device = load_yolo_model(model_path)
convert_to_yolo_format(model, device)


Fusing layers... 
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block


In [93]:
import os
import cv2

def save_yolo_format(image_path, detections, label_path):
    # Convert detections to YOLO format (class_id x_center y_center width height)
    with open(label_path, 'w') as f:
        for detection in detections:
            # Example format: class_id x_center y_center width height
            # (You need to adjust the coordinates to be relative to image size)
            class_id, x, y, w, h = detection
            f.write(f"{class_id} {x} {y} {w} {h}\n")

def convert_to_yolo_format():
    # Get all images in the validation folder
    val_images_dir = 'datasets/RDD2022_combined/images/val'
    label_dir = 'datasets/RDD2022_combined/labels/val'
    os.makedirs(label_dir, exist_ok=True)
    
    # Iterate through all detected images and create labels
    for image_file in os.listdir(val_images_dir):
        if image_file.endswith(('.jpg', '.jpeg', '.png')):
            image_path = os.path.join(val_images_dir, image_file)
            label_path = os.path.join(label_dir, image_file.replace(image_file.split('.')[-1], 'txt'))
            
            # Get detections for this image (from the output of pretrained YOLO model)
            # You will need to replace this with the actual detections from YOLO inference
            detections = get_yolo_detections(image_path)  # Your detection function here

            # Save detections as YOLO label file
            save_yolo_format(image_path, detections, label_path)
            
def get_yolo_detections(image_path):
    # Your YOLO model inference code here to get predictions for each image
    return [(0, 0.5, 0.5, 0.3, 0.4)]  # Dummy output, replace with real detection data

# Run the conversion
convert_to_yolo_format()


In [106]:
import os

label_dir = 'datasets/RDD2022_combined/labels/train'
image_dir = 'datasets/RDD2022_combined/images/train'
allowed_classes = set(range(6))  # Adjust if your classes differ

for filename in os.listdir(label_dir):
    if not filename.endswith('.txt'):
        continue

    label_path = os.path.join(label_dir, filename)
    image_path = os.path.join(image_dir, filename.replace('.txt', '.jpg'))

    with open(label_path, 'r') as file:
        lines = file.readlines()

    filtered_lines = [line for line in lines if int(line.split()[0]) in allowed_classes]

    if filtered_lines:
        with open(label_path, 'w') as file:
            file.writelines(filtered_lines)
    else:
        # Delete label and corresponding image if label is empty or invalid
        os.remove(label_path)
        if os.path.exists(image_path):
            os.remove(image_path)


In [95]:
import os

image_dir = 'datasets/RDD2022_combined/images/val'
label_dir = 'datasets/RDD2022_combined/labels/val'

images = [f for f in os.listdir(image_dir) if f.endswith(('.jpg', '.png'))]
labels = [f.replace('.jpg', '.txt').replace('.png', '.txt') for f in images]

missing_labels = [img for img, label in zip(images, labels) if not os.path.exists(os.path.join(label_dir, label))]
if missing_labels:
    print(f"Missing labels for {len(missing_labels)} images: {missing_labels}")
else:
    print("All images have corresponding labels.")


All images have corresponding labels.


Train the model

In [None]:
!python train.py \
  --img 640 \
  --batch 1 \
  --epochs 100 \
  --data data/rdd2022.yaml \
  --cfg cfg/training/yolov7_CA.yaml \
  --weights yolov7.pt \
  --device 0


[34m[1mwandb: [0mInstall Weights & Biases for YOLOR logging with 'pip install wandb' (recommended)
Reversing anchor order

[34m[1mautoanchor: [0mAnalyzing anchors... anchors/target = 4.44, Best Possible Recall (BPR) = 0.9916
Unhandled exception caught in c10/util/AbortHandler.h
00007FFC0546034400007FFC05445230 torch_python.dll!THPGenerator_initDefaultGenerator [<unknown file> @ <unknown line number>]
00007FFCB39619D700007FFCB39619C0 ucrtbase.dll!terminate [<unknown file> @ <unknown line number>]
00007FFC9B811911 <unknown symbol address> VCRUNTIME140_1.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFC9B81218F <unknown symbol address> VCRUNTIME140_1.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFC9B8121E9 <unknown symbol address> VCRUNTIME140_1.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFC9B81401900007FFC9B813F70 VCRUNTIME140_1.dll!_CxxFrameHandler4 [<unknown file> @ <unknown line number>]
00007FFCB64A06FF00007FFCB

YOLOR  v0.1-128-ga207844 torch 2.5.1 CUDA:0 (NVIDIA GeForce RTX 4070 SUPER, 12281.5MB)

Namespace(weights='yolov7.pt', cfg='cfg/training/yolov7_CA.yaml', data='data/rdd2022.yaml', hyp='data/hyp.scratch.p5.yaml', epochs=100, batch_size=1, img_size=[640, 640], rect=False, resume=False, nosave=False, notest=False, noautoanchor=False, evolve=False, bucket='', cache_images=False, image_weights=False, device='0', multi_scale=False, single_cls=False, adam=False, sync_bn=False, local_rank=-1, workers=8, project='runs/train', entity=None, name='exp', exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias='latest', freeze=[0], v5_metric=False, world_size=1, global_rank=-1, save_dir='runs\\train\\exp17', total_batch_size=1)
[34m[1mtensorboard: [0mStart with 'tensorboard --logdir runs/train', view at http://localhost:6006/
[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup

In [118]:
# Check max class index in your labels
import os

path = 'D:/CS512-project/yolov7/datasets/RDD2022_combined/labels/train'  # or val/test
for file in os.listdir(path):
    with open(os.path.join(path, file)) as f:
        for line in f:
            class_id = int(line.strip().split()[0])
            if class_id >= 6:  # Replace with your actual class count
                print(f"Out-of-bounds class {class_id} in file {file}")


In [121]:
import os

NUM_CLASSES = 6  # Make sure this matches your data.yaml

for split in ['train', 'val', 'test']:
    label_dir = f'D:/CS512-project/yolov7/datasets/RDD2022_combined/labels/{split}'
    for file in os.listdir(label_dir):
        with open(os.path.join(label_dir, file)) as f:
            for line in f:
                try:
                    class_id = int(line.strip().split()[0])
                    if class_id >= NUM_CLASSES:
                        print(f"[{split}] Out-of-bounds class {class_id} in file: {file}")
                except:
                    print(f"[{split}] Malformed line in file: {file}")


[val] Out-of-bounds class 7 in file: China_MotorBike_001983.txt
[val] Out-of-bounds class 24 in file: China_MotorBike_001983.txt
[val] Out-of-bounds class 74 in file: China_MotorBike_001984.txt
[val] Out-of-bounds class 61 in file: China_MotorBike_001985.txt
[val] Out-of-bounds class 74 in file: China_MotorBike_001989.txt
[val] Out-of-bounds class 7 in file: China_MotorBike_001998.txt
[val] Out-of-bounds class 7 in file: China_MotorBike_001998.txt
[val] Out-of-bounds class 7 in file: China_MotorBike_002005.txt
[val] Out-of-bounds class 24 in file: China_MotorBike_002009.txt
[val] Out-of-bounds class 10 in file: China_MotorBike_002010.txt
[val] Out-of-bounds class 29 in file: China_MotorBike_002013.txt
[val] Out-of-bounds class 38 in file: China_MotorBike_002013.txt
[val] Out-of-bounds class 10 in file: China_MotorBike_002016.txt
[val] Out-of-bounds class 32 in file: China_MotorBike_002016.txt
[val] Out-of-bounds class 7 in file: China_MotorBike_002024.txt
[val] Out-of-bounds class 7 in

In [124]:
import os

label_dir = 'datasets/RDD2022_combined/labels/val'
image_dir = 'datasets/RDD2022_combined/images/val'
deleted = 0

for file in os.listdir(label_dir):
    path = os.path.join(label_dir, file)
    with open(path) as f:
        lines = f.readlines()
        if any(int(line.split()[0]) > 5 for line in lines):
            os.remove(path)
            image_path = os.path.join(image_dir, file.replace('.txt', '.jpg'))
            if os.path.exists(image_path):
                os.remove(image_path)
            deleted += 1

print(f"Deleted {deleted} image-label pairs with invalid class IDs.")


PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'datasets/RDD2022_combined/labels/val\\China_MotorBike_001983.txt'