**In the Kaggle competition, I was able to achieve a score of .87246:**

![Image](images/Kaggle-leaderboard.png)

In earlier submission i had an accuracy of .56, and .59 but I will go over that later in the sections

# YOLO v7 experiementation

In the past, I have worked with YOLO v4 for object detection tasks, and have heard of it being retrofitted for classification. Last semester, I found out ot that AlexeyAB et al published a new paper on YOLO v7, and I decided I wanted to try and implement this for the assignment.

### Preparing the environment

The first steps I did was to download the YOLO v7 github repository, then I also amde sure to download the cifar10 data set as the competition expected of me. Some of this code is actually from the starting jupyter notebook. I then loaded in the dataset into the torchvision library.

```
# Clone the specified YOLOv7 repository
!git clone https://github.com/WongKinYiu/yolov7

# Import necessary libraries
import torch
import torchvision
import torchvision.transforms as transforms

# Download CIFAR-10 dataset
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
```


### Fromating the data

Unfortunately, I could not leave the data the way it normally is when downloaded with the torch vision dataset method. YOLO v7 requires a specific  data structure:

![Image2](images/folder-structure.png)

The images directory will have the train test split (im doing 90% train, 10% split) of my images. For the labels, YOLO v7 requires the following format for labeling the images: `<class> <x_center> <y_center> <width> <height>`. This needs to be in a txt file for every single train and test image. The outer most txt files have a relative path to each test and train image (YOLO finds the images this way).

to match this file structure, I made the follow code to go through and restructre and save the images to an actual file, rather than keep it loaded as a variable in memory.


```
import os
import torchvision

# Create directory structure inside 'yolov7' but without the 'yolov7' prefix in paths
base_path = "cifar10"
if not os.path.exists(base_path):
    os.makedirs(f"{base_path}/images/train")
    os.makedirs(f"{base_path}/images/val")
    os.makedirs(f"{base_path}/labels/train")
    os.makedirs(f"{base_path}/labels/val")

# Split dataset into training and validation sets
train_size = int(0.9 * len(trainset))
val_size = len(trainset) - train_size
train_subset, val_subset = torch.utils.data.random_split(trainset, [train_size, val_size])

def convert_to_yolo_format(dataset, subset, mode):
    image_paths = []

    for idx in range(len(subset)):
        # Get image and label
        image, label = subset[idx]

        # Save image
        image_path = f"{base_path}/images/{mode}/{idx}.jpg"
        torchvision.utils.save_image(image, image_path)
        image_paths.append(f"cifar10/images/{mode}/{idx}.jpg")  # Relative path for train.txt and val.txt

        # Create annotation in YOLO format
        height, width = image.shape[1:3]
        x_center = width / 2.0
        y_center = height / 2.0
        annotation = f"{label} {x_center/width} {y_center/height} {width/width} {height/height}\n"

        # Save annotation
        with open(f"{base_path}/labels/{mode}/{idx}.txt", "w") as f:
            f.write(annotation)

    return image_paths

# Convert datasets to YOLO format
train_image_paths = convert_to_yolo_format(trainset, train_subset, "train")
val_image_paths = convert_to_yolo_format(trainset, val_subset, "val")

# Generate .txt files listing the paths of training and validation images
with open(f"{base_path}/train.txt", "w") as f:
    f.write("\n".join(train_image_paths))

with open(f"{base_path}/val.txt", "w") as f:
    f.write("\n".join(val_image_paths))

print("Data preparation completed!")

```

### Training

Now that I have the images in the right structure, it was time to start training.

When I was later trying to train, I was getting an error. This error was related to, what i believe, is a bug. If you are using a GPU, then there is a strage ocurance where YOLO is trying to use indices that are on one device (e.g., GPU) to index a tensor that's on another device (e.g., CPU). Both the indices and the tensor being indexed should be on the same device. To combat this, we need ot move the layers over to the GPU:


```
# File path
file_path = "yolov7/utils/loss.py"

# Specific line numbers where you want to insert the content
line_numbers = [1557,1404]


# Content to insert
insert_line = "\t\t\t\t\t\tfrom_which_layer = from_which_layer.to(\"cuda:0\")\n"
insert_line = insert_line.expandtabs(2)

# Read the file
with open(file_path, 'r') as file:
    lines = file.readlines()

# Insert the content at specified line numbers
for line_num in line_numbers:
    lines.insert(line_num - 1, insert_line)  # Adjust for 0-indexing



# Write back to the file
with open(file_path, 'w') as file:
    file.writelines(lines)

```

Then we start the normal training process:

`!cd yolov7 &&python train_aux.py --workers 8 --device 0 --img 128 128 --batch-size 16 --epochs 40 --data cifar10.yaml --cfg yolov7-e6e.yaml --hyp hyp.yaml --weights '' --cache`

there are a few things to note in this command:

- `workers` were set to 8, later on I realized I could have increased this for my computer setup.
- `img` was set to a 128x128 image. Yolov7 is not optimized for 32x32 images, so I set the upscaling to make them 128x128
- `epochs` in this example was set to 40, I found the model to converge  very quickly, so too many epochs were not needed
- `data` was a yaml file that specified the classes,train, val, and test directories. you can see the file under `YOLOv7_files`
- `cfg` was another yaml file that defined a few attributes of the model. The backbone and head were defined here, and I left it as default. THe other big thing to note was the anchors. The really important anchor is the P3/8 and 1/16, where we define the small and medium size bounding boxes that should roughly fit our data (since the images were 32x32 scaled up to 128x128).can be found in the `YOLOv7_files`
- `hyp` was the final yaml file that stored all of my hyperparameters. Most of these were left to default except some.the learning rate was reduced by a full decimal point. since this is a relativly small data set, I reduced the warmup epoch number down to only the first epoch. I also toned down the augmentations done to the images, so translate was set to 0.1, but increased the rotation to 10.0 deg.


I trained the model for 30 epochs, and saved a few of the models at different intervals. Models seemed to converge very fast and plateau within the first 20 epochs.


### Evaluation

the code I used for evaluation was the following (Also in `custom.py` under the `YOLOv7_files` dir):

```
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
half = device.type != 'cpu'  # half precision only supported on CUDA
print(device)

# Load hyperparameters
with open('data/hyp.yaml', 'r') as f:
    hyp = yaml.load(f, Loader=yaml.FullLoader)

weights = 'best.pt'
model = torch.load(weights)['model'].to(device)
if half:
    model.half()  # to FP16

# Get class names (for displaying results)
names = model.module.names if hasattr(model, 'module') else model.names

def get_actual_class(annotation_path):
    """Extract the actual class from the YOLO annotation file."""
    with open(annotation_path, 'r') as f:
        # The class is the first value on the first (and only) line of the YOLO annotation
        return int(f.readline().split()[0])

base_path = "cifar10"
with open(f"{base_path}/val.txt", "r") as f:
    image_paths = [line.strip() for line in f.readlines()]

all_predictions = []
correct_count = 0

for img_path in image_paths:
    img = cv2.imread(img_path)  # Adjust path for reading
    img = letterbox(img, 640, stride=64, auto=True)[0]
    img = transforms.ToTensor()(img).unsqueeze(0).to(device)
    if half:
        img = img.half()

    with torch.no_grad():
        outputs = model(img)

    # Extract the main output tensor
    main_output = outputs[0]

    # Aggregate across the spatial dimensions (i.e., take the mean across the 25500 predictions)
    class_predictions = torch.mean(main_output, dim=1)
    # Average over the spatial dimensions
    class_predictions_avg = torch.mean(class_predictions, dim=[1, 2])

    # Get the predicted class
    predicted_class = torch.argmax(class_predictions_avg, dim=1).item()


    # Get the actual class
    annotation_path = os.path.join(img_path.replace('images', 'labels').replace('.jpg', '.txt'))
    actual_class = get_actual_class(annotation_path)

    # Check if the prediction is correct
    if predicted_class == actual_class:
        correct_count += 1

    print(f"Image: {img_path} | Predicted Class: {names[predicted_class]} | Actual Class: {names[actual_class]}")

accuracy = (correct_count / len(image_paths)) * 100
print(f"Accuracy: {accuracy:.2f}%")

```

I got some pretty horrendous accuracy scores (under .05%). I quickly decided to try another method, and found the YOLO v8 model was finally released.

# YOLO v8

This time around, I decided to try the YOLO v8 model. It was interesting to see how more professionally developed the package was to use. it had better support for working in code, and better (albeit still pretty bad, but actually exists in some form) documentation.

### Initial training and submission

my first step was to create a new folder and run the following code:

```
from ultralytics import YOLO

# Load a model
model = YOLO('yolov8-cls.yaml')

# Train the model
results = model.train(data='cifar10', epochs=100, imgsz=32)
```

The `yolov8-cls.yaml` was a yaml cfg file, similar to YOLO v8 file, but with less fluf and a simpler backbone. I let this train for 15 epochs, and I decided I wanted to try it on the kaggle dataset. I submitted the follwoing code on Kaggle:

```
import torch
import torchvision
import torchvision.transforms as transforms
import pandas as pd
from ultralytics import YOLO
import os
import matplotlib.pyplot as plt
import numpy as np

# Load and normalize the CIFAR-10 dataset for testing
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

test_images = torch.load('/kaggle/input/fall-2023-ist-557-individual-project-ii/test_image.pt')

# Load the pretrained YOLOv8 model
model = YOLO('/kaggle/input/models/last.pt')

# Classes for CIFAR-10
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# Make predictions on the test dataset
predictions = []

for image in test_images:
    # Note: Adjust the following line if the model's prediction format is different
    result = model(image.unsqueeze(0))  # Add batch dimension
    predicted_class = result[0].probs.top1  # Assuming this gives the class index
    predictions.append(classes[predicted_class])

# Visualize some of the test images along with their predicted labels
def imshow(img):
    img = img / 2 + 0.5  # Unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# Display first 4 test images
imshow(torchvision.utils.make_grid(test_images[:4]))
print('Predicted: ', ' '.join(f'{predictions[j]:5s}' for j in range(4)))

# Create a CSV submission file
submission = pd.DataFrame()
submission['label'] = predictions
submission.to_csv("submission.csv", index=True, index_label='id')
```

The code is pretty rudimentary,  but it loads the model's `last.pt` tensor. I was abel to get a score of 56% accuracy.

### Hyperparameter Tuning

after that succesful run, I looked into more hyperparameter tunning. The code for this is the `YOLOv8_files/hyperparam tuning.py` file, but I will also add it here:

```
from ultralytics import YOLO

# Initialize the YOLO model
model = YOLO('yolov8x-cls.pt')

model.tune(data='cifar10', epochs=30, iterations=50, batch=96,workers=16, optimizer='AdamW', plots=False, save=False, val=False)
```

This time, I wanted to try their pre-trained model as a starting point. It is much more complex than the initial yaml file I was using (and was trained on the coco data set im pretty sure). This function will specifically use a mutation technique to try and find the optimal hyper params for this given model. I found that 30 epochs was enough for the model to plateau in training. I messed with the iterations, batch number, and workers to maximize my GPU without causing system instability (I ran out of google co-lab credits to use their gpu's for free, luckily I have a 2080-ti). In total, it took around 10 hours to get through 7 iterations. I decided to cut the tuning off there since I neede time to write and prepare this paper. The following images show the results of the hyper param tuning, and the actual values are in `YOLOv8_files/runs/classify/tune/best_hyperparameters.yaml`.

![Image](images/tune_fitness.png)
![Image](images/tune_scatter_plots.png)



### Re-training with the best hyperparameters.

This is the code I ran (also in `final.py`) to train my final model:

```
model = YOLO('yolov8x-cls.pt')

results = model.train(data='cifar10', epochs=50, imgsz=32, batch=128, workers=16,
                      lr0= 0.00859,
                      lrf= 0.01068,
                      momentum= 0.92692,
                      weight_decay= 0.00046,
                      warmup_epochs= 3.06646,
                      warmup_momentum= 0.8081,
                      box= 6.46683,
                      cls= 0.55668,
                      dfl= 1.53146,
                      hsv_h= 0.01546,
                      hsv_s= 0.85974,
                      hsv_v= 0.44395,
                      degrees= 0.0,
                      translate= 0.06773,
                      scale= 0.49418,
                      shear= 0.0,
                      perspective= 0.0,
                      flipud= 0.0,
                      fliplr= 0.44357,
                      mosaic= 0.9805,
                      mixup= 0.0,
                      copy_paste= 0.0)
```

and I evaluated the model(s) using the following code (I had multiple models that I pulled form different times in the epoch training):

```
accuracy = []

for model in os.listdir('models'):
    if model.endswith(".pt"):
        model = YOLO("models/" + model)

        metrics = model.val(data='./datasets/cifar10/')
        accuracy.append(metrics.top1)

for i in range(len(accuracy)):
    print(f"Model {i} has accuracy {accuracy[i]}")

```

The models folder would look something like this:

![Image](images/models.png)

There are training stats and images in the `YOLOv8_files/runs/classify/train9` folder. After running the evaluation code. I realized that the 25 epoch model performed the best on my test dataset.


### Discovering an error

When I submitted my 25-epoch model, I got an accuracy of 59% on kaggle which confused me. after some de-bugging I realized the following:

The ultralytics package, when downloading the cifar10 dataset, does not allwo for modifications unless you import it as custom data set. So naturally, the images are not normalized when handled by ultralytics. The `test_image.pt` tensor that has the images in kaggle is already normalized.

Since I was crunched for time, I decided the simplest solution was to un-normalize the `test_image.pt` tensor. I feel that in the future I would like to re-train my model on a normalized version of cifar10 and re-submit.


# Actual code

if you want to run this file to train a model for yourself and test it, you cna use the following code. Also, my kaggle submission is in the `Kaggle-upload.ipynb` file (NOTE: that kaggle file wont work unless you change the directories to point to a local folder). There is also a file called `yolo8.py` that has some random code from my learning process.

In [4]:
from ultralytics import YOLO

#load an initial model
model = YOLO('yolov8x-cls.pt')

#train the model to 25 epochs (roughly where mine performed the best)
results = model.train(data='cifar10', epochs=25, imgsz=32, batch=128, workers=16,
                      lr0= 0.00859,
                      lrf= 0.01068,
                      momentum= 0.92692,
                      weight_decay= 0.00046,
                      warmup_epochs= 3.06646,
                      warmup_momentum= 0.8081,
                      box= 6.46683,
                      cls= 0.55668,
                      dfl= 1.53146,
                      hsv_h= 0.01546,
                      hsv_s= 0.85974,
                      hsv_v= 0.44395,
                      degrees= 0.0,
                      translate= 0.06773,
                      scale= 0.49418,
                      shear= 0.0,
                      perspective= 0.0,
                      flipud= 0.0,
                      fliplr= 0.44357,
                      mosaic= 0.9805,
                      mixup= 0.0,
                      copy_paste= 0.0)

ModuleNotFoundError: No module named 'ultralytics'

In [None]:
#test agaisnt the cifar10 dataset's test folder
metrics = model.val(data='./datasets/cifar10/')

print("metrics.top1")