<a href="https://colab.research.google.com/github/billzhao1030/KIT315_Project/blob/KIT/KIT315_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Introduction***

<p align="justify">
In this KIT315 project, I developed an machine learning application that can help us detecting and classifying the type of Penguin from images, which would support Antarctic scientific research. There are three types of penguins I need to classify: 
</p>

<ul>
  <li><b><i>Aptenodytes Forsteri (Emperor Penguin)</i></b></li>
  <li><b><i>Aptenodytes Patagonicus (King Penguin)</i></b></li>
  <li><b><i>Pygoscelis Antarciticus (Chinstrap Penguin)</i></b></li>
</ul>
<br>

<p align="justify">
In this jupyter notebook, I will describe the most important tasks I did during the ML application development, these include the data preparation, processing, and analysis, model development, evaluation and selection, and how I apply the model to the new data in general cases. 
</p>
<br>

*Submission details:*
*   *Student Name: Xunyi Zhao (Bill)*
*   *Student ID: 560060*
*   *Date: 04/09/2022*
*   *Environment uses:* [ *Google Colab* ](https://colab.research.google.com/)

# ***Motivation and purpose***

<p align="justify"> 
Penguins, a group of aquatic flightless birds, live almost exclusively in the Southern Hemisphere in Antarctica. Among the 20 living species, the <b>Aptenodytes Forsteri (Emperor Penguin)</b>, <b>Aptenodytes Patagonicus (King Penguin)</b> and <b>Pygoscelis Antarciticus (Chinstrap Penguin)</b> are three of the most common and popular species. 
</p>
<p align="justify"> 
However, due to the global warming issue and other issues, emperor penguins have already been identified as vulnerable or Near Threatened in the lastest <i>IUCN Red List verison 3.1</i>. Sometimes, we may find one or two penguins are isolated on a floating ice (can't find continent or suitable are for living within 10 kilometres around it).
</p>
<p align="justify">  
Thus, to help researchers detect this situation, and potentially help them to classify the species of the penguins automatically, or count the number of penguins in the specific area, I choose this project and wish to develop an appropriate machine learning application.
</p>
<br>
<p align="justify"> 
In summary, the purpose of this application is to <b>detect and then identify the species</b> (only from emperor penguins, king penguins and chinstrap penguins due to the time restriction) of penguins live in Antarctica, <b>and count the number of penguins potentially if possible</b>.
</p>


# ***Data Preparation***



<p align="justify"> 
In this section I will introduce how I collect and annotate the data (using the <a herf="https://roboflow.com/"><i>Roboflow</i></a> tool) which helps achieve good performance of the model development.
</p>


### ***Data Collecting***

<p align="justify"> 
In this project, nearly all of the data is collected from <a herf="https://images.google.com/"><i>Google Image </i></a>. These images listed in the Google image are originally from different resources, like videos, original pictures, or even magazines/reports. 
</p>
<p align="justify"> 
To ensure the quality of the data and helps the model that will be train later achieve good performance, I have already filtered the bad images (for example, image with too many penguins, blurry images, low-resolution images etc.) manually. 
</p>
<p align="justify"> 
What's more, since the purpose of the application is to detect and then identify the species (and count the number potentially), it's important for me to collect the images that contains different numbers of penguins (only 1 penguin or many penguins), clear to see (the higher resolution, the better), in various context and scenario. Besides, the number of samples and penguin instances should also be large enough, and both penguin adults and chicks (they look pretty different) should be included in the dataset.
</p>
<p align="justify"> 
The following are the rules I sticked with to ensure the quality of the data collecting process:
</p>
<ul>
  <li><b><i>If the resolution (image size) of the image is too small, then I won't collect it</i></b></li>
  <li><b><i>If the image ratio is too weird (say 16:3, 1:5, which means it's not very square), then I won't collect it</i></b></li>
  <li><b><i>If the image contains too many penguins, then I won't collect it</i></b></li>
  <li><b><i>If the penguins on the image is drawn by human, or the background is too fake, then I won't collect it</i></b></li>
</ul>
<br>
<p align="justify"> 
Here are some example images (before annotated and processed) I've collected:
</p>

<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1cgxSWK9I5USXEfPSf2NOkVJQ7GCiHZ9y" alt="Chinstrap" style="width: 200px;"/>
<img src="https://drive.google.com/uc?id=1nyeAbl6agnV9uhoUhf_6tLcKpFP_s6Px" alt="Emperor" style="width: 390px;"/>
<img src="https://drive.google.com/uc?id=180cl1GNv_4MLCWhmImD06Re8dfFJoAQ2" alt="King" style="width: 350px;"/>
<img src="https://drive.google.com/uc?id=1d4aCkSi9ySPKayVKAv10GBmCU7uXEHSi" alt="King" style="width: 350px;"/>
<img src="https://drive.google.com/uc?id=1vUM6SZyHKcrqF-MuXvCzPeq9f-ASDNOy" alt="Chinstrap" style="width: 350px;"/>
<img src="https://drive.google.com/uc?id=1fAhFMmvx1aBKmdbCLGg45LL0_yijeKxO" alt="Emperor" style="width: 350px;"/>
</p>


### ***Data annotating***

<p align="justify"> 
To annotate the data that I've collected, I used the <a herf="https://roboflow.com/"><i>Roboflow</i></a> tool. This online tool would help us annotate the data and export them quickly. Each types of penguins can be labeled using bounding bo with different colors, and the example screenshot of the data annotating process is as such (please notice that the reason why there has 6 labels in this example image is that Roboflow can modify the class name when generating dataset - see data processing section for more information): 
</p>
<br>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1QCsvRvtE-pR0ixUSuaaE8f1pSqGAHq7c" alt="annotate" style="width: 800px;"/>
</p>



<p align="justify"> 
The following are the rules I sticked with to ensure the quality of the data annotating process:
</p>
<ul>
  <li><b><i>Make sure the bounding box cover the whole body of each penguin (like mouth, wing etc.)</i></b></li>
  <li><b><i>If the image contains many penguins, and some of the penguins are too small/blurry/not distinctly visible, just annotate the clear penguins that stand in the front *</i></b></li>
  <li><b><i>If only the head of the penguin can be seen, make sure to annotate the head</i></b></li>
  <li><b><i>Annotate the penguin chick as well</i></b></li>
</ul>
<p align="justify"> 
<i>The reason why I ignored some penguins that are too small/blurry/not distinctly visible is that, although as human, we can know that these "things" is emperor penguin or king penguin, but if the model learns many of these "information" from, it would have some problems such as overfitting or can't detect the penguins correctly (treat any similiar blurry things as penguin which is not good for general case). However, it doesn't mean computer couldn't detect the blurry penguins since computer vision is in pixel-level but human is not (means computer can work better than human sometimes in the object detection field). If the application is used for counting the number of penguins in the image, then I should label all the penguins that I can see.</i>
</p>
<p align="justify"> 
<i>Since the main purpose is detect and classify, to balance this situation, I choose ro ignore the blurry penguins in the background as such (see the arrows that point at the blurry areas):</i>
</p>

<br>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1BJlFHlH9bbKqSUfSeO95rQmryQTCz3fI" alt="King" style="width: 530px;"/>
<img src="https://drive.google.com/uc?id=1KdyxvmciJPg7Ngux0vV7BIVrDgmRnNtJ" alt="Emperor" style="width: 400px;"/>
</p>
<br><br>
<p align="justify"> 
Here are some example images (after annotated but before processed):
</p>


<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1m8YcnxmP0P9LJas-WRJAgeclAzC0f_pp" alt="King" style="width: 350px;"/>
<img src="https://drive.google.com/uc?id=1YmHm8Or-YYH048hhOJ1Ww6Q4JYeROwR_" alt="Emperor" style="width: 400px;"/>
<img src="https://drive.google.com/uc?id=17uFFm1BjcBu4qWKhXCwT0gzWvZo7bug-" alt="Chinstrap" style="width: 230px;"/>
</p>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1eHe4QgUJ1AKKIt8OmsS0ThxI0FCKw7i0" alt="Emperor" style="width: 350px;"/>
</p>


# ***Data Processing***

<p align="justify"> 
In this section, I will introduce how I identify if data processing is needed or not, and apply relevant techniques for data pre-processing using the <a herf="https://roboflow.com/"><i>Roboflow</i></a> tool.
</p>


<p align="justify"> 
Data preprocessing is a crucial phase in the machine learning process since the quality of the data and the information that can be extracted from it directly influence how well our model can learn. For this reason, it is crucial that we preprocess the data before introducing it to the model.
</p>
<p align="justify"> 
For this project, we can make use of the <a herf="https://roboflow.com/"><i>Roboflow</i></a> tool to help us process the data. We cans see between the train/test split and generate data, RoboFlow provide another two steps which are the data processing and augmentation.
</p>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1RCqsOdgtTOdr21QjUjg1rPWICpz8iSYF" alt="processing" style="width: 400px;"/>
</p>


## ***Data preprocessing***

<p align="justify"> 
The first steps is the data preprocessing (3), which can help the model decrease training time and increase performance by applying image transformations to all images in this dataset. Roboflow has already given us a few options for data preprocessing, so let's see if these options would provide us some ideas.
</p> 
<p align="justify"> 
Here, we can see there have a few options such as Auto-Orient, Grayscale, Filter Null etc, but our dataset needn't to apply all of it.
</p> 
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1c2AFNtNC1nnmXUauVtfuhzGJ3UCD0vlZ" alt="options" style="width: 400px;"/>
</p>


<h4><b><i>Step 1:</i></b><h4>
<p align="justify"> 
 Since the images I collected don't have the same size (resolution), although in some models like YOLOv5 the image resizing is done automatically, it's still a good practice to resize the image into same size. Thus, to help the model learn better, the Resize option should be chosen as such: 
</p> 
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=11u49exFv0JEd69Gm77r11gGCcBRwsR30" alt="resize" style="width: 400px;"/>
</p>

<h4><b><i>Step 2:</i></b><h4>
<p align="justify"> 
Usually, an image is captured with metadata that specifies how it should be displayed in relation to how the pixels are arranged on disc. This directive, which is stored in the EXIF orientation field, expedites the image encoding process at the time of capture, allowing cameras to efficiently sample data from their sensors without unwelcome artefacts.
</p> 
<p align="justify"> 
This means that most cameras store images' pixels exactly the same whether the camera is oriented in landscape or portrait mode. They just flip a bit to signal to the viewer whether to display the pixels as-is or to rotate them by 90 or 180 degrees when displaying the image.
</p> 
<p align="justify"> 
Unfortunately, this can cause issues if the application displaying the images is unaware of the metadata and naively displays the image without respecting its EXIF orientation. Thus, to help the model learn better, the Auto-Orient option should always be chosen as such: 
</p> 
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1DLvFT1cED3_H53mxk9L4xeL7Wb4SW9h9" alt="auto orient" style="width: 400px;"/>
</p>

<h4><b><i>Step 3:</i></b><h4>
<p align="justify"> 
During the data annotating, I used the scientific name for each type of penguin which is hard for general user to understand. Thus, to help user know what the species are when the model detecting objects, the Modify classes option should be chosen as such: 
</p> 
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1rK77selPJZ6TqVKK2iikr3O7sWFwAd4R" alt="modify class" style="width: 450px;"/>
</p>


## ***Data augmentation***

<p align="justify"> 
The second steps here is the data augmentation (4). Augmentation performs transforms on the existing images to create new variations and increase the number of images in our dataset. This ultimately makes models more accurate across a broader range of use cases. Roboflow has already given us a few options for data augmentation:
</p> 
<p align="justify"> 
Here, we can see there have a few options such as Flip, Brightness (for both image level and bounding box level augmentation) etc, but obviously our dataset needn't to apply all of it. For example, we needn't perform bounding box level augementation since usually penguin would stand in the picture and won't upside down. Besides, something like blur (usually the picture is clear) and grayscale (the picture would always in RGB format) should not be considered. 
</p> 
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=13KYo_b0fid9_IJP67zuQe_LA7wLXsiCi" alt="augmentation options" style="width: 400px;"/>
</p>






<h4><b><i>Step 1:</i></b><h4>
<p align="justify"> 
In the real scenario, the environment we monitor may contain some noise that would affect the predection accuracy of our model. Roboflow can help us add noise to the training data to help our model be more resilient to camera artifects. Thus, the Noise option is one of the most appropriate ways to perform the data augmentation. Hence I choose the Noise option and set it to "up to 3% of pixels" as such:
</p> 
</p> 
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1vsCTKr7UisuO3__HQEWXJWxHg4jtmT3u" alt="noise" style="width: 400px;"/>
</p>

<h4><b><i>Step 2:</i></b><h4>
<p align="justify"> 
In the real scenario, the exposure of the photo/video of the observed environment will be different. Roboflow can help us add variability to image brightness to help your model be more resilient to lighting and camera setting changes. Thus, generating another two sets of images with different exposure is also one of the most appropriate ways to perform the data augmentation. Hence I choose the Exposure option and set it to "Between -15% and +15%" as such:
</p>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1gNhiEkCu3kCYsDTBwT9n9oJ7pl2fnzXY" alt="exposure" style="width: 400px;"/>
</p>

## ***Data Processing summary***

<p align="justify"> 
Here is the summary of the data processing:
</p>
<br>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1DjpW7IlTK8v_XGB_OhoW5GrMi9v_x0JH" alt="summary" style="width: 600px;"/>
</p>

# ***Data Analysis***

In this section, I will perform some analysis of the data, this includes:

*   Number of samples/attributes/size.
*   Is the data balance or not.
*   Are there any missing values?
*   The chanllenges for learning with this data.


## ***Sample size/train-test split***

<p align="justify"> 
After the data collecting and the data processing, there have <b>518</b> images (without augmentation) in the dataset, and the train/validation/test split I use is <b>65% for trainging, 20% for validation, and 15% for testing</b> (See the figure below). 
</p>
<br>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1xi6ORaP-AgIuJiNNROgv5mRmn5ZzKHuV" alt="balance" style="width: 700px;"/>
</p>
<br>


<p align="justify"> 
However, in the data processing phase, I've performed the data augmentation on the training set (generate new data using different exposure and/or noise), so the sample size becomes <b>1188</b>. Hence the actual train/validation/test split is <b>1005 images for training, 103 images for validation, and 80 images for testing</b> (See the figure below).
</p>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1BASQ9SJ5uAZ53AT99kartFdh9SKwY6kH" alt="balance" style="width: 700px;"/>
</p>
<br>
<p align="justify"> 
The size (number of attributes) for each image, as mentioned in previous section, is <b>640</b> after resizing.
</p>
<br>

## ***Class balance***

<p align="justify"> 
To find out if our data class is balance or not, I make good use of the <b><i>dataset health check</i></b> function of Roboflow. By utilising this functionality, I can know many useful information about my dataset.
</p>
<p align="justify"> 
In my dataset, there originally have 518 images (before augmentation), and 2085 annotations (approximately 4 penguins in each image). Besides, the dataset doesn't include the missing values/annotations, and there has 0 null examples as well.
</p>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1WkqW_w0nneBFTxJBrOcpJb3JwUkMr3Zi" alt="balance" style="width: 700px;"/>
</p>
<p align="justify"> 
As we can see, the class is not very balance, with around 700-900 instances for emperor and king penguins, but only 360 instances for chinstrap penguins. The reason for that is the emperor penguin and king penguin are both Aptenodytes (great penguins) and they have very similiar appearance. They only difference is the patterns on their head and their height. Chinstrap penguin, on the other hand, looks obviously different with the other two types of penguins. Thus, to help the model understand more about the emperor and king penguins, the number of instances of them are bigger than the chinstrap penguin. In this way, althought the class is not perfectly balanced, the performance of the model will not be afftected. 
</p>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1cxgNeoJ06TnfYemOZvTptMYeY2aVhIT-" alt="balance" style="width: 700px;"/>
</p>
<br>
<p align="justify"> 
In addtion, in the image we collected, we can see the group size of chinstrap penguin is a lot smaller than great penguins, which is an interesting fact because the number of chinstrap on earth is around 3 million, but only 570 thousands emperor penguin lives in Antarctica.
</p>


## ***Histogram of Object Count by Image***

<p align="justify"> 
The following three images roughly describe the distribution of the object count (number of instances) for each type of penguin in each image. We can see from the histograms that images which contain only 1-3 penguins, and the images which contain many penguins (10 or above), are both included in the datasets. This would help the model to learn better and also allow the model has better performance when it used in the real-life scenarios.
</p>
<br>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1TY87cXz6cIRbLZikA5glg0zXCKmsXr0B" alt="hist-chinstrap" style="width: 600px;"/>
</p>

<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1xxL4fj7LO9keNdyrzJOrpNF4mgQyqkzN" alt="hist-king" style="width: 600px;"/>
</p>

<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1Z_O7dVIe2RicQ-eIccV8_HTajFEJTTxx" alt="emperor" style="width: 600px;"/>
</p>

## ***Chanllenges for learning with this data***

<p align="justify"> 
Through the above data analysis, we can see the dataset is actually appropriate enough for this task. However, there still exists several chanllenges for learning with this data.
</p>


<h4><b><i>Chanllenge 1:</i></b><h4>
<p align="justify"> 
Due to the time and scope restriction, the number of samples are still too small if we want to train a very accurate model. When we apply the model in real-life scenario, the model may not have the ability to detect all the penguins in the image/video.
</p>


<h4><b><i>Chanllenge 2:</i></b><h4>
<p align="justify"> 
There have some images that contains too many penguins. As I mentioned in the data annotation process, I didn't annotate all the penguins in some images if penguins in the back are to hard to see or that area is too blurry like this. 
</p>
<p style="text-align:center;">
<img src="https://drive.google.com/uc?id=1BJlFHlH9bbKqSUfSeO95rQmryQTCz3fI" alt="King" style="width: 400px;"/>
</p>
<p align="justify"> 
This means the model may not be able to identify all the penguins in the image/video if there has too many penguins (say more than 40). This issue can also be identified in the object count histogram in above section (we don't have enough images that have many penguins in the dataset).
</p>

<h4><b><i>Chanllenge 3:</i></b><h4>
<p align="justify"> 
As mentioned before, emperor penguins and king penguins look very similiar. Although the number of instances of these two types of penguin in the dataset are larger than chinstrap penguin, it's still a chanllenge for the model to learn how to distinguish them.
</p>

# ***Model Development***

## ***Model selection motivation***

## ***Preparation before traing models***

In this section, there are **three** models we need to develop:


1.   YOLOv5
2.   Detectron2-Faster RCNN
3.   YOLOv7


Now we need to install the ***Roboflow*** to load the data (in different format)

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
version = 9

## ***YOLOv5 Model***

In [None]:
%cd /content
!git clone https://github.com/ultralytics/yolov5  # clone yolov5
%cd yolov5
%pip install -r requirements.txt  # install

import utils
display = utils.notebook_init() # check

YOLOv5 ðŸš€ v6.2-99-g3cd66b1 Python-3.7.13 torch-1.12.1+cu113 CUDA:0 (Tesla T4, 15110MiB)


Setup complete âœ… (2 CPUs, 12.7 GB RAM, 38.8/78.2 GB disk)


In [None]:
!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="pCjhgO6AaDWtsuKCrLgf")
project = rf.workspace("utas").project("kit315-dequm")
dataset = project.version(version).download("yolov5")

In [None]:
%cd /content/yolov5
!python train.py --batch 16 --epochs 50 --data KIT315-9/data.yaml --weights yolov5m6.pt --cache --cfg yolov5m6.yaml

In [None]:
%load_ext tensorboard
%tensorboard --logdir runs/train

In [None]:
%cd /content/yolov5
!python detect.py --weights ./runs/train/exp/weights/best.pt  --conf 0.3 --source /content/drive/MyDrive/KIT315_Project/Image_demo/penguin.mp4

In [None]:
%cp ./runs/train/exp/weights/best.pt /content/drive/My\ Drive/models/yolov5_tutorial_model

## ***Detectron2-Faster RCNN Model***

In [None]:
!python -m pip install pyyaml==5.1
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

# !pip install -U torch==1.5 torchvision==0.6 -f https://download.pytorch.org/whl/cu101/torch_stable.html 
# !pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
# import torch, torchvision
# print(torch.__version__, torch.cuda.is_available())
# !gcc --version

# !pip install detectron2==0.1.3 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html

In [None]:
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import cv2
import random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
from detectron2.data.catalog import DatasetCatalog

In [None]:
!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="pCjhgO6AaDWtsuKCrLgf")
project = rf.workspace("utas").project("kit315-dequm")
dataset = project.version(version).download("coco")

In [None]:
!mv /content/KIT315-9 /content/KIT315

In [None]:
from detectron2.data.datasets import register_coco_instances
register_coco_instances("my_dataset_train", {}, "/content/KIT315/train/_annotations.coco.json", "/content/KIT315/train")
register_coco_instances("my_dataset_val", {}, "/content/KIT315/valid/_annotations.coco.json", "/content/KIT315/valid")
register_coco_instances("my_dataset_test", {}, "/content/KIT315/test/_annotations.coco.json", "/content/KIT315/test")

In [None]:
#visualize training data
my_dataset_train_metadata = MetadataCatalog.get("my_dataset_train")
dataset_dicts = DatasetCatalog.get("my_dataset_train")

import random
from detectron2.utils.visualizer import Visualizer

for d in random.sample(dataset_dicts, 3):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=my_dataset_train_metadata, scale=0.5)
    vis = visualizer.draw_dataset_dict(d)
    cv2_imshow(vis.get_image()[:, :, ::-1])

In [None]:
from detectron2.engine import DefaultTrainer
from detectron2.evaluation import COCOEvaluator

class CocoTrainer(DefaultTrainer):

  @classmethod
  def build_evaluator(cls, cfg, dataset_name, output_folder=None):

    if output_folder is None:
        os.makedirs("coco_eval", exist_ok=True)
        output_folder = "coco_eval"

    return COCOEvaluator(dataset_name, cfg, False, output_folder)

In [None]:
#from .detectron2.tools.train_net import Trainer
#from detectron2.engine import DefaultTrainer
# select from modelzoo here: https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md#coco-object-detection-baselines

from detectron2.config import get_cfg
#from detectron2.evaluation.coco_evaluation import COCOEvaluator
import os

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml"))
# faster_rcnn_R_101_FPN_3x.yaml
cfg.DATASETS.TRAIN = ("my_dataset_train",)
cfg.DATASETS.TEST = ("my_dataset_val",)

cfg.DATALOADER.NUM_WORKERS = 2 # Max 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 4 # This is the real "batch size" commonly known to deep learning people
cfg.SOLVER.BASE_LR = 0.001


cfg.SOLVER.MAX_ITER = 1200 #adjust up if val mAP is still rising, adjust down if overfit
# cfg.SOLVER.STEPS = (1000, 1500)
# cfg.SOLVER.GAMMA = 0.05
cfg.SOLVER.STEPS = [] # do not decay learning rate


cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 4 #your number of classes + 1

cfg.TEST.EVAL_PERIOD = 500

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = CocoTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

In [None]:
%load_ext tensorboard
%tensorboard --logdir output

In [None]:
#test evaluation
from detectron2.data import DatasetCatalog, MetadataCatalog, build_detection_test_loader
from detectron2.evaluation import COCOEvaluator, inference_on_dataset

cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.3
predictor = DefaultPredictor(cfg)
evaluator = COCOEvaluator("my_dataset_test", cfg, False, output_dir="./output/")
val_loader = build_detection_test_loader(cfg, "my_dataset_test")
inference_on_dataset(trainer.model, val_loader, evaluator)

In [None]:
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.DATASETS.TEST = ("my_dataset_test", )
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.3   # set the testing threshold for this model
predictor = DefaultPredictor(cfg)
test_metadata = MetadataCatalog.get("my_dataset_test")

In [None]:
from detectron2.utils.visualizer import ColorMode
import glob

for imageName in glob.glob('/content/KIT315/test/*jpg'):
  im = cv2.imread(imageName)
  outputs = predictor(im)
  v = Visualizer(im[:, :, ::-1],
                metadata=test_metadata, 
                scale=0.8
                 )
  out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
  cv2_imshow(out.get_image()[:, :, ::-1])

## ***YOLOv7 Model***

In [1]:
!git clone https://github.com/WongKinYiu/yolov7
%cd yolov7
!pip install -r requirements.txt

Cloning into 'yolov7'...
remote: Enumerating objects: 933, done.[K
remote: Counting objects: 100% (88/88), done.[K
remote: Compressing objects: 100% (63/63), done.[K
remote: Total 933 (delta 31), reused 76 (delta 24), pack-reused 845[K
Receiving objects: 100% (933/933), 68.26 MiB | 4.96 MiB/s, done.
Resolving deltas: 100% (456/456), done.
/content/yolov7
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting thop
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Collecting jedi>=0.10
  Downloading jedi-0.18.1-py2.py3-none-any.whl (1.6 MB)
[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1.6 MB 37.4 MB/s 
Installing collected packages: jedi, thop
Successfully installed jedi-0.18.1 thop-0.1.1.post2209072238


In [2]:
!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="pCjhgO6AaDWtsuKCrLgf")
project = rf.workspace("utas").project("kit315-dequm")
dataset = project.version(9).download("yolov7")

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting roboflow
  Downloading roboflow-0.2.14.tar.gz (18 kB)
Collecting certifi==2021.5.30
  Downloading certifi-2021.5.30-py2.py3-none-any.whl (145 kB)
[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 145 kB 33.4 MB/s 
[?25hCollecting chardet==4.0.0
  Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 178 kB 63.2 MB/s 
[?25hCollecting cycler==0.10.0
  Downloading cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting kiwisolver==1.3.1
  Downloading kiwisolver-1.3.1-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB)
[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1.1 MB 59.6 MB/s 
Collecting pyparsing==2.4.7
  Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)


loading Roboflow workspace...
loading Roboflow project...
Downloading Dataset Version Zip in KIT315-9 to yolov7pytorch: 100% [107461134 / 107461134] bytes


Extracting Dataset Version Zip to KIT315-9 in yolov7pytorch:: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2388/2388 [00:01<00:00, 1524.43it/s]


In [None]:
%cd /content/yolov7
!wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt"

In [3]:
%cd /content/yolov7
!python train.py --batch 16 --cfg cfg/training/yolov7.yaml --epochs 60 --data KIT315-9/data.yaml --weights 'yolov7.pt' --device 0

/content/yolov7
YOLOR ðŸš€ v0.1-107-g44d8ab4 torch 1.12.1+cu113 CUDA:0 (Tesla T4, 15109.75MB)

Namespace(adam=False, artifact_alias='latest', batch_size=16, bbox_interval=-1, bucket='', cache_images=False, cfg='cfg/training/yolov7.yaml', data='KIT315-9/data.yaml', device='0', entity=None, epochs=60, evolve=False, exist_ok=False, freeze=[0], global_rank=-1, hyp='data/hyp.scratch.p5.yaml', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, save_dir='runs/train/exp', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=16, upload_dataset=False, weights='yolov7.pt', workers=8, world_size=1)
[34m[1mtensorboard: [0mStart with 'tensorboard --logdir runs/train', view at http://localhost:6006/
[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, wa

In [6]:
!python detect.py --weights runs/train/exp/weights/best.pt --conf 0.1 --source /content/drive/MyDrive/KIT315_Project/Image_demo/penguin.mp4

Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.1, device='', exist_ok=False, img_size=640, iou_thres=0.45, name='exp', no_trace=False, nosave=False, project='runs/detect', save_conf=False, save_txt=False, source='/content/drive/MyDrive/KIT315_Project/Image_demo/penguin.mp4', update=False, view_img=False, weights=['runs/train/exp/weights/best.pt'])
YOLOR ðŸš€ v0.1-107-g44d8ab4 torch 1.12.1+cu113 CUDA:0 (Tesla T4, 15109.75MB)

Fusing layers... 
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
IDetect.fuse
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 314 layers, 36492560 parameters, 6194944 gradients, 103.2 GFLOPS
 Convert model to Traced-model... 
 traced_script_module saved! 
 model is traced! 

video 1/1 (1/791) /content/drive/MyDrive/KIT315_Project/Image_demo/penguin.mp4: 6 Emperor Penguins, Done. (16.0ms) Inference, (1.4ms) NMS
video 1/1 (2/791) /content/drive/MyDrive/KIT315_Project/Image_d

In [None]:
!zip -r export.zip runs/detect
!zip -r export.zip runs/train/exp/weights/best.pt
!zip export.zip runs/train/exp/*

# ***Model Evaluation and Selection***

## ***Train the best model***

<p align="justify"> 
Based on the model evaluation and selection steps mentioned in previous sections, now I will train the best model - TODO, and save it for later use.
</p>

# ***Apply Model***

<p align="justify"> 
In this last section, I will apply the best model I've developed in previous steps to the new data (the samples mainly represent two different scenarios which are <b><i>images</i></b> and <b><i>videos</i></b>).
<p align="justify"> 