# 1 Learning objectives

This JupyterNotebook aims to learn practice of deep learning for object detection in satellite images, specifically cranes in ports. The contents follow the objectives below.

* Understand the basis of deep learning
* Practice deep learning for object detection with steps of:
    * Train a detection model
    * Evaluate model performce
    * Improve model training
    * Tune a model with new datasets

# 2 Introduction to deep learning for object detection
##  2.1 What is deep learning?

### Introduction to deep learning

The term ‘Artificial Intelligence (AI)’ has been introduced for 50 years and still it is in the global trend. These days machine learning (ML) is interchangeably used for AI as it is one of the most popular and successful sub-branches of AI. Deep learning is a technique that uses multi-layered neural networks, to mimic human-like recognition and try to find the most optimal path to a solution.

### Deep learning for image recognition

Deep Learning has been powerful when it comes to image recognition. Using the neural networks with multiple layers (deep neural networks), these models can automatically learn features and pattern directly from raw image data, significantly outperforming traditional image processing methods.

## 2.2 Benefits

### Application in practice

Deep learning for object detection has a wide range of practical applications, including:
* **Automated surveillance:** Enhancing security systems by accurately detecting and classifying objects in real-time.
* **Autonomous vehicles:** Enabling self-driving cars to recognize and respond to various objects on the road.
* **Medical imaging:** Assisting in the detection of anomalies in medical scans, improving diagnostic accuracy.
* **Industrial automation:** Streamlining manufacturing processes by identifying and categorizing different components.
* **Remote sensing and earth observation:** Deep learning models can analyze satellite and aerial imagery with high precision, improving environmental monitoring and disaster response. For example, they can detect deforestation, urban expansion, and climate change impacts, as well as rapidly identify areas affected by natural disasters such as floods, hurricanes, and wildfires. This facilitates timely decision-making and efficient resource allocation.

## 2.3 Outline of training a detection model

Developing a deep leaning model for object detection involves several key steps:

*  **Dataset preparation:** Collecting and annotating images relevant to the detection task.
*  **Model configuration:** Configure an network to train.
*  **Training:** Feeding the annotated dataset into the model training.
*  **Evaluation:** Assessing the model's performance using accuracy metrics, such as Mean Average Precision (mAP).
*  **Tuning:** Refining the model training through hyperparameter tuning, data augmentation, and other techniques to achieve better performance and generality.


# 3 Prerequisites

### 3.1 Preferred skillsets for the following hands-on practice

#### Recommended readings for in-depth understanding

- [Official documents of MMDetection](https://mmdetection.readthedocs.io/): General information of MMDetection.

* [MMDetection Benchmark and Model Zoo](https://mmdetection.readthedocs.io/en/latest/model_zoo.html): Theoretical background of the models and methods.

### 3.2 System requirements

#### Hardware recommendation

* Processor: A modern multi-core processor, such as an Intel Core i7 (8th generation or newer) or an AMD Ryzen 7 (3rd generation or newer)
* RAM: 16 GiB
* GPU: NVIDIA GeForce GTX 1070 or a more powerful GPU with at least 4 GB of VRAM, preferrably 8 GB.
* NVIDIA driver version: 510 or later

#### Software Installation

**Please skip this section for the installation if you use GLODAL's JupyterHub environment.**

* [Anaconda/ Miniconda](https://docs.anaconda.com/miniconda/miniconda-install/)
* [Install JupyterLab](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html)
* [Install git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)

This notebook demonstrates the steps to finetune a model for crane detection. We start by setting up the environment, preparing the dataset, and then proceed to train and evaluate the model. Working with the proper versions of packages and libraries is essential. The environment can be set up with the following codes. First, we create the environment `craneDetection` where all required packages will be installed.

### Before Installing 
Make sure the current directory is set to home, running "cd".

```
%cd
```

#### Installing Required Packages
First, ensure that you have Miniconda installed. Then, create a new Conda environment and install the necessary libraries:


```bash
# Create a new conda environment
conda create -n craneDetection python==3.10.12

# Activate the newly created environment
conda activate craneDetection

# Install PyTorch and CUDA
conda install -n base -c conda-forge mamba
mamba install -c conda-forge cudatoolkit=11.8 cudnn
pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cu118

# Install MMDetection dependencies
pip install -U openmim==0.3.9
mim install mmengine==0.9.1
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1.0/index.html

# Install additional packages
pip install labelme
pip install -U labelme2coco

# Clone the mmdetection repository and install it
conda install anaconda::git
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -e .

# Install ipykernel
conda install anaconda::ipykernel
python -m ipykernel install --user --name craneDetection --display-name "Crane Detection"

### 3.3 Custom Loss and Hook

As of August 2024, the current version of  **MMDetection (3.3.0)** does not provide validation loss during training, requring a custom hook for this purpose. It needs minor adjustments to libraries and frameworks, including changes to installed packages such as **MMDetection** and **MMEngine**.

- Custom loss functions are essential for training models in tasks like object detection. MMDetection allow for tailored loss functions that address specific needs, enhancing model performance and accuracy. [This paper is useful for theoretical overview of loss functions in deep learning](https://arxiv.org/abs/2307.02694).
- A logger hook is a vital tool for monitoring and recording the training process. It helps track various metrics, including custom losses, enabling better analysis and debugging of the model's performance. Implementing a custom logger hook ensures that all relevant information is captured during training.

To apply the **Custom Hook**, the **MMEngine** library and the **MMDetection** folder must be updated.

* 1. Locate the **MMEngine** path. Since this folder is part of a library, its location is hidden. To access and find the location, use the ```**%pip show mmengine**``` command.

    After running the ```**%pip show mmengine**``` command, the output should be similar to this:
    ```bash
        Name: mmengine
        Version: 0.9.1
        Summary: Engine of OpenMMLab projects
        Home-page: https://github.com/open-mmlab/mmengine
        Author: MMEngine Authors
        Author-email: openmmlab@gmail.com
        License: UNKNOWN
        Location: /opt/conda/lib/python3.10/site-packages
        Requires: addict, matplotlib, numpy, opencv-python, pyyaml, rich, termcolor, yapf
        Required-by: mmcv
        Note: you may need to restart the kernel to use updated packages.
    ```

    Now, locate the ***Location*** line in the output. Copy the location path, for example:
    ```bash
    Location: /opt/conda/lib/python3.10/site-packages
    ```
     Copy the location path, for example:
     ```bash
    /opt/conda/lib/python3.10/site-packages
    ```



    **Note**: The % symbol indicates that this command should be run in a cell or notebook. Use ```pip show mmengine``` for terminal commands, and ```%pip show mmengine``` for cell or notebook commands.


In [None]:
%pip show mmengine

* 2. Once the location is provided, copy the location path and paste it into the variable ***source_path***. This cell will then copy the files that need to be updated. The files to be copied are:
    * ***runtime_info_hook.py***
    * ***logger_hook.py***

    **Note**: The libraries os and shutil are used for joining paths and copying files.

In [None]:
import shutil
import os

# Paste the location obtained from the `%pip show mmengine` command into this variable
source_path = "/opt/conda/lib/python3.10/site-packages" 
destination_path = "./files"  # Create a folder for the files to be copied into


os.makedirs(destination_path, exist_ok=True)  # Create the destination directory if it doesn't exist
runtime_info_hook_path = os.path.join(source_path, 'mmengine', 'hooks', 'runtime_info_hook.py') # Define the full path to the runtime_info_hook.py file
logger_hook_path = os.path.join(source_path, 'mmengine', 'hooks', 'logger_hook.py') # Define the full path to the logger_hook.py file

shutil.copy(runtime_info_hook_path, os.path.join(destination_path, 'runtime_info_hook.py')) # Copy runtime_info_hook.py from the source path to the destination folder
shutil.copy(logger_hook_path, os.path.join(destination_path, 'logger_hook.py')) # Copy logger_hook.py from the source path to the destination folder


3. Inside the folder "files," there should be 2 files: ***runtime_info_hook.py*** and ***logger_hook.py***.

    * Open ***runtime_info_hook.py***. Inside this file, locate the class definition "**class RuntimeInfoHook(Hook)**":
    ```python
    @HOOKS.register_module()
    class RuntimeInfoHook(Hook):
        """A hook that updates runtime information into message hub.

        E.g. ``epoch``, ``iter``, ``max_epochs``, and ``max_iters`` for the
        training state. Components that cannot access the runner can get runtime
        information through the message hub.
        """

        priority = 'VERY_HIGH'
    ```

    * Inside this class, find the **before_val** function. It should be located between lines 120 to 140 and should look like this:
    ```python
    def before_val(self, runner) -> None:
        self.last_loop_stage = runner.message_hub.get_info('loop_stage')
        runner.message_hub.update_info('loop_stage', 'val')
    ```

    * Below this function, add the following **custom function**:
    ```python
    ## ADD THIS CUSTOM FUNCTION
    def after_test_iter(self,
                        runner,
                        batch_idx: int,
                        data_batch: DATA_BATCH = None,
                        outputs: Optional[dict] = None) -> None:
        if outputs is not None:
            #print(outputs)
            try:
                for key, value in outputs.items():
                    runner.message_hub.update_scalar(f'test/{key}', value)
            except Exception as e:
                pass
    ```

    * The ***runtime_info_hook.py*** file should now look like this:
    ```python
    def before_val(self, runner) -> None:
        self.last_loop_stage = runner.message_hub.get_info('loop_stage')
        runner.message_hub.update_info('loop_stage', 'val')
        
    def after_test_iter(self,
                        runner,
                        batch_idx: int,
                        data_batch: DATA_BATCH = None,
                        outputs: Optional[dict] = None) -> None:
        if outputs is not None:
            #print(outputs)
            try:
                for key, value in outputs.items():
                    runner.message_hub.update_scalar(f'test/{key}', value)
            except Exception as e:
                pass
    
    def after_val_epoch(self,
                        runner,
                        metrics: Optional[Dict[str, float]] = None) -> None:
        """All subclasses should override this method, if they need any
        operations after each validation epoch.

        Args:
            runner (Runner): The runner of the validation process.
            metrics (Dict[str, float], optional): Evaluation results of all
                metrics on validation dataset. The keys are the names of the
                metrics, and the values are corresponding results.
        """
        if metrics is not None:
            for key, value in metrics.items():
                if _is_scalar(value):
                    runner.message_hub.update_scalar(f'val/{key}', value)
                else:
                    runner.message_hub.update_info(f'val/{key}', value)
    ```

    * Now that ***runtime_info_hook.py*** is updated, the next step is to update ***logger_hook.py***.
    



4. Next, open the file ***logger_hook.py*** located in the **files** folder.

    * Inside the file, locate the class **class LoggerHook(Hook)**, which should look like this:
    ```python
    @HOOKS.register_module()
    class LoggerHook(Hook):
        """Collect logs from different components of ``Runner`` and write them to
        terminal, JSON file, tensorboard and wandb .etc.

        ``LoggerHook`` is used to record logs formatted by ``LogProcessor`` during
        training/validation/testing phase. It is used to control following
        behaviors:
    ```

    * Next, find the function named **after_test_iter**, which should be between lines 220 and 240.
    ```python
    def after_test_iter(self,
                        runner,
                        batch_idx: int,
                        data_batch: DATA_BATCH = None,
                        outputs: Optional[Sequence] = None) -> None:
        """Record logs after testing iteration.

        Args:
            runner (Runner): The runner of the testing process.
            batch_idx (int): The index of the current batch in the test loop.
            data_batch (dict or tuple or list, optional): Data from dataloader.
            outputs (sequence, optional): Outputs from model.
        """
        if self.every_n_inner_iters(batch_idx, self.interval):
            _, log_str = runner.log_processor.get_log_after_iter(
                runner, batch_idx, 'test')
            runner.logger.info(log_str)
    ```

    * Instead of adding a custom function, replace the existing **after_test_iter** function with the following updated version:
    ```python
    ###REPLACE THE after_test_iter function
    def after_test_iter(self,
                            runner,
                            batch_idx: int,
                            data_batch: DATA_BATCH = None,
                            outputs: Optional[dict] = None) -> None:
            """Record logs after training iteration.

            Args:
                runner (Runner): The runner of the training process.
                batch_idx (int): The index of the current batch in the train loop.
                data_batch (dict tuple or list, optional): Data from dataloader.
                outputs (dict, optional): Outputs from model.
            """

            runner.logger.setLevel(logging.INFO)
            
            if self.every_n_train_iters(
                    runner, self.interval_exp_name) or (self.end_of_epoch(
                        runner.test_dataloader, batch_idx)):
                exp_info = f'Exp name: {runner.experiment_name}'
                runner.logger.info(exp_info)
            if self.every_n_inner_iters(batch_idx, self.interval):
                tag, log_str = runner.log_processor.get_log_after_iter(
                    runner, batch_idx, 'test')
            elif (self.end_of_epoch(runner.test_dataloader, batch_idx)
                and (not self.ignore_last
                    or len(runner.test_dataloader) <= self.interval)):
                tag, log_str = runner.log_processor.get_log_after_iter(
                    runner, batch_idx, 'test')
            else:
                return
            
            runner.logger.info(log_str)
            runner.logger.info(tag)
            runner.visualizer.add_scalars(
                tag, step=runner.iter + 1, file_path=self.json_log_path)
    ```

    * Now that both files have been updated, the next step is to overwrite the existing files in the **mmengine** directory using the path obtained earlier from the command ```%pip show mmengine```.
    ```bash
    "/opt/conda/lib/python3.10/site-packages" 
    ```
    Using this path, replace the old files with the updated ones inside the **mmengine** folder.

    **Note**: Overwriting files inside the **system's environment** is risky because **site-packages** is where Python packages and libraries installed via pip (or other package managers) are stored. Therefore, **administrative permissions** are required when making changes to files within the **Python packages directory**.
    

In [None]:
import shutil
import os

# Define the path to the folder containing the updated files
folder_path = "./files"
# Define the path to the mmengine directory within the system's environment
mmengine_path = "/opt/conda/lib/python3.10/site-packages" 


runtime_info_hook_path = os.path.join(mmengine_path, 'mmengine', 'hooks', 'runtime_info_hook.py') # Define the full path to the runtime_info_hook.py file
logger_hook_path = os.path.join(mmengine_path, 'mmengine', 'hooks', 'logger_hook.py') # Define the full path to the logger_hook.py file


shutil.copyfile(os.path.join(folder_path, 'runtime_info_hook.py'), runtime_info_hook_path) # Overwrite the existing runtime_info_hook.py file in mmengine with the updated version from the files folder
shutil.copyfile(os.path.join(folder_path, 'logger_hook.py'), logger_hook_path) # Overwrite the existing logger_hook.py file in mmengine with the updated version from the files folder


* After running this cell, the hook files within **mmengine** should be updated, integrating the custom hook. The next step is to update the files inside the **mmdetection** folder.

5. Next, update the files in **mmdetection** and create a new file for the custom hook.

    **Note**: If **mmdetection** has not been cloned yet or if the folder is missing (due to skipping the installation or other reasons), use the following command to clone or download **mmdetection** to avoid any errors or missing files.

    Before cloning the mmdetection repository double check the current path using "pwd"
    ```bash
    pwd
    ``` 

    If the current location is inside **mmdetection**, go to the home directory using "cd"
    ```bash
    cd
    ```

    Now proceed on cloning.


    ```python 
    !git clone https://github.com/open-mmlab/mmdetection.git --progress --verbose
    ```
    The --progress --verbose options provide detailed output during the cloning process. Running this command will clone the repository. If the folder already exists, you may see the following message:
    ```python
    fatal: destination path 'mmdetection' already exists and is not an empty directory.
    ```
    This is just a prompt indicating that the folder already exists and won’t cause any issues.

    **Note**: When running this command in a notebook cell, the **!** symbol is required to execute it as a shell command. If you’re running the command in a terminal, omit the **!** symbol.

    * In a notebook cell:
    ```python 
    !git clone https://github.com/open-mmlab/mmdetection.git --progress --verbose
    ```

    * In a terminal:
    ```python 
    git clone https://github.com/open-mmlab/mmdetection.git
    ```


In [None]:
!git clone https://github.com/open-mmlab/mmdetection.git --progress --verbose

Running the cell should output something similar to this:
```python
Cloning into 'mmdetection'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (gzip 3902 to 1906 bytes)
remote: Enumerating objects: 38023, done.
remote: Counting objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 38023 (delta 0), reused 1 (delta 0), pack-reused 38021 (from 1)
Receiving objects: 100% (38023/38023), 63.25 MiB | 16.75 MiB/s, done.
Resolving deltas: 100% (26223/26223), done.
Updating files: 100% (2443/2443), done.
```

Now that **mmdetection** is cloned (or if the folder already exists), the next step is to update the necessary files. This process should be quick.

6. Inside the ***mmdetection*** folder, navigate to  **mmdetection** -> **mmdet** -> **engine** -> **hooks**.

    * Create a new file in this folder called **my_hook.py**. The file should now appear as:
    ```bash
    my_hook.py
    ```

    * Since ***my_hook.py*** is currently empty, we will add a custom logger. This custom hook will call the functions that were updated in ***MMEngine*** earlier. Add the*** following script to the ***my_hook.py*** file:

    ```python
    from mmengine.hooks import Hook
    from mmengine.runner import Runner

    from mmdet.registry import HOOKS

    from mmengine.hooks.logger_hook import LoggerHook
    from mmengine.hooks.runtime_info_hook import RuntimeInfoHook


    @HOOKS.register_module()
    class MyHook(Hook):
        def __init__(self):
            pass

        def val_step(self, model, data, optim_wrapper):
            with optim_wrapper.optim_context(model):
                data = model.data_preprocessor(data, True)
                losses = model(**data, mode='loss')
            parsed_losses, log_vars = model.parse_losses(losses)  
            return log_vars

        def after_train_epoch(self, runner) -> None:
            model = runner.model
            model.eval()  # Set model to evaluation mode
            optim_wrapper = runner.optim_wrapper
            dataloader = runner.test_dataloader
            for hook in runner._hooks:
                if isinstance(hook, LoggerHook):
                    logger = hook
                elif isinstance(hook, RuntimeInfoHook):
                    runtimeinfo = hook

            for i, data in enumerate(dataloader):
                outputs = self.val_step(model, data, optim_wrapper)
                # Ensure that the methods exist and are called with the correct arguments
                if hasattr(runtimeinfo, 'after_test_iter'):
                    getattr(runtimeinfo, 'after_test_iter')(runner, None, None, outputs)
                if hasattr(logger, 'after_test_iter'):
                    getattr(logger, 'after_test_iter')(runner, i+1, outputs)
    ```

    * Now that the custom hook file is complete, the final step is to update the initialization file, **init.py**.


7. Navigate to the ***mmdetection*** path: **mmdetection** -> **mmdet** -> **engine** -> **hooks**. Inside this folder, there should be a file named **__init__.py**. Open this file.

    * After openning the file is should like this.
    ```python
        # Copyright (c) OpenMMLab. All rights reserved.
    from .checkloss_hook import CheckInvalidLossHook
    from .mean_teacher_hook import MeanTeacherHook
    from .memory_profiler_hook import MemoryProfilerHook
    from .num_class_check_hook import NumClassCheckHook
    from .pipeline_switch_hook import PipelineSwitchHook
    from .set_epoch_info_hook import SetEpochInfoHook
    from .sync_norm_hook import SyncNormHook
    from .utils import trigger_visualization_hook
    from .visualization_hook import (DetVisualizationHook,
                                    GroundingVisualizationHook,
                                    TrackVisualizationHook)
    from .yolox_mode_switch_hook import YOLOXModeSwitchHook

    __all__ = [
        'YOLOXModeSwitchHook', 'SyncNormHook', 'CheckInvalidLossHook',
        'SetEpochInfoHook', 'MemoryProfilerHook', 'DetVisualizationHook',
        'NumClassCheckHook', 'MeanTeacherHook', 'trigger_visualization_hook',
        'PipelineSwitchHook', 'TrackVisualizationHook',
        'GroundingVisualizationHook'
    ]
    ```

    * Below the line:
    ```python
    from .yolox_mode_switch_hook import YOLOXModeSwitchHook
    ``` 

    * Add the following import statement to include the custom hook you created earlier (**my_hook.py**). This ensures that the custom hook can be called in the configuration without causing errors:
    ```python
    from .my_hook import MyHook
    ``` 

    * Next, append '**MyHook**' to the __ all __ list. This makes the custom hook available for import when the module is loaded.
    * The updated file should look like this:
    ```python
        # Copyright (c) OpenMMLab. All rights reserved.
    from .checkloss_hook import CheckInvalidLossHook
    from .mean_teacher_hook import MeanTeacherHook
    from .memory_profiler_hook import MemoryProfilerHook
    from .num_class_check_hook import NumClassCheckHook
    from .pipeline_switch_hook import PipelineSwitchHook
    from .set_epoch_info_hook import SetEpochInfoHook
    from .sync_norm_hook import SyncNormHook
    from .utils import trigger_visualization_hook
    from .visualization_hook import (DetVisualizationHook,
                                    GroundingVisualizationHook,
                                    TrackVisualizationHook)
    from .yolox_mode_switch_hook import YOLOXModeSwitchHook
    from .my_hook import MyHook

    __all__ = [
        'YOLOXModeSwitchHook', 'SyncNormHook', 'CheckInvalidLossHook',
        'SetEpochInfoHook', 'MemoryProfilerHook', 'DetVisualizationHook',
        'NumClassCheckHook', 'MeanTeacherHook', 'trigger_visualization_hook',
        'PipelineSwitchHook', 'TrackVisualizationHook',
        'GroundingVisualizationHook', 'MyHook'
    ]
    ```


    * Now, the ***Custom Hook*** is fully integrated and ready to be used in your configuration without any issues.



# 4 Preparation of training dataset
## 4.1 Introduction
###  Formats for training datasets

Model training needs training datasets, which are pairs of image data and label data. The datasets should be very accurate. Human visual interpretation of sampled image data is often employied for labeling image data.

![Example of training data](https://cdn.prod.website-files.com/5d7b77b063a9066d83e1209c/6349238b2269c6b312ce24ca_image10.webp)

Source: [Labeling with LabelMe: Step-by-step Guide](https://www.v7labs.com/blog/labelme-guide)

### Steps to prepare training datasets 

Here are steps to prepare training datasets with satellite images. You can skip this step since training dataset for this hands-on is already provided.

1.	Locating Harbor Scene:
  *  Search for the harbor location in Google Earth with the extent
  *  Set the eye altitude at 900m and cover the harbor locatio.
  *  Export the scene in, including the harbor, as an image file. The guide for exporting the scene from Google Earth is well documented [here](https://glodal.sharepoint.com/:w:/s/GLODAL/EdXDpdTBwohKrJKTMi2QVEMBM2QvPrLwyH8eI6mcDM2owg?e=AqXyeG).

2.	Annotating Cranes
  *  Open the exported image in the ‘labelme’ tool
  *  Use the ‘Create Polygons’ feature to draw the borders around target objects, such as cranes, in the image
  *  Once the crane annotation is complete, label the polygon as target names, such as “crane”
  *  Save the annotation, which will be stored in JSON format in the same folder as the image.

3.	Converting Annotations to COCO Format
  *  Use the labelme2coco package to convert the JSON annotations from labelme to the COCO format
  *  Apply the convert method provided by the labelme2coco to perform the conversion.
	
This process ensures that the annotations are correctly labeled and converted to a standard coco data format for further analysis in object detection tasks.


### 4.1 Download Dataset

To download the demonstration dataset, call 'wget'. The dataset is already processed and ready to be used for training.

In [None]:
!wget -O demo_dataset.zip http://owncloud-http/owncloud/index.php/s/S38aljMHL47rax2/download #Dataset class instances train:21, test:9   

After downloading the dataset, it needs to be unzipped to fully use it for model training, validation, and testing.

In [None]:
import shutil
shutil.unpack_archive("./demo_dataset.zip", "./demo_dataset", "zip") 

**Note**: Data split and conversion are required only if the dataset has not yet been processed (split) and converted into COCO format. In this demo, the dataset has already been processed.

## 4.2 Data Split (optional)

Training datasets should be split into two groups, training set and validation set. "Training" is dataset for iterative process of model training whereas "testing" is for evaluating trained models at every iteration. Strict model training sets three groups, training set, validation set, and test set. [See this article for a comprihensive explanation about splitting training datasets](https://mlu-explain.github.io/train-test-validation/).

**NOTE**: This hands-on uses "test" though those are validation sets. The validation and test sets are often mixed in practice.

The dataset is split by random sampling. The paths of all image files are collected and shuffled to ensure randomness. The dataset is then divided into 70% for the training set and 30% for the validation set, ensuring that the training and validation sets are randomly selected and non-overlapping. Once the images are annotated using LabelMe, the data folder will contain .tif image files along with .json files consisting of annotations. The script takes all the images from the ./data folder, splits the data into a 70:30 ratio between train and test sets, and places these sets inside the data_converted_to_coco folder, maintaining the original pairing between image files (.tif) and their corresponding annotation files (.json).

The cell below is to process splitting, followed by converting them to [COCO format](https://cocodataset.org/#home), which is a popular format to train models for image recognition. This is optional because the downloaded dataset files have already been processed.

In [None]:
import os
import glob
import random
import math
import shutil

orig_path = "./data"
to_path = "./data_converted_to_coco"
os.makedirs(os.path.join(to_path, "train"), exist_ok=True)
os.makedirs(os.path.join(to_path, "test"), exist_ok=True)

path_ = glob.glob(os.path.join(orig_path, "*.tif"))

random.shuffle(path_)
split_index = math.ceil(len(path_) * 0.7)

list_A = path_[:split_index]
list_B = path_[split_index:]

for x in list_A:
    img_path = x
    json_path = x.replace(".tif", ".json")

    shutil.copy(img_path, os.path.join(to_path, "train", os.path.basename(img_path)))
    shutil.copy(json_path, os.path.join(to_path, "train", os.path.basename(json_path)))

for x in list_B:
    img_path = x
    json_path = x.replace(".tif", ".json")

    shutil.copy(img_path, os.path.join(to_path, "test", os.path.basename(img_path)))
    shutil.copy(json_path, os.path.join(to_path, "test", os.path.basename(json_path)))

Now we convert the splitted data to COCO format using labelme2coco as:

In [None]:
import labelme2coco

labelme2coco.convert('./data_converted_to_coco/train','./data_converted_to_coco/train.json/')
labelme2coco.convert('./data_converted_to_coco/test','./data_converted_to_coco/test.json/')

# 5 Experiments

To ensure that all required dependencies are correctly installed and configured before proceeding with further tasks, such as model training or fine-tuning, let's run the following command. This script collects and displays the versions of important packages to verify that the environment is set up correctly.

In [None]:
from mmengine.utils import get_git_hash
from mmengine.utils.dl_utils import collect_env as collect_base_env

import mmdet
import os


def collect_env():
    """Collect the information of the running environments."""
    env_info = collect_base_env()
    env_info['MMDetection'] = f'{mmdet.__version__}+{get_git_hash()[:7]}'
    return env_info


if __name__ == '__main__':
    for name, val in collect_env().items():
        print(f'{name}: {val}')

## 5.1 Configure model training

Selecting a backbone is practically crucial for model training, as it determines complexity and computing cost. [The recent versions of MMDetection provides a variety of backbones for users to experiment with](https://mmdetection.readthedocs.io/en/latest/model_zoo.html). Loading and adjusting the configuration is necessary to meet specific objectives.

In this hands-on, we use Faster R-CNN with ResNet101. This is rather simple than the recent models, so good for initial practices.

The settings to configure include:

* Path to the dataset
* Model configurations 
* Training hyperparameters, such as learning rate, batch size, number of epochs, etc.
* Data augmentation
* Optimizer configurations

When the config file is edited, use the mmdetection/tools/train.py script provided by MMDetection to start training. This script takes the configuration file as input and handles the training process. During training, the script saves checkpoints at specified intervals, allowing for resuming training or evaluating the model at different stages. The training process can be monitored using command-line outputs and log files. MMDetection also supports TensorBoard for visualizing training metrics like loss and accuracy. Hyperparameters can be adjusted if necessary based on the observed training behavior.

To clone the mmdetection repository to your local machine, run the following command in your terminal:

In [None]:
!cd ~  # Optional: Ensures you're in the home directory
!git clone https://github.com/open-mmlab/mmdetection.git

This command will download all the files from the mmdetection repository to a directory named mmdetection on your local machine, enabling you to start using and modifying the code right away.

To load the configuration correctly within mmdetection/configs, you need to import the configuration module:

In [None]:
from mmengine.config import Config

* Loading the configuration and outputting its structure is necessary to understand the default configuration of the selected backbone and identify the variables that need to be changed.

    To find a config path, e.g., Faster R-CNN:
When the config file is edited, use the mmdetection/tools/train.py script provided by MMDetection to start training. This script takes the configuration file as input and handles the training process. During training, the script saves checkpoints at specified intervals, allowing for resuming training or evaluating the model at different stages. The training process can be monitored using command-line outputs and log files. MMDetection also supports TensorBoard for visualizing training metrics like loss and accuracy. Hyperparameters can be adjusted if necessary based on the observed training behavior.


To load the configuration correctly within mmdetection/configs, you need to import the configuration module like below. The file names indicates models and backboens. You may choose a pre-defined model architecture from the MMDetection model zoo, such as RPN, Faster R-CNN, Mask R-CNN, RetinaNet, etc. MMDetection provides config files for various pre-trained models. You can edit the config file to match the dataset and training settings though default configuration usually works well.

In [None]:
from mmengine.config import Config

cfg = Config.fromfile('./mmdetection/configs/faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py')
print(cfg.pretty_text)

### 5.1.1 Setup model configurations

After loading and outputting the configuration, there are three main parts that are important to check, update, and double-check:
1. Path to data files and config files.
2. Model training parameters such as batch size, number of classes, type of model, and losses. These are essential for fine-tuning but you should check before training.
3. Train, validation, and test dataloaders, including the pipeline responsible for the datasets used during training, evaluation, and testing.

Set the important keys or data in the config dictionary, such as dataset root, output model, and other relevant parameters:

In [None]:
cfg.data_root = './demo_dataset/'  # data_root means the path of the dataset that will be used.
cfg.dataset_type = 'CocoDataset'   # type of the dataset, indicating its structure. Mostly it's CocoDataset.

# Set auto scaling of learning rate parameters:
#   - base_batch_size: Batch size used as a base for scaling.
#   - enable: Flag to enable auto scaling of learning rate.
cfg.auto_scale_lr = dict(base_batch_size=16, enable=True)  # auto_scale will auto set its batch size based on the selected batch size of the dataloaders.
cfg.backend_args = None  # Configure backend arguments.

cfg.work_dir = './model'  # Output path of the model.

print(cfg.pretty_text)

To ensure that the previous output is now updated and to reduce the risk of errors during training, double-check the keys that were updated. Here’s how you can verify the updates: 

In [None]:
print(f"Data root: {cfg.data_root}")
print(f"Dataset type: {cfg.dataset_type}")
print(f"Auto scale LR: {cfg.auto_scale_lr}")
print(f"Backend args: {cfg.backend_args}")
print(f"Work directory: {cfg.work_dir}")

### 5.1.2 Model Setup
Choose a pre-defined model architecture from the MMDetection model zoo, such as RPN, Faster R-CNN, Mask R-CNN, RetinaNet, etc. MMDetection provides config files for various pre-trained models. Edit the config file to match the dataset and training settings.

Default configuration mostly works well, but you should pay attention to some key parameters. For example, the number of classesatch the class number of the dataset used. While updating the loss and other keys is not always necessary, basic knowledge of the model structure is required if changes are needed. For this example, Faster R-CNN is used, but there are diverse backbones that can be employed.

First, check the model structure to find the keys that should be updated and verify the default values to ensure the model meets your objectives.

In [None]:
print(cfg.model)  # Print the model descriptions

Here are examples of how to update some key parameters:

In [None]:
# Update backbone parameters
cfg.model['backbone']['depth'] = 101 # change FasterRCNN backbone depth
cfg.model['backbone']['init_cfg']['checkpoint'] = 'torchvision://resnet101' # path/to/custom/pretrained.pth; Specify custom pretrained weights
cfg.model['backbone']['init_cfg']['type'] = 'Pretrained' # specifies how the model should be initialized or where it should load its initial weights from
cfg.model['backbone']['norm_cfg']['requires_grad'] = False  # Disable gradient updates for normalization layer
# Update neck parameters
cfg.model['neck']['in_channels'] = [256, 512, 1024, 2048]  # Add an additional stage with 4096 input channels
cfg.model['neck']['out_channels'] = 256  # Increase output channels to 512

# Update ROI Head parameters
cfg.model['roi_head']['bbox_head']['loss_bbox']['loss_weight'] = 1.0  # Increase weight for bounding box regression loss
cfg.model['roi_head']['bbox_head']['num_classes'] = 2  # Change number of classes to 2

In this example, the number of classes is updated. The model structure is now ready to be used.

### 5.1.3 Data Loading
Properly loading and configuring the dataset ensures that the model receives the data in the right format and structure, allowing for effective learning and evaluation during the training, validation, and testing phases. Proper augmentation, batch size, and workers are necessary to ensure the process is error-free and efficient, preventing issues like insufficient memory or incorrect model predictions due to poor augmentation combinations.

 1. **Training Dataloader Setup**: Proper augmentation, batch size, and workers are necessary to ensure the process is error-free and efficient, preventing issues like insufficient memory or incorrect model predictions due to poor augmentation combinations.
 
    First, print the current structure of the training dataloader and its training configuration to check defaults and settings for epochs and validation intervals.
    

In [None]:
print(cfg.train_dataloader)  # Print current training dataloader structure
print(cfg.train_cfg)  # Print training configuration (epochs, validation interval)

* **Train pipeline** : The augmentations the dataset will undergo before training. Proper augmentation results in better model performance. Proper augmentation results in better model performance. [See also the official document for details](https://mmdetection.readthedocs.io/en/latest/advanced_guides/transforms.html?highlight=train_pipeline#design-of-data-transforms-pipeline).

In [None]:
cfg.train_pipeline = [
    dict(backend_args=None, type='LoadImageFromFile'),  # Load image from file
    dict(type='LoadAnnotations', with_bbox=True),  # Load annotations with bounding boxes
    #dict(type='CachedMosaic', img_scale=(1024, 1024), pad_val=114.0),  # Cached Mosaic augmentation to reduce memory consumption
    dict(type='Resize', keep_ratio=True, scale=(1333, 800)),  # Resize images to the target size
    dict(type='RandomFlip', prob=0.5),  # Apply random horizontal flip with 50% probability
    #dict(
    #    type='CachedMixUp',
    #    img_scale=(1024, 1024),  # Cached MixUp augmentation
    #    ratio_range=(1.0, 1.0),
    #    max_cached_images=20,
    #    pad_val=(114, 114, 114)),
    dict(type='PackDetInputs'),  # Pack inputs for detection
]

Next is the **Train Dataloader** setup, updating batch size for memory efficiency, setting the dataset's training JSON file, and configuring the pipeline to avoid redundant code.

In [None]:
# Update batch size for the training dataloader
cfg.train_dataloader['batch_size'] = 1  # Set batch size to 2
cfg.train_dataloader['num_workers'] = 1

# Set dataset type
cfg.train_dataloader['dataset']['type'] = cfg.dataset_type

# Configure training epochs and validation interval
cfg.train_cfg['max_epochs'] = 100  # Maximum number of training epochs
cfg.train_cfg['val_interval'] = 100  # Interval (in epochs) for validation

# Set the annotation file for training dataset
cfg.train_dataloader['dataset']['ann_file'] = 'train.json'  # Path to training annotations

# Initialize metainfo dictionary if not present
cfg.train_dataloader['dataset']['metainfo'] = {}  
cfg.train_dataloader['dataset']['metainfo']['classes'] = ('gantry_crane', 'standby_gantry_crane')  # Set classes

# Set data root directory and image prefix
cfg.train_dataloader['dataset']['data_root'] = cfg.data_root  # Root path for dataset
cfg.train_dataloader['dataset']['data_prefix']['img'] = "train/images"  # Path to training images

# Assign the train pipeline
cfg.train_dataloader['dataset']['pipeline'] = cfg.train_pipeline  # Apply training pipeline

 2. **Validtion and Test Dataloader Setup**: Validation and Test Dataloader Setup: Both should be configured similarly, as the validation dataloader is used during the model's validation phase, and the test dataloader is used during model evaluation, such as running the test.py script and for generating a confusion matrix.

    The reason for minimal augmentation is to match real-world images and scenarios.
      * Validation Dataloader: Print its default structure before updating.
    

In [None]:
print(cfg.val_dataloader)  # Print current validation dataloader structure

Like the training dataloader, the validation dataloader uses the default pipeline with updated dimensions to match the dataset used.

In [None]:
cfg['val_pipeline'] = cfg.test_pipeline
cfg.val_pipeline = [
    dict(type='LoadImageFromFile', backend_args=cfg.backend_args),  # Load image from file
    dict(type='Resize', scale=(1333, 800), keep_ratio=True),  # Resize images to 1024x1024 to match the demo dataset
    dict(type='LoadAnnotations', with_bbox=True),  # Load annotations with bounding boxes
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')
    )  # Pack input metadata for detection
]

Similar to the training dataloader structure, update the JSON and classes, then print to ensure correctness.

In [None]:
# Configuration for loading validation data during training
cfg.val_dataloader['batch_size'] = 1  # Update the batch size for the validation dataloader
cfg.val_dataloader['num_workers'] = 1
cfg.val_dataloader['dataset']['type'] = cfg.dataset_type  # Set dataset type

cfg.val_dataloader['dataset']['ann_file'] = 'test.json'  # Path to validation annotations

cfg.val_dataloader['dataset']['metainfo'] = {}  # Initialize metainfo if not present
cfg.val_dataloader['dataset']['metainfo']['classes'] = ('gantry_crane', 'standby_gantry_crane')  # Set classes

cfg.val_dataloader['dataset']['data_root'] = cfg.data_root  # Root path for dataset
cfg.val_dataloader['dataset']['data_prefix']['img'] = "test/images"  # Path to validation images

cfg.val_dataloader['dataset']['pipeline'] = cfg.val_pipeline  # Apply validation pipeline

#For the val evaluator
cfg.val_evaluator['ann_file'] = os.path.join(cfg.data_root, "test.json")

print(cfg.val_dataloader)  # Print updated validation dataloader structure
print(cfg.val_evaluator)

**Note**: The JSON file is set to test.json instead of val.json because the demo dataset does not contain validation data. However, this can be changed if validation data is available.

* **Test Dataloader**: The test dataloader is used during model evaluation, such as running the test.py script and generating a confusion matrix.

    First, set up the test pipeline:

In [None]:
cfg.test_pipeline = [
    dict(type='LoadImageFromFile', backend_args=cfg.backend_args),  # Load image from file
    dict(type='Resize', scale=(1333, 800), keep_ratio=True),  # Resize images to 1024x1024 to match the demo dataset
    dict(type='LoadAnnotations', with_bbox=True),  # Load annotations with bounding boxes
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')
    )  # Pack input metadata for detection
]

Configure the test dataloader with necessary updates:

In [None]:
# Configuration for testing dataset for confusion matrix and model evaluation after training
cfg.test_dataloader['batch_size'] = 1  # Update the batch size for the test dataloader
cfg.test_dataloader['num_workers'] = 1
cfg.test_dataloader['dataset']['type'] = cfg.dataset_type  # Set dataset type

cfg.test_dataloader['dataset']['ann_file'] = 'test.json'  # Path to test annotations

cfg.test_dataloader['dataset']['metainfo'] = {}  # Initialize metainfo if not present
cfg.test_dataloader['dataset']['metainfo']['classes'] = ('gantry_crane', 'standby_gantry_crane')  # Set classes

cfg.test_dataloader['dataset']['data_root'] = cfg.data_root  # Root path for dataset
cfg.test_dataloader['dataset']['data_prefix']['img'] = "test/images"  # Path to test images

cfg.test_dataloader['dataset']['pipeline'] = cfg.test_pipeline  # Apply test pipeline

cfg.test_evaluator['ann_file'] = os.path.join(cfg.data_root, "test.json")  # Set annotation file for the evaluator

### 5.1.4 Setup custom hook
#### Custom Hook
The configuration is already set; now it's time to add the custom hook created earlier.
* Define custom_hooks with MyHook.

In [None]:
cfg['custom_hooks'] = [dict(type='MyHook')]

Now that the configuration is done, the next step is to save it. MMDetection (3.3.0) config library has a dump feature to save it. This will save to the model output path that was set earlier.

In [None]:
os.makedirs(cfg.work_dir, exist_ok = True)
cfg.dump(os.path.join(cfg.work_dir, "config.py"))

## 5.2 Run the model training


To start model training, run the following script:
```
python mmdetection/tools/train.py {config path}
```  
This command will initiate the model training process according to the settings specified in the configuration file, continuing until the specified number of epochs is reached.

In [None]:
!python mmdetection/tools/train.py "./model/config.py"

## 5.3 Model evaluation

### 5.3.1 Evaluating detection performance

Mean Average Precision (mAP) is a metric used to measure the performance of a model for tasks such as object detection tasks and information retrieval. It is is a widely used for evaluating the performance of object detection models. It summarizes the precision-recall curve and provides a single number representing the overall performance of the model. [You may find comprehensive details with this article.](https://towardsdatascience.com/what-is-map-understanding-the-statistic-of-choice-for-comparing-object-detection-models-1ea4f67a9dbd)


* Precision: The ratio of true positive detections to the total number of detections (true positives + false positives).
* Recall: The ratio of true positive detections to the total number of ground truth instances (true positives + false negatives).
* Average Precision (AP): The area under the precision-recall curve for a single class. It is computed by taking the average of precision values at different recall levels.
* Mean Average Precision (mAP): The mean of APs across all classes. It gives an overall performance measure of the detection model across different object categories.

In the context of object detection, a model's performance is often reported using mAP at different Intersection over Union (IoU) thresholds (e.g., `mAP@0.5`, `mAP@0.75`, `mAP@[0.5:0.95]`).

In [None]:
import json
import matplotlib.pyplot as plt

#change the path, folder log differ everytime.
with open('./model/20240724_121658/vis_data/20240724_121658.json', 'r') as file:
    lines = file.readlines()

train_losses = []
val_losses = []
current_epoch = None
line_number = 0

for line in lines:

    entry = json.loads(line.strip())
    if 'epoch' in entry:
        train_losses.append((entry['epoch'], entry['iter'], entry['loss_bbox']))
        current_epoch = entry['epoch']
    elif 'loss' in entry:
        val_losses.append((current_epoch, entry['iter'], entry['loss_bbox']))

train_epochs, train_iters, train_loss_values = zip(*train_losses)
if val_losses:
    val_epochs, val_iters, val_loss_values = zip(*val_losses)
    plt.plot(val_epochs, val_loss_values, label='Validation Loss')

plt.plot(train_epochs, train_loss_values, label='Train Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss (loss_bbox)')
plt.title('Detection Learning Curve for Crane')
plt.legend()
plt.grid(True)
plt.show()

### 5.3.2 Evaluating classification performance
Confusion Matrix is a valuable tool for evaluating the performance of a classification model, including object detection models. It provides a detailed breakdown of the model's predictions, which helps in understanding the strengths and weaknesses of the model. The matrix consists of four categories as below. [You may find comprehensive details with this article.](https://en.wikipedia.org/wiki/Confusion_matrix)

* **True Positives (TP)**: The model correctly identifies an object that is present.
* **False Positives (FP)**: The model incorrectly identifies an object that is not present (**false alarm**).
* **True Negatives (TN)**: The model correctly identifies the absence of an object (**background)**.
* **False Negatives (FN)**: The model fails to identify an object that is present.

In object detection, the Confusion Matrix may include a background class even if it’s not explicitly defined. This is because the model needs to distinguish between objects and non-objects (**background)**. The matrix helps assess how well the model detects objects and handles background areas.

Before creating a confusion matrix in MMDetection, a pickle file must be generated using **test.py**, which will test the model and use the dataloader (**test_dataloader**) to create the pickle. This pickle file is then used for generating the confusion matrix. Running the following command will generate a .pkl file necessary for the confusion matrix:

In [None]:
#python mmdetection/tools/test.py {config_path} {model_path} --out {pickle_output_path}

#example code
!python ./mmdetection/tools/test.py "./model/config.py" "./model/epoch_100.pth" --out "./result.pkl"

Since MMDetection’s confusion matrix has not been updated to handle custom outputs, a custom script is used for model evaluation. This script will produce a plot image that provides a detailed assessment of the model’s performance. The plot will include:

* Classes and background: A visual representation of detected classes and background.
* Overall Count of ground truth per Class: The total number of true instances for each class.
* Count of incorrect background detections: Instances where the model incorrectly identified the background.

This approach ensures an accurate evaluation of the model by highlighting both correct detections and areas of error.

In [None]:
from mmengine.fileio import load
from mmdet.utils import replace_cfg_vals
from mmengine.registry import init_default_scope
from mmdet.registry import DATASETS
from mmdet.evaluation import bbox_overlaps
from mmengine.config import Config
import numpy as np
import os

def generate_confusionmatrix(config_path, pkl_path, score_thr = 0.3, tp_iou_thr = 0.5):
    cfg = Config.fromfile(config_path)
    cfg = replace_cfg_vals(cfg)
    init_default_scope(cfg.get('default_scope', 'mmdet'))
    results = load(pkl_path)
    dataset = DATASETS.build(cfg.test_dataloader.dataset)

    assert len(results) == len(dataset), "Please check the dataset"

    num_classes = len(dataset.metainfo['classes'])
    confusion_matrix = np.zeros(shape=[num_classes + 1, num_classes + 1])
    for idx, per_img_res in enumerate(results):
        res_bboxes = per_img_res['pred_instances']
        gts = dataset.get_data_info(idx)['instances']
    
        true_positives = np.zeros(len(gts))
        gt_bboxes = []
        gt_labels = []
        for gt in gts:
            gt_bboxes.append(gt['bbox'])
            gt_labels.append(gt['bbox_label'])
        gt_bboxes = np.array(gt_bboxes)
        gt_labels = np.array(gt_labels)
    
        unique_label = np.unique(res_bboxes['labels'].numpy())
    
        for det_label in unique_label:
            mask = (res_bboxes['labels'] == det_label)
            det_bboxes = res_bboxes['bboxes'][mask].numpy()
            det_scores = res_bboxes['scores'][mask].numpy()
        
            ious = bbox_overlaps(det_bboxes[:, :4], gt_bboxes)
            for i, score in enumerate(det_scores):
                det_match = 0
                if score >= score_thr:
                    for j, gt_label in enumerate(gt_labels):
                        if ious[i, j] >= tp_iou_thr:
                            det_match += 1
                            if gt_label == det_label:
                                true_positives[j] += 1  # TP
                            confusion_matrix[gt_label, det_label] += 1
                    if det_match == 0:  # BG FP
                        confusion_matrix[-1, det_label] += 1
        for num_tp, gt_label in zip(true_positives, gt_labels):
            if num_tp == 0:  # FN
                confusion_matrix[gt_label, -1] += 1

    class_labels = ['Gantry Crane', 'Standby Gantry Crane', 'Background']
    additional_info = confusion_matrix.sum(axis=1)[:, np.newaxis]
    
    fig, ax = plt.subplots()
    cax = ax.matshow(confusion_matrix, cmap=plt.cm.Blues)
    fig.colorbar(cax)
    
    ax.set_xticks(np.arange(confusion_matrix.shape[1]))
    ax.set_yticks(np.arange(confusion_matrix.shape[0]))
    ax.set_xticklabels(class_labels)
    ax.set_yticklabels(class_labels)
    
    plt.setp(ax.get_xticklabels(), rotation=45, ha="left", rotation_mode="anchor")
    
    for i in range(confusion_matrix.shape[0]):
        for j in range(confusion_matrix.shape[1]):
            ax.text(j, i, int(confusion_matrix[i, j]), ha='center', va='center', color='black')
    
    ax.set_xlabel('Predicted labels')
    ax.set_ylabel('True labels')
    
    for idx, info in enumerate(additional_info):
        plt.text(0.5, -0.2 - idx * 0.1, f"{class_labels[idx]}: {info}", ha='center', va='center', transform=ax.transAxes, fontsize=10)
    
    plt.show()


config_path = "./model/config.py" #configuration path
pkl__path = "./result.pkl" #pickel outpath

generate_confusionmatrix(config_path, pkl__path)

Analyzing the confusion matrix reveals errors like false positives and false negatives, guiding improvements in the model’s performance, such as tuning parameters, refining training data, or adjusting detection techniques.


### 5.3.3 Model Inference
For inference visuazalition, MMDetection provides a script for visualizing inference result.  

In [None]:
from mmdet.apis import DetInferencer

path_model = "./model/epoch_100.pth"
path_config = "./model/config.py"

detection_model = DetInferencer(model=path_config, weights=path_model, show_progress=True)

detection_model('./demo_dataset/test/images', out_dir='./inference_result', no_save_pred=False, pred_score_thr = 0.50)

**Note**: An error might occur during processing, as MMDetection version 3 has not yet issued a fix. Manual adjustment to the script may be necessary.

For a quick fix in MMDetection version 3, a modification is needed in the runtime_info_hook.py file located in the mmengine/hooks/ folder. Specifically, on line 132, within the after_test_iter function, the following changes should be made:

1. Open the runtime_info_hook.py file.
2. Locate the after_test_iter function around line 132.
3. Add a try and except block to handle the error related to custom validation loss, which occurs when generating the pickle file. The updated function might look like this:

```python
def after_test_iter(self,
                    runner,
                    batch_idx: int,
                    data_batch: DATA_BATCH = None,
                    outputs: Optional[dict] = None) -> None:

    if outputs is not None:
        try:
            for key, value in outputs.items():
                runner.message_hub.update_scalar(f'test/{key}', value)
        except Exception as e:
            pass
```

This adjustment will help bypass the error caused by custom validation loss when generating the pickle file, as this issue arises from using the script specifically for generating the pickle rather than for validation.

# 6 Improve model training
The models can be improved by performing combination of data augmentation, model configurations, hyperparameter tuning and some other techniques.

## 6.1 Data Augmentation:

By performing various augmentations to the training datasets, it can make the model robust and improve generalization. In mmdetection, ‘pipeline’ contains preprocessors of dataset such as augmentation. Some techniques are: Resize, RandomFlip, Normalize, Pad, RandomCrop, ColorTransform, etc.

In [None]:
# add the following parameter to cfg file to use mentioned data augmentation.
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='RandomCrop', crop_size=(800, 800)),
    dict(type='ColorTransform', prob=0.5, level=1),
    dict(type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]

# The choice of selecting augmentation depends upon dataset characteristics, task requirements and domain knowledge.

## 6.2 Exploring alternative models and datasets for model training
Pre-trained detectors from the COCO dataset serve as effective models for initializing on another dataset. Using models available in the Model Zoo can significantly enhance performance. To fine-tune a model for a new dataset, follow these steps:

1. **Inherit default configs:** mmdetection supports inheriting configurations from existing setups. Start by inheriting base configurations for model architecture (models/), dataset specifics (datasets/), and runtime settings (default_runtime.py) from mmdetection’s configs directory.
2. **Modify configurations:** Adjust settings such as model backbone, ROI heads, and dataset paths to suit the characteristics of the new dataset. Modify parameters like num_classes in the ROI head to match the number of classes in the new dataset.
3. **Load pre-trained weights:** Initialize the model with weights pretrained on a large-scale dataset (e.g., COCO). This step initializes the model with beneficial learned features for object detection tasks.
4. **Fine-tuning:** Fine-tune the initialized model using the new dataset. Optimize hyperparameters such as learning rate, optimizer type, and batch size to improve performance. Considerations include adjusting learning rates and epochs based on the dataset’s scale and complexity.

To use a pre-trained model, specify the path to the pretrained checkpoint in load_from. Ensure the model weights are downloaded before training to minimize download time during training.

In [None]:
# cfg.load_from = 'path_to_pretrained_checkpoint.pth'

**Model configurations:** To enhance the object detection capabilities, we can explore alternative models such as ResNet. ResNet101 is known for its efficiency and accuracy in real-time object detection tasks. Here’s how it can be configured and train a ResNet101 model using your dataset.


In [None]:
# Model can be adjusted with following component within cfg.model
model = dict(
    backbone=dict(  # Define the backbone of the model (usually a pre-trained network for feature extraction)
        depth=101,  # Specifies the depth of the ResNet, in this case, ResNet-101 (101 layers)
        frozen_stages=1,  # The first stage of the network is frozen (not updated during training)
        init_cfg=dict(checkpoint='torchvision://resnet101', type='Pretrained'),  # Initialization configuration: uses a pre-trained ResNet-101 model from Torchvision
        norm_cfg=dict(requires_grad=False, type='BN'),  # Normalization configuration: batch normalization (BN) layers with fixed parameters (no gradient update)
        norm_eval=True,  # During evaluation, batch normalization layers use running statistics rather than batch statistics
        num_stages=4,  # The number of stages in the ResNet architecture (ResNet typically has 4 stages)
        out_indices=(
            0,
            1,
            2,
            3,
        ),  # Output features from all 4 stages of the network (0 to 3)
        style='pytorch',  # Indicates that the ResNet architecture follows the original PyTorch implementation
        type='ResNet'),  # Specifies that the backbone type is ResNet
# The overall architecture is a Feature Pyramid Network (Resnet)

Or load the entire configuration provided by MMDetection. To load the configuration, use:

```
cfg = Config.fromfile('mmdetection/configs/faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py')
print(cfg.pretty_text)
```

Then, follow the guide in section 5. The flow remains the same; the differences lie in augmentation, batch size, or custom hooks like RTMDet. A prior understanding of what has been done is necessary.

## 6.3 Hyperparameter tuning:

 It involves adjusting the settings of optimizers. These settings, known as hyperparameters, are not learned from the training data but set prior to the training process. Effective tuning of hyperparameters such as learning rate, batch size, optimizer type, and the number of epochs can significantly impact the model’s accuracy and convergence speed. In mmdetection, hyperparameters are configured in the model’s configuration file and can be fine-tuned based on the specific dataset and task requirements to achieve better performance. We have a parameter scheduler, which dynamically adjusts learning rates and other hyperparameter during training to enhance model convergence and performance.

1. Similar to section 5, many of the cells in this section will be identical, with the primary differences being in the values and additional dictionaries used in the augmentation pipeline. The changes primarily focus on adjusting values and the train_dataloader.

    First, load the configuration as in section 5. Notice that most of the cells will be identical to those in section 5, but with some modifications to suit the current setup.

In [None]:
from mmengine.config import Config
import os

cfg = Config.fromfile('./mmdetection/configs/faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py')
print(cfg.pretty_text)

2. Set the important keys or data in the configuration dictionary, such as the dataset root, output model, and other relevant parameters:

In [None]:
cfg.data_root = './demo_dataset/'  # data_root means the path of the dataset that will be used.
cfg.dataset_type = 'CocoDataset'   # type of the dataset, indicating its structure. Mostly it's CocoDataset.

cfg.auto_scale_lr = dict(base_batch_size=16, enable=True)  # auto_scale will auto set its batch size based on the selected batch size of the dataloaders.
cfg.backend_args = None  # Configure backend arguments.

cfg.work_dir = './model_finetuned'  # Output path of the model.

print(cfg.pretty_text)

3. Model setup is identical to the previous configuration, but there are some values that can be adjusted. These adjustments require a deep understanding of the model, so for now, we'll stick to the basics that still contribute to model training.

In [None]:
# Update backbone parameters
cfg.model['backbone']['depth'] = 101 # change FasterRCNN backbone depth
cfg.model['backbone']['init_cfg']['checkpoint'] = 'torchvision://resnet101' # path/to/custom/pretrained.pth; Specify custom pretrained weights
cfg.model['backbone']['init_cfg']['type'] = 'Pretrained' # specifies how the model should be initialized or where it should load its initial weights from
cfg.model['backbone']['norm_cfg']['requires_grad'] = False  # Disable gradient updates for normalization layer
# Update neck parameters
cfg.model['neck']['in_channels'] = [256, 512, 1024, 2048]  # Add an additional stage with 4096 input channels
cfg.model['neck']['out_channels'] = 256  # Increase output channels to 512

# Update ROI Head parameters
cfg.model['roi_head']['bbox_head']['loss_bbox']['loss_weight'] = 1.0  # Increase weight for bounding box regression loss
cfg.model['roi_head']['bbox_head']['num_classes'] = 2  # Change number of classes to 2

4. Training Dataloader

In [None]:
cfg.train_pipeline = [
    dict(backend_args=None, type='LoadImageFromFile'),  # Load image from file
    dict(type='LoadAnnotations', with_bbox=True),  # Load annotations with bounding boxes
    dict(type='Resize', keep_ratio=True, scale=(1024, 1024)),  # Resize images to the target size
    dict(crop_size=(
                1024,
                1024,
            ), type='RandomCrop'),
    dict(type='YOLOXHSVRandomAug'),
    dict(
        direction=[
            'horizontal',
            'vertical',
            'diagonal',
        ],
        prob=0.75,
        type='RandomFlip'), # Apply random horizontal flip with 75% probability
    dict(level=10, prob=1.0, type='Color'),
    dict(level=10, prob=1.0, type='AutoContrast'),
    dict(level=6, prob=1.0, type='Brightness'),
    dict(level=4, prob=1.0, type='Sharpness'),
    dict(
                pad_val=dict(img=(
                    114,
                    114,
                    114,
                )),
                size=(
                    1024,
                    1024,
                ),
                type='Pad'),
    dict(type='PackDetInputs'),  # Pack inputs for detection
]

In [None]:
# Update batch size for the training dataloader
cfg.train_dataloader['batch_size'] = 4  # Set batch size to 2
cfg.train_dataloader['num_workers'] = 1

# Set dataset type
cfg.train_dataloader['dataset']['type'] = cfg.dataset_type

# Configure training epochs and validation interval
cfg.train_cfg['max_epochs'] = 100  # Maximum number of training epochs
cfg.train_cfg['val_interval'] = 100  # Interval (in epochs) for validation

# Set the annotation file for training dataset
cfg.train_dataloader['dataset']['ann_file'] = 'train.json'  # Path to training annotations

# Initialize metainfo dictionary if not present
cfg.train_dataloader['dataset']['metainfo'] = {}  
cfg.train_dataloader['dataset']['metainfo']['classes'] = ('gantry_crane', 'standby_gantry_crane')  # Set classes

# Set data root directory and image prefix
cfg.train_dataloader['dataset']['data_root'] = cfg.data_root  # Root path for dataset
cfg.train_dataloader['dataset']['data_prefix']['img'] = "train/images"  # Path to training images

# Assign the train pipeline
cfg.train_dataloader['dataset']['pipeline'] = cfg.train_pipeline  # Apply training pipeline

5. Test Dataloader

In [None]:
cfg['val_pipeline'] = cfg.test_pipeline
cfg.val_pipeline = [
    dict(type='LoadImageFromFile', backend_args=cfg.backend_args),  # Load image from file
    dict(type='Resize', scale=(1024, 1024), keep_ratio=True),  # Resize images to 1024x1024 to match the demo dataset
    dict(type='LoadAnnotations', with_bbox=True),  # Load annotations with bounding boxes
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')
    )  # Pack input metadata for detection
]

In [None]:
# Configuration for loading validation data during training
cfg.val_dataloader['batch_size'] = 2  # Update the batch size for the validation dataloader
cfg.val_dataloader['num_workers'] = 1
cfg.val_dataloader['dataset']['type'] = cfg.dataset_type  # Set dataset type

cfg.val_dataloader['dataset']['ann_file'] = 'test.json'  # Path to validation annotations

cfg.val_dataloader['dataset']['metainfo'] = {}  # Initialize metainfo if not present
cfg.val_dataloader['dataset']['metainfo']['classes'] = ('gantry_crane', 'standby_gantry_crane')  # Set classes

cfg.val_dataloader['dataset']['data_root'] = cfg.data_root  # Root path for dataset
cfg.val_dataloader['dataset']['data_prefix']['img'] = "test/images"  # Path to validation images

cfg.val_dataloader['dataset']['pipeline'] = cfg.val_pipeline  # Apply validation pipeline

#For the val evaluator
cfg.val_evaluator['ann_file'] = os.path.join(cfg.data_root, "test.json")

print(cfg.val_dataloader)  # Print updated validation dataloader structure
print(cfg.val_evaluator)

6. Val Dataloader

In [None]:
cfg.test_pipeline = [
    dict(type='LoadImageFromFile', backend_args=cfg.backend_args),  # Load image from file
    dict(type='Resize', scale=(1024, 1024), keep_ratio=True),  # Resize images to 1024x1024 to match the demo dataset
    dict(type='LoadAnnotations', with_bbox=True),  # Load annotations with bounding boxes
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')
    )  # Pack input metadata for detection
]

In [None]:
# Configuration for testing dataset for confusion matrix and model evaluation after training
cfg.test_dataloader['batch_size'] = 2  # Update the batch size for the test dataloader
cfg.test_dataloader['num_workers'] = 1
cfg.test_dataloader['dataset']['type'] = cfg.dataset_type  # Set dataset type

cfg.test_dataloader['dataset']['ann_file'] = 'test.json'  # Path to test annotations

cfg.test_dataloader['dataset']['metainfo'] = {}  # Initialize metainfo if not present
cfg.test_dataloader['dataset']['metainfo']['classes'] = ('gantry_crane', 'standby_gantry_crane')  # Set classes

cfg.test_dataloader['dataset']['data_root'] = cfg.data_root  # Root path for dataset
cfg.test_dataloader['dataset']['data_prefix']['img'] = "test/images"  # Path to test images

cfg.test_dataloader['dataset']['pipeline'] = cfg.test_pipeline  # Apply test pipeline

cfg.test_evaluator['ann_file'] = os.path.join(cfg.data_root, "test.json")  # Set annotation file for the evaluator

7. Set Custom Hook and the Output/Dump Directory

In [None]:
cfg['custom_hooks'] = [dict(type='MyHook')]

os.makedirs(cfg.work_dir, exist_ok = True)
cfg.dump(os.path.join(cfg.work_dir, "config.py"))

Now it's ready to be trained again, and you can compare the differences from the previous model training.

## 6.4 Run the tuned model training

In [None]:
!python mmdetection/tools/train.py "./model_finetuned/config.py"

After the model training, run the code/cell in section 5.4.2 to evaluate classification performance.