# Create the ultimate neural network by using only curated data

# -> version with square images at the end

With images manually annotated and curated, from saturated and clean filters, we train an ultimate neural network. 
The curated data are augmented to 500 images as much as the free plan allows. Additional augmentation is not necessary as the saturated filters annotations are not based on highly recognizable features (human biased - perception - )

Remove old data before downloading another version of the dataset. 

```! rm -r ../../../0_DATA/IMPTOX/01-01_image_library/V4/download/*```

In [None]:
#! rm -r ../../../0_DATA/IMPTOX/01-01_image_library/V4/download/*

Use roboflow SDK to login and download the chosen version of the dataset. 

In [None]:
#V6
import roboflow 

roboflow.login()
rf = roboflow.Roboflow()

#rf = Roboflow(api_key="...")
project = rf.workspace("uftir-particles").project("uftir_curated")
download_folder = "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v6/"

# Uncomment to download. It will overwrite the json annotation files !! 

new_dataset = False

if new_dataset: 
    # Download and process the data again only if they have changed. 
    dataset = project.version(6).download(model_format="coco", location=download_folder)


In [None]:
#V7
import roboflow 

roboflow.login()
rf = roboflow.Roboflow()

#rf = Roboflow(api_key="...")
project = rf.workspace("uftir-particles").project("uftir_curated")
download_folder = "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v7/"

# Uncomment to download. It will overwrite the json annotation files !! 

new_dataset = False

if new_dataset: 
    # Download and process the data again only if they have changed. 
    dataset = project.version(7).download(model_format="coco", location=download_folder)


The particle object is present twice (god knows why). We must remove it once for every annotation. 

Modification of all three annotation files (train, test, validation), keep only the following:  


```
 "categories": [{
            "id": 0,
            "name": "particle",
            "supercategory": "particles"
        }
    ],
```

Then replace all the ""category_id": 1," by ""category_id": 0," if you correctly changed the ids as over. 

## Register datasets

In [None]:
! pwd

In [None]:
import random
import os
import cv2

from detectron2.engine import DefaultTrainer
from detectron2.utils.visualizer import Visualizer, ColorMode
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.data.datasets import register_coco_instances
from detectron2.evaluation import COCOEvaluator, inference_on_dataset


from detectron2.data import build_detection_test_loader
import matplotlib.pyplot as plt
import cv2

import matplotlib.pyplot as plt
import numpy as np

DatasetCatalog.clear()
MetadataCatalog.clear()

# Register the train, test, and validation datasets
register_coco_instances("train_particles_v6", {}, 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v6/train/_annotations.coco.json", 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v6/train")
register_coco_instances("test_particles_v6", {}, 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v6/test/_annotations.coco.json", 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v6/test")
register_coco_instances("val_particles_v6", {}, 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v6/valid/_annotations.coco.json", 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v6/valid")

# Define metadata for your classes
MetadataCatalog.get("train_particles_v6").set(thing_classes=["particle"])
MetadataCatalog.get("test_particles_v6").set(thing_classes=["particle"])
MetadataCatalog.get("val_particles_v6").set(thing_classes=["particle"])


# V7 with modified augmentation parameters

# Register the train, test, and validation datasets
register_coco_instances("train_particles_v7", {}, 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v7/train/_annotations.coco.json", 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v7/train")
register_coco_instances("test_particles_v7", {}, 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v7/test/_annotations.coco.json", 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v7/test")
register_coco_instances("val_particles_v7", {}, 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v7/valid/_annotations.coco.json", 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V4/download_v7/valid")

# Define metadata for your classes
MetadataCatalog.get("train_particles_v7").set(thing_classes=["particle"])
MetadataCatalog.get("test_particles_v7").set(thing_classes=["particle"])
MetadataCatalog.get("val_particles_v7").set(thing_classes=["particle"])




# For the saturated filters (additional - augemented from V3) 

# Register the train, test, and validation datasets
register_coco_instances("test_sat_particles", {}, 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V3/download_version2_augmented/train/_annotations.coco.json", 
                        "../../../0_DATA/IMPTOX/01-01_image_library/V3/download_version2_augmented/train")

# Define metadata for your classes
MetadataCatalog.get("test_sat_particles").set(thing_classes=["particle"])


### Visualize datasets

1. Train
2. Val
3. Test


In [None]:
def create_mosaic(dataset_name, max_images=30):
    dataset_dicts = DatasetCatalog.get(dataset_name)
    metadata = MetadataCatalog.get(dataset_name)
    
    num_images = min(len(dataset_dicts), max_images)
    fig, axes = plt.subplots(1, num_images, figsize=(20, 5))
    
    for i, d in enumerate(random.sample(dataset_dicts, num_images)):
        image = plt.imread(d["file_name"])
        v = Visualizer(image, metadata=metadata, scale=0.8)
        v = v.draw_dataset_dict(d)
        axes[i].imshow(v.get_image()[:, :, ::-1])
        axes[i].axis("off")
        axes[i].set_title(f"Image {i+1}")
    
    plt.tight_layout()
    plt.show()

In [None]:
create_mosaic("train_particles")

In [None]:
create_mosaic("val_particles")

In [None]:
create_mosaic("test_particles")

In [None]:
create_mosaic("test_sat_particles")

## Configure and train detectron2

In [None]:
cfg = get_cfg()
cfg.OUTPUT_DIR = "./TrainDetectron2Model"
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

cfg.merge_from_file(
    "../../Other/detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
    #"../../Other/detectron2/configs/Misc/cascade_mask_rcnn_R_50_FPN_3x.yaml"
) # Detectron2 accidentally installed at more than one place

# Specify the train, test, and validation subsets in your Detectron2 configuration
cfg.DATASETS.TRAIN = ("train_particles",)
cfg.DATASETS.TEST = ("test_particles",)  # Possible to later change the test set
cfg.DATASETS.VAL = ("val_particles",) # You can add validation dataset here if you want to evaluate during training
cfg.DATALOADER.NUM_WORKERS = 4

# Before training
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"  # initialize from model zoo
cfg.MODEL.DEVICE = 'cuda:1'  # cpu or cuda:0-1
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.005
cfg.SOLVER.MAX_ITER = (
    300
)  # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = (
    512
)  # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # 1 class: particle


# Set the number of augmented images per iteration
cfg.INPUT.AUG.AUGMENTATIONS_PER_BATCH = 6  # Adjust this value as needed


Train if needed, otherwise load the weights

In [None]:
#new_parameters = True

#if new_parameters:
    
# If the parameters have changed, retrain the neural net. Otherwise, just load the weights. 


trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
    



### Create the predictor

In [None]:
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.01   # set the testing threshold for this model
cfg.TEST.DETECTIONS_PER_IMAGE = 2000
predictor = DefaultPredictor(cfg)

In [None]:
# Load the evaluation dataset
val_dataset_name = "test_particles"
val_dataset_dicts = DatasetCatalog.get(val_dataset_name)

# Prepare a table for plotting
predictions = []
ground_truths = []
for d in val_dataset_dicts:
    image_path = d["file_name"]
    image = cv2.imread(image_path)
    outputs = predictor(image)

    instances = outputs["instances"].to("cpu")
    predictions.append(instances)
    ground_truths.append(d)  # Append the whole dictionary, containing "file_name" and "annotations"

    
evaluator = COCOEvaluator(val_dataset_name, cfg, False, output_dir=cfg.OUTPUT_DIR)
val_loader = build_detection_test_loader(cfg, val_dataset_name)

# Perform evaluation using the predictor and evaluator
metrics = inference_on_dataset(predictor.model, val_loader, evaluator)

print(metrics)

In [None]:
# Step 5: Plot a few graphics to compare predictions with ground truth

for i in range(len(predictions)):
    image_path = ground_truths[i]["file_name"]
    image = cv2.imread(image_path)

    v_pred = Visualizer(image[:, :, ::-1], metadata=MetadataCatalog.get(val_dataset_name), scale=0.8)
    v_pred = v_pred.draw_instance_predictions(predictions[i])

    v_gt = Visualizer(image[:, :, ::-1], metadata=MetadataCatalog.get(val_dataset_name), scale=0.8)
    v_gt = v_gt.draw_dataset_dict(ground_truths[i])

    # Plot the predicted and ground truth instances side by side
    plt.figure(figsize=(15, 8))
    plt.subplot(1, 2, 1)
    plt.imshow(v_pred.get_image()[:, :, ::-1])
    plt.title("Predicted")
    plt.axis("off")

    plt.subplot(1, 2, 2)
    plt.imshow(v_gt.get_image()[:, :, ::-1])
    plt.title("Ground Truth")
    plt.axis("off")

    plt.show()

### Test with testset

Test the neural network with the test dataset (or validation) as the test set does not contain saturated filters. 

Results are approx. 18 APs when using the validation sets (many images from both types) and 25 when using the test set that contains clean filters only. 

### Test with the satureated only dataset

Here we use the unaugmented set of saturated particles (V3) to test the performances of our model. 

In [None]:
# Load the evaluation dataset
val_dataset_name = "test_sat_particles"
val_dataset_dicts = DatasetCatalog.get(val_dataset_name)

# Prepare a table for plotting
predictions = []
ground_truths = []
for d in val_dataset_dicts:
    image_path = d["file_name"]
    image = cv2.imread(image_path)
    outputs = predictor(image)

    instances = outputs["instances"].to("cpu")
    predictions.append(instances)
    ground_truths.append(d)  # Append the whole dictionary, containing "file_name" and "annotations"

    
evaluator = COCOEvaluator(val_dataset_name, cfg, False, output_dir=cfg.OUTPUT_DIR)
val_loader = build_detection_test_loader(cfg, val_dataset_name)

# Perform evaluation using the predictor and evaluator
metrics = inference_on_dataset(predictor.model, val_loader, evaluator)

print(metrics)

In [None]:
# Step 5: Plot a few graphics to compare predictions with ground truth

for i in range(len(predictions)):
    image_path = ground_truths[i]["file_name"]
    image = cv2.imread(image_path)

    v_pred = Visualizer(image[:, :, ::-1], metadata=MetadataCatalog.get(val_dataset_name), scale=0.8)
    v_pred = v_pred.draw_instance_predictions(predictions[i])

    v_gt = Visualizer(image[:, :, ::-1], metadata=MetadataCatalog.get(val_dataset_name), scale=0.8)
    v_gt = v_gt.draw_dataset_dict(ground_truths[i])

    # Plot the predicted and ground truth instances side by side
    plt.figure(figsize=(15, 8))
    plt.subplot(1, 2, 1)
    plt.imshow(v_pred.get_image()[:, :, ::-1])
    plt.title("Predicted")
    plt.axis("off")

    plt.subplot(1, 2, 2)
    plt.imshow(v_gt.get_image()[:, :, ::-1])
    plt.title("Ground Truth")
    plt.axis("off")

    plt.show()

The APs of around 11 indicates that the model performs poorly on saturated filters. However, the subjectivity of the detection and the overall purpose to scan almost random points in order to avoid scanning the whole filter makes this operation useful anyway. Indeed, the primary goal is to have a selection of points on the filter instead of a random detection like with the first version of the neural network, only working with clean filters. Secondly, Gepard, the end user of this NN, allows for manual correction and max size selection, rendering the small mistakes and inconsistencies of this neural network secondary. 

## Metrics

To assess the quality of the NN, let's compare the predicted data with the real ones. The metrics given by Detectron2 are the followin: 


1. AP (Average Precision):

    AP stands for Average Precision. It is a common metric used to evaluate the accuracy of object detection models. AP calculates the precision-recall curve for different confidence thresholds and then computes the area under that curve. AP takes into account how well the model detects objects at various confidence thresholds.

2. AP50 (Average Precision at 50% IoU):

    AP50 is the Average Precision computed at an Intersection over Union (IoU) threshold of 0.5. It evaluates how well the model performs when the predicted bounding boxes have an overlap of at least 50% with the ground truth bounding boxes.

3. AP75 (Average Precision at 75% IoU):

    AP75 is the Average Precision computed at an IoU threshold of 0.75. It measures the model's accuracy when the predicted bounding boxes have an overlap of at least 75% with the ground truth bounding boxes.

4. **APs (Average Precision for Small objects):**

    APs is the Average Precision calculated specifically for small objects. It evaluates the model's performance on small-sized objects in the images.

5. APm (Average Precision for Medium objects):

    APm is the Average Precision calculated specifically for medium-sized objects. It evaluates the model's performance on objects of medium size in the images.

6. APl (Average Precision for Large objects):

    APl is the Average Precision calculated specifically for large-sized objects. It evaluates the model's performance on large objects in the images.

In summary, these scores provide insights into the object detection model's performance across different aspects, such as overall AP, performance at different IoU thresholds, and performance on objects of different sizes. The higher the values, the better the model's performance for each of these metrics.

APs (Average Precision for Small objects) focuses on measuring how well the model detects and localizes small objects, which are generally more challenging to detect due to their limited spatial extent and lower visual prominence compared to larger objects.

To compute APs, the model's predictions for small objects are compared to the ground truth bounding boxes of small objects in the dataset. The evaluation is done based on an Intersection over Union (IoU) threshold (usually 0.5) to determine if a predicted bounding box is a true positive or a false positive.

A comparison of good and bad scores for APs:

Good APs Score:
A good APs score would be close to 1.0 (or 100 in our case), indicating that the model is very effective at detecting and localizing small objects. A score close to 1.0 means that there is a high precision-recall trade-off, implying that the model successfully finds most small objects with very few false positives.

Bad APs Score:
A bad APs score would be close to 0.0, indicating poor performance in detecting and localizing small objects. A score close to 0.0 means that the model is not effective at identifying small objects accurately, leading to numerous false positives or missing many small objects, or both.




### Analysis

The APs of 18.4 is relatively low

More in-depth analysis and fine tuning in next notebook. 

# Square 256x256 images

Train a new NN with square images 



In [None]:
cfg = get_cfg()
cfg.OUTPUT_DIR = "./TrainDetectron2ModelSquare"
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

cfg.merge_from_file(
    "../../Other/detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
    #"../../Other/detectron2/configs/Misc/cascade_mask_rcnn_R_50_FPN_3x.yaml"
) # Detectron2 accidentally installed at more than one place

# Specify the train, test, and validation subsets in your Detectron2 configuration
cfg.DATASETS.TRAIN = ("train_particles",)
cfg.DATASETS.TEST = ("test_particles",)  # Possible to later change the test set
cfg.DATASETS.VAL = ("val_particles",) # You can add validation dataset here if you want to evaluate during training
cfg.DATALOADER.NUM_WORKERS = 8

# Before training
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"  # initialize from model zoo
cfg.MODEL.DEVICE = 'cuda:0-1'  # cpu or cuda:0-1
cfg.SOLVER.IMS_PER_BATCH = 16
cfg.SOLVER.BASE_LR = 0.005
cfg.SOLVER.MAX_ITER = (
    300
)  # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = (
    512
)  # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # 1 class: particle


# Set the number of augmented images per iteration
cfg.INPUT.AUG.AUGMENTATIONS_PER_BATCH = 6  # Adjust this value as needed
