# Farm ponds identification pipeline: Validating the results
Before you start this notebook, make sure that you have
- Pretrained model configuration: ```config.pkl``` in the [parameters folder](../../output/parameters/)
- Tiles of the satellite image: ```tile_x_y.png``` in [train](../../data/train/) and [val](../../data/val/) folders
- Tiles of the masks: ```tile_x_y.png``` in [train_mask](../../data/train_mask/) and [val_mask](../../data/val_mask/) folders

 The tiles and the mask should follow the naming scheme: ```tile_x_y.png```, where x is the top left coordinates ```(x,y)``` of the image (in pixles). 

## Install the packages for the pipeline
Make sure you have the environment set up done, so that we can import the packages used in this notebook. Check out the [setup info](../../README.md). Traning the instance segmentation model requires ```torch``` and ```detectron2```, so here we need to install the libraries first. You can skip this step if you already installed ```detectron2``` in the previous steps.


In [None]:
!python3 -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

## Import the libraries that are used in the pipeline
Part of the libraries and requirements used in this pipeline are based on Detectron2 documentation. You can find more details of the pre-train models from [Meta Research's Github repository](https://github.com/facebookresearch/detectron2)

In [None]:
import torch, detectron2

# Setup detectron2 logger
from detectron2.utils.logger import setup_logger
setup_logger()

import numpy as np
import os, json, cv2, random, sys, re
import matplotlib.pyplot as plt
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.engine import DefaultTrainer
from detectron2.data import transforms as T
from detectron2.data import DatasetMapper, build_detection_train_loader
import cloudpickle
from detectron2.config import CfgNode
from detectron2.utils.visualizer import ColorMode
from detectron2.data.datasets import register_coco_instances
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
from detectron2.projects import point_rend # uncomment if pointrend


TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
print("detectron2:", detectron2.__version__)

### Setting up the folder paths and parameters
You don't need to change the folder paths as the missing folders should be created if they do not exist. The data produced in the pipeline will be stored in the corresponding folders (e.g. training data in [train](./ponds/data/train/)) Refer to the [ponds README.md](./ponds/README.md) to see the structure of the folders in the package. 


In [None]:
# Paths
ponds_root = os.path.dirname(os.path.dirname(os.getcwd()))
if ponds_root not in sys.path:
    sys.path.append(ponds_root)
train_image_path = os.path.join(ponds_root, "data/train.png")  # Path to the input image
train_mask_path =  os.path.join(ponds_root,"data/train_mask.png")
train_folder =  os.path.join(ponds_root,"data/train/")  # Output folder for tiles
train_mask_folder =  os.path.join(ponds_root,"data/train_mask/")
train_not_used_folder =  os.path.join(ponds_root,"data/train_not_used/")
val_folder =  os.path.join(ponds_root,"data/val/")
val_mask_folder =   os.path.join(ponds_root,"data/val_mask/")
output_folder = os.path.join(ponds_root, "output/")
parameter_folder = os.path.join(output_folder, "parameters/")
model_folder = os.path.join(output_folder, "model/")
performance_folder = os.path.join(output_folder, "performance/")
inference_path = os.path.join(performance_folder, "inference.txt")


In [None]:
from utils import helpers as hp
register_coco_instances("pond_train", {}, os.path.join(train_folder,"train.json"), train_folder)
register_coco_instances("pond_val", {}, os.path.join(val_folder,"val.json"), val_folder)
pond_metadata = MetadataCatalog.get("pond_train")

## Inference & evaluation using the trained model




In [None]:
cfg_path = os.path.join(parameter_folder, "config.pkl")
cfg = hp.load_from_cloudpickle(cfg_path)
point_rend.add_pointrend_config(cfg) # un comment if changeing model to pointrend
cfg.OUTPUT_DIR = model_folder
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model.pth")  # path to the trained model 
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
predictor = DefaultPredictor(cfg)

### Run multiple validations

In [None]:
def count_pth_files(directory):
    # This regular expression matches files with the format 'config_{number}.pkl'
    pattern = re.compile(r'model_\d+\.pth$')
    # List all files in the given directory
    files = os.listdir(directory)
    # Filter files based on the pattern
    matched_files = [file for file in files if pattern.match(file)]
    # Return the count of matched files
    return len(matched_files)


count = count_pth_files(model_folder)
print(f"Number of matching .pth files: {count}")

In [None]:
with open(inference_path, "w") as file:
    for num in range(1, count+1):
        cfg_path = os.path.join(parameter_folder, f"config_{num}.pkl")
        cfg = load_from_cloudpickle(cfg_path)
        #point_rend.add_pointrend_config(cfg) # un comment if changeing model to pointrend
        cfg.MODEL.WEIGHTS =  os.path.join(model_folder, f"model_{num}.pth")  # path to the trained model 
        cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
        predictor = DefaultPredictor(cfg)
        evaluator = COCOEvaluator("pond_val", output_dir=performance_folder)
        val_loader = build_detection_test_loader(cfg, "pond_val")
        results = inference_on_dataset(predictor.model, val_loader, evaluator)
        file.write(f"model_{num}.pth\n")
        file.write(json.dumps(results, indent=4))
        file.write("end\n\n")
        print(results)

## Evaluation of the Performance on the validation data

Performance evaluation: AP metric implemented in COCO API.

In [None]:
evaluator = COCOEvaluator("pond_val", output_dir=performance_folder)
val_loader = build_detection_test_loader(cfg, "pond_val")
results = inference_on_dataset(predictor.model, val_loader, evaluator)
print(results)

### Pick the best performance out of the pre-trained models

In [None]:
def find_best_model_and_config(filename):
    highest_ap = -1
    best_model = None
    best_config = None

    try:
        # Open the file for reading
        with open(filename, 'r') as file:
            content = file.read()

        # Regular expression to match each model's data block
        pattern = re.compile(r'(model_(\d+)\.pth)(.*?)}end', re.DOTALL)
        matches = pattern.finditer(content)

        # Process each match
        for match in matches:
            model_name = match.group(1)
            model_number = match.group(2)  # This captures the number in 'model_number.pth'
            json_data = match.group(3) + '}'

            try:
                data = json.loads(json_data.strip())
                # Extract the segm AP value
                segm_ap = data['segm']['AP']
                
                # Update the best model if the current AP is higher than the highest found so far
                if segm_ap > highest_ap:
                    highest_ap = segm_ap
                    best_model = model_name
                    best_config = f"config_{model_number}.pkl"  # Construct the config filename
            except json.JSONDecodeError:
                print(f"JSON decode error in block for {model_name}. Skipping this block.")

    except FileNotFoundError:
        print("The file was not found. Check your file path.")
    except Exception as e:
        print("An unexpected error occurred:", str(e))

    return best_model, highest_ap, best_config

# Specify the path to your text file containing the inference scores
best_model, highest_ap, best_config = find_best_model_and_config(inference_path)
if best_model:
    print(f"The model with the highest segmentation AP score is {best_model} with an AP of {highest_ap:.2f}")

else:
    print("No valid model data was found.")

best_performance_path = os.path.join(performance_folder, "best_performance.txt")
with open(best_performance_path, "w") as file:
    best_model_path = os.path.join(model_folder, best_model)
    best_config_path =os.path.join(parameter_folder, best_config)
    file.write(f"{best_model_path}\n{best_config_path}")

In [None]:
cfg_path = os.path.join(parameter_folder, best_config)
cfg = hp.load_from_cloudpickle(cfg_path)
cfg.MODEL.WEIGHTS =  os.path.join(model_folder, best_model)  # path to the trained model 
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
predictor = DefaultPredictor(cfg)

Randomly select three samples to visualize the prediction results in the val folder.

In [None]:
dataset_dicts = DatasetCatalog.get("pond_val")
for d in random.sample(list(dataset_dicts), 3):
    print(d["file_name"])
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)
    # calculate 
    masks = outputs['instances'].pred_masks
    print(masks.shape)
    print(torch.sum(torch.flatten(masks, start_dim=1),dim=1))
  
    v = Visualizer(im[:, :, ::-1],
                   metadata=pond_metadata,
                   scale=0.5,
                   instance_mode=ColorMode.IMAGE_BW   
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    plt.imshow(out.get_image()[:, :, ::-1])
    plt.show()

In [None]:
print(cfg)