# Preprocessing of images (whole process) for a single image

<font color='red'>THIS SCRIPT ONLY WORKS IF NO MULTIPOLYGONS ARE IN THE BOULDER SHAPEFILES. PLEASE CHECK BY USING THE CHECK VALIDATE GEOMETRY TOOL OF QGIS.</font>

In [1]:
import json
import geopandas as gpd
import numpy as np
import pandas as pd
import rasterio as rio
import sys

sys.path.append("/home/nilscp/GIT/BOULDERING/tools/graticule")
sys.path.append("/home/nilscp/GIT/")

#from detectron2.data import MetadataCatalog, DatasetCatalog
from graticule import grid
from MLtools import create_annotations
from pathlib import Path
from rastertools import raster
from shptools import polygon
from affine import Affine
from tqdm import tqdm
from shapely import geometry



## 1. Tiling of images from raster (global)

This part can be skipped if the grid was generated in QGIS: 
1. Processing/Toolbox/Create Grid/
2. Set Grid type to Rectangle (Polygon)
3. Grid extent is set to the extent of the raster
4. Horizontal/Vertical spacing is in meters, and we want to have boxes that are 500x500 px, so you just have to multiply the resolution of the image by 500

**If you want to do it with python, and the script I have written... See below**

## a) Generate a global rectangle grid (polygon) from a single raster
I know that you have done this step already, but I would like you to rerun it so that you get the `tile_id column`. 

<font color='red'>-------INPUTS-------</font>

In [2]:
in_raster = Path("/home/nilscp/BOULDERING/qgis/moon/censorinus/NAC/reference-for-coreg/M139694087LE.tif")
block_width = 500 # in pixels
block_height = 500 # in pixels

<font color='red'>-------OUTPUT-------</font>

In [3]:
graticule_name = Path('/home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-global-tiles.shp')

In [9]:
(df_global, gdf_global) = create_annotations.generate_graticule_from_raster(in_raster, block_width, block_height, graticule_name)

pickle /home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-global-tiles.pkl has been generated
shapefile /home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-global-tiles.shp has been generated


## b) Generate a global rectangle grid (polygon) from multiple rasters 
Just loop through the code above (here is an example by putting the same rasters and graticules in the same list. 
```python
in_rasters = [Path("/home/nilscp/BOULDERING/qgis/moon/censorinus/NAC/reference-for-coreg/M139694087LE.tif"),
              Path("/home/nilscp/BOULDERING/qgis/moon/censorinus/NAC/reference-for-coreg/M139694087LE.tif")]

graticule_name = [Path("/home/nilscp/tmp/preprocessing/global-graticule.shp"),
                  Path("/home/nilscp/tmp/preprocessing/global-graticule.shp"),]

for i, in_raster in enumerate(in_rasters):
    (df_global_1, gdf_global_1) = create_annotations.generate_graticule_from_raster(in_raster, block_width, block_height, graticule_name[i])
    (df_global_2, gdf_global_2) = create_annotations.generate_graticule_from_raster(in_raster, block_width, block_height, graticule_name[i])
```
NB! This is a MARKDOWN cell!

## 2. Selection of the 500x500 pixels you are working with

<font color='red'>-------MANUAL STEP REQUIRED HERE-------</font>

This is done in QGIS. You just to have highlight the boxes you want to work with and then you 
1. Click right with your mouse on your shapefile
2. Export
3. Save Selected Features As
4. Write filename

## 3. Clipping boulders intersecting selection rectangular grid(s) / graticules

After a bit of thinking, I would like to:
1. clip the boulders for the selected graticule(s)
2. filter out boulders that have an area below the min_area_threshold.
3. explode MultiPolygons (and only keep the one above the min_area_threshold)

### a) Example for a single image

In [4]:
global_tiles_pickle = Path('/home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-global-tiles.pkl')
selection_tiles_shapefile = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-selection-tiles.shp')
out_selection_tiles_pickle = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-selection-tiles.pkl')
boulders_shapefile = '/home/nilscp/BOULDERING/qgis/moon/censorinus/shapefiles/boulder_population/krishna-kumar/shp/M139694087LE-nils.shp'
out_selection_boulders_shapefile = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-selection-boulders.shp')
in_raster = Path("/home/nilscp/BOULDERING/qgis/moon/censorinus/NAC/reference-for-coreg/M139694087LE.tif")
res = raster.get_raster_resolution(in_raster)[0]
min_area_threshold = (res * 6.0)**2.0 # we do not take boulders smaller than about 6 times the resolution (assuming a square)

In [5]:
gdf_boulders, gdf_selection_tiles_updated, df_selection_tiles = create_annotations.clip_boulders(boulders_shapefile, selection_tiles_shapefile, min_area_threshold, global_tiles_pickle, out_selection_tiles_pickle, out_selection_boulders_shapefile)

100%|██████████| 161/161 [00:12<00:00, 12.44it/s]


1 MultiPolygon(s) was/were removed
2 exploded MultiPolygon(s) was/were added
shapefile /home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-selection-boulders.shp has been generated
shapefile /home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-selection-tiles.shp has been overwritten (graticules/rectangle grids with no boulder occurences are deleted)


In [6]:
gdf_boulders.shape, gdf_selection_tiles_updated.shape, df_selection_tiles.shape

((6038, 7), (161, 4), (161, 9))

### b) Example for multiple images
Here, I have just made two different selection of graticules for the same image.

In [7]:
global_tiles_pickle = Path('/home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-global-tiles.pkl')
selection_tiles_shapefile = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-selection2-tiles.shp')
out_selection_tiles_pickle = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-selection2-tiles.pkl')
boulders_shapefile = '/home/nilscp/BOULDERING/qgis/moon/censorinus/shapefiles/boulder_population/krishna-kumar/shp/M139694087LE-nils.shp'
out_selection_boulders_shapefile = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-selection2-boulders.shp')
in_raster = Path("/home/nilscp/BOULDERING/qgis/moon/censorinus/NAC/reference-for-coreg/M139694087LE.tif")
res = raster.get_raster_resolution(in_raster)[0]
min_area_threshold = (res * 6.0)**2.0 # we do not take boulders smaller than about 6 times the resolution (assuming a square)

In [8]:
gdf_boulders2, gdf_selection_tiles_updated2, df_selection_tiles2 = create_annotations.clip_boulders(boulders_shapefile, selection_tiles_shapefile, min_area_threshold, global_tiles_pickle, out_selection_tiles_pickle, out_selection_boulders_shapefile)

100%|██████████| 128/128 [00:10<00:00, 12.11it/s]


shapefile /home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-selection2-boulders.shp has been generated
shapefile /home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-selection2-tiles.shp has been overwritten (graticules/rectangle grids with no boulder occurences are deleted)


## 4. Merging if working with multiple images (most likely)

In [9]:
frames = [gdf_boulders, gdf_boulders2]
df_boulders_all_images = create_annotations.merge_dataframes(frames)

frames = [gdf_selection_tiles_updated, gdf_selection_tiles_updated2]
gdf_selection_tiles_all_images = create_annotations.merge_dataframes(frames)

frames = [df_selection_tiles, df_selection_tiles2]
df_selection_tiles_all_images = create_annotations.merge_dataframes(frames)

In [10]:
gdf_boulders.shape[0] + gdf_boulders2.shape[0] == df_boulders_all_images.shape[0] # ok it is working correctly

True

In [11]:
split = (0.6, 0.2, 0.2)

In [12]:
out_shapefile = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-selection-tiles-with-status.shp')

In [13]:
(df_selection_tiles_all_images, gdf_selection_tiles_all_images) = create_annotations.split_global(df_selection_tiles_all_images, gdf_selection_tiles_all_images, split, out_shapefile)

shapefile /home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-selection-tiles-with-status.shp has been generated


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


#### Dataset columns was added

In [14]:
df_selection_tiles_all_images.columns

Index(['NAC_id', 'tile_id', 'file_name', 'raster_ap', 'raster_rp', 'rwindows',
       'transform', 'bbox_im', 'coord_sys', 'dataset'],
      dtype='object')

### 5. Tiling of images (tif, 1- and 3-band pngs)

In [15]:
dataset_directory = Path("/home/nilscp/tmp/preprocessing/preprocessing")

In [16]:
create_annotations.tiling_raster_from_dataframe(df_selection_tiles_all_images, dataset_directory, block_width, block_height)

100%|██████████| 289/289 [00:57<00:00,  5.07it/s]


#### Let's check if the 60%, 20%, 20% distribution has been respected

In [16]:
fo = ["train", "validation", "test"]
p = []

for f in fo:
    p.append(len(list((dataset_directory / f / "images").rglob("*image.png"))))
    
p = np.array(p)
p[0] / p.sum(), p[1] / p.sum(), p[2] / p.sum()

(0.5986159169550173, 0.20069204152249134, 0.20069204152249134)

## 6. Selecting train, validation and test datasets

In [17]:
(df_selection_boulders_train, df_selection_boulders_validation, df_selection_boulders_test) = create_annotations.selection_boulders(df_boulders_all_images, df_selection_tiles_all_images)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)


In [18]:
df_selection_boulders_train.shape, df_selection_boulders_validation.shape, df_selection_boulders_test.shape

((8189, 8), (2122, 8), (2343, 8))

## 7. Generate image annotations (as a dataframe)

In [19]:
train_image_annotations_pickle = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-train-image-annotations.pkl')
validation_image_annotations_pickle = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-validation-image-annotations.pkl')
test_image_annotations_pickle = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-test-image-annotations.pkl')

In [20]:
df_selection_tiles_all_images

Unnamed: 0,NAC_id,tile_id,file_name,raster_ap,raster_rp,rwindows,transform,bbox_im,coord_sys,dataset
0,M139694087LE,30,M139694087LE_0030_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[0, 15000, 500, 500]","[0.48357597846618, 0.0, -1227.3158333472, 0.0,...","(-1227.3158333472, -7474.150323173278, -985.52...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",train
1,M139694087LE,31,M139694087LE_0031_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[0, 15500, 500, 500]","[0.48357597846618, 0.0, -1227.3158333472, 0.0,...","(-1227.3158333472, -7715.938312406368, -985.52...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",test
2,M139694087LE,34,M139694087LE_0034_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[0, 17000, 500, 500]","[0.48357597846618, 0.0, -1227.3158333472, 0.0,...","(-1227.3158333472, -8441.302280105638, -985.52...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",test
3,M139694087LE,36,M139694087LE_0036_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[0, 18000, 500, 500]","[0.48357597846618, 0.0, -1227.3158333472, 0.0,...","(-1227.3158333472, -8924.878258571818, -985.52...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",train
4,M139694087LE,37,M139694087LE_0037_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[0, 18500, 500, 500]","[0.48357597846618, 0.0, -1227.3158333472, 0.0,...","(-1227.3158333472, -9166.666247804907, -985.52...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",test
...,...,...,...,...,...,...,...,...,...,...
284,M139694087LE,1240,M139694087LE_1240_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[5000, 25000, 500, 500]","[0.48357597846618, 0.0, 1190.5640589837, 0.0, ...","(1190.5640589837, -12309.910107835078, 1432.35...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",validation
285,M139694087LE,1241,M139694087LE_1241_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[5000, 25500, 500, 500]","[0.48357597846618, 0.0, 1190.5640589837, 0.0, ...","(1190.5640589837, -12551.698097068167, 1432.35...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",train
286,M139694087LE,1242,M139694087LE_1242_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[5000, 26000, 500, 500]","[0.48357597846618, 0.0, 1190.5640589837, 0.0, ...","(1190.5640589837, -12793.486086301258, 1432.35...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",validation
287,M139694087LE,1244,M139694087LE_1244_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[5000, 27000, 500, 500]","[0.48357597846618, 0.0, 1190.5640589837, 0.0, ...","(1190.5640589837, -13277.062064767439, 1432.35...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",train


In [21]:
image_annotations_train_df = create_annotations.image(df_selection_tiles_all_images, "train", dataset_directory, block_width, block_height, train_image_annotations_pickle)
image_annotations_validation_df = create_annotations.image(df_selection_tiles_all_images, "validation", dataset_directory, block_width, block_height, validation_image_annotations_pickle)
image_annotations_test_df = create_annotations.image(df_selection_tiles_all_images, "test", dataset_directory, block_width, block_height, test_image_annotations_pickle)

pickle /home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-train-image-annotations.pkl has been generated
pickle /home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-validation-image-annotations.pkl has been generated
pickle /home/nilscp/tmp/preprocessing/shapefiles/Censorinus-M139694087LE-v001-test-image-annotations.pkl has been generated


In [22]:
image_annotations_train_df.head(2)

Unnamed: 0,NAC_id,tile_id,file_name,raster_ap,raster_rp,rwindows,transform,bbox_im,coord_sys,dataset,id,height,width,file_name_ap
0,M139694087LE,985,M139694087LE_0985_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[4000, 16500, 500, 500]","[0.48357597846618, 0.0, 706.9880805175201, 0.0...","(706.9880805175201, -8199.514290872548, 948.77...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",train,0,500,500,/home/nilscp/tmp/preprocessing/preprocessing/t...
1,M139694087LE,881,M139694087LE_0881_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[3500, 24000, 500, 500]","[0.48357597846618, 0.0, 465.2000912844301, 0.0...","(465.2000912844301, -11826.334129368897, 706.9...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",train,1,500,500,/home/nilscp/tmp/preprocessing/preprocessing/t...


In [23]:
image_annotations_validation_df.head(2)

Unnamed: 0,NAC_id,tile_id,file_name,raster_ap,raster_rp,rwindows,transform,bbox_im,coord_sys,dataset,id,height,width,file_name_ap
0,M139694087LE,529,M139694087LE_0529_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[2000, 26500, 500, 500]","[0.48357597846618, 0.0, -260.16387641483993, 0...","(-260.16387641483993, -13035.274075534348, -18...","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",validation,0,500,500,/home/nilscp/tmp/preprocessing/preprocessing/v...
1,M139694087LE,156,M139694087LE_0156_image.png,/home/nilscp/BOULDERING/qgis/moon/censorinus/N...,M139694087LE.tif,"[500, 18500, 500, 500]","[0.48357597846618, 0.0, -985.5278441141099, 0....","(-985.5278441141099, -9166.666247804907, -743....","PROJCS[""EQUIRECTANGULAR MOON"",GEOGCS[""GCS_MOON...",validation,1,500,500,/home/nilscp/tmp/preprocessing/preprocessing/v...


## 8. Generate boulder segmentation annotations (as a dataframe)

In [24]:
train_seg_annotations_pickle = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-train-segmentation-annotations.pkl')
validation_seg_annotations_pickle = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-validation-segmentation-annotations.pkl')
test_seg_annotations_pickle = global_tiles_pickle.with_name('Censorinus-M139694087LE-v001-test-segmentation-annotations.pkl')

In [25]:
boulder_annotation_segmentation_train = create_annotations.segmentation(df_selection_boulders_train, image_annotations_train_df, dataset_directory, train_seg_annotations_pickle)
boulder_annotation_segmentation_validation = create_annotations.segmentation(df_selection_boulders_validation, image_annotations_validation_df, dataset_directory, validation_seg_annotations_pickle)
boulder_annotation_segmentation_test = create_annotations.segmentation(df_selection_boulders_test, image_annotations_test_df, dataset_directory, test_seg_annotations_pickle)

100%|██████████| 8189/8189 [01:00<00:00, 134.94it/s]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)
  gdf_boulder_annotation_segmentation.to_file(out_pickle.with_name(out_pickle.name.replace(".pkl", ".shp")))
100%|██████████| 2122/2122 [00:14<00:00, 146.88it/s]
100%|██████████| 2343/2343 [00:15<00:00, 146.48it/s]


In [26]:
boulder_annotation_segmentation_train.head(5)

Unnamed: 0,id,shifted,geometry,area,NAC_id,tile_id,file_name,dataset,image_id,category_id,bbox_xyxy_pixel,bbox_xywh_pixel,bbox_xyxy_image,segmentation_mask,iscrowd
0,0,2.0,"POLYGON ((-1013.885 -7279.843, -1013.138 -7279...",43.238014,M139694087LE,30,M139694087LE_0030_image.png,train,103,0,"[438, 98, 447, 104]","[438, 98, 9, 6]","[-1015.228748186628, -7283.053913976853, -1011...","[[441, 98, 442, 98, 445, 98, 446, 100, 447, 10...",0
1,1,2.0,"POLYGON ((-1111.946 -7381.643, -1111.542 -7381...",40.391965,M139694087LE,30,M139694087LE_0030_image.png,train,103,0,"[237, 306, 245, 313]","[237, 306, 8, 7]","[-1112.693184078843, -7383.851938930882, -1108...","[[238, 308, 239, 307, 240, 307, 241, 306, 242,...",0
2,2,2.0,"POLYGON ((-1035.649 -7317.115, -1034.716 -7316...",96.042968,M139694087LE,30,M139694087LE_0030_image.png,train,103,0,"[391, 173, 403, 184]","[391, 173, 12, 11]","[-1038.1230578818504, -7321.54939973708, -1032...","[[396, 175, 398, 174, 400, 173, 402, 175, 402,...",0
3,3,2.0,"POLYGON ((-1162.604 -7397.990, -1162.044 -7396...",148.120391,M139694087LE,30,M139694087LE_0030_image.png,train,103,0,"[133, 337, 146, 353]","[133, 337, 13, 16]","[-1162.8837175884391, -7403.24126081133, -1156...","[[133, 342, 134, 339, 136, 338, 139, 337, 142,...",0
4,4,2.0,"POLYGON ((-994.436 -7412.597, -993.963 -7412.5...",45.281498,M139694087LE,30,M139694087LE_0030_image.png,train,103,0,"[477, 372, 485, 380]","[477, 372, 8, 8]","[-996.431153922664, -7416.21965996571, -992.59...","[[481, 372, 482, 372, 483, 373, 484, 373, 484,...",0


## 9. Convert the image and segmentation labels to a json file (that can be imported as a custom dataset in Detectron 2)

In [27]:
coco_annotations_train_json = Path("/home/nilscp/tmp/preprocessing/json/Censorinus-M139694087LE-v001-train-annotations-coco-format.json")
coco_annotations_validation_json = Path("/home/nilscp/tmp/preprocessing/json/Censorinus-M139694087LE-v001-validation-annotations-coco-format.json")
coco_annotations_test_json = Path("/home/nilscp/tmp/preprocessing/json/Censorinus-M139694087LE-v001-test-annotations-coco-format.json")

In [28]:
create_annotations.dataframes_to_json_coco_format(image_annotations_train_df, boulder_annotation_segmentation_train, coco_annotations_train_json)
create_annotations.dataframes_to_json_coco_format(image_annotations_validation_df, boulder_annotation_segmentation_validation, coco_annotations_validation_json)
create_annotations.dataframes_to_json_coco_format(image_annotations_test_df, boulder_annotation_segmentation_test, coco_annotations_test_json)

Censorinus-M139694087LE-v001-train-annotations-coco-format.json has been generated
Censorinus-M139694087LE-v001-validation-annotations-coco-format.json has been generated
Censorinus-M139694087LE-v001-test-annotations-coco-format.json has been generated
