# Multithreaded, Tiled Image Segmentation 

Image segmentation at large scales can be both time and memory intensive. The function 'tiledSegThreaded.py' (https://github.com/GeoscienceAustralia/dea-notebooks/blob/chad/segmentation/src/tiledSegThreaded.py) builds upon the image segmentation algorithm developed by Shepherd et al. (2019) (implemented in the package RSGISlib) to run image segmentation across multiple cpus. A full description of the approach can be found _Clewey et al. (2014) A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables_. This script requires the installation of 'pathos.multiprocessing', a fork of python's Multiprocessing package that using Dill instead of Pickle for serializing.  

    pip install --user pathos

There are two major caveats to the use of this script.
1. As the script uses the Multiprocessing library, it cannot be run across multiple nodes.
2. The tiling approach is based on the bounding coordinates of the geotiff. If a geotiff is irregularly shaped such that a tile(s) contains none of the input geotiff, then the segmentation will fail. This will result in the script failing during the second stage of the algorithm.  If this occurs, it is important to check the ..._S1Tiles.shp_ file output during stage 1 of the algorithm. If you overlay this file on top of your input geotiff, you can check if there are tiles that don't contain any of the geotiff. At the moment, the only solution is to change the extent of the geotiff to be more regularly shaped.  


THINGS TO DO (17/6/2019):
- To run this at continental scale the code will need to be adjusted so that the input tiles are the Landsat Albers tiles. Might be best to seperate that into a different python file
- Is it possible to multithread 'performStage2TilesSegmentation'? Defintely a bottle-neck so worth attempting


### User Inputs

In [None]:
#Location string of the geotiff you wish to segment
InputNDVIStats = "data/nmdb_Summer2017_18_NDVI_max.tif"

#Location string of the .KEA file the geotiff will be converted too
KEAFile = "data/nmdb_Summer2017_18_NDVI_max.kea"

#Location string of clumps mean .KEA file that will be output 
meanImage = "data/nmdb_Summer2017_18_ClumpMean.kea"

#Location to a folder to store temporary files during segmentation
temp = 'tmps/'

#How many cpus will this run on?
ncpus=6

# what fraction of a tile should contain valid data? Below this threshold
# a tile will be merged with its neighbour. 
validDataTileFraction = 0.4

#enter the tile size parameters (in number of pixels)
width = 8000
height = 8000

### Run the cells below to conduct the image segmentation

In [None]:
from osgeo import gdal
import os
from rsgislib.segmentation import segutils
from pathos.multiprocessing import ProcessingPool as Pool
import dill
#import custom functions
import sys
sys.path.append('src')
import tiledSegThreaded

In [None]:
# Change the tiff to a kea file (only run this once!)
gdal.Translate(KEAFile, InputNDVIStats, format='KEA', outputSRS='EPSG:3577')

In [None]:
# Run segmentation, without creation of clump means
tiledSegThreaded.performTiledSegmentation(KEAFile, meanImage, tmpDIR=temp, numClusters=20, validDataThreshold=validDataTileFraction,
                                    tileWidth=width, tileHeight=height, minPxls=100, ncpus=ncpus)