# Open Source Object-Based-Image-Analysis

Derivatives -> Segmentation -> Zonal statistics -> Classification 

## Stack
**python:**  
    - [geopandas](https://geopandas.org): vector (image object) manipulation  
    - [rasterio](https://rasterio.readthedocs.io/en/latest/): raster loading/manipulation  
    - [rasterstats](https://pythonhosted.org/rasterstats/)  
*The use of anaconda/miniconda is *highly* recommended for managing python dependencies, specifically installing from the `conda-forge` channel.*  

[WhiteBoxTools](https://jblindsay.github.io/wbt_book/intro.html): raster derivatives, specifically geomorphometic analyses.  
*Has python bindings, but tough to integrate with other dependencies, so called from the command line.*  

[**Orfeo Toolbox**](https://www.orfeo-toolbox.org/CookBook/): segmentation (also has general GIS tools).  
*Has python bindings, but tough to integrate with other dependencies, so called from the command line.*  

[**QGIS**](https://qgis.org/en/site/): Viewing outputs, general GIS.

## Imports + Logging Set up

In [1]:
# Standard lib
import copy
import logging
import numpy as np
import operator
from pathlib import Path
from pprint import pprint
import warnings
import sys

# Installed packages
from osgeo import gdal, osr
import geopandas
import rasterio
import rasterstats
from skimage.segmentation import quickshift

# Local packages
from lib import (run_subprocess, clean4cmdline, create_grm_outname,
                  rio_polygonize, read_vec, write_gdf, write_array)
from calc_zonal_stats import calc_zonal_stats

In [2]:
# Set up logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter(
    '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)

warnings.filterwarnings('ignore', r'RuntimeWarning: Sequential read of iterator was interrupted.')

## Paths (Orfeo + Data)

In [3]:
# Orfeo setup and tools
otb = Path(r'C:\OTB-7.2.0-Win64\OTB-7.2.0-Win64')
otb_init = otb / 'otbenv.bat'
otb_bin = otb / 'bin'
otb_grm = otb_bin / 'otbcli_GenericRegionMerging.bat'
# otb_lsms = otb_bin / 'otbcli_LargeScaleMeanShift.bat'

# Data
data_dir = Path(r'./data')
img = data_dir / 'naip_m_4509361_se_15_060_20190727_aoi.tif'
ndsm = data_dir / 'nDSM_clip_fill.tif'
ndvi = data_dir / 'ndvi_naip_m_4509361_se_15_060_20190727_aoi.tif'
roughness = data_dir / 'nDSM_clip_fill_roughness.tif'

# Out paths
out_dir = Path(r'./results')
seg_dir = out_dir / 'seg'

# Ensure all exist
missing_files = []
for file_path in [data_dir, img, ndsm, ndvi, roughness, out_dir]:
    if not file_path.exists():
        missing_files.append(file_path)
if len(missing_files) > 0:
    for file_path in missing_files:
        logger.error(f'Missing file/folder: {file_path}')
else:
    logger.info('All files/folders located.')

2021-04-22 23:27:56,413 - __main__ - INFO - All files/folders located.


# Check out input data
(how it was created, etc), histogram of values in image (qgis), stats of image to seg

## Segmentation

#### Generic Region Merging
[otb docs](https://www.orfeo-toolbox.org/CookBook/Applications/app_GenericRegionMerging.html?highlight=generic%20region%20merging)

In [4]:
# Parameters
# Homogeneity criterion to use. The default is 'bs'. One of: [bs, ed, fls]
criterion = 'bs'
threshold = 100
niter = 100
spectral_w = 0.6 # spectral weight, higher values slow processing time
spatial_w = 25 # spatial weight
out_img = create_grm_outname(img=img,
                             out_dir=seg_dir,
                             criterion=criterion,
                             threshold=threshold,
                             niter=niter,
                             spectral=spectral_w,
                             spatial=spatial_w)

In [5]:
# Build the command
otb_cmd = f"""
"{otb_grm}"
-in {str(img)}
-out {str(out_img)}
-criterion {criterion}
-threshold {threshold}
-niter {niter}
-cw {spectral_w}
-sw {spatial_w}"""

otb_cmd = clean4cmdline(otb_cmd)
logger.info(f'OTB command:\n{otb_cmd}')
otb_cmd = f'{otb_init} && {otb_cmd}'
logger.debug(f'OTB full command:\n{otb_cmd}')

2021-04-22 23:27:59,515 - __main__ - INFO - OTB command:
"C:\OTB-7.2.0-Win64\OTB-7.2.0-Win64\bin\otbcli_GenericRegionMerging.bat" -in data\naip_m_4509361_se_15_060_20190727_aoi.tif -out results\seg\naip_m_4509361_se_15_060_20190727_aoi_bst100ni100s0spec0x6spat25.tif -criterion bs -threshold 100 -niter 100 -cw 0.6 -sw 25
2021-04-22 23:27:59,523 - __main__ - DEBUG - OTB full command:
C:\OTB-7.2.0-Win64\OTB-7.2.0-Win64\otbenv.bat && "C:\OTB-7.2.0-Win64\OTB-7.2.0-Win64\bin\otbcli_GenericRegionMerging.bat" -in data\naip_m_4509361_se_15_060_20190727_aoi.tif -out results\seg\naip_m_4509361_se_15_060_20190727_aoi_bst100ni100s0spec0x6spat25.tif -criterion bs -threshold 100 -niter 100 -cw 0.6 -sw 25


In [None]:
# Run the segmentation
# import subprocess
# from subprocess import PIPE
# def run_subprocess(command):
#     proc = subprocess.Popen(command, stdout=PIPE, stderr=PIPE, shell=True)
#     for line in iter(proc.stdout.readline, b''):  # replace '' with b'' for Python 3
#         logger.info(line.decode())
#     output, error = proc.communicate()
#     logger.debug('Output: {}'.format(output.decode()))
#     logger.debug('Err: {}'.format(error.decode()))

# run_subprocess(otb_cmd)

(Check out resulting image)

In [6]:
# Convert output tif to polygons
out_vec = out_img.replace('.tif', '.gpkg/seg')
# vec_objects = rio_polygonize(img=out_img, out_vec=out_vec)

(Check out resuting polygons)

### Other segmentation options
[Orfeo Toolbox LargeScaleMeanShift](https://www.orfeo-toolbox.org/CookBook/Applications/app_LargeScaleMeanShift.html)

[Scikit-Image Segmenation](https://scikit-image.org/docs/dev/api/skimage.segmentation.html)  
Inputs and output are often nunmpy arrays/matrices that need to be converted from pixel space back to geographic space. A couple of useful functions for this conversion:

In [7]:
def pixel2geo(pixel_coord, geotransform):
    """
    Covert pixel coordinates to geographic coordinates
    """
    y, x = pixel_coord
    gy = geotransform[4] * x + geotransform[5] * y + geotransform[4] * 0.5 + geotransform[5] * 0.5 + self.geotransform[3]
    gx = geotransform[1] * x + geotransform[2] * y + geotransform[1] * 0.5 + geotransform[2] * 0.5 + self.geotransform[0]

    return gy, gx

def geo2pixel(geocoord, geotransform):
    """
    Convert geographic coordinates to pixel coordinates
    """
    y, x = geocoord
    py = int(np.around((y - geotransform[3]) / geotransform[5]))
    px = int(np.around((x - geotransform[0]) / geotransform[1]))
    return py, px

ds = gdal.Open(str(img))
geotransform = ds.GetGeoTransform()
array = ds.ReadAsArray()


logger.info(f'Geotransform:\n{geotransform}')
logger.info(f'Top left x: {geotransform[0]}')
logger.info(f'X Resolution: {geotransform[1]}')
logger.info(f'Rotation1: {geotransform[2]}')
logger.info(f'Top left y: {geotransform[3]}')
logger.info(f'Rotation: {geotransform[4]}')
logger.info(f'Y Resolution: {geotransform[5]}')

geo2pixel(geocoord=(467491.8, 4983821), geotransform=geotransform)
logger.info(f'Array shape: {array.shape}')

2021-04-22 23:28:05,631 - __main__ - INFO - Geotransform:
(467491.8, 0.600000000000014, 0.0, 4983821.4, 0.0, -0.600000000000258)
2021-04-22 23:28:05,631 - __main__ - INFO - Top left x: 467491.8
2021-04-22 23:28:05,643 - __main__ - INFO - X Resolution: 0.600000000000014
2021-04-22 23:28:05,645 - __main__ - INFO - Rotation1: 0.0
2021-04-22 23:28:05,648 - __main__ - INFO - Top left y: 4983821.4
2021-04-22 23:28:05,652 - __main__ - INFO - Rotation: 0.0
2021-04-22 23:28:05,653 - __main__ - INFO - Y Resolution: -0.600000000000258
2021-04-22 23:28:05,655 - __main__ - INFO - Array shape: (4, 1444, 2506)


In [36]:
rgb = array.reshape(array.shape[1], array.shape[2], array.shape[0])
rgb = rgb[:, :, :3]
labelled = quickshift(image=rgb, ratio=1, kernel_size=5, max_dist=25, sigma=1)
out_qs = out_img.replace('.tif', '_qs.tif')
write_array(labelled, out_qs, ds, dtype=3)
out_qs_vec = out_qs.replace('.tif', '.shp')
qs_poly = rio_polygonize(img=out_qs, out_vec=out_qs_vec)

(1444, 2506, 4)
(1444, 2506, 3)




2021-04-23 00:05:54,873 - lib - INFO - Polygonizing: results\seg\naip_m_4509361_se_15_060_20190727_aoi_bst100ni100s0spec0x6spat25_qs.tif
2021-04-23 00:05:57,093 - lib - INFO - Writing polygons to: results\seg\naip_m_4509361_se_15_060_20190727_aoi_bst100ni100s0spec0x6spat25_qs.shp


Unnamed: 0,geometry,raster_val
0,"POLYGON ((467938.800 4983821.400, 467938.800 4...",79.0
1,"POLYGON ((468752.400 4983821.400, 468752.400 4...",62.0
2,"POLYGON ((468974.400 4983821.400, 468974.400 4...",35.0
3,"POLYGON ((467781.600 4983820.800, 467781.600 4...",47.0
4,"POLYGON ((467794.200 4983820.800, 467794.200 4...",47.0
...,...,...
11078,"POLYGON ((468831.000 4982957.400, 468831.000 4...",2324.0
11079,"POLYGON ((468907.800 4982998.800, 468907.800 4...",2321.0
11080,"POLYGON ((468931.200 4982955.600, 468931.200 4...",2297.0
11081,"POLYGON ((468923.400 4982992.200, 468923.400 4...",2297.0


## Zonal Statistics

In [None]:
# Create dictionary of rasters, stats, and bands to compute
zonal_stats_params = (
    {"img": {
        "path": img,
        "stats": ["mean", "max"],
        "bands": [1, 2, 3, 4]
	},
    "nDSM": {
        "path": ndsm,
        "stats": ["mean"]
    },
    "roughness":{
        "path": roughness,
        "stats": ["mean"],
    },
    "ndvi":{
        "path": ndvi,
        "stats": ["mean"]
    } 
})
pprint(zonal_stats_params)

In [None]:
out_zs = out_vec.replace('/seg', '/zs')
calc_zonal_stats(shp=out_vec,
                 rasters=[zonal_stats_params],
                 compactness=True,
                 roundness=True,
                 out_path=out_zs)

## Classification
This is done with python+geopandas but could also be done via QGIS+select_by_attribute

In [None]:
obj = read_vec(out_zs)
obj.sample(5)

In [None]:
obj.describe()

In [None]:
# Field names
ndvi_mean = 'ndvi_mean'
ndsm_mean = 'nDSM_mean'
roughness_mean = 'roughness_mean'
imgb1_mean = 'imgb1_mean'
imgb2_mean = 'imgb2_mean'
imgb3_mean = 'imgb3_mean'
imgb4_mean = 'imgb4_mean'
roundness = 'roundness'
# Class names
trees = 'trees'
water = 'water'
open_green = 'open_green'
buildings = 'buildings'
roads_pave = 'roads_pavement'
shadow = 'shadow'
# Add a field to hold the class
CLASS = 'class'
obj[CLASS] = None


def apply_rules(gdf, rules, class_name, unclass_only=True):
    """Function to apply rules"""
    # Look up strings operators to get function
    op_lut = {'>': operator.gt,
              '<': operator.lt}
    
    matches = copy.deepcopy(gdf)
    for field, op, val in rules:
        if isinstance(op, str):
            # if  a string is passed, get the function
            op = op_lut[op]
        matches = matches[op(matches[field], val)]
    logger.info(f'Objects to be classified as {class_name}: {len(matches)}')
    if unclass_only:
        # Index in matches and not classified yet
        gdf.loc[gdf.index.isin(matches.index) & gdf[CLASS].isnull(), CLASS] = class_name
    else:
        # Index in matches - would overwrite class if present
        gdf.loc[gdf.index.isin(matches.index), CLASS] = class_name
    logger.info(f'Objects classified as {class_name}: {len(gdf[gdf[CLASS]==class_name])}')
    logger.debug(f'Remaining unclassified: {len(gdf[gdf[CLASS].isnull()])}')
    return gdf

In [None]:
# Trees
tree_rules = [
    (ndvi_mean, '>', 0),
    (ndsm_mean, '>', 0.75),
    (roughness_mean, '>', 1.25)
]
# Water
water_rules = [
    (ndvi_mean, '<', 0),
    (roughness_mean, '<', 0.13),
    (ndsm_mean, '<', 1),
    (imgb4_mean, '<', 10)
]
# Open Green Space
open_rules = [
    (ndvi_mean, '>', 0.1),
    (ndsm_mean, '<', 1)
]
# Buildings
build_rules = [
    (ndvi_mean, '<', 0),
    (ndsm_mean, '>', 2),
]
# Roads and pavement
roads_rules = [
    (ndvi_mean, '<', 0),
    (ndsm_mean, '<', 1),
    (roughness_mean, '<', 1.5),
    (roundness, '>', 2)
]
# Shadow
shadow_rules = [
    (imgb1_mean, '<', 150),
    (imgb2_mean, '<', 150),
    (imgb3_mean, '<', 150),
    (imgb4_mean, '<', 150)
]
classes = [trees, water, open_green, buildings, roads_pave, shadow]
rules = [tree_rules, water_rules, open_rules, build_rules, roads_rules, shadow_rules]
classes_rules = zip(classes, rules)
for class_name, class_rules in classes_rules:
    logger.info(f'Classifying: {class_name}')
    obj = apply_rules(obj, class_rules, class_name)

In [None]:
# Write classification
out_class = out_vec.replace('/seg', '/classified')
# write_gdf(obj, out_class)
print(out_class)