# Open Source Object-Based-Image-Analysis

*The great thing about python is...*  
<img src="img/build_it2.png" width="600">

### What is OBIA really?
Derivatives -> Segmentation -> Zonal statistics -> Classification 

## Tools / Software Stack

## python
# ![python](img/python_logo.png)  +  ![gdal](img/gdalicon.png)  
- [geopandas](https://geopandas.org): vector (image object) manipulation  
- [rasterio](https://rasterio.readthedocs.io/en/latest/): raster loading/manipulation  
- [rasterstats](https://pythonhosted.org/rasterstats/): zonal statistics  
*The use of anaconda/miniconda is *highly* recommended for managing python dependencies, specifically installing packages from the `conda-forge` channel. E.g.:*  
`conda install -c conda-forge geopandas`
  
  
## WhiteBoxTools
![wbt](img/WhiteBoxToolsLogo_vert.png)  
[WhiteBoxTools](https://jblindsay.github.io/wbt_book/intro.html): raster derivatives, specifically geomorphometic analyses.  
*Has python bindings, but tough to integrate with other dependencies, so called from the command line.*  
  
  
## Orfeo Toolbox  
![orfeo](img/logo-orfeo-toolbox.png)  
[**Orfeo Toolbox**](https://www.orfeo-toolbox.org/CookBook/): segmentation (also has general GIS tools).  
*Has python bindings, but tough to integrate with other dependencies, so called from the command line.*  


## QGIS
![qgis](img/qgis-logo.png)  
[**QGIS**](https://qgis.org/en/site/): Viewing outputs, general GIS.

## Imports + Logging Set up

In [None]:
# Standard lib
import copy
import logging
import numpy as np
import operator
from pathlib import Path
from pprint import pprint
import warnings
import sys

# Installed packages
from osgeo import gdal, osr
import geopandas
import rasterio
import rasterstats
from skimage.segmentation import quickshift
import scipy

# Local packages
from lib import (run_subprocess, clean4cmdline, create_grm_outname,
                 rio_polygonize, read_vec, write_gdf, write_array,
                log_me)
from calc_zonal_stats import calc_zonal_stats

In [None]:
# Set up logger
root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler(sys.stdout)
ch.setLevel(1)
formatter = logging.Formatter(
    '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
root_logger.addHandler(ch)

warnings.filterwarnings('ignore', category=RuntimeWarning, message='Sequential read of iterator*')

## Paths (Orfeo + Data)

In [None]:
# Orfeo setup and tools
# Change this to match where OTB is on your machine, the rest of the paths should be OK
otb = Path(r'C:\OTB-7.2.0-Win64\OTB-7.2.0-Win64')
otb_init = otb / 'otbenv.bat'
otb_bin = otb / 'bin'
otb_grm = otb_bin / 'otbcli_GenericRegionMerging.bat'
# otb_lsms = otb_bin / 'otbcli_LargeScaleMeanShift.bat'
otb_grm

In [None]:
# Data
data_dir = Path(r'./data')
img = data_dir / 'naip_m_4509361_se_15_060_20190727_aoi.tif'
ndsm = data_dir / 'nDSM_clip_fill.tif'
ndvi = data_dir / 'ndvi_naip_m_4509361_se_15_060_20190727_aoi.tif'
roughness = data_dir / 'nDSM_clip_fill_roughness.tif'
img

In [None]:
# Out paths
out_dir = Path(r'./results')
seg_dir = out_dir / 'seg'

In [None]:
# Ensure all exist
missing_files = []
for file_path in [data_dir, img, ndsm, ndvi, roughness, out_dir]:
    if not file_path.exists():
        missing_files.append(file_path)
if len(missing_files) > 0:
    for file_path in missing_files:
        root_logger.error(f'Missing file/folder: {file_path}')
else:
    root_logger.info('All files/folders located.')

### Check out input data
how it was created, histogram of values in image (qgis), profile, stats of image to seg

In [None]:
response = run_subprocess(f'gdalinfo {img} -stats', log=False)
for l in response:
    print(l.strip('\n'))

## Segmentation

#### Generic Region Merging
[otb docs](https://www.orfeo-toolbox.org/CookBook/Applications/app_GenericRegionMerging.html?highlight=generic%20region%20merging)

In [None]:
# Parameters
# Homogeneity criterion to use. The default is 'bs'. One of: [bs, ed, fls]
criterion = 'bs'
threshold = 100
niter = 100
spectral_w = 0.6 # spectral weight, higher values slow processing time
spatial_w = 25 # spatial weight
out_img = create_grm_outname(img=img,
                             out_dir=seg_dir,
                             criterion=criterion,
                             threshold=threshold,
                             niter=niter,
                             spectral=spectral_w,
                             spatial=spatial_w)
root_logger.info(f'Out image: {out_img}')

In [None]:
# Build the command
otb_cmd = f"""
"{otb_grm}"
-in {str(img)}
-out {str(out_img)}
-criterion {criterion}
-threshold {threshold}
-niter {niter}
-cw {spectral_w}
-sw {spatial_w}"""

otb_cmd = clean4cmdline(otb_cmd)
root_logger.info(f'OTB command:\n{otb_cmd}')
otb_cmd = f'{otb_init} && {otb_cmd}'
root_logger.debug(f'OTB full command:\n{otb_cmd}')

In [None]:
run_subprocess(otb_cmd)

(Check out resulting image)

In [None]:
# Convert output tif to polygons
out_vec = out_img.replace('.tif', '.gpkg/seg')
vec_objects = rio_polygonize(img=out_img, out_vec=out_vec)

(Check out resuting polygons)

### Other segmentation options

[Orfeo Toolbox LargeScaleMeanShift](https://www.orfeo-toolbox.org/CookBook/Applications/app_LargeScaleMeanShift.html)

[Scikit-Image Segmenation](https://scikit-image.org/docs/dev/api/skimage.segmentation.html)  
Inputs and output are often numpy arrays/matrices that need to be converted from geographic-space to pixel-space and back to geographic-space. 

![pixel_space](img/pixel_space.png)

A couple of useful functions for this conversion:

In [None]:
# Functions for going back and forth from pixel space to coordinate space
def pixel2geo(pixel_coord, geotransform):
    """
    Covert pixel coordinates to geographic coordinates
    """
    y, x = pixel_coord
    gy = geotransform[4] * x + geotransform[5] * y + geotransform[4] * 0.5 + geotransform[5] * 0.5 + geotransform[3]
    gx = geotransform[1] * x + geotransform[2] * y + geotransform[1] * 0.5 + geotransform[2] * 0.5 + geotransform[0]

    return gy, gx

def geo2pixel(geocoord, geotransform):
    """
    Convert geographic coordinates to pixel coordinates
    """
    y, x = geocoord
    py = int(np.around((y - geotransform[3]) / geotransform[5]))
    px = int(np.around((x - geotransform[0]) / geotransform[1]))
    return py, px

Luckily GDAL can handle much of this work, if you know where to look...  
<img src="img/gdal_geotransform.png" width="400">


In [None]:
# Open image and get geotransform, array
ds = gdal.Open(str(img))
geotransform = ds.GetGeoTransform()

# Log info 
root_logger.info(f'Geotransform:\n{geotransform}\n')
root_logger.info(f'Top left x: {geotransform[0]}')
root_logger.info(f'X Resolution: {geotransform[1]}')
root_logger.info(f'Rotation1: {geotransform[2]}')
root_logger.info(f'Top left y: {geotransform[3]}')
root_logger.info(f'Rotation2: {geotransform[4]}')
root_logger.info(f'Y Resolution: {geotransform[5]}\n')

In [None]:
root_logger.info(f'Top left corner in pixels (y, x): {geo2pixel(geocoord=(4983821.4, 467491.8), geotransform=geotransform)}')
root_logger.info(f'Top left corner in geocoords (y, x): {pixel2geo(pixel_coord=(0,0), geotransform=geotransform)}')

#### Quickshift Segmentation

In [None]:
# Read image as array
array = ds.ReadAsArray()
# Reshape so that bands are last dimension of array
rgb = array.reshape(array.shape[1], array.shape[2], array.shape[0])
# Drop last band to get three-bands
rgb = rgb[:, :, :3]
# Run segmentation
labelled = quickshift(image=rgb, ratio=1, kernel_size=5, max_dist=25, sigma=1)

In [None]:
root_logger.info(f'Source array shape: {array.shape}')
root_logger.info(f'RGB array shape: {rgb.shape}')
root_logger.info(f'Labelled shape: {labelled.shape}')
root_logger.info(f'Labelled min: {labelled.min()}')
root_logger.info(f'Labelled max: {labelled.max()}')
root_logger.info(f'Labelled mean: {labelled.mean()}')

In [None]:
# Write array out, using geotransform of original img
out_qs = out_img.replace('.tif', '_qs.tif')
write_array(labelled, out_qs, ds, dtype=3)

In [None]:
out_qs_vec = out_qs.replace('.tif', '_qs.gpkg/qs_seg')
qs_poly = rio_polygonize(img=out_qs, out_vec=out_qs_vec)

## Zonal Statistics

The [rasterstats](https://pythonhosted.org/rasterstats/https://pythonhosted.org/rasterstats/) package does all of the work here. The simpliest usage of `rasterstats` is very easy, this is straight from their docs:  
```python
from rasterstats import zonal_stats
zonal_stats("polygons.shp", "elevation.tif",
            stats="count min mean max median")
```
The only wonky bit is that this returns a `list` of `dicts`, one for each feature in `polygons.shp`:
```python
[...,
 {'count': 89,
  'max': 69.52958679199219,
  'mean': 20.08093536034059,
  'median': 19.33736801147461,
  'min': 1.5106816291809082},
]

```
The `calc_zonal_stats` function I wrote handles the logic of inputting multiple rasters, computing different stats for each raster, formatting the results into a GeoDataFrame and writing the output. It also adds the ability to compute `compactness` and `roundness`. 

In [None]:
# Create dictionary of rasters, stats, and bands to compute
zonal_stats_params = (
    {"img": {
        "path": img,
        "stats": ["mean", "max"],
        "bands": [1, 2, 3, 4]
	},
    "nDSM": {
        "path": ndsm,
        "stats": ["mean"]
    },
    "roughness":{
        "path": roughnesscategory=       "stats": ["mean"],
    },
    "ndvi":{
        "path": ndvi,
        "stats": ["mean"]
    } 
})

In [None]:
out_zs = out_vec.replace('/seg', '/zs')
calc_zonal_stats(shp=out_vec,
                 rasters=[zonal_stats_params],
                 compactness=True,
                 roundness=True,
                 out_path=out_zs)

## Classification

This is done with python+geopandas but could also be done via QGIS+select_by_attribute

In [None]:
obj = read_vec(out_zs)
obj.sample(5)

In [None]:
obj.describe()

In [None]:
# Field names
ndvi_mean = 'ndvi_mean'
ndsm_mean = 'nDSM_mean'
roughness_mean = 'roughness_mean'
imgb1_mean = 'imgb1_mean'
imgb2_mean = 'imgb2_mean'
imgb3_mean = 'imgb3_mean'
imgb4_mean = 'imgb4_mean'
roundness = 'roundness'
# Class names
trees = 'trees'
water = 'water'
open_green = 'open_green'
buildings = 'buildings'
roads_pave = 'roads_pavement'
shadow = 'shadow'
# Add a field to hold the class
CLASS = 'class'
obj[CLASS] = None


def apply_rules(gdf, rules, class_name, unclass_only=True):
    """Function to apply rules"""
    # Look up strings operators to get function
    op_lut = {'>': operator.gt,
              '<': operator.lt}
    
    matches = copy.deepcopy(gdf)
    for field, op, val in rules:
        if isinstance(op, str):
            # if  a string is passed, get the function
            op = op_lut[op]
        matches = matches[op(matches[field], val)]
    root_logger.info(f'Objects to be classified as {class_name}: {len(matches)}')
    if unclass_only:
        # Index in matches and not classified yet
        gdf.loc[gdf.index.isin(matches.index) & gdf[CLASS].isnull(), CLASS] = class_name
    else:
        # Index in matches - would overwrite class if present
        gdf.loc[gdf.index.isin(matches.index), CLASS] = class_name
    root_logger.info(f'Objects classified as {class_name}: {len(gdf[gdf[CLASS]==class_name])}')
    root_logger.debug(f'Remaining unclassified: {len(gdf[gdf[CLASS].isnull()])}')
    return gdf

In [None]:
# Trees
tree_rules = [
    (ndvi_mean, '>', 0),
    (ndsm_mean, '>', 0.75),
    (roughness_mean, '>', 1.25)
]
# Water
water_rules = [
    (ndvi_mean, '<', 0),
    (roughness_mean, '<', 0.13),
    (ndsm_mean, '<', 1),
    (imgb4_mean, '<', 10)
]
# Open Green Space
open_rules = [
    (ndvi_mean, '>', 0.1),
    (ndsm_mean, '<', 1)
]
# Buildings
build_rules = [
    (ndvi_mean, '<', 0),
    (ndsm_mean, '>', 2),
]
# Roads and pavement
roads_rules = [
    (ndvi_mean, '<', 0),
    (ndsm_mean, '<', 1),
    (roughness_mean, '<', 1.5),
    (roundness, '>', 2)
]
# Shadow
shadow_rules = [
    (imgb1_mean, '<', 150),
    (imgb2_mean, '<', 150),
    (imgb3_mean, '<', 150),
    (imgb4_mean, '<', 150)
]
classes = [trees, water, open_green, buildings, roads_pave, shadow]
rules = [tree_rules, water_rules, open_rules, build_rules, roads_rules, shadow_rules]
classes_rules = zip(classes, rules)
for class_name, class_rules in classes_rules:
    root_logger.info(f'Classifying: {class_name}')
    obj = apply_rules(obj, class_rules, class_name)

Other rules are possible, but complicated.  
* OR rules  
* adjacency


In [None]:
# Write classification
out_class = out_vec.replace('/seg', '/classified')
# write_gdf(obj, out_class)
print(out_class)