# Open Source Object-Based-Image-Analysis

Derivatives -> Segmentation -> Zonal statistics -> Classification 
[add image]

gdal hierarchy


## Stack
**python:**  
    - [geopandas](https://geopandas.org): vector (image object) manipulation  
    - [rasterio](https://rasterio.readthedocs.io/en/latest/): raster loading/manipulation  
    - [rasterstats](https://pythonhosted.org/rasterstats/)  
*The use of anaconda/miniconda is *highly* recommended for managing python dependencies, specifically installing from the `conda-forge` channel.*  

[WhiteBoxTools](https://jblindsay.github.io/wbt_book/intro.html): raster derivatives, specifically geomorphometic analyses.  
*Has python bindings, but tough to integrate with other dependencies, so called from the command line.*  

[**Orfeo Toolbox**](https://www.orfeo-toolbox.org/CookBook/): segmentation (also has general GIS tools).  
*Has python bindings, but tough to integrate with other dependencies, so called from the command line.*  

[**QGIS**](https://qgis.org/en/site/): Viewing outputs, general GIS.

## Imports + Logging Set up

In [89]:
# Standard lib
import copy
import logging
import operator
from pathlib import Path
from pprint import pprint
import warnings
import sys

# Installed packages
from osgeo import gdal
import geopandas
import rasterio
import rasterstats

# Local packages
from lib import (run_subprocess, clean4cmdline, create_grm_outname,
                  rio_polygonize, read_vec, write_gdf)
from calc_zonal_stats import calc_zonal_stats

In [25]:
# Set up logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter(
    '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)

warnings.filterwarnings('ignore', r'RuntimeWarning: Sequential read of iterator was interrupted.')

## Paths (Orfeo + Data)

In [3]:
# Orfeo setup and tools
otb = Path(r'C:\OTB-7.2.0-Win64\OTB-7.2.0-Win64')
otb_init = otb / 'otbenv.bat'
otb_bin = otb / 'bin'
otb_grm = otb_bin / 'otbcli_GenericRegionMerging.bat'
# otb_lsms = otb_bin / 'otbcli_LargeScaleMeanShift.bat'

# Data
data_dir = Path(r'./data')
img = data_dir / 'naip_m_4509361_se_15_060_20190727_aoi.tif'
ndsm = data_dir / 'nDSM_clip_fill.tif'
ndvi = data_dir / 'ndvi_naip_m_4509361_se_15_060_20190727_aoi.tif'
roughness = data_dir / 'nDSM_clip_fill_roughness.tif'

# Out paths
out_dir = Path(r'./results')
seg_dir = out_dir / 'seg'

# Ensure all exist
missing_files = []
for file_path in [data_dir, img, ndsm, ndvi, roughness, out_dir]:
    if not file_path.exists():
        missing_files.append(file_path)
if len(missing_files) > 0:
    for file_path in missing_files:
        logger.error(f'Missing file/folder: {file_path}')
else:
    logger.info('All files/folders located.')

2021-04-22 15:00:33,802 - __main__ - INFO - All files/folders located.


# Check out input data
(how it was created, etc), histogram of values in image (qgis), stats of image to seg

## Segmentation

#### Generic Region Merging
[otb docs](https://www.orfeo-toolbox.org/CookBook/Applications/app_GenericRegionMerging.html?highlight=generic%20region%20merging)

In [4]:
# Parameters
# Homogeneity criterion to use. The default is 'bs'. One of: [bs, ed, fls]
criterion = 'bs'
threshold = 100
niter = 100
spectral_w = 0.6 # spectral weight, higher values slow processing time
spatial_w = 25 # spatial weight
out_img = create_grm_outname(img=img,
                             out_dir=seg_dir,
                             criterion=criterion,
                             threshold=threshold,
                             niter=niter,
                             spectral=spectral_w,
                             spatial=spatial_w)

In [5]:
# Build the command
otb_cmd = f"""
"{otb_grm}"
-in {str(img)}
-out {str(out_img)}
-criterion {criterion}
-threshold {threshold}
-niter {niter}
-cw {spectral_w}
-sw {spatial_w}"""

otb_cmd = clean4cmdline(otb_cmd)
logger.info(f'OTB command:\n{otb_cmd}')
otb_cmd = f'{otb_init} && {otb_cmd}'
logger.debug(f'OTB full command:\n{otb_cmd}')

2021-04-22 15:00:33,839 - __main__ - INFO - OTB command:
"C:\OTB-7.2.0-Win64\OTB-7.2.0-Win64\bin\otbcli_GenericRegionMerging.bat" -in data\naip_m_4509361_se_15_060_20190727_aoi.tif -out results\seg\naip_m_4509361_se_15_060_20190727_aoi_bst100ni0s0spec0x6spat25.tif -criterion bs -threshold 100 -niter 0 -cw 0.6 -sw 25
2021-04-22 15:00:33,839 - __main__ - DEBUG - OTB full command:
C:\OTB-7.2.0-Win64\OTB-7.2.0-Win64\otbenv.bat && "C:\OTB-7.2.0-Win64\OTB-7.2.0-Win64\bin\otbcli_GenericRegionMerging.bat" -in data\naip_m_4509361_se_15_060_20190727_aoi.tif -out results\seg\naip_m_4509361_se_15_060_20190727_aoi_bst100ni0s0spec0x6spat25.tif -criterion bs -threshold 100 -niter 0 -cw 0.6 -sw 25


In [90]:
# Run the segmentation
# import subprocess
# from subprocess import PIPE
# def run_subprocess(command):
#     proc = subprocess.Popen(command, stdout=PIPE, stderr=PIPE, shell=True)
#     for line in iter(proc.stdout.readline, b''):  # replace '' with b'' for Python 3
#         logger.info(line.decode())
#     output, error = proc.communicate()
#     logger.debug('Output: {}'.format(output.decode()))
#     logger.debug('Err: {}'.format(error.decode()))

# run_subprocess(otb_cmd)

(Check out resulting image)

In [None]:
# Convert output tif to polygons
out_vec = out_img.replace('.tif', '.gpkg/seg')
# vec_objects = rio_polygonize(img=out_img, out_vec=out_vec)

(Check out resuting polygons)

### Other segmentation options
[Orfeo Toolbox LargeScaleMeanShift](https://www.orfeo-toolbox.org/CookBook/Applications/app_LargeScaleMeanShift.html)

[Scikit-Image Segmenation](https://scikit-image.org/docs/dev/api/skimage.segmentation.html)  
Inputs and output are often nunmpy arrays/matrices that need to be converted from pixel space back to geographic space. A couple of useful functions for this conversion:

In [91]:
def pixel2geo(pixel_coord, geotransform):
    """
    Covert pixel coordinates to geographic coordinates
    """
    y, x = pixel_coord
    gy = geotransform[4] * x + geotransform[5] * y + geotransform[4] * 0.5 + geotransform[5] * 0.5 + self.geotransform[3]
    gx = geotransform[1] * x + geotransform[2] * y + geotransform[1] * 0.5 + geotransform[2] * 0.5 + self.geotransform[0]

    return gy, gx

def geo2pixel(geocoord, geotransform):
    """
    Convert geographic coordinates to pixel coordinates
    """
    y, x = geocoord
    py = int(np.around((y - geotransform[3]) / geotransform[5]))
    px = int(np.around((x - geotransform[0]) / geotransform[1]))
    return py, px

ds = gdal.Open(str(img))
geotransform = ds.GetGeoTransform()

logger.info(f'Geotransform:\n{geotransform}')
logger.info(f'Top left x: {geotransform[0]}')
logger.info(f'X Resolution: {geotransform[1]}')
logger.info(f'Rotation1: {geotransform[2]}')
logger.info(f'Top left y: {geotransform[3]}')
logger.info(f'Rotation: {geotransform[4]}')
logger.info(f'Y Resolution: {geotransform[5]}')

geo2pixel(geocoord=(467491.8, 4983821), geotransform=geotransform)

2021-04-22 18:20:07,325 - __main__ - INFO - Geotransform:
(467491.8, 0.600000000000014, 0.0, 4983821.4, 0.0, -0.600000000000258)
2021-04-22 18:20:07,325 - __main__ - INFO - Geotransform:
(467491.8, 0.600000000000014, 0.0, 4983821.4, 0.0, -0.600000000000258)
2021-04-22 18:20:07,325 - __main__ - INFO - Geotransform:
(467491.8, 0.600000000000014, 0.0, 4983821.4, 0.0, -0.600000000000258)
2021-04-22 18:20:07,332 - __main__ - INFO - Top left x: 467491.8
2021-04-22 18:20:07,332 - __main__ - INFO - Top left x: 467491.8
2021-04-22 18:20:07,332 - __main__ - INFO - Top left x: 467491.8
2021-04-22 18:20:07,332 - __main__ - INFO - X Resolution: 0.600000000000014
2021-04-22 18:20:07,332 - __main__ - INFO - X Resolution: 0.600000000000014
2021-04-22 18:20:07,332 - __main__ - INFO - X Resolution: 0.600000000000014
2021-04-22 18:20:07,332 - __main__ - INFO - Rotation1: 0.0
2021-04-22 18:20:07,332 - __main__ - INFO - Rotation1: 0.0
2021-04-22 18:20:07,332 - __main__ - INFO - Rotation1: 0.0
2021-04-22 18

NameError: name 'np' is not defined

## Zonal Statistics

In [16]:
# Create dictionary of rasters, stats, and bands to compute
zonal_stats_params = (
    {"img": {
        "path": img,
        "stats": ["mean", "max"],
        "bands": [1, 2, 3, 4]
	},
    "nDSM": {
        "path": ndsm,
        "stats": ["mean"]
    },
    "roughness":{
        "path": roughness,
        "stats": ["mean"],
    },
    "ndvi":{
        "path": ndvi,
        "stats": ["mean"]
    } 
})
pprint(zonal_stats_params)

{'img': {'bands': [1, 2, 3, 4],
         'path': WindowsPath('data/naip_m_4509361_se_15_060_20190727_aoi.tif'),
         'stats': ['mean', 'max']},
 'nDSM': {'path': WindowsPath('data/nDSM_clip_fill.tif'), 'stats': ['mean']},
 'ndvi': {'path': WindowsPath('data/ndvi_naip_m_4509361_se_15_060_20190727_aoi.tif'),
          'stats': ['mean']},
 'roughness': {'path': WindowsPath('data/nDSM_clip_fill_roughness.tif'),
               'stats': ['mean']}}


In [17]:
out_zs = out_vec.replace('/seg', '/zs')
calc_zonal_stats(shp=out_vec,
                 rasters=[zonal_stats_params],
                 compactness=True,
                 roundness=True,
                 out_path=out_zs)

  for feature in features_lst:


results\seg\naip_m_4509361_se_15_060_20190727_aoi_bst100ni0s0spec0x6spat25.gpkg/zs


## Classification
This is done with python+geopandas but could also be done via QGIS+select_by_attribute

In [54]:
obj = read_vec(out_zs)
obj.sample(5)

  for feature in features_lst:


Unnamed: 0,raster_val,imgb1_max,imgb1_mean,imgb2_max,imgb2_mean,imgb3_max,imgb3_mean,imgb4_max,imgb4_mean,nDSM_mean,roughness_mean,ndvi_mean,area_zs,compactness,roundness,geometry
6948,7018.0,98.0,80.079365,83.0,69.386243,89.0,80.164021,213.0,189.190476,6.559516,7.47871,0.40251,68.04,0.579845,1.7246,"POLYGON ((468736.800 4983237.600, 468736.800 4..."
2416,2152.0,182.0,125.08867,178.0,116.827586,146.0,119.428571,80.0,31.591133,2.815,2.699571,-0.584924,73.08,0.520607,1.920836,"POLYGON ((467815.800 4983820.800, 467815.800 4..."
4215,4286.0,159.0,90.942029,142.0,89.971014,159.0,94.76087,101.0,29.985507,0.313962,0.681132,-0.518909,49.68,0.552984,1.808369,"POLYGON ((468494.400 4983621.600, 468494.400 4..."
3904,3825.0,163.0,111.684369,166.0,126.256513,129.0,88.481964,194.0,146.109218,10.849944,10.543547,0.233579,359.28,0.301356,3.318333,"POLYGON ((468284.400 4983681.600, 468284.400 4..."
5829,5825.0,122.0,58.592896,120.0,64.065574,110.0,64.571038,169.0,58.18306,3.367422,4.303204,-0.097828,131.76,0.256142,3.904079,"POLYGON ((468622.200 4983406.200, 468622.200 4..."


In [53]:
obj.describe()

Unnamed: 0,raster_val,imgb1_max,imgb1_mean,imgb2_max,imgb2_mean,imgb3_max,imgb3_mean,imgb4_max,imgb4_mean,nDSM_mean,roughness_mean,ndvi_mean,area_zs,compactness,roundness
count,8821.0,8821.0,8821.0,8821.0,8821.0,8821.0,8821.0,8821.0,8821.0,5259.0,5248.0,5258.0,8821.0,8821.0,8821.0
mean,4414.0,89.444281,62.530597,89.692552,65.061934,78.089446,56.880884,87.69822,52.528765,2.893229,3.738809,-0.150113,147.670767,0.554309,2.147803
std,2546.547696,80.698568,60.444144,79.254736,60.170375,71.380508,52.343567,85.796962,60.037277,3.392048,4.032962,0.371299,296.337555,0.207466,1.037152
min,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.021763,0.0,-0.913682,0.36,0.069582,1.27324
25%,2209.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06412,0.373922,-0.423731,0.36,0.373546,1.27324
50%,4414.0,99.0,66.763359,110.0,71.770386,88.0,70.806452,83.0,31.930233,1.496084,2.309888,-0.135565,51.48,0.500046,1.999816
75%,6619.0,165.0,103.245968,159.0,115.46832,135.0,88.622605,177.0,98.302469,4.761645,6.022746,0.204408,181.8,0.785398,2.677043
max,8824.0,255.0,231.715447,255.0,228.756098,255.0,222.398374,255.0,211.557692,18.680612,22.409891,0.420029,6719.4,0.785398,14.371454


In [84]:
# Field names
ndvi_mean = 'ndvi_mean'
ndsm_mean = 'nDSM_mean'
roughness_mean = 'roughness_mean'
imgb1_mean = 'imgb1_mean'
imgb2_mean = 'imgb2_mean'
imgb3_mean = 'imgb3_mean'
imgb4_mean = 'imgb4_mean'
roundness = 'roundness'
# Class names
trees = 'trees'
water = 'water'
open_green = 'open_green'
buildings = 'buildings'
roads_pave = 'roads_pavement'
shadow = 'shadow'
# Add a field to hold the class
CLASS = 'class'
obj[CLASS] = None


def apply_rules(gdf, rules, class_name, unclass_only=True):
    """Function to apply rules"""
    # Look up strings operators to get function
    op_lut = {'>': operator.gt,
              '<': operator.lt}
    
    matches = copy.deepcopy(gdf)
    for field, op, val in rules:
        if isinstance(op, str):
            # if  a string is passed, get the function
            op = op_lut[op]
        matches = matches[op(matches[field], val)]
    logger.info(f'Objects to be classified as {class_name}: {len(matches)}')
    if unclass_only:
        # Index in matches and not classified yet
        gdf.loc[gdf.index.isin(matches.index) & gdf[CLASS].isnull(), CLASS] = class_name
    else:
        # Index in matches - would overwrite class if present
        gdf.loc[gdf.index.isin(matches.index), CLASS] = class_name
    logger.info(f'Objects classified as {class_name}: {len(gdf[gdf[CLASS]==class_name])}')
    logger.debug(f'Remaining unclassified: {len(gdf[gdf[CLASS].isnull()])}')
    return gdf

In [85]:
# Trees
tree_rules = [
    (ndvi_mean, '>', 0),
    (ndsm_mean, '>', 0.75),
    (roughness_mean, '>', 1.25)
]
# Water
water_rules = [
    (ndvi_mean, '<', 0),
    (roughness_mean, '<', 0.13),
    (ndsm_mean, '<', 1),
    (imgb4_mean, '<', 10)
]
# Open Green Space
open_rules = [
    (ndvi_mean, '>', 0.1),
    (ndsm_mean, '<', 1)
]
# Buildings
build_rules = [
    (ndvi_mean, '<', 0),
    (ndsm_mean, '>', 2),
]
# Roads and pavement
roads_rules = [
    (ndvi_mean, '<', 0),
    (ndsm_mean, '<', 1),
    (roughness_mean, '<', 1.5),
    (roundness, '>', 2)
]
# Shadow
shadow_rules = [
    (imgb1_mean, '<', 150),
    (imgb2_mean, '<', 150),
    (imgb3_mean, '<', 150),
    (imgb4_mean, '<', 150)
]
classes = [trees, water, open_green, buildings, roads_pave, shadow]
rules = [tree_rules, water_rules, open_rules, build_rules, roads_rules, shadow_rules]
classes_rules = zip(classes, rules)
for class_name, class_rules in classes_rules:
    logger.info(f'Classifying: {class_name}')
    obj = apply_rules(obj, class_rules, class_name)

2021-04-22 16:47:42,153 - __main__ - INFO - Classifying: trees
2021-04-22 16:47:42,153 - __main__ - INFO - Classifying: trees
2021-04-22 16:47:42,153 - __main__ - INFO - Classifying: trees
2021-04-22 16:47:42,222 - __main__ - INFO - Objects to be classified as trees: 1632
2021-04-22 16:47:42,222 - __main__ - INFO - Objects to be classified as trees: 1632
2021-04-22 16:47:42,222 - __main__ - INFO - Objects to be classified as trees: 1632
2021-04-22 16:47:42,260 - __main__ - INFO - Objects classified as trees: 1632
2021-04-22 16:47:42,260 - __main__ - INFO - Objects classified as trees: 1632
2021-04-22 16:47:42,260 - __main__ - INFO - Objects classified as trees: 1632
2021-04-22 16:47:42,275 - __main__ - INFO - Remaining unclassified: 7189
2021-04-22 16:47:42,275 - __main__ - INFO - Remaining unclassified: 7189
2021-04-22 16:47:42,275 - __main__ - INFO - Remaining unclassified: 7189
2021-04-22 16:47:42,290 - __main__ - INFO - Classifying: water
2021-04-22 16:47:42,290 - __main__ - INFO -

In [83]:
# Write classification
out_class = out_vec.replace('/seg', '/classified')
# write_gdf(obj, out_class)
print(out_class)

results\seg\naip_m_4509361_se_15_060_20190727_aoi_bst100ni0s0spec0x6spat25.gpkg/classified
