# Geospatial packages

- SN6 dataset contains geo-referenced images, to make our lives easier we'll use `geopandas` for processing geojson files, and `rasterio` for handling .tiff raster files.
- `geopandas` is more or less like pandas with added spatial processing, and with `rasterio` you can read how to do stuffs in their documentation website
- I like modular, reusable code, so I'll be creating functions for a group of process
- I like bulletpoints explanation, since I'm busy, you're probably the same, so let's just understand what matters

In [None]:
import geopandas as gpd
import rasterio as rs
from rasterio.plot import show  # imshow for raster
from rasterio import features as feat  # handle binary mask

# some standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

I always like to start with viewing the annotations first

In [None]:
ROOT_DIR = '../input/spacenet-6-multisensor-allweather-mapping/AOI_11_Rotterdam/'

df = pd.read_csv(ROOT_DIR+'SummaryData/SN6_Train_AOI_11_Rotterdam_Buildings.csv')
print(f'total rows: {df.shape[0]}')
display(df.head())

- Here we can tell there are 214k buildings, are they all unique? Let's sort by mean height. Why are there NaN values?
- You can also use geopandas for this, but it's faster to read from geojson files of each ImageId
- let's list the unique ImageIds that we'll use for previewing some examples


In [None]:
df_mean_h = df.sort_values('Median_Building_Height')
df_mean_h.tail()

In [None]:
# take unique ImageId for plotting the raster maps
image_ids = df.ImageId.unique()
print(f'total images: {len(image_ids)}')

# Raster maps
4 types of sensor:
- PAN: panchromatic, 900x900px, 1ch
- PS-RGB: pan-sharpened RGB, 900x900px, 3ch
- PS-RGBNIR: added Near InfraRed, 900x900px, 4ch
- RGBNIR: 450x450px, 4ch
- SAR-Intensity: quad polarimetric SAR, 900x900px, 4ch

## Viewing optical raster
- `rs.open` will load a .tiff file as a raster (with spatial information)
- `show` will preview the raster

In [None]:
def get_filepath(image_id, mode='PS-RGB'):
    return f'{ROOT_DIR}{mode}/SN6_Train_AOI_11_Rotterdam_{mode}_{image_id}.tif'

def get_raster(image_id, mode='PS-RGB'):
    return rs.open(get_filepath(image_id, mode))

ex_image_id = image_ids[0]; print(f'tile: {ex_image_id}')

# open the raster image
ex_raster = get_raster(ex_image_id)

# print properties of the raster
print(ex_raster.meta)

# view with rasterio's show
show(ex_raster)

## Read as image
- to convert the raster as normal images, use the `.read()` method
- read() also takes some attributes such as which channel to extract, output dtype, cropping window etc. [docs](https://rasterio.readthedocs.io/en/latest/api/rasterio.io.html)
- reshape the order of dimension from `c,w,h` to `w,h,c`

In [None]:
def get_rgb(image_id):
    raster = rs.open(get_filepath(ex_image_id,'PS-RGB'))
    image = raster.read()
    return rs.plot.reshape_as_image(image)

plt.imshow(get_rgb(ex_image_id)); plt.show()

## Loading the masks
- to generate a binary mask from polygons, we'll be using `.geometry_mask` method from rasterio's features
- some images have no buildings present, this will return an error, to handle, we'll output a mask of all zeros if the geojson file is empty
- `.geometry_mask` has some important properties
    - requires column containing geometry information
    - out_shape: height and width of the mask, we can use the raster properties
    - `invert` will show True for pixel buildings, and False for background pixels
    - transform information, can also use from raster properties

In [None]:
def get_geopath(image_id):
    return f'{ROOT_DIR}geojson_buildings/SN6_Train_AOI_11_Rotterdam_Buildings_{image_id}.geojson'

def get_binary_mask(image_id, raster):
    gdf = gpd.read_file(get_geopath(ex_image_id))
    
    if gdf.shape[0]==0:  # handle rasters with no buildings
        mask = np.zeros((raster.height, raster.width))
    
    mask = feat.geometry_mask(
        ex_gdf.geometry,
        out_shape=(raster.height, raster.width),
        transform=raster.transform,
        invert=True
    )
    
    return mask

ex_gdf = gpd.read_file(get_geopath(ex_image_id))
display(ex_gdf.head(3))

mask = get_binary_mask(ex_image_id, ex_raster)

plt.imshow(mask, cmap='gray'); plt.show()

## Loading SAR
- pixel values are SAR-Intensity in dB. range value from 1e-5 to 92.88 dB [1]
- values are type float32
- notice the `nodata` region (area where there's no map) is set to the value 0.0
- `.read` method allows to select which channel to extract and which order

In [None]:
# 4ch sar raster
ex_raster_sar = get_raster(ex_image_id, mode='SAR-Intensity')
print(ex_raster_sar.meta)

show(ex_raster_sar); plt.show()

## Normalizing
- to stretch pixel range to `uint8` we normalize the pixel values and convert from float32
- apply norm to each plane, and reshape

In [None]:
def norm(plane):
    # make sure that no value is larger than 92.88, or else will return an overflow
    max_val = plane.max() if plane.max()>92.88 else 92.88
    plane = plane / max_val * 255
    return plane.astype(np.uint8)

def get_sar(image_id, sar_ch=[1,2,3,4]):
    raster = get_raster(image_id, mode='SAR-Intensity')

    # read all 4ch
    image = raster.read(indexes=sar_ch)
    
    # channel-wise norm
    if norm:
        for i in range(image.shape[0]):
            image[i] = norm(image[i])
    
    # each channel is uint8, but the combined array is float, change this
    image = image.astype(np.uint8)
        
    return rs.plot.reshape_as_image(image)

# view each SAR polarity channel
ex_image_sar = get_sar(ex_image_id)
pol_title = ['HH','HV','VH','VV']

f = plt.figure(figsize=(20,5))
for i in range(ex_image_sar.shape[-1]):
    ax = f.add_subplot(1,4,i+1)
    ax.imshow(ex_image_sar[:,:,i], cmap='gray')
    ax.set_title(pol_title[i])
    plt.axis('off')
    
plt.tight_layout();plt.show()

# Histogram
understand pixel distribution for each channel

In [None]:
def show_hist(image, ax, start=1, end=256):
    num_bins = 256
    color = ['r','g','b','k']
    if len(image.shape)==2:
        ax.hist(image.ravel(), num_bins, [start,end])
    else:
        for i in range(image.shape[-1]):
            ax.hist(image[:,:,i].ravel(), num_bins, [start,end],
                    color=color[i], histtype='step', alpha=0.6)

In [None]:
f,(ax1,ax2,ax3,ax4) = plt.subplots(1,4,figsize=(20,5))

ex_rgb = get_rgb(ex_image_id)
ax1.imshow(ex_rgb); ax1.set_title('Optical RGB Image')
show_hist(ex_rgb, ax2); ax2.set_title('RGB Histogram')

ex_raster_sar = get_raster(ex_image_id, mode='SAR-Intensity')
ex_image_sar = get_sar(ex_image_id)

show(ex_raster_sar, ax=ax3); ax3.set_title('SAR Raster')
show_hist(ex_image_sar, ax4); ax4.set_title('SAR Histogram')
plt.show()

Notice if we set the histogram range from 0, we capture the `nodata` valued 0.0, which is dominant compared to other values

In [None]:
f,ax = plt.subplots(figsize=(5,5))
show_hist(ex_image_sar, ax, 0, 255)

When applying transformation to the SAR image, this `nodata` pixel values also gets computed, to minimize this we can crop the values by grabbing the corners where the map still shows at `r0,r1,c0,c1`. When working with SAR it's common to show only 3 channels or processed the quad polarimetry channel as a false colored RGB. Here I show R=HH, G=VV and B=VH

In [None]:
def get_region_index(raster):
    """
    leaves nodata trails in the edges
    np.argwhere() returns list of index. [[row,col],[row,col]]
        of every element that gets true condition
    """
    bin_mask = raster.read_masks(1)
    coords = np.argwhere(bin_mask==255)
    row0,col0 = coords.min(axis=0)  # find lowest row and col
    row1,col1 = coords.max(axis=0)  # find highest row and col
    return row0,row1+1,col0,col1+1

ex_raster = get_raster(ex_image_id)
r0,r1,c0,c1 = get_region_index(ex_raster)  # get cropping index

ex_image_sar = get_sar(ex_image_id, sar_ch=[1,4,3])
plt.imshow(ex_image_sar[r0:r1,c0:c1,0])
plt.title('Cropped SAR image false colored'); plt.show()

# Reference

[1] Shermeyer, J., Hogan, D., Brown, J., Etten, A.V., Weir, N., Pacifici, F., Hänsch, R., Bastidas, A., Soenen, S., Bacastow, T.M., & Lewis, R. (2020). SpaceNet 6: Multi-Sensor All Weather Mapping Dataset. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 768-777. [Arxiv paper](https://arxiv.org/abs/2004.06500)