<a href="https://colab.research.google.com/github/MMoronto/ml-unstructured-data-projects/blob/master/Tile_based_classification_using_Sentinel_2_L1C_and_EuroSAT_data_Inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Introduction**

This workflow explores the process of infering land use / classification from a pre-trained model based on the benchmark EuroSAT dataset. This note book contains notes I've taken as I work through the example workflows presented in the AI for Earth monitoring MOOC on Futurelearn.

This is the second of a two part practice module, the first part of this two part series explores the process of training a convolutional neural network based on EuroSAT benchmark data for land use/land cover classification while the second part explores the workflow for utilising the learned capabilities of our pre-trained model to infer land use / land cover classification on a Sentinel-2 level-1C tile image.   

## **Data**

The inference process utilizes the following data:

* a `Sentinel-2 level-1C file`, which is available in the following folder: ./S2_Tile_based_classification/01_input_data/S2_tile_4_inference.
The scene shows a coasteal part over Italy on March 31, 20121. The scene is used as input data for the pretrained model in order to infer land use/land cover classes.
* a `pretrained  model`, a `convolutional neural network` which has been trained based on EuroSAT  data, as a benchmark dataset for land use / land cover classifications.

## **Additional Resources**

* 3B - Tile-based classification with EuroSAT data - Training
* EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
* EuroSAT data

## **Notebook Outline**

* 1 - Load a Sentinel-2 Level 1-C tile
* 2 - Resample all bands of a Seninel-2 Level 1-C tile to 10m spatial resolution
* 3 - Reorder the bands according to the order of the pretrained model
* 4 - Load the pretrained sequential convolutional neural network based on EuroSAT data
* 5 - Divide the Sentinel-2 L1C tile into  64x64 windows
* 6 - Infer land use classes
* 7 - Visualize the final classified image

**Import libraries**

In [4]:
## Begin S3FS import snippet ##
import os, sys
s3_home = os.getcwd()
try: sys.path.remove(s3_home) # Remove the S3 root from the path
except Exception: pass

# Begin imports #
import tensorflow as tf
from osgeo import gdal_array, osr, gdal
import glob
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
import matplotlib.colors
import warnings
warnings.filterwarnings("ignore")
# end imports #

# os.chdir(current_dir) # go back to your previous directory

sys.path.append(s3_home) # restore the s3 root in the path

## end s3fs import snippet ##

Define helper functions

from_folder_to_stack

In [None]:
import numpy as np

'''
function name:
  from_folder_to_stack
description:
  This function transforms the .SAFE file into three different arrays (10m, 20m and 60m).
Input:
  safe_path: the path of the .SAFE file;
  data_bands_20m: if True, the fumnction computes stack using Sentinel2 band with 20m of pixel resolution (default=True);
  data_bands_60m: if True, the function computes stack using Sentinel2 band with 60m of pixel resolution (default=True);
Output:
  stack_10m: stack with the following S2L1C bands (B02, B03, B04, B08)
  stack_20m: stack with the following S2L1C bands (B05, B06, B07, B11, B12, B8A)
  stack_60m: stack with the following S2L1C bands (B01, mB09, B10)
'''
def from_folder_to_stack(
    safe_path,
    data_bands_20m=True,
    data_bands_60m=True,
    ):

  level_folder_name_list = glob.glob(safe_path + 'GRANULE/*')
  level_folder_name = level_folder_name_list[0]

  if level_folder_name.find("L2A") < 0:
    safe_path = [level_folder_name + '/IMG_DATA/']
  else:
    safe_path_10m = level_folder_name + '/IMG_DATA/R10m/'
    safe_path = [safe_path_10m]

  text_files = []

  for i in range(0, len(safe_path)):
      print("[AI4EO_MOOC]_log: Loading .jp2 images in %s" % (safe_path[i]))
      text_files_tmp = [f for f in os.listdir(safe_path[i]) if f.endswith('.jp2')]
      text_files.append(text_files_tmp)

  lst_stack_60m=[]
  lst_code_60m=[]
  lst_stack_20m=[]
  lst_code_20m=[]
  lst_stack_10m=[]
  lst_code_10m=[]
  for i in range(0, len(safe_path)):

    print("[AI4EO_MOOC]_log: Reading .jp2 files in %s" % (safe_path[i]))
    for name in range(0, len(text_files[i])):
      text_files_tmp = text_files[i]
      if data_bands_60m == True:
        cond_60m = ( (text_files_tmp[name].find("B01") > 0) or (text_files_tmp[name].find("B09") > 0)
                    or (text_files_tmp[name].find("B10") > 0))
        if cond_60m:
            print("[AI4EO_MOOC]_log: Using .jp2 image: %s" % text_files_tmp[name])
            lst_stack_60m.append(gdal_array.LoadFile(safe_path[i] + text_files_tmp[name]))
            lst_code_60m.append(text_files_tmp[name][24:26])

      if data_bands_20m == True:
          cond_20m = (text_files_tmp[name].find("B05") > 0) or (text_files_tmp[name].find("B06") > 0) or (
                      text_files_tmp[name].find("B07") > 0) or (text_files_tmp[name].find("B11") > 0) or (
                                  text_files_tmp[name].find("B12") > 0) or (text_files_tmp[name].find("B8A") > 0)
          cond_60m_L2 = (text_files_tmp[name].find("B05_60m") < 0) and (text_files_tmp[name].find("B06_60m") < 0) and (
                      text_files_tmp[name].find("B07_60m") < 0) and (text_files_tmp[name].find("B11_60m") < 0) and (
                                  text_files_tmp[name].find("B12_60m") < 0) and (text_files_tmp[name].find("B8A_60m") < 0)
          cond_20m_tot = cond_20m and cond_60m_L2
          if cond_20m_tot:
              print("[AI4E_MOOC]_log: Using .jp2 image: %s" % text_files_tmp[name])
              lst_stack_20m.append(gdal_array.LoadFile(safe_path[i] + text_files_tmp[name]))
              lst_code_20m.append(text_files_tmp[name][24:26])
      else:
        stack_20m = 0

      cond_10m = (text_files_tmp[name].find("B02") > 0) or (text_files_tmp[name].find("B03") > 0) or (
                  text_files_tmp[name].find("B04") > 0) or (text_files_tmp[name].find("B08") > 0)
      cond_20m_L2 = (text_files_tmp[name].find("B02_20m") < 0) and (text_files_tmp[name].find("B03_20m") < 0) and (
                  text_files_tmp[name].find("B04_20m") < 0) and (text_files_tmp[name].find("B08_20m") < 0)
      cond_60m_L2 = (text_files_tmp[name].find("B02_60m") < 0) and (text_files_tmp[name].find("B03_60m") < 0) and (
                  text_files_tmp[name].find("B04_60m") < 0) and (text_files_tmp[name].find("B08_60m") < 0)
      cond_10m_tot = cond_10m and cond_20m_L2 and cond_60m_L2

      if cond_10m_tot:
          print("[AI4E)_MOOC]_log: Using .jp2 image: %s" % text_files_tmp[name])
          lst_stack_10m.append(gdal_array.LoadFile(safe_path[i] + text_files_tmp[name]))
          lst_code_10m.append(text_files_tmp[name][24:26])


  stack_10m=np.asarray(lst_stack_10m)
  sorted_list_10m = ['02', '03', '04', '08']
  print('[AI4EO_MOOC]_log: Sorting stack 10m...')
  stack_10m_final_sorted = stack_sort(stack_10m, lst_code_10m, sorted_list_10m)

  stack_20m=np.asarray(lst_stack_20m)
  sorted_list_20m = ['05', '06', '07', '11', '12', '8A']
  print('[AI4EO_MOOC]_log: Sorting stack 20m...')
  stack_20m_final_sorted = stack_sort(stack_20m, lst_code_20m, sorted_list_20m)
              
  stack_60m=np.asarray(lst_stack_60m)
  sorted_list_60m = ['01', '09', '10']
  print('[AI4EO_MOOC]_log: Sorting stack 60m...')
  stack_60m_final_sorted = stack_sort(stack_60m, lst_code_60m, sorted_list_60m)

  return stack_10m_final_sorted, stack_20m_final_sorted, stack_60m_final_sorted

stack_sort

In [None]:
def stack_sort(stack_in, lst_code, sorted_list):
  b, r, c = stack_in.shape
  stack_sorted = np.zeros((r,c,b), dtype=np.unit16)

  len_list_bands = len(lst_code)

  c = np.zeros((len_list_bands), dtype=np.unit8)
  count = 0
  count_sort = 0
  while count_sort != len_list_bands:
    if lst_code[count] == sorted_list[count_sort]:
      c[count_sort] = count
      count_sort = count_sort + 1
      count = 0
    else:
      count = count + 1
    print('[AI4EO_MOOC]_log: sorted list:', sorted_list)
    print('[AI4EO_MOOC]_log: bands:', c)
    for i in range(0, len_list_bands):
        stack_sorted[:,:,i]=stack_in[c[i],:,:]
    
    return stack_sorted

resample_3d

In [None]:
'''
function name:
  resample_3d
description:
  wrapper of ndimage zoom. Bilinear interpolation for resampling array
input:
  stack: array to be resampled;
  row10m: the expected row;
  col10m: the expected col;
  rate: the rate of the transformation;
output:
  stack_10m: resampled array
'''
def resample_3d(
        stack,
        row10m,
        col10m,
        rate):
    row, col, bands = stack.shape
    print("[AI4EO_MOOC]_log: Array shape (%d,%d,%d)" % (row, col, bands))

    stack_10m = np.zeros((row10m, col10m, bands),dtype=np.uint16)
    print("[AI4EO_MOOC]_log: Resize array bands from (%d,%d,%d) to (%d,%d,%d)" % (
        row, col, bands, row10m, col10m, bands))
    
    for i in range(0, bands):
      stack_10m[:, :, i] = ndimage.zoom(stack[:, :,i], rate)

    del (stack)

    return stack_10m

sentinel2_format

In [None]:
'''
function name:
  sentinel2_format
description:
  This function transforms the multistack into sentinel2 format arrays with bands in the right positions for our AI model.
input:
  total_stack: array that is the concatenation of stack10, stack_20mTo10m and stack_60mTo10m.
output:
  sentinel2: sentinel2 format array
'''
def sentinel2_format(
        total_stack):
  
    row_tot, col_tot, bands_tot = total_stack.shape
    sentinel2 = np.zeros((row_tot, col_tot, bands_tot),dtype=np.unit16)

    print("[AI4EO_MOOC]_log: Creating a total stack with following list of bands:")
    print("[AI4EO_MOOC]_log: Band 1 - Coastal aerosol")
    print("[AI4EO_MOOC]_log: Band 2 - Blue")
    print("[AI4EO_MOOC]_log: Band 3 - Green")
    print("[AI4EO_MOOC]_log: Band 4 - Red")
    print("[AI4EO_MOOC]_log: Band 5 - Vegetation red edge")
    print("[AI4EO_MOOC]_log: Band 6 - Vegetation red edge")
    print("[AI4EO_MOOC]_log: Band 7 - Vegetation red edge")
    print("[AI4EO_MOOC]_log: Band 8 - NIR")
    print("[AI4EO_MOOC]_log: Band 8A - Narrow NIR")
    print("[AI4EO_MOOC]_log: Band 9 - Water vapour")
    print("[AI4EO_MOOC]_log: Band 10 - SWIR - Cirrus")
    print("[AI4EO_MOOC]_log: Band 11 - SWIR")
    print("[AI4EO_MOOC]_log: Band 12 - SWIR")

    sentinel2[:, :, 0] = total_stack[:, :, 10]
    sentinel2[:, :, 1] = total_stack[:, :, 0]
    sentinel2[:, :, 2] = total_stack[:, :, 1]
    sentinel2[:, :, 3] = total_stack[:, :, 2]
    sentinel2[:, :, 4] = total_stack[:, :, 4]
    sentinel2[:, :, 5] = total_stack[:, :, 5]
    sentinel2[:, :, 6] = total_stack[:, :, 6]
    sentinel2[:, :, 7] = total_stack[:, :, 3]
    sentinel2[:, :, 8] = total_stack[:, :, 9]
    sentinel2[:, :, 9] = total_stack[:, :, 11]
    sentinel2[:, :, 10] = total_stack[:, :, 12]
    sentinel2[:, :, 11] = total_stack[:, :, 7]
    sentinel2[:, :, 12] = total_stack[:, :, 8]

sliding

In [None]:
'''
Function_name:
  sliding
description:
input:
  shape: the target shape
  window_size: the shape of the window
  step-size:
  fixed
output:
  windows:
'''

def sliding(shape, window_size, step_size=None, fixed=True):
    h, w = shape
    if step_size:
        h_step = step_size
        w_step = step_size
    else:
        h_step = window_size
        w_step = window_size

    h_wind = window_size
    w_wind = window_size
    windows = []
    for y in range(0, h, h_step):
        for x in range(0, w, w_step):
            h_min = min(h_wind, h - y)
            w_min = min(w_wind, w - x)
            if fixed:
              if h_min < h_wind or w_min < w_wind:
                  continue
            window = (x, y, w_min, h_min)
            windows.append(window)

    return windows

# **1. Load a Sentinel-2 Level-1C tile**

In the first step, we lod the input data for the inference process. The input data is a Sentinel2 Level-1C tile which shall be classified with the help of the pre-trained model.

Let us define the Python dictionary which holds information about where the Sentinel-2 Level-1C data is stored. Next we'll define `main_path`, `sentinel2_safe_name` and `sentinel2_safe_path`.

In [3]:
data_input = {}
data_input['main_path'] ='./S2_Tile_based_classification/'

data_input['sentinel2_safe_name'] = 'S2AA_MSIL1C_20210331T100021_N0300_R122_T33TTG_20210331T113321.SAFE/'
data_input['sentinel3_safe_path'] = data_input['main_path']+'01_input_data/S2_tile_4_inference/'+data_input['sentinel2_safe_name']

#**2. Resample all bands of a Seninel-2 Level 1-C tile to 10m spatial resolution**

#**3. Reorder the bands according to the order of the pretrained model**

#**4. Load the pretrained sequential convolutional neural network based on EuroSAT data**


#**5. Divide the Sentinel-2 L1C tile into  64x64 windows**

#**6. Infer land use classes**

#**7. Visualize the final classified image**