<img style="float: left; margin:0px 15px 15px 0px; width:120px" src="https://www.orfeo-toolbox.org/wp-content/uploads/2016/03/logo-orfeo-toolbox.png">

# OTB Guided Tour - FOSS4G 2019 Bucharest 
## Yannick TANGUY and David YOUSSEFI (CNES, French Space Agency)

<br>

<b> Press <span style="color:black;background:yellow">SHIFT+ENTER</span> to execute the notebook interactively cell by cell </b></div>

## Let's play with OTB classification framework to produce a land cover classification

In [1]:
import otbApplication
import os
import utils
import numpy as np

# Data directory
DATA_DIR = "data"

# Our dataset contains polygons, of 4 different land cover classes : urban, cultures, forest, sea/water
DB = "GroundTruth_4_classes.sqlite"

# Associates a RGB color to each class
CMAP = "4_classes_colormap.txt"

# Output directory
OUTPUT_DIR = "output"

# Input / Output filenames
images_list = utils.list_images(DATA_DIR,"xt_SENTINEL2B",".tif")
training_set = DATA_DIR+"/"+DB
colormap = DATA_DIR+"/"+CMAP
image_stack = OUTPUT_DIR+"/"+"image_stack.tif"
stats = OUTPUT_DIR+"/"+"stats.xml"
samples = OUTPUT_DIR+"/"+"samples.sqlite"
rf_model = OUTPUT_DIR+"/"+"rf_model.txt"
classif = OUTPUT_DIR+"/"+"classif_4_classes.tif"
confmatrix = OUTPUT_DIR+"/"+"conf_matrix.tif"

3  images will be used


### First, make an image stack with all our images
We will work on a (small..) time serie of Sentinel2 images

In [2]:
app = otbApplication.Registry.CreateApplication("ConcatenateImages")
app.SetParameterStringList("il",images_list)
app.SetParameterString("out",image_stack)
app.ExecuteAndWriteOutput()

2019-07-30 09:40:22 (INFO) ConcatenateImages: Default RAM limit for OTB is 256 MB
2019-07-30 09:40:22 (INFO) ConcatenateImages: GDAL maximum cache size is 12847 MB
2019-07-30 09:40:22 (INFO) ConcatenateImages: OTB will use at most 48 threads
2019-07-30 09:40:22 (INFO): Estimated memory for full processing: 917.413MB (avail.: 256 MB), optimal image partitioning: 4 blocks
2019-07-30 09:40:22 (INFO): File output/image_stack.tif will be written in 5 blocks of 2410x417 pixels
Writing output/image_stack.tif...: 100% [**************************************************] (3s)


0

### Then, we compute stats on our ground-truth polygons
The aim is to observe how the different classes are represented over the input region...

In [3]:
app = otbApplication.Registry.CreateApplication("PolygonClassStatistics")
app.SetParameterString("in",image_stack)
app.SetParameterString("vec",training_set)
# OTB Python API trick : we need to call "UpdateParameters" so OTB opens the training set
# and list the possible field names...
app.UpdateParameters()
app.SetParameterString("field","code")
app.SetParameterString("out",stats)
app.ExecuteAndWriteOutput()

2019-07-30 09:40:25 (INFO) PolygonClassStatistics: Default RAM limit for OTB is 256 MB
2019-07-30 09:40:25 (INFO) PolygonClassStatistics: GDAL maximum cache size is 12847 MB
2019-07-30 09:40:25 (INFO) PolygonClassStatistics: OTB will use at most 48 threads
2019-07-30 09:40:25 (INFO) PolygonClassStatistics: Elevation management: setting default height above ellipsoid to 0 meters
2019-07-30 09:40:25 (INFO): Estimated memory for full processing: 458.478MB (avail.: 256 MB), optimal image partitioning: 2 blocks
2019-07-30 09:40:25 (INFO): Estimation will be performed in 3 blocks of 2410x694 pixels
Analyze polygons...: 100% [**************************************************] (8s)


0

### Display stats file

In [4]:
stats_file = open(stats,"r") 
for cpt in range(8):
    print(stats_file.readline())

<?xml version="1.0" ?>

<GeneralStatistics>

    <Statistic name="samplesPerClass">

        <StatisticMap key="1" value="437540" />

        <StatisticMap key="2" value="1650038" />

        <StatisticMap key="3" value="147831" />

        <StatisticMap key="4" value="468701" />

    </Statistic>



###  We observe that the class "3" (forest) is less represented
* we will limit the number of samples per class to this value (or lower)
* a lot of other strategies exist !!

Let's choose 80 000 samples by class

In [5]:
app = otbApplication.Registry.CreateApplication("SampleSelection")
app.SetParameterString("in",image_stack)
app.SetParameterString("vec",training_set)
# OTB Python API trick : we need to call "UpdateParameters" so OTB opens the training set
# and list the possible field names...
app.UpdateParameters()
app.SetParameterString("field","code")
app.SetParameterString("instats",stats)
app.SetParameterString("strategy","constant")
app.SetParameterInt("strategy.constant.nb",80000)
app.SetParameterString("out",samples)
app.ExecuteAndWriteOutput()

2019-07-30 09:40:33 (INFO) SampleSelection: Default RAM limit for OTB is 256 MB
2019-07-30 09:40:33 (INFO) SampleSelection: GDAL maximum cache size is 12847 MB
2019-07-30 09:40:33 (INFO) SampleSelection: OTB will use at most 48 threads
2019-07-30 09:40:33 (INFO) SampleSelection: Elevation management: setting default height above ellipsoid to 0 meters
2019-07-30 09:40:33 (INFO) SampleSelection: Sampling strategy : set a constant number of samples for all classes
2019-07-30 09:40:33 (INFO) SampleSelection: Sampling rates :  className  requiredSamples  totalSamples  rate
1	80000	437540	0.18284
2	80000	1650038	0.0484837
3	80000	147831	0.541158
4	80000	468701	0.170685

2019-07-30 09:40:33 (INFO): Estimated memory for full processing: 458.478MB (avail.: 256 MB), optimal image partitioning: 2 blocks
2019-07-30 09:40:33 (INFO): Estimation will be performed in 4 blocks of 1584x1584 pixels
Selecting positions with periodic sampler...: 100% [**************************************************] (15

0

###  Now we have to extract values for each sample, from our image stack

Each sample (a pixel of 10m by 10m) will be updated with the time serie values (Blue band for image 1, Green band for image 1, ..., Near Infrared band for image 3)

In [6]:
app = otbApplication.Registry.CreateApplication("SampleExtraction")
app.SetParameterString("in",image_stack)
app.SetParameterString("vec",samples)
# OTB Python API trick : we need to call "UpdateParameters" so OTB opens the training set
# and list the possible field names...
app.UpdateParameters()
app.SetParameterString("field","code")
app.SetParameterString("outfield","prefix")
app.SetParameterString("outfield.prefix.name","band_")
app.ExecuteAndWriteOutput()

2019-07-30 09:40:49 (INFO) SampleExtraction: Default RAM limit for OTB is 256 MB
2019-07-30 09:40:49 (INFO) SampleExtraction: GDAL maximum cache size is 12847 MB
2019-07-30 09:40:49 (INFO) SampleExtraction: OTB will use at most 48 threads
2019-07-30 09:40:49 (INFO): Estimated memory for full processing: 687.946MB (avail.: 256 MB), optimal image partitioning: 3 blocks
2019-07-30 09:40:49 (INFO): Estimation will be performed in 4 blocks of 2410x521 pixels
Extracting sample values...: 100% [**************************************************] (25s)


0

###  We can now train our classifier !
* Let's choose a Random Forest classifier
* it will learn from our sample set 
* it needs to know :
    * the name of the field representing the classes
    * the names of the parameters to learn from

In [None]:
# workaround because the UpdateParameters() did not seem to work fine for this app !
import subprocess
args = ['otbcli_TrainVectorClassifier', '-io.vd', samples, 
        '-cfield', 'code', '-io.out', rf_model, '-classifier', 'rf', 
        '-classifier.rf.cat', '4', '-feat', 'band_0', 'band_1', 'band_2', 
        'band_3', 'band_4', 'band_5', 'band_6', 'band_7', 'band_8', 
        'band_9', 'band_10', 'band_11','-io.confmatout',confmatrix]
subprocess.call(args)

###  The confusion matrix give us some stats on the learning step

In [None]:
stats_learn = open(confmatrix,"r") 
#for line in stats_learn:
#    print(line)
    
tab_val = []
for i in range(6):
    line = stats_learn.readline()
    if (i>1):
        datas = np.array(line.split(',')).astype(int)
        print("Recall class",i-2," \t:",datas[i-2]/datas.sum())
        tab_val.append(datas)
print("---------------------------------------------")
for i in range(4):
    precision = tab_val[i][i]/(tab_val[0][i]+tab_val[1][i]+tab_val[2][i]+tab_val[3][i])
    print("Precision class",i, " \t:",precision)

In [None]:
app = otbApplication.Registry.CreateApplication("ImageClassifier")
app.SetParameterString("in",image_stack)
app.SetParameterString("model",rf_model)
app.SetParameterString("out",classif)
app.Execute()

app2 = otbApplication.Registry.CreateApplication("ColorMapping")
app2.SetParameterInputImage("in",app.GetParameterOutputImage("out"))
app2.SetParameterString("method","custom")
app2.SetParameterString("method.custom.lut",colormap)
app2.SetParameterString("out",classif)
app2.ExecuteAndWriteOutput()

In [None]:
import rasterio
import display_api

raster = rasterio.open(classif)

m, dc = display_api.rasters_on_map([raster, rasterio.open(images_list[0])], OUTPUT_DIR, ["Classification 4 classes","Image S2B"])
m