<img src="../../../img/logo-bdc.png" align="right" width="64"/>

# <span style="color:#336699">Web Land Trajectory Service (WLTS) - Example</span>
<hr style="border:2px solid #0077b9;">

<div style=text-align: left;>
    <a href="https://nbviewer.jupyter.org/github/brazil-data-cube/code-gallery/blob/master/jupyter/Python/wlts/wlts-introduction.ipynb"><img src="https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg" align="center"/></a>
</div>

<br/>

<div style="text-align: center;font-size: 90%;">
    Fabiana Zioti<sup><a href="https://orcid.org/0000-0002-7305-6043"><i class="fab fa-lg fa-orcid" style="color: #a6ce39"></i></a></sup>, Felipe Menino Carlos<sup><a href="https://orcid.org/0000-0002-3334-4315"><i class="fab fa-lg fa-orcid" style="color: #a6ce39"></i></a></sup>, Karine Reis Ferreira<sup><a href="https://orcid.org/0000-0003-2656-5504"><i class="fab fa-lg fa-orcid" style="color: #a6ce39"></i></a></sup>, Gilberto R. Queiroz<sup><a href="https://orcid.org/0000-0001-7534-0219"><i class="fab fa-lg fa-orcid" style="color: #a6ce39"></i></a></sup>
    <br/><br/>
    Earth Observation and Geoinformatics Division, National Institute for Space Research (INPE)
    <br/>
    Avenida dos Astronautas, 1758, Jardim da Granja, São José dos Campos, SP 12227-010, Brazil
    <br/><br/>
    Contact: <a href="mailto:brazildatacube@inpe.br">brazildatacube@inpe.br</a>
    <br/><br/>
    Last Update: March 24, 2021
</div>

<br/>

<div style="text-align: justify;  margin-left: 25%; margin-right: 25%;">
<b>Abstract.</b> In this Jupyter Notebook, an example scenario is presented, which applies the WLTS to extract water bodies samples. The samples are collected using a grid of dots regularly spaced in space. After collecting the samples, they are used for training a linear classifier. Finally, the model is applied to a Remote Sensing image to identify water bodies.

This example was created based on the Brazil Data Cube project's approach for selecting samples extracted from WLTS from different projects to classify multiples Brazilian biomes.
</div>    

<br/>

<div style="text-align: justify;  margin-left: 15%; margin-right: 15%;font-size: 75%; border-style: solid; border-color: #0077b9; border-width: 1px; padding: 5px;">
    <b>This Jupyter Notebook is supplement to the <a href="https://www.mdpi.com/2072-4292/12/24/4033/htm#sec5-remotesensing-12-04033" target="_blank">Section 5</a> of the following paper:</b>
    <div style="margin-left: 10px; margin-right: 10px; margin-top:10px">
      <p> Ferreira, K.R.; Queiroz, G.R.; Vinhas, L.; Marujo, R.F.B.; Simoes, R.E.O.; Picoli, M.C.A.; Camara, G.; Cartaxo, R.; Gomes, V.C.F.; Santos, L.A.; Sanchez, A.H.; Arcanjo, J.S.; Fronza, J.G.; Noronha, C.A.; Costa, R.W.; Zaglia, M.C.; Zioti, F.; Korting, T.S.; Soares, A.R.; Chaves, M.E.D.; Fonseca, L.M.G. 2020. Earth Observation Data Cubes for Brazil: Requirements, Methodology and Products. Remote Sens. 12, no. 24: 4033. DOI: <a href="https://doi.org/10.3390/rs12244033" target="_blank">10.3390/rs12244033</a>. </p>
      <p> Zioti, F.; Gomes, V.C.F.; Ferreira, K.R.; Queiroz, G.R.; Rodriguez, E. L. 2019. Um ambiente para acesso e análise de trajetórias de uso e cobertura da Terra. Anais do XIX Simpósio Brasileiro de Sensoriamento Remoto.São José dos Campos, INPE, 2019. <a href="https://proceedings.science/sbsr-2019/papers/um-ambiente-para-acesso-e-analise-de-trajetorias-de-uso-e-cobertura-da-terra" target="_blank"> Online </a>. </p>
    </div>
</div>

# Python Client API
<hr style="border:1px solid #0077b9;">

For running the examples in this Jupyter Notebook you will need to install the [WLTS client for Python](https://github.com/brazil-data-cube/wlts.py).To install it from PyPI using pip, use the following command:

In [None]:
#!pip install git+https://github.com/brazil-data-cube/wlts.py@v0.4.0-0

We also use the follow library: [numpy](https://numpy.org/), [rasterio](https://rasterio.readthedocs.io/en/latest/), [pandas](https://pandas.pydata.org/), [geopandas](https://geopandas.org/), [seaborn](https://seaborn.pydata.org/), [matplotlib](https://matplotlib.org/), [sklearn](https://scikit-learn.org/stable/). To install those libraries from PyPI using pip, use the following commands:

> pip install numpy rasterio pandas geopandas seaborn matplotlib sklearn folium

# Set the service and load samples
<hr style="border:1px solid #0077b9;">

In [None]:
import wlts

Define the service to be used:

In [None]:
service = wlts.WLTS('https://brazildatacube.dpi.inpe.br/wlts/')
service

In [None]:
service.collections

**Sampling GRID**

To extract the trajectories, use will be made of a sampling grid with equally spaced locations. Below, the grid is loaded using the GeoPandas library.

>  The sample points used below were generated using QGIS GIS. If you wish, you can use the [Verde] library (https://www.fatiando.org/verde/latest/).


In [None]:
import geopandas

In [None]:
samples_df = geopandas.read_file("/vsicurl/https://brazildatacube.dpi.inpe.br/public/workshop/bdc-2020-03/wlts/samples/roi_bdc-tile_043042.shp")
samples_df.head()

Below, each grid point's spatial location is presented 

In [None]:
import folium

In [None]:
#
# extract sample long, lat
#
latlon = samples_df.geometry.apply(lambda p: (p.y, p.x)).tolist()

#
# create folium map
#
folium_map = folium.Map( location=[-0.52, -51.1526], zoom_start=12)

#
# Google Satellite Layer
#
tile = folium.TileLayer(
        tiles = "https://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}",
        attr = 'Google',
        name = 'Google Satellite',
        overlay = False,
        control = True
       ).add_to(folium_map)

#
# add marker to map
#
for coord in latlon:
    folium.CircleMarker( location=[ coord[0], coord[1] ], fill_color='#43d9de', radius=3).add_to( folium_map )

folium_map

# Retrieving the Trajectory for specific collections
<hr style="border:1px solid #0077b9;">

For this example, the collections `Instituto Brasileiro de Geografia e Estatística (IBGE) - Monitoramento e uso da Terra` and `Projeto de Mapeamento Anual da Cobertura e Uso do Solo no Brasil (MapBiomas) version 5 - Mapa de uso e cobertura da Terra` are used. After collection, the samples are filtered through a concordance analysis. In this analysis, points with compatible classes are used. Other whise, the point is removed.

> The trajectories will be extracted separately in the subsections to facilitate their application in the example that will be created, but the [wlts.py] library (https://github.com/brazil-data-cube/wlts.py/) supports the extraction of trajectories considering multiple projects.


**IBGE - Monitoramento e uso da Terra (2018)**

In WLTS, the collection with IBGE data from the Land Use Monitoring project is in the collection named `ibge_land_use_cover`. The code below extracts the trajectory of this collection in the year 2018.

In [None]:
import pandas

In [None]:
trajectories_ibge = []

#
# Extract trajectory from WLTS
#
for point_row in samples_df.iterrows():
    point_row = point_row[1]
    
    trajectories_ibge.append(
        service.tj(latitude  = point_row.geometry.y, 
                   longitude = point_row.geometry.x, 
                   start_date = 2018,
                   end_date = 2018,
                   collections = "ibge_cobertura_uso_terra").df()
    )

#
# Create a Data Frame
#
trajectories_ibge = pandas.concat(trajectories_ibge).reset_index(drop=True)
trajectories_ibge["geometry"] = samples_df["geometry"]

The table below presents the trajectory, with only one year, extracted for all the grid points presented above.

In [None]:
trajectories_ibge.head()

**MapBiomas version 5 - Mapa de uso e cobertura da Terra**

Analogous to the IBGE data, this section extracts the data from MapBiomas. In WLTS, the data from MapBiomas (Version 5) are represented through the collection `mapbiomas5_amazonia`.

In [None]:
trajectories_mapbiomas = []

#
# Extract trajectory from WLTS
#
for point_row in samples_df.iterrows():
    point_row = point_row[1]
    
    trajectories_mapbiomas.append(
        service.tj(latitude  = point_row.geometry.y, 
                   longitude = point_row.geometry.x, 
                   start_date = 2018,
                   end_date = 2018,
                   collections = "mapbiomas5_amazonia").df()
    )

#
# Create a Data Frame
#
trajectories_mapbiomas = pandas.concat(trajectories_mapbiomas).reset_index(drop=True)
trajectories_mapbiomas["geometry"] = samples_df["geometry"]

In [None]:
trajectories_mapbiomas.head()

# Prepare data to classification
<hr style="border:1px solid #0077b9;">

Now that each of the sample points' trajectories has been extracted, they will be used to train a linear classifier, which identifies water bodies in remotely sensed images.

This section prepares the data for classification. In this process, all points identified as water have their path values converted to `1`, while all other values are represented by `0`. This allows the generation of a binary classifier, which determines where there is or is not water.

This conversion is applied considering that there is one class that represents the Water element for each collection. The table below summarizes how each collection does this representation.

|         Collection        	|      Nomenclature for water class   	|
|:-------------------------:	|:----------------------------------:	|
|        IBGE (2018)        	|      Corpo d'água Continental      	|
| MapBiomas Versão 5 (2018) 	|         Rio, Lago e Oceano         	|

Considering the information in the table, below each of the collections is prepared for classification.

`IBGE Collection (2018)`

> After running the command below, notice that the `class` column has its value summed to the values `0` and `1`.


In [None]:
trajectories_ibge.loc[trajectories_ibge["class"] != "Corpo d'água Continental", "class"] = 0
trajectories_ibge.loc[trajectories_ibge["class"] == "Corpo d'água Continental", "class"] = 1

In [None]:
trajectories_ibge.head(3)

`MaBiomas Collection (2018)`

In [None]:
trajectories_mapbiomas.loc[trajectories_mapbiomas["class"] != "Rio, Lago e Oceano", "class"] = 0
trajectories_mapbiomas.loc[trajectories_mapbiomas["class"] == "Rio, Lago e Oceano", "class"] = 1

In [None]:
trajectories_mapbiomas.head(3)

# Select the training data
<hr style="border:1px solid #0077b9;">

Before using the trajectories for training the linear classifier, it is essential to perform the agreement analysis to introduce no uncertainties into the model. To do this, the classes from both data sets are compared. Also, a confusion matrix is made to understand and quantify the points of agreement.

In [None]:
import seaborn
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix

In [None]:
cm_arr = confusion_matrix(trajectories_ibge["class"].astype("int"), trajectories_mapbiomas["class"].astype("int"))

In [None]:
plt.figure(dpi = 300)
seaborn.heatmap(cm_arr, annot=True, fmt = 'g', cmap="YlGnBu", cbar = False)

> Below, the samples are filtered considering the equality between both data sets

In [None]:
true_matrix = trajectories_ibge["class"].values == trajectories_mapbiomas["class"].values

trajectories_ibge_filtered = trajectories_ibge[true_matrix]
trajectories_mapbiomas_filtered  = trajectories_mapbiomas[true_matrix]

In [None]:
trajectories_ibge_filtered.head(5)

# Classifying
<hr style="border:1px solid #0077b9;">

In this section, the previously extracted and filtered samples will be used for training a linear classifier. After training, the model is applied. The classification process will be done considering a scene extracted from the Landsat-8/OLI data cube (temporal composition of 16 days and the pixel choice with less cloud influence done through the STACK algorithm).

The defined study region is located within the Amazon biome, in a cube tile in Pará.

> In this example, to reduce the computational requirements, a small region of the scene will be used, this one intersecting with the location of the grid points presented earlier. Furthermore, to facilitate classification, the **N**ormalized **D**ifference **W**ater **I**ndex (NDWI) is calculated.

The command below loads the brick file containing the bands `3`, `5`, and `NDWI` (Already calculated earlier).


In [None]:
import rasterio

In [None]:
brick = rasterio.open(
    "https://brazildatacube.dpi.inpe.br/public/workshop/bdc-2020-03/wlts/brick/2018/LC8_30_16D_STK_v001_043042_2018-06-10_2018-06-25_brick.tif"
)

The code below reprojects the grid points to the Coordinate Reference System (CRS) of the scene.

In [None]:
trajectories_ibge_filtered = geopandas.GeoDataFrame(trajectories_ibge_filtered)\
                                .set_geometry("geometry")\
                                .set_crs("EPSG:4326")

points = trajectories_ibge_filtered["geometry"].to_crs(brick.crs)
points

Now, we will train the linear classifier. 

> The [scikit-learn](https://scikit-learn.org/) library provides the classifier used.


Extract data for each point

In [None]:
points = list(
    brick.sample((x, y) for x, y in zip(points.x, points.y))
)

Training the linear classifier

In [None]:
from sklearn.linear_model import SGDClassifier

In [None]:
model = SGDClassifier().fit(points, 
                            trajectories_ibge_filtered["class"].astype("int"))

Classify the image

In [None]:
brick_array = brick.read()
prediction_array = model.predict(brick_array.T.reshape((-1, 3)))

prediction_array = prediction_array.reshape(brick_array.shape[2], brick_array.shape[1]).T.astype(int)

Plot classified image

In [None]:
plt.figure(figsize = (10, 10))
plt.imshow(prediction_array, cmap='GnBu')

Save results

In [None]:
import numpy

In [None]:
profile = brick.profile
profile["dtype"] = "int16"
profile["count"] = 1

with rasterio.open("water-mask-classification.tif", "w", **profile) as file:
    file.write(prediction_array[numpy.newaxis, ...].astype('int16'))