# PyEumap - Land-Cover Mapping

In this tutorial, we will use the overlayed points (see [Overlay tutorial](02_overlay.ipynb)) to train a ML-model and predict the land-cover (LC) in the last two decades, using the **LandMapper** class.

The training step will use *elevation*, *slope*, *landsat* (7 spectral bands, 4 seasons and 3 percentiles per year) and *night light* (VIIRS Night Band) data to predict the follow LC classes:
* 211: Non-irrigated arable land
* 311: Broad-leaved forest
* 312: Coniferous forest
* 324: Transitional woodland-shrub
* 411: Inland wetlands
* 512: Water bodies

First, let's import the necessary modules

In [11]:
import sys
sys.path.append('../../')

import os
import gdal
from pathlib import Path
import pandas as pd
import geopandas as gpd
from pyeumap.mapper import LandMapper
from sklearn.ensemble import RandomForestClassifier

## Dataset

Our dataset refers to 1 tile, located in Sweden, extracted from a tiling system created for European Union (7,042 tiles) by GeoHarmonizer Project.

In [19]:
from pyeumap import datasets

tile = datasets.TILES[1]

data_root = datasets.DATA_ROOT_NAME
data_dir = Path(os.getcwd()).joinpath(data_root,tile)

PosixPath('/home/leandro/Code/eumap/demo/python/eumap_data/9529_croatia')

Let's load the overlayed points

In [21]:
fn_points = Path(os.getcwd()).joinpath(data_dir, tile + '_landcover_samples_overlayed.gpkg')
print(fn_points)
points = gpd.read_file(fn_points)
points

/home/leandro/Code/eumap/demo/python/eumap_data/9529_croatia/9529_croatia_landcover_samples_overlayed.gpkg


Unnamed: 0,confidence,dtm_elevation,dtm_slope,landsat_ard_fall_blue_p25,landsat_ard_fall_blue_p50,landsat_ard_fall_blue_p75,landsat_ard_fall_count,landsat_ard_fall_green_p25,landsat_ard_fall_green_p50,landsat_ard_fall_green_p75,...,landsat_ard_winter_thermal_p25,landsat_ard_winter_thermal_p50,landsat_ard_winter_thermal_p75,lc_class,lucas,night_lights,overlay_id,survey_date,tile_id,geometry
0,100.0,721.0,3.952847,6.0,6.0,6.0,3.0,14.0,14.0,14.0,...,185.0,186.0,186.0,311,True,0.180370,1,2015-06-19T00:00:00+00:00,9529,POINT (4770000.000 2404000.000)
1,100.0,630.0,0.000000,6.0,6.0,7.0,3.0,14.0,14.0,15.0,...,187.0,187.0,187.0,321,True,0.000390,2,2015-06-19T00:00:00+00:00,9529,POINT (4772000.000 2406000.000)
2,100.0,684.0,5.803495,8.0,8.0,9.0,2.0,17.0,17.0,18.0,...,186.0,186.0,186.0,322,True,0.615905,3,2015-06-26T00:00:00+00:00,9529,POINT (4772000.000 2424000.000)
3,100.0,648.0,10.307764,5.0,6.0,6.0,3.0,14.0,14.0,15.0,...,188.0,188.0,188.0,311,True,0.293385,4,2015-06-19T00:00:00+00:00,9529,POINT (4772000.000 2402000.000)
4,100.0,438.0,9.718253,5.0,5.0,5.0,2.0,13.0,14.0,15.0,...,187.0,188.0,189.0,322,True,0.535678,5,2015-10-20T00:00:00+00:00,9529,POINT (4788000.000 2422000.000)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
754,85.0,666.0,25.873625,3.0,3.0,3.0,2.0,11.0,11.0,12.0,...,186.0,186.0,186.0,231,False,0.059816,176,2012-06-30T00:00:00+00:00,9529,POINT (4796422.302 2416657.855)
755,85.0,702.0,6.718548,5.0,6.0,6.0,2.0,14.0,14.0,14.0,...,180.0,180.0,180.0,231,False,0.039074,177,2012-06-30T00:00:00+00:00,9529,POINT (4797303.915 2404796.255)
756,85.0,560.0,6.508542,5.0,5.0,5.0,1.0,15.0,15.0,15.0,...,182.0,182.0,182.0,231,False,0.051485,178,2012-06-30T00:00:00+00:00,9529,POINT (4797638.685 2415216.855)
757,85.0,658.0,17.159384,3.0,3.0,3.0,1.0,11.0,11.0,11.0,...,183.0,183.0,183.0,324,False,0.025304,179,2012-06-30T00:00:00+00:00,9529,POINT (4798052.494 2405145.480)


What are the columns avaiable to the ML-model ?

In [22]:
print("Columns:")
columns = []
for col_name, col_type in zip(points.columns, points.dtypes):
    print(f' - {col_name} ({col_type})')

Columns:
 - confidence (float64)
 - dtm_elevation (float64)
 - dtm_slope (float64)
 - landsat_ard_fall_blue_p25 (float64)
 - landsat_ard_fall_blue_p50 (float64)
 - landsat_ard_fall_blue_p75 (float64)
 - landsat_ard_fall_count (float64)
 - landsat_ard_fall_green_p25 (float64)
 - landsat_ard_fall_green_p50 (float64)
 - landsat_ard_fall_green_p75 (float64)
 - landsat_ard_fall_nir_p25 (float64)
 - landsat_ard_fall_nir_p50 (float64)
 - landsat_ard_fall_nir_p75 (float64)
 - landsat_ard_fall_red_p25 (float64)
 - landsat_ard_fall_red_p50 (float64)
 - landsat_ard_fall_red_p75 (float64)
 - landsat_ard_fall_swir1_p25 (float64)
 - landsat_ard_fall_swir1_p50 (float64)
 - landsat_ard_fall_swir1_p75 (float64)
 - landsat_ard_fall_swir2_p25 (float64)
 - landsat_ard_fall_swir2_p50 (float64)
 - landsat_ard_fall_swir2_p75 (float64)
 - landsat_ard_fall_thermal_p25 (float64)
 - landsat_ard_fall_thermal_p50 (float64)
 - landsat_ard_fall_thermal_p75 (float64)
 - landsat_ard_spring_blue_p25 (float64)
 - landsa

## Training 

To map the land-cover classes we will use LandMapper, which will train a ML-model and do the space time prediction. The LandMapper receives the follow parameters:
* *fn_points*: the geopackage filepath or [GeoPandas DataFrame](https://geopandas.org/reference/geopandas.GeoDataFrame.html) instance
* *feat_col_prfxs*: the prefix of all columns that should be included as covariates in the feature space 
* *target_col*: the name of the column that should be considered as the target variable by the model
* *estimator*: The model implementation, which could be any one available in the [sklearn](https://scikit-learn.org/stable/modules/classes.html) 
* *val_samples*: The sample proportion that should be used by validation
* *min_samples_per_class*: The minimum sample proportion per class. For example, all the classes with less than 5% of samples will be removed from the training.

In [23]:
feat_col_prfxs = ['landsat', 'dtm', 'night_lights']
target_col = 'lc_class'
estimator = RandomForestClassifier(n_estimators=100)

landmapper = LandMapper(fn_points, feat_col_prfxs, target_col, 
                        estimator=estimator, 
                        val_samples_pct=0.5, 
                        min_samples_per_class=0.05,
                        verbose = False
)

[02:40:43] Filling the missing values (0.71% / 458 values)...


Let's train the model

In [24]:
landmapper.train()

and check the summary of the model performance:

In [26]:
print(f'Overall accuracy: {landmapper.overall_acc * 100:.2f}%\n\n')
print(landmapper.classification_report)

Overall accuracy: 63.59%


              precision    recall  f1-score   support

       231.0       0.68      0.56      0.62       103
       311.0       0.76      0.52      0.62        61
       312.0       0.86      0.95      0.90        20
       321.0       0.57      0.25      0.35        32
       324.0       0.57      0.78      0.66       141

    accuracy                           0.64       357
   macro avg       0.69      0.61      0.63       357
weighted avg       0.65      0.64      0.63       357



It's possible also access the confusion matrix:

In [27]:
landmapper.cm

array([[ 58,   2,   0,   1,  42],
       [  3,  32,   2,   0,  24],
       [  0,   1,  19,   0,   0],
       [  6,   0,   0,   8,  18],
       [ 18,   7,   1,   5, 110]])

## Predictions

Now we are ready to run the predictions. To do it the LandMapper shoudl receive as parameter:
* *dirs_layers*: a file path list to access all the raster layers used by training phase.
* *fn_result*: The file path to write the model output
* *data_type*: The gdal data type for the output file

First, let's predict only the year of 2000:

In [29]:
dir_timeless_layers = os.path.join(data_dir, 'timeless') 
dir_2000_layers = os.path.join(data_dir, '2000')

dirs_layers = [dir_2000_layers, dir_timeless_layers]
fn_result = os.path.join('land_cover_2000.tif')
data_type = gdal.GDT_Int16

landmapper.predict(dirs_layers, fn_result, data_type)

[02:41:52] Filling the missing values (0.62% / 560113 values)...


To predict the other years we will call the same method changing the dirs_layers and fn_result parameters:

In [None]:
dir_timeless_layers = os.path.join(data_dir, 'timeless') 

for year in range(2001, 2003):
    dir_time_layers = os.path.join(data_dir, str(year))
    dirs_layers = [dir_time_layers, dir_timeless_layers]
    fn_result = os.path.join(f'land_cover_{year}.tif')
    
    print(f"Predicting the land-cover for {year} and saving the result in land_cover_{year}.tif")
    landmapper.predict(dirs_layers, fn_result, data_type)

Predicting the land-cover for 2001 and saving the result in land_cover_2001.tif
[02:43:47] Filling the missing values (0.93% / 849897 values)...
Predicting the land-cover for 2002 and saving the result in land_cover_2002.tif
[02:44:16] Filling the missing values (0.74% / 674670 values)...
