# PyEumap - Land-Cover Mapping

In this tutorial, we will use the overlayed points (see [Overlay tutorial](02_overlay.ipynb)) to train a ML-model and predict the land-cover (LC) in the last two decades, using the **LandMapper** class.

The training step will use *elevation*, *slope*, *landsat* (7 spectral bands, 4 seasons and 3 percentiles per year) and *night light* (VIIRS Night Band) data to predict the follow LC classes:
* 211: Non-irrigated arable land
* 311: Broad-leaved forest
* 312: Coniferous forest
* 324: Transitional woodland-shrub
* 411: Inland wetlands
* 512: Water bodies

First, let's import the necessary modules

In [1]:
import sys
sys.path.append('../../')

import os
import gdal
from pathlib import Path
import pandas as pd
import geopandas as gpd
from pyeumap.mapper import LandMapper
from sklearn.ensemble import RandomForestClassifier

## Dataset

Our dataset refers to 1 tile, located in Sweden, extracted from a tiling system created for European Union (7,042 tiles) by GeoHarmonizer Project.

In [2]:
from pyeumap import datasets

tile = datasets.TILES[0]

data_root = datasets.DATA_ROOT_NAME
data_dir = Path(os.getcwd()).joinpath(data_root,tile)

Let's load the overlayed points

In [3]:
fn_points = Path(os.getcwd()).joinpath(data_dir, tile + '_landcover_samples_overlayed.gpkg')
points = gpd.read_file(fn_points)
points

Unnamed: 0,lucas,survey_date,confidence,lc_class,tile_id,overlay_id,dtm_elevation,dtm_slope,landsat_ard_fall_green_p25,landsat_ard_fall_blue_p75,...,landsat_ard_winter_swir1_p50,landsat_ard_winter_red_p50,landsat_ard_winter_nir_p50,landsat_ard_winter_thermal_p25,landsat_ard_winter_swir2_p50,landsat_ard_winter_swir1_p75,landsat_ard_winter_swir1_p25,landsat_ard_winter_thermal_p75,night_lights,geometry
0,True,2012-05-29T00:00:00,100,312,22497,1,239.0,2.946278,8.0,3.0,...,24.0,6.0,48.0,183.0,9.0,24.0,23.0,183.0,0.059711,POINT (4650000.166 4483999.711)
1,True,2012-05-16T00:00:00,100,312,22497,2,391.0,5.559027,8.0,2.0,...,23.0,6.0,55.0,182.0,8.0,24.0,23.0,183.0,0.009072,POINT (4650000.255 4471999.472)
2,False,2012-06-30T00:00:00,85,411,22497,3,416.0,1.666667,14.0,6.0,...,49.0,17.0,68.0,184.0,23.0,49.0,49.0,184.0,0.030172,POINT (4650097.582 4470351.405)
3,False,2012-06-30T00:00:00,85,312,22497,4,357.0,8.700255,7.0,2.0,...,21.0,4.0,39.0,182.0,8.0,21.0,20.0,182.0,-0.012674,POINT (4651001.339 4472046.880)
4,False,2012-06-30T00:00:00,85,312,22497,5,133.0,23.109041,8.0,3.0,...,24.0,6.0,47.0,183.0,8.0,24.0,23.0,183.0,0.200448,POINT (4651217.720 4488931.650)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
675,True,2015-10-29T00:00:00,100,312,22497,6,207.0,17.834112,8.0,4.0,...,20.0,5.0,44.0,183.0,8.0,21.0,20.0,183.0,0.859678,POINT (4668000.057 4493999.655)
676,True,2015-10-29T00:00:00,100,312,22497,7,172.0,21.122986,7.0,2.0,...,19.0,5.0,46.0,183.0,6.0,19.0,18.0,183.0,0.925818,POINT (4670000.156 4491999.887)
677,True,2015-07-27T00:00:00,100,511,22497,8,113.0,0.000000,7.0,3.0,...,14.0,4.0,19.0,182.0,4.0,15.0,12.0,183.0,0.900600,POINT (4662000.000 4478000.000)
678,True,2015-10-29T00:00:00,100,311,22497,9,40.0,15.023130,7.0,3.0,...,9.0,4.0,14.0,182.0,3.0,10.0,8.0,183.0,2.580281,POINT (4676000.000 4488000.000)


What are the columns avaiable to the ML-model ?

In [4]:
print("Columns:")
columns = []
for col_name, col_type in zip(points.columns, points.dtypes):
    print(f' - {col_name} ({col_type})')

Columns:
 - lucas (bool)
 - survey_date (object)
 - confidence (object)
 - lc_class (int64)
 - tile_id (int64)
 - overlay_id (int64)
 - dtm_elevation (float64)
 - dtm_slope (float64)
 - landsat_ard_fall_green_p25 (float64)
 - landsat_ard_fall_blue_p75 (float64)
 - landsat_ard_fall_blue_p50 (float64)
 - landsat_ard_fall_green_p75 (float64)
 - landsat_ard_spring_blue_p25 (float64)
 - landsat_ard_fall_blue_p25 (float64)
 - landsat_ard_fall_thermal_p50 (float64)
 - landsat_ard_fall_nir_p75 (float64)
 - landsat_ard_fall_red_p50 (float64)
 - landsat_ard_fall_green_p50 (float64)
 - landsat_ard_fall_nir_p50 (float64)
 - landsat_ard_fall_swir1_p25 (float64)
 - landsat_ard_fall_nir_p25 (float64)
 - landsat_ard_fall_thermal_p75 (float64)
 - landsat_ard_fall_red_p25 (float64)
 - landsat_ard_fall_swir2_p75 (float64)
 - landsat_ard_fall_swir1_p75 (float64)
 - landsat_ard_spring_nir_p25 (float64)
 - landsat_ard_fall_thermal_p25 (float64)
 - landsat_ard_fall_swir2_p25 (float64)
 - landsat_ard_spring_b

## Training 

To map the land-cover classes we will use LandMapper, which will train a ML-model and do the space time prediction. The LandMapper receives the follow parameters:
* *fn_points*: the geopackage filepath or [GeoPandas DataFrame](https://geopandas.org/reference/geopandas.GeoDataFrame.html) instance
* *feat_col_prfxs*: the prefix of all columns that should be included as covariates in the feature space 
* *target_col*: the name of the column that should be considered as the target variable by the model
* *estimator*: The model implementation, which could be any one available in the [sklearn](https://scikit-learn.org/stable/modules/classes.html) 
* *val_samples*: The sample proportion that should be used by validation
* *min_samples_per_class*: The minimum sample proportion per class. For example, all the classes with less than 5% of samples will be removed from the training.

In [5]:
feat_col_prfxs = ['landsat', 'dtm', 'night_lights']
target_col = 'lc_class'
estimator = RandomForestClassifier(n_estimators=100)

landmapper = LandMapper(fn_points, feat_col_prfxs, target_col, 
                        estimator=estimator, 
                        val_samples_pct=0.5, 
                        min_samples_per_class=0.05,
                        verbose = False
)

Let's train the model

In [6]:
landmapper.train()

and check the summary of the model performance:

In [7]:
print(f'Overall accuracy: {landmapper.overall_acc * 100:.2f}%\n\n')
print(landmapper.classification_report)

Overall accuracy: 90.32%


              precision    recall  f1-score   support

       211.0       0.97      0.94      0.95        62
       311.0       0.89      0.40      0.55        20
       312.0       0.80      0.96      0.87        89
       324.0       0.90      0.97      0.93        29
       411.0       0.96      0.93      0.94        55
       512.0       0.98      0.91      0.94        55

    accuracy                           0.90       310
   macro avg       0.92      0.85      0.87       310
weighted avg       0.91      0.90      0.90       310



It's possible also access the confusion matrix:

In [8]:
landmapper.cm

array([[58,  0,  2,  0,  2,  0],
       [ 0,  8, 10,  1,  0,  1],
       [ 2,  1, 85,  1,  0,  0],
       [ 0,  0,  1, 28,  0,  0],
       [ 0,  0,  3,  1, 51,  0],
       [ 0,  0,  5,  0,  0, 50]])

## Predictions

Now we are ready to run the predictions. To do it the LandMapper shoudl receive as parameter:
* *dirs_layers*: a file path list to access all the raster layers used by training phase.
* *fn_result*: The file path to write the model output
* *data_type*: The gdal data type for the output file

First, let's predict only the year of 2000:

In [9]:
dir_timeless_layers = os.path.join(data_dir, 'timeless') 
dir_2000_layers = os.path.join(data_dir, '2000')

dirs_layers = [dir_2000_layers, dir_timeless_layers]
fn_result = os.path.join('land_cover_2000.tif')
data_type = gdal.GDT_Int16

landmapper.predict(dirs_layers, fn_result, data_type)

To predict the other years we will call the same method changing the dirs_layers and fn_result parameters:

In [None]:
dir_timeless_layers = os.path.join(data_dir, 'timeless') 

for year in range(2001, 2020):
    dir_time_layers = os.path.join(data_dir, str(year))
    dirs_layers = [dir_time_layers, dir_timeless_layers]
    fn_result = os.path.join(f'land_cover_{year}.tif')
    
    print(f"Predicting the land-cover for {year} and saving the result in {fn_result}")
    landmapper.predict(dirs_layers, fn_result, data_type)

Predicting the land-cover for 2001 and saving the result in land_cover_2001.tif
Predicting the land-cover for 2002 and saving the result in land_cover_2002.tif
Predicting the land-cover for 2003 and saving the result in land_cover_2003.tif
Predicting the land-cover for 2004 and saving the result in land_cover_2004.tif
Predicting the land-cover for 2005 and saving the result in land_cover_2005.tif
Predicting the land-cover for 2006 and saving the result in land_cover_2006.tif
Predicting the land-cover for 2007 and saving the result in land_cover_2007.tif
Predicting the land-cover for 2008 and saving the result in land_cover_2008.tif
Predicting the land-cover for 2009 and saving the result in land_cover_2009.tif
Predicting the land-cover for 2010 and saving the result in land_cover_2010.tif
Predicting the land-cover for 2011 and saving the result in land_cover_2011.tif
Predicting the land-cover for 2012 and saving the result in land_cover_2012.tif
