Skip to content
Landmap package for R
R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R fixed spsample.prob Sep 10, 2019
data spLearner examples May 24, 2019
man fixed spsample.prob Sep 10, 2019
.Rbuildignore spLearner examples May 24, 2019
.gitignore Initial commit May 24, 2019
DESCRIPTION version Jun 10, 2019
LICENSE Initial commit May 24, 2019
NAMESPACE migrated to mlr Jun 10, 2019
README.md oblique coords Sep 10, 2019
landmap.Rproj spLearner examples May 24, 2019

README.md

landmap package for R

Package provides methodology for automated mapping i.e. spatial interpolation and/or prediction using Ensemble Machine Learning (extends functionality of the mlr package). Key functionality includes:

  • train.spLearner --- train a spatial prediction and/or interpolation model using Ensemble Machine Learning (works with numeric, binomial and factor-type variables),
  • buffer.dist --- derive buffer (geographical) distances that can be used as covariates in spLearner,
  • spc --- derive Principal Components using stack of spatial layers,
  • tile --- tile spatial layers so they can be used to run processing in parallel,
  • download.landgis --- access and download LandGIS layers from www.openlandmap.org,

Warning: most of functions are optimized to run in parallel by default. This might result in high RAM and CPU usage.

Spatial prediction using Ensemble Machine Learning with geographical distances is explained in detail in:

Installing

Install development versions from github:

library(devtools)
install_github("envirometrix/landmap")

Under construction. Use for testing purposes only.

Functionality

Automated mapping using Ensemble Machine Learning

The following examples demostrates spatial prediction using the meuse data set:

library(rgdal)
library(geoR)
library(plotKML)
library(raster)
library(glmnet)
library(xgboost)
library(kernlab)
library(deepnet)
library(Cubist)
demo(meuse, echo=FALSE)
m <- train.spLearner(meuse["lead"], covariates=meuse.grid[,c("dist","ffreq")], lambda = 1)

this runs several steps:

Converting ffreq to indicators...
Converting covariates to principal components...
Deriving oblique coordinates...TRUE
Fitting a variogram using 'linkfit' and trend model...TRUE
Estimating block size ID for spatial Cross Validation...TRUE
Starting parallelization in mode=socket with cpus=8.
Using learners: regr.ranger, regr.ksvm, regr.glmnet, regr.cubist...TRUE
Fitting a spatial learner using 'mlr::makeRegrTask'...TRUE
Exporting objects to slaves for mode socket: .mlr.slave.options
Mapping in parallel: mode = socket; cpus = 8; elements = 5.
Exporting objects to slaves for mode socket: .mlr.slave.options
Mapping in parallel: mode = socket; cpus = 8; elements = 5.
Exporting objects to slaves for mode socket: .mlr.slave.options
Mapping in parallel: mode = socket; cpus = 8; elements = 5.
Exporting objects to slaves for mode socket: .mlr.slave.options
Mapping in parallel: mode = socket; cpus = 8; elements = 5.
Stopped parallelization. All cleaned up.

The variogram model is only fitted to estimate effective range of spatial dependence. Spatial Prediction models are based only on fitting the Ensemble Machine Learning (by default landmap uses c("regr.ranger", "regr.ksvm", "regr.glmnet", "regr.cubist"); see a complete list of learners available via mlr) with oblique coordinates (rotated coordinates) as described in Moller et al. (2019) "Oblique Coordinates as Covariates for Digital Soil Mapping" to account for spatial autocorrelation in values. Geographical distances to ALL points can be added by specifying buffer.dist=TRUE; this is however not recommended for large point data sets. The meta-learning i.e. the SuperLearner model shows which individual learners are most important:

m@spModel$learner.model$super.model$learner.model
Call:  stats::glm(formula = f, family = family, data = d, control = ctrl, 
    model = FALSE)

Coefficients:
(Intercept)  regr.ranger    regr.ksvm  regr.glmnet  regr.cubist  
  -15.79378     -0.06904      1.40024     -0.24365      0.02731  

Degrees of Freedom: 154 Total (i.e. Null);  150 Residual
Null Deviance:	    1908000 
Residual Deviance: 1041000 	AIC: 1818

in this case regr.ksvm seems to be most important for predicting lead concentration, while regr.cubist is the least important. Overall this ensemble model explains ca 45% of variance (based on repeated cross-validation):

rvar = m@spModel$learner.model$super.model$learner.model$deviance
tvar = m@spModel$learner.model$super.model$learner.model$null.deviance
1-rvar/tvar

Next we can generate predictions using:

meuse.lead <- predict(m)

Note that, based on the current set-up with method = "stack.cv", every time you re-run the model training you might get somewhat different models / different betas. On the other hand, the final ensemble predictions (map) should visually not differ too much.

figure Figure: Predicted lead content for the Meuse data set. Model error is derived as weighted standard deviation from multiple model predictions.

Animated predictions by 9 models (3x independently fitted random forest, SVM and Xgboost) looks like this (the coefficients are beta coefficients from the metalearner fit: the higher the coefficient, more important the model for the ensemble merge):

The predictions shown in the image above incorporate spatial correlation between values, and hence can be used as a possible replacement for kriging methods (Hengl et al. 2018). Automation comes, however, at the high computing and RAM usage costs.

In the following example we use somewhat larger data set from the SIC1997 exercise.

data("sic1997")
X <- sic1997$swiss1km[c("CHELSA_rainfall","DEM")]
mR <- train.spLearner(sic1997$daily.rainfall, covariates=X, lambda=1)
rainfall1km <- predict(mR)

The processing is now much more computational because the data set consists from 467 points (hence 467 buffer distance maps need to be produced). This will make the regression matrix becoming extensive, and also 5x3 models need to be fitted. At the moment, using train.spLearner for point data set with >>1000 points should be done with caution.

The final results also shows quite similar results to universal kriging in geoR. The model error map above, however, shows more spatial contrast and helps detect areas of especially high errors.

figure Figure: Predicted daily rainfall for the SIC1997 data set.

The same function can also be used to interpolate factor-type variables:

library(plotKML)
data(eberg_grid)
gridded(eberg_grid) <- ~x+y
proj4string(eberg_grid) <- CRS("+init=epsg:31467")
data(eberg)
coordinates(eberg) <- ~X+Y
proj4string(eberg) <- CRS("+init=epsg:31467")
X <- eberg_grid[c("PRMGEO6","DEMSRT6","TWISRT6","TIRAST6")]
mF <- train.spLearner(eberg["TAXGRSC"], covariates=X, buffer.dist=FALSE)
TAXGRSC <- predict(mF)

figure Figure: Predicted Ebergotzen soil types (probabilities).

Note that in the case of factor variables, prediction are based on ensemble stacking based on the following three classification algorithms c("classif.ranger", "classif.multinom", "classif.svm"). See mlr documentation on how to add additional learners.

In summary: package mlr provides a comprehensive environment for Machine Learning:

  • Ensemble predictions are based on the mlr::makeStackedLearner function,
  • Additional learners can be added,
  • Processing can be parallelized using the parallelMap package,

Ensemble Machine Learning is also available via the subsemble and the SuperLearner packages (not used here).

Accessing LandGIS layers

Landmap package also provides functionality to access and download LandGIS layers from www.openlandmap.org. Recommend process is to first search the coverage ID names and file names e.g.:

search.landgis(pattern=c("clay", "10..10cm"))

This shows that a clay map at 10 cm depth of the world is available via:

[[1]]
                                                   predicted250m.file 
"sol_clay.wfraction_usda.3a1a1a_m_250m_b10..10cm_1950..2017_v0.2.tif" 

[[2]]
[1] "https://www.zenodo.org/api/files/d95e82f3-203d-4ae5-86dc-b1ddd65ff8b2/sol_clay.wfraction_usda.3a1a1a_m_250m_b10..10cm_1950..2017_v0.2.tif" 
[2] "https://www.zenodo.org/api/files/d95e82f3-203d-4ae5-86dc-b1ddd65ff8b2/sol_clay.wfraction_usda.3a1a1a_md_250m_b10..10cm_1950..2017_v0.2.tif"

Web Coverage Service functionality and zenodo.org API are explained in detail here. Next we can download only clay map for Switzerland using the Web Coverage Service functionality of LandGIS:

coverageId = "predicted250m:sol_clay.wfraction_usda.3a1a1a_m_250m_b10..10cm_1950..2017_v0.2"
swiss1km.ll <- raster::projectRaster(raster(sic1997$swiss1km[2]), crs = "+init=epsg:4326", res=c(1/120, 1/120))
s1 = paste0("Lat(", swiss1km.ll@extent@ymin, ",", swiss1km.ll@extent@ymax,")")
s2 = paste0("Long(", swiss1km.ll@extent@xmin, ",", swiss1km.ll@extent@xmax,")")
sf = (1/480)/(1/120)
download.landgis(coverageId, filename = "clay_ch1km.tif", subset = c(s1,s2), scalefactor = sf)
swiss1km.ll1km <- as(swiss1km.ll, "SpatialGridDataFrame")
swiss1km.ll1km$clay_10..10cm <- readGDAL("clay_ch1km.tif")$band1
swiss1km.ll1km$clay_10..10cm <- ifelse(is.na(swiss1km.ll1km$DEM), NA, swiss1km.ll1km$clay_10..10cm)
mapview(swiss1km.ll1km["clay_10..10cm"])

figure Figure: Clay content map for Switzerland.

This takes few steps because you have to determine:

  • bounding box,
  • scaling factor,
  • mask out pixels of interest,

For smaller areas (<500Mb in size) download of data using WCS is fast and efficient. For accessing and using global layers larger than 1GB we recommend directly downloading data from zenodo.org.

Contributions

  • Contributions to landmap are welcome. Issues and pull requests are the preferred ways of sharing them.
  • We are interested in your results and experiences with using the train.spLearner function for generating spatial predictions with your own data. Share your data sets, code and results either using github issues and/or R-sig-geo mailing list.
You can’t perform that action at this time.