Package for the spatial implementation of Bayesian Networks and mapping in geographical space. Outputs are GIS ready maps of expected value (or most likely state) given known and unknown conditions, maps of uncertainty measured as both coefficient of variation or Shannon index (entropy), maps of probability associated to any states of any node of the network. Some additional features are provided as well, such as parallel processing options, data discretization routines and function wrappers designed for users with minimal knowledge of the R programming language.
To install, open a R session and enter the following commands, which will install some mandatory functions underlying gRain:
# Use this for R version > 3.5.0 if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install('RBGL', version = "3.8") # ...or this for older versions: source("http://bioconductor.org/biocLite.R") biocLite("RBGL")
Then install bnspatial, by selecting it from the packages list, or typing in the console:
Alternatively, to install from source:
- download the source file (.tar.gz) to the R working directory (or any other directory)
- start an R session
- run the following commands in the console:
## Install the required packages # Use this for R version > 3.5.0 if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install('RBGL', version = "3.8") # ...or this for older versions: source("http://bioconductor.org/biocLite.R") biocLite("RBGL") install.packages("gRain", repos="http://cran.uk.r-project.org/", dependencies=T, clean=T) install.packages("raster", repos="http://cran.uk.r-project.org/", dependencies=T, clean=T) ## Install bnspatial (full path to the file, if not in the R working directory) install.packages("~/bnspatial_1.0.3.tar.gz", repos = NULL, type="source")
help(package=bnspatial) ## opens package index ?bnspatial ## help file for the main function
This document describes the package bnspatial. The package is aimed at implementing Bayesian networks in the geographical space. It makes maps of expected value (or most likely state) given known and unknown conditions specified in the network, maps of uncertainty measured as coefficient of variation or Shannon index (entropy), maps of probability associated to any states of any node in the network. bnspatial can be used any time a Bayesian network contains variables for which some sort of spatial data is available and maps are needed from the model. However, it can be useful in non-spatial context as well, for instance to discretize data before querying the network, or when many iterations and/or queries over a network are needed. Utility nodes are currently under development.
The package is designed to facilitate the spatial implementation of Bayesian networks with minimal knowledge of the R programming language. R provides a single consistent working environment, while currently available options to make maps out of Bayesian networks do not. The package acts partly as function wrapper integrating the packages raster and gRain, but offers some additional features too, including powerful parallel processing options (via foreach package) and data discretization routines. Bayesian networks must be constructed beforehand, either in R, with gRain or bnlearn packages, or via an external software, such as GeNIe (free for academic use) or Hugin, and saved in .net file format (Hugin native format); .net files can then be imported in bnspatial. Currently, .net files written with Netica are not supported, due to reading issues which are encountered by Hugin itself. A workaround is to import Netica files to GeNIe and save to .net from there. In bnspatial, Bayesian networks are objects of class
grain and therefore are queried through the functions provided by the gRain package. All basic GIS tasks are performed through the raster package and spatial data are objects of its native class
Development was partly funded by the European Union FP7 (2007-2013), under project ROBIN - Role Of Biodiversity In climate change mitigation (agreement 283093).
A worked example
Setting the scene
This example shows the use of bnspatial in a hypotetical land use change context and is inspired by works like Celio et al.1. The aim is to analyze how different factors can influence land use change and where the change is most likely to occur.
A Bayesian network was built and its conditional probability table populated, in order to catch some of the relevant drivers of land use change in a sample catchment (Conwy, North Wales, UK). The links in the network reflect the relationships among variables: slope, legal status and current land use are assumed to be independent, as well as two hypotetical scenarios. One of them, intensification, predicts increase in demand and prices of agricultural products, while the other entails a more balanced management of natural resources, including measures for climate change mitigation, such as payments for carbon storage. Legal status of land determines a different level of direct or indirect involvement of stakeholders in land management and explain the link directed from it. Overall, change is defined by a combination of factors: stakeholder preferences, suitability of land for a given use (simplified with a single variable, slope) and the current land use, as a proxy for past suitability and preferences. Under protection, a given land use may or may not change, quite independently from other factors.
The simplified land use map under current and modelled conditions contains only three land use classes: arable, forest and other. The "final" land use type to be modelled depends on a set of drivers: some of them vary spatially (slope, legal status and current land use), while some other do not (socio-economic setting and stakeholder preferences). The model links the land use types to carbon stock as well, summarised under three broad categories identified as high, moderate or low.
Together with the spatial data and the network, a lookup list must be provided, to link properly the nodes from the network to the spatial data in input.
In this example the following questions are asked:
- What is the most likely outcome given the known conditions? What land use type can be expected under current conditions?
- If a range of conditions are set to reflect a scenario, what outcome can be expected? Are the proportions of forest and arable land expected to change under intensification or suistainable scenarios?
- What are the uncertainties associated to the modelled variable of interest (target node)? What is the likelihood of having forest and how uncertain is the land use output?
- Can we quantify the expected change in a given variable? Is carbon stock expected to increase under sustainable scenario and how is carbon stock distributed when accounting for uncertainty in the model?
Below a brief description of the data (also accessed through command
?ConwyData in R console), which are available both as raw data (i.e. file of different formats), in the extdata folder under bnspatial package directory, and as objects in native R binary format, for a quicker access and code conciseness.
- LandUseChange (LandUseChange.net): the Bayesian network.
- currentLU (CurrentLULC.tif): a simplified version of the current land use map from the Conwy catchment (Wales, UK). It includes three classes: arable (raster value 3), forest (2), other (1).
- slope (degSlope.tif): a raster of slope derived from a digital elevation model at 50 meters resolution, units are degrees.
- status (LegalStatus.tif): the land ownership (dummy data), divided into three possible classes: public (raster value 4), private (3), protected (1).
- evidence: a matrix, the collection of previous spatial data as extracted from each location (i.e. raster cell) in the catchment. Each value from the spatial data was discretized through
bulkDiscretizefunctions, then assigned to the corresponding state from the Bayesian network (LandUseChange).
- LUclasses (LUclasses.txt): the look up list and classification of input spatial data (corresponding states and values). The list is formatted accordingly to bnspatial functions requirements and as returned by functions
1: Celio, E., Koellner, T., & Gret-Regamey, A., 2014. Modelling land use decisions with Bayesian networks: Spatially explicit analysis of driving forces on land use change. Environmental Modelling & Software, 52, 222-233.
In the following the main function
bnspatial will be used to query the network and map the outputs. This function wraps most bnspatial functions into one, bringing directly from the network and input spatial data to the output maps. The use of
bnspatial allows the user to provide external files as inputs (the Bayesian network in .net format, a look up classification file in .txt format, spatial data in various formats; see details below), and minimizes the user interaction with R and it is therefore recommended to light R users. Several additional options are available in both input and output, as shown by the tutorial below (also, type
?bnspatial in R console). Below,
FinalLULC is the name of the node of interest (target), as indicated in the Bayesian network.
## Loading package and data stored in it library(bnspatial) data(ConwyData) list2env(ConwyData, environment()) network <- LandUseChange spatialData <- c(currentLU, slope, status) lookup <- LUclasses target <- 'FinalLULC' ## Run a spatial query on the Bayesian network ## A summary will be printed on screen to check whether spatial data correspond ## to network nodes as intended. bn <- bnspatial(network, target, spatialData, lookup) bn ## Plot output maps library(raster) par(mfrow=c(1,2)) plot(bn$Class, main='Most likely class') plot(bn$Entropy, main='Uncertainty (Shannon index)')
bn <- bnspatial(network, 'FinalLULC', spatialData, lookup, verbose=FALSE)
Maps from the previous example can be greatly improved by setting additional arguments, for instance
msk=currentLU, which masks the output map to the
currentLU only, instead of the union of all spatial inputs (default). However, the example shows how the Bayesian network is able to produce coherent outputs even under missing values, such as the case of background areas external to the catchment, for which spatial values are missing.
Loading external data
For ease of use, bnspatial allows the use of external files in input, which can be handy if the user resorted to external software to build the Bayesian network. In particular, the network itself and the classification file, used to link node states to the input spatial data, may be compiled more easily outside R. For conciseness, the data used in this document is loaded from the R binary format which comes with the package, i.e. was already transformed in a suitable format for direct use in bnspatial (including spatial data). However, the same data are provided in raw format as well (see sub-directory extdata under bnspatial directory).
In fact, a more common case is when spatial data in typical formats are available to the user (e.g. .tif), Bayesian network was built using a stand alone software (e.g. GeNIe, Hugin or Netica) and structure, states and CPT were defined within that.
bnspatial allows to use such inputs; if the example above resorted to external data, it would be as follows, with identical outputs:
## Loading data stored in package network <- 'fullPath/toPackageLibrary/bnspatial/extdata/LandUseChange.net' spatialData <- c('fullPath/toPackageLibrary/bnspatial/extdata/CurrentLULC.tif', 'fullPath/toPackageLibrary/bnspatial/extdata/degSlope.tif', 'fullPath/toPackageLibrary/bnspatial/extdata/LegalStatus.tif') lookup <- 'fullPath/toPackageLibrary/bnspatial/extdata/LUclasses.txt' ## Run a spatial query on the Bayesian network bn <- bnspatial(network, 'FinalLULC', spatialData, lookup)
'fullPath/toPackageLibrary/' stands for the full path to the package library sub-directory, under R directory.
The text file (
lookup argument) is the only object that the user has to build from scratch. This can be done with any text editor, then imported for use in bnspatial with fucntion
importClasses. Alternatively, the user can build it directly in R by using the function
setClasses, then optionally export to .txt for future reference setting argument
As a general guideline, the text file should be formatted as follows, for each node:
First line: the node name.
Second line: the node states, comma separated (spaces allowed).
Third line: interval values from the spatial data associated to the states (integer values for discrete data; interval boundaries, including endpoints, for continuous data). The same exact order as node states is required.
In the above example, the external .txt appears as follows:
2, 1, 3
flat, moderate, steep
-Inf, 1, 7, Inf
public, private, protected
4, 3, 1
Please refer to help pages for
importClasses for further details (
?setClasses from R console).
Under current release, Bayesian networks from external software must be provided in .net format. The use of either GeNIe (free for academic use) or Hugin is recommended to build the Bayesian networks, as both can export them successfully to the .net format. Currently, .net files written with Netica are not supported, due to reading issues which are encountered by Hugin itself. A workaround is to open Netica networks with GeNIe and export to .net from there.
The use of
bnlearn packages is recommended for building and manipulating Bayesian networks in the R environment. If the network was built with bnlearn, through the function
as.grain it can be exported to an object of class
grain, suitable for bnspatial.
All input spatial data must be in raster format, no vector data is accepted. Categorical data in shapefile format, for instance, must be converted to raster. Functionalities to accept other file formats for networks and vactor data formats are planned for future releases. Packages rgdal, raster and sp offer a great deal of GIS tools for R.
By default, outputs of
bnspatial are the expected state of target node (i.e. the state with the highest relative probability) and the uncertainty, expressed as entropy by the Shannon index.
Other outputs can be obtained, by adding or removing the following to the argument
class(default) returns the relatively most likely state of the target node (expected state).
entropy(default) calculates the Shannon index and returns the entropy given the state probabilities.
probabilityreturns an object for each state of the target node, with associated probability.
expectedreturns the expected value for the target node. Only valid for continuous target nodes.
midValuesargument must be provided. The expected value is calculated by summing the mid values of target node states weighted by their probability.
variationreturns the coefficient of variation, as a measure of uncertainty.
So for instance, the following command will produce two maps, one of probability of land use to be forest and the other for arable:
bn <- bnspatial(network, 'FinalLULC', spatialData, lookup, msk=currentLU, what="probability", targetState=c("arable","forest")) par(mfrow=c(1,2)) plot(bn$Probability$forest, main="Probability of forest") plot(bn$Probability$arable, main="Probability of arable land")
Note that if
targetState argument was not specified (default is
NULL), maps of probability for all states of the target node would be returned.
Exploring scenarios: setting node evidence
The user may be interested in exploring the likelihood of outcome for a target node, given certain conditions. To do so, any node can be fixed to a given state and maps can be produced accordingly, by setting them as additional arguments. So, drawing from the main example, we may be interested in mapping the most likely land use of preference by farmers, under the intensification scenario and compare that with the sustainable scenario. To do so it is sufficient to set
target="RelativePreference" and the two additional arguments
"sustainable" for the comparison)
par(mfrow=c(2,2)) bn <- bnspatial(network, "RelativePreference", spatialData, lookup, msk=currentLU, Scenarios="intensification") plot(bn$Class, main="Intensification scenario") plot(bn$Entropy, main="Uncertainty (Shannon index)") bn <- bnspatial(network, 'RelativePreference', spatialData, lookup, msk=currentLU, Scenarios="sustainable") plot(bn$Class, main="Sustainability scenario") plot(bn$Entropy, main="Uncertainty (Shannon index)")
No big differences are noticed, mainly because preferences do not change sensibly in relative terms. However uncertainty of outcome is quite different between the two scenarios. This is further explained by mapping the probabilities of preference for forest and arable under the two scenarios:
par(mfrow=c(2,2)) bn <- bnspatial(network, "RelativePreference", spatialData, lookup, msk=currentLU, what="probability", targetState=c("forest","arable"), Scenarios="intensification") plot(bn$Probability$forest, main="P of forest preference (intensif.)") plot(bn$Probability$arable, main="P of arable preference (intensif.)") bn <- bnspatial(network, 'RelativePreference', spatialData, lookup, msk=currentLU, what="probability", targetState=c("forest","arable"), Scenarios="sustainable") plot(bn$Probability$forest, main="P of forest preference (sustain.)") plot(bn$Probability$arable, main="P of arable preference (sustain.)")
Now a further question can be asked, in particular how carbon stock is expected to be affected by land use under different scenarios and how much carbon we can expect under different scenarios. To answer the question and have quantitative maps, additional information about the meaning of the carbon stock classes is needed. More specifically, a value of carbon stock must be assigned to each of the three classes:
- high carbon stock is considered high when above 120 tons per hectare. A maximum is set at 230 t/ha, as values above that can be rather unlikely. The value of reference for this class is therefore in between the outer values: 175 tons per hectare
- moderate when between 30 and 120 t/ha, therefore a mid value of 75 t/ha is the reference for this class
- low when ranging between 0 to 30 t/ha, setting a class mid value at 15 t/ha for this class.
par(mfrow=c(2,2)) midValues <- c(175, 75, 15) bn <- bnspatial(network, "CarbonStock", spatialData, lookup, msk=currentLU, midvals=midValues, what=c("expected","variation"), Scenarios="intensification") plot(bn$ExpectedValue, main="Expected carbon (t/ha) (intensif.)") plot(bn$CoeffVariation, main="Uncertainty (intensif.)") bn <- bnspatial(network, "CarbonStock", spatialData, lookup, msk=currentLU, midvals=midValues, what=c("expected","variation"), Scenarios="sustainable") plot(bn$ExpectedValue, main="Expected carbon (t/ha) (sustain.)") plot(bn$CoeffVariation, main="Uncertainty (sustain.)")
It should be noted that this model does not return a quantitative estimate of carbon as it may be in a deterministic or process based model, but rather a quantitative expectation of carbon stock. That is the estimate based on uncertainty. For instance, a cell in the output raster may have any value between 15 and 175, but its land use class would be only one of three (forest, arable or other). However, the probabilities of having any of the three land use classes would vary and this variation is accounted for in the carbon stock estimate. This is not a generally valid statement and the meaning of the expected value of a target node depends on the model structure and variables.
In the examples above the function
bnspatial was used exclusively, but it can be quite inefficient when multiple options have to be tested, or the spatial data involved is big enough to require significant computing time. An option is to resort to parallel processing, by setting argument
inparallel=TRUE (see next paragraph), or making sure that the query is run on the area of interest only, by setting argument
msk as already shown.
However, these options do not prevent
bnspatial from repeatedly loading and discretize the same data over and over, which can be a very time consuming process. To avoid that, using directly the wider set of functions provided by bnspatial is recommended, instead of their wrapper
Speeding up: parallel processing
When the scale of analysis is very wide or spatial data has high resolution, computation time can be significant. To speed up the network querying, parallel processing options are provided in bnspatial, integrating the powerful functions from
foreach package and applying them in complete autonomy.
When using parallel processing options, care should be paid to the RAM commitment, as memory should be sufficient to contain all data being processed in parallel (use of memory can be easily checked from Task Manager in Windows systems). Solutions to this issue in bnspatial are currently under development.
To apply parallel processing, simply set
inparallel=TRUE. By default the number of tasks will be the number of cores/processors available minus one. If a different number of tasks is required, that number can be set, e.g.
inparallel=2 will use two cores. A higher number of tasks than number of cores/processors can be provided, but there are potential shortcomings and the user should avoid that choice, unless knowing exactly the reason fot it.
bn <- bnspatial(network, 'FinalLULC', spatialData, lookup, inparallel=2)
Exporting spatial data
All the spatial data created with bnspatial can be easily exported to a portable format, suitable for use in any GIS. The task of writing data are handled entirely by rgdal and raster packages. Available file formats are listed for function
writeFormats, from raster package.
In bnspatial, data can be exported indirectly, by creating the output data and then exporting via
writeRaster function available from raster package, or directly with
mapTarget functions, which will write the data to the directory and format of choice. For both functions, this is done by setting arguments
exportRaster=TRUE, rasters will be exported in .tif format by default, to the working directory. A character can be provided instead of the logical value, specifying another preferred extension (e.g.
".tif"), in which case the raster will be exported in that format.
path is not specified, output files will be written to the working directory, as by
File names can't be provided, as they are set automatically, depending on the output required by
what and accordingly to the following naming convention:
"class"will set the file name to _Class.<file format -default .tif>
"entropy"-> _ShanEntropy.<file format -default .tif>
"probability"-> Probability..<file format -default .tif>
"expected"-> _ExpectedValue.<file format -default .tif>
"variation"-> _CoefVariation.<file format -default .tif>
An additional comma separated file (.csv) is written to the same directory when
"class"}, providing a key to relate the raster values to the node states they refer to.
Unfolding bnspatial functions
bnspatial is actually a wrapper for a bunch of other functions. Below the same example as above is used, but using explicitly the functions underneath
bnspatial. These can all be accessed independently from the user and offer many more options to handle and process the network spatially.
The first step is loading the Bayesian network from an external file format. In the example, LandUseChange.net can be found at the extdata subfolder of bnspatial, under the the R package library directory. To load the network:
network <- loadNetwork(LandUseChange, target='FinalLULC') network
An object of class
grain is returned. Optional argument
target should be set when known, as it will speed up the network query remarkably.
These functions are meant to import, read and/or export of a list or text file, to be used for the integration and error checking of Bayesian network and input spatial variables. See
?setClasses for details.
fullpath <- system.file("extdata", "LUclasses.txt", package = "bnspatial") fullpath intervals <- importClasses(fullpath) intervals
linkNode links a node of the Bayesian network to its corresponding spatial dataset (in raster format), returning a list of objects, including the spatial data and relevant information about the node.
linkMultiple operates on multiple rasters and nodes.
spatialData <- c(currentLU, slope, status) spatialDataList <- linkMultiple(spatialData, network, intervals)
Create a mask within which the query is performed. A subset of the data can be provided with argument
## Return coordinates of valid cells inside the mask instead of a raster layer msk <- aoi(spatialData, xy=TRUE) head(msk)
This function extracts the values from a given input raster based on a mask.
m <- aoi(msk=currentLU, mskSub=c(2,3)) head( extractByMask(slope, msk=m), 20)
These functions discretize continuous input data into classes. Classes can be defined by the user or, if the user provides the number of expected classes, calculated from quantiles (default option) or by equal intervals.
s <- runif(30) ## Split by user defined values. Values out of boundaries are set to NA: dataDiscretize(s, classBoundaries = c(0.2, 0.5, 0.8))
Queries the Bayesian network and returns the probabilities for each state of the target node. Available input variables are set as evidence.
queryNetParallel works as
queryNet, but makes use of multi cores/processors facilities for big network queries, by splitting data into chunks and processing them in parallel.
q <- queryNet(network, 'FinalLULC', evidence) head(q)
This function creates the required spatial outputs for the target node. By default most relatively lilkely class and entropy (uncertainty) associated are returned. More options available, see
target <- 'FinalLULC' statesProb <- queryNet(network, target, evidence) maps <- mapTarget(target, statesProb, msk=currentLU) maps par(mfrow=c(1,2)) plot(maps$Class, main='Most likely land use class') plot(maps$Entropy, main='Uncertainty (Shannon index)')