manuscript/Appendix B - source code.qmd

---
title: Making use of spatially biased variables in ecosystem condition accounting – a GIS based workflow
author:
  - name: Anders Lorentzen Kolstad
    email: anders.kolstad@nina.no
    orchid: https://orcid.org/0000-0002-9623-9491
    affiliations: 
        - id: nina
          name: Norwegian Institute for Nature Research
          department: Department of Terrestrial Ecology
          address: Pb 5685 Torgarden
          city: Trondheim
          postal-code: 7485
    attributes:
        corresponding: true
  - name: Matthew Grainger
    email: matthew.grainger@nina.no
    orchid: https://orcid.org/0000-0001-8426-6495
    affiliations:
        - ref: nina
  - name: Marianne Evju
    email: marianne.evju@nina.no
    orchid: https://orcid.org/0000-0001-7338-5376
    affiliations:
        - id: nina2
          name: Norwegian Institute for Nature Research
          department: NINA Oslo
          address: Sognsveien 68
          city: Oslo
          postal-code: NO-0855
abstract: |
  Ecosystem Condition Accounts (ECA) should reflect the integrity or quality of all nature inside the scope of the account, and therefore rely on spatially representative indicators for condition. All ECAs are subject to data constraints in some way. Therefore, being able to make use of spatially biased data sets would be very valuable. For national ECAs, modelling approaches can in some cases be used to control for sampling biases. For local ECAs however, like at the scale of individual municipalities of development projects, this is often not an option as it involves spatial extrapolation using data from outside the ecosystem accounting area. In this study we develop three ecosystem condition indicators from the same spatially biased data set based on nature type mapping of Norwegian mires. The indicators are Alien species, Trenching and Anthropogenic Disturbance to Soil and Vegetation. We test our approach in three municipalities in south Norway. We discretised a spatial variable representing infrastructure prevalence, and we refer to this new map as Homogeneous Impact Areas (HIAs). To facilitate reliable estimation of indicator uncertainties, even in cases with very low sample sizes, we use a Bayesian updating method and produce probability distributions for the area-weighted mean indicator values in each HIA separately. Then we use area-weighted resampling and produce indicator probability distributions for each municipality. With this Bayesian updating and resampling approach, small sample sizes can be compensated by correspondingly large uncertainty ranges, as long as the full dataset is large enough to estimate the true population standard deviation. This paper demonstrates the use of a GIS-based workflow to control for some of the most problematic biases in an opportunistic field survey so that the data can be used for indicators in ECAs. The workflow can be used at any scale, including national scale. Because indicator values are calculated for unique spatial strata, local governments or others can target their data acquisition towards strata with low sample sizes, and thus achieve higher cost effectiveness and ultimately better spatial indicator coverage.
keywords: 
  - alien species
  - disturbance
  - ecosystem accounting
  - ecosystem condition
  - horizontal aggregation
  - indicator
  - mire
  - peatlands
  - SEEA EA
  - wetlands
date: last-modified
bibliography: bibliography.bib
format:
  elsevier-pdf:
    keep-tex: true
    journal:
      name: EcoEvoRxiv
      formatting: preprint
      model: 3p
      cite-style: authoryear
    include-in-header:
      text: '\usepackage{lineno}\linenumbers'
editor: 
  markdown: 
    wrap: sentence
execute:
  echo: false
  include: false
  eval: false
  warning: false
  message: false
header-includes:
- |
  \usepackage{pdflscape}
  \newcommand{\blandscape}{\begin{landscape}}
  \newcommand{\elandscape}{\end{landscape}}
---

```{r setup}
#| eval: true

library(tidyverse)
library(knitr)
library(sf)
library(tmap)
library(tmaptools)
library(stars)
library(terra)
library(tidyterra)
library(ggtext)
library(cowplot)
library(units)
library(rnaturalearth)
library(rnaturalearthdata)
library(ggmagnify)
library(ggridges)
library(eaTools) #https://ninanor.github.io/eaTools/ version 0.0.0.9000
library(ggpubr)
library(kableExtra)

myCRS <- 25832
```

```{r paths}
#| eval: true

# Conditional file directory
dir <- substr(getwd(), 1, 2)

# Some directories

# Ecosystem delineation map
path_mire <- "P:/41201785_okologisk_tilstand_2022_2023/data/Myrmodell/myrmodell90pros.tif"

# infrastructure index (i.e. land use intensity index)
path_infrastructure <- ifelse(dir == "C:",
  "R:/GeoSpatialData/Utility_governmentalServices/Norway_Infrastructure_Index/Original/Infrastrukturindeks_UTM33/infra_tiff.tif",
  "/data/R/GeoSpatialData/Utility_governmentalServices/Norway_Infrastructure_Index/Original/Infrastrukturindeks_UTM33/infra_tiff.tif"
)

# field survey
# # downloaded from https://kartkatalog.geonorge.no/metadata/naturtyper-miljoedirektoratets-instruks/eb48dd19-03da-41e1-afd9-7ebc3079265c
path_naturetypes <- "../data/survey.gdb"

# municipality outline
path_muni <- "../data/Basisdata_0000_Norge_25833_Kommuner_FGDB.gdb"

# path to local caching folder
path_temp <- ifelse(dir == "C:",
  "P:/41201785_okologisk_tilstand_2022_2023/data/cache/",
  "/data/P-Prosjekter2/41201785_okologisk_tilstand_2022_2023/data/cache/"
)
```

```{r import}
#| eval: true
#| cache: true

# I already did some work to identify the relevant nature types
# summary file (https://github.com/NINAnor/ecosystemCondition/blob/main/data/naturetypes/natureType_summary.rds)
naturetypes_summary <- readRDS("../data/natureType_summary.rds")

# Survey data
# The data is too big to be stored on GitHub
# Import polygon data set
st_layers(path_naturetypes)
naturetypes <- sf::st_read(dsn = path_naturetypes, layer = "naturtyper_nin_omr")
# 142k polygons (2023)

# Impart survey coverage map
coverage <- sf::st_read(dsn = path_naturetypes, layer = "naturtyper_nin_dekning") |>
  st_transform(myCRS)

# Outline of norway (coastline)
outline <- sf::read_sf("../data/outlineOfNorway_EPSG25833.shp") |>
  st_transform(myCRS)

# Municipalities
# find the correct layer
st_layers(path_muni)
# read inn data and transform
muni <- sf::read_sf(path_muni, layer = "kommune") |>
  st_transform(myCRS)

# Infrastructure index (read proxy)
infra <- stars::read_stars(path_infrastructure)

```

```{r terraLoad}
#| eval: true
# Mire data
# Spat rasters cannot be cached
mire_terra <- terra::rast(path_mire)
```

```{r getRelevantNTs}
#| eval: true

myVars <- c("7TK", "7SE", "PRTK", "PRSL", "7FA", "7GR-GI")
nts <- naturetypes_summary %>%
  rowwise() %>%
  mutate(keepers = sum(c_across(
    all_of(myVars))>0, na.rm=T)) |>
  filter(
    keepers >0,
    Ecosystem == "våtmark"
    ) |>
  pull(Nature_type)
```

```{r natureTypeData}
#| eval: true
#| cache: true

# Clean the survey data

naturetypes <- naturetypes |>
  # keep only wetlands
  filter(
    hovedøkosystem == "våtmark",
    naturtype %in% nts,
    naturtype != "Kalkrik helofyttsump"
  ) |>
  # calculate the areas (m2) of the polygons
  mutate(area = SHAPE |> st_area()) |>
  # the variable codes and values are all in the same column
  separate_rows(ninBeskrivelsesvariable, sep = ",") |>
  separate(
    col = ninBeskrivelsesvariable,
    into = c("NiN_variable_code", "NiN_variable_value"),
    sep = "_",
    remove = F
  ) |>
  mutate(NiN_variable_value = as.numeric(NiN_variable_value)) |>
  filter(NiN_variable_code %in% myVars) |>
  select(
    id = identifikasjon_lokalId,
    municipality = kommunenummer,
    year = kartleggingsår,
    mosaic = mosaikk,
    quality = lokalitetskvalitet,
    biodiversity = naturmangfold,
    condition = tilstand,
    natureType = naturtype,
    variable = NiN_variable_code,
    value = NiN_variable_value,
    area
  ) |>
  st_transform(myCRS) # Choosing this to match the EDM (se further down)
# 19k obs.
```

```{r}
#| eval: false
# Plot to show what the most common nature types in the data set are
naturetypes |>
  as_tibble() |>
  count(natureType, sort=T) |>
  mutate(natureType = fct_reorder(natureType, n)) |>
  ggplot(aes(x = natureType, y = n))+
  geom_col()+
  coord_flip()
```

```{r convertToPercent}
#| eval: true
# I now want to take the variables and normalise them before I can then combine
# them despite them being on different scales.
# I will first normalise by converting into % (not for 7GR-GI).
# Remember the ordinal categories represents frequency ranges
# The data is strongly right skewed, so simply taking the center value of each
# bin will not work:
naturetypes %>%
  ggplot() +
  theme_bw() +
  geom_histogram(aes(x = value),
    binwidth = 1
  ) +
  facet_wrap(. ~ variable,
    scales = "free"
  )

# I will use the lower bound for each bin instead.
# The exception in when the variable is 1, because then the lower bound
# is 0, same as when the variable is 0.
# For these I will set manually a slightly higher value.

naturetypes <- naturetypes %>%
  mutate(value = case_when(
    # selecting the variables that follow the same 4 step scale
    variable %in% c("7TK", "7SE", "7FA") ~
      case_match(
        value,
        0 ~ 0,
        1 ~ mean(c(0, 1 / 16)) * 100,
        2 ~ 1 / 16 * 100,
        3 ~ 50
      ), # note that it is not possible to get a value of 1
    # selecting the eight step variables
    variable %in% c("PRTK", "PRSL") ~
      case_match(
        value,
        0 ~ 0,
        1 ~ 1.5,
        2 ~ 3,
        3 ~ 6.25,
        4 ~ 12.5,
        5 ~ 25,
        6 ~ 50,
        7 ~ 75
      ),
    .default = value
  ))

naturetypes %>%
  filter(variable != "7GR-GI") |>
  ggplot() +
  theme_bw() +
  geom_histogram(aes(x = value),
    binwidth = 1,
    color = "orange",
    fill = "orange"
  ) +
  xlab("%") +
  facet_wrap(. ~ variable,
    scales = "free"
  )

```

```{r}
#| eval: true
# Now I make the data wide, and remove 7TK and 7SE if PRTK or PRSL are present,
# respectively

naturetypes_wide <- naturetypes |>
  filter(variable %in% c("7TK", "7SE", "PRTK", "PRSL")) |>
  # Column names starting with a number is problematic, so adding a prefix
  mutate(variable = paste0("var_", variable)) |>
  pivot_wider(
    names_from = "variable",
    values_from = "value",
    id_cols = "id") |>
    as_tibble()

head(naturetypes_wide, 10)
```

```{r}
#| eval: true
# First I will combine 7TK and PRTK, and also 7SE and PRSL.
naturetypes_wide <- naturetypes_wide %>%
  mutate(
    TK = if_else(
      is.na(var_PRTK), var_7TK, var_PRTK
    ),
    SE = if_else(
      is.na(var_PRSL), var_7SE, var_PRSL
    )
  )

plot_grid(
  naturetypes_wide %>%
    as_tibble() |>
    count(SE,
      name = "sum"
    ) |>
    ggplot(
      aes(
        x = factor(SE),
        y = sum
      )
    ) +
    geom_bar(
      stat = "identity",
      fill = "grey",
      colour = "black"
    ) +
    theme_bw(base_size = 12) +
    labs(
      x = "7SE or PRSL score",
      y = "Number of localities"
    ),
  naturetypes_wide %>%
    as_tibble() |>
    count(TK,
      name = "sum"
    ) |>
    ggplot(
      aes(
        x = factor(TK),
        y = sum
      )
    ) +
    geom_bar(
      stat = "identity",
      fill = "grey",
      colour = "black"
    ) +
    theme_bw(base_size = 12) +
    labs(
      x = "7TK or PRTK score",
      y = "Number of localities"
    )
)

# The NA's represents localities where just one of the two variables
# (then thinking 7SE and PRSL as the same variable)
# is recorded. 

# To combine these into one metric, ADSV, I could take the
# one with the highest value (worst-rule) or the sum.
# Sum is problematic as not all locations have two values to sum together.
# But the other option is problematic since I think field workers often
# tend to split the effects over two variables is they have that option.
# And if we have 50% vehicle damage and 50% hiking damage, that is no doubt
# worst than just having 50% of either. So I will use the sum, despite its issues.
```

```{r}
#| eval: true
# Taking the sum of 7SE and 7TK (incl the PR.. variables)
naturetypes_wide <- naturetypes_wide |>
  rowwise() |>
  mutate(ADSV = sum(c(SE, TK), na.rm = TRUE))

naturetypes_wide %>%
  as_tibble() |>
  count(ADSV,
    name = "sum"
  ) |>
  ggplot(
    aes(
      x = ADSV,
      y = sum
    )
  ) +
  geom_bar(
    stat = "identity",
    fill = "grey",
    colour = "black"
  ) +
  theme_bw(base_size = 12) +
  labs(
    x = "Summed ADVS score",
    y = "Number of localities"
  ) +
  scale_x_continuous(
    labels = scales::label_number(accuracy = 1)
  )

```

```{r}
#| eval: true

# Now I will copy these ADVS-values into the sf object again, keeping things in wide format
naturetypes <- naturetypes |>
  pivot_wider(
    names_from = "variable",
    values_from = "value"
  ) |>
  left_join(naturetypes_wide |> select(id, ADSV), by = "id") |>
  select(!c("7TK", "7SE", "PRSL", "PRTK"))

head(naturetypes)
```

```{r rescale}
#| eval: true

# Now I rescale the now continuous variables using reference and threshold values
# I will use the same reference levels/values for all of Norway for ADSV and alien species:

upper <- 0
lower <- 100
threshold <- 10

# For 7GR-GI I use this
upper2 <- 1
lower2 <- 5
threshold2 <- 2.5 # = observable effect. Value 3 indicates a shift to a new type (grunntype)


scale1 <- eaTools::ea_normalise(data = naturetypes,
  vector = "ADSV",
  upper_reference_level = lower,
  lower_reference_level = upper,
  break_point = threshold,
  plot=T,
  reverse = T
  ) +
  labs(x = "ADVS (converted to %)") +
  ylim(0,1)

# There is no point yet making this a time series
# I will assign all the indicator value to the same time (2018-2022)

# same for 7FA
scale2 <- eaTools::ea_normalise(data = naturetypes,
  vector = "7FA",
  upper_reference_level = lower,
  lower_reference_level = upper,
  break_point = threshold,
  plot=T,
  reverse = T
  ) +
  labs(x = "7FA (converted to %)",
       y = "") +
  ylim(0,1)
# The variables are really coarse

scale3 <- eaTools::ea_normalise(data = naturetypes,
  vector = "7GR-GI",
  upper_reference_level = lower2,
  lower_reference_level = upper2,
  break_point = threshold2,
  plot=T,
  reverse = T
  )+
  labs(x = "7GR-GI (original units)",
       y = "") +
  ylim(0,1)

(scaling_plot <- ggarrange(scale1,
          scale2,
          scale3,
          ncol=3)
)

#ggsave(plot = scaling_plot,
#  "../images/scaling-plot.jpg",
#  width=8,
#  height=5)
```

```{r}
#| eval: true
# Adding scaled indicator values to the dataset
# Same code as above, but with plot=F.
naturetypes$i_ADSV <- eaTools::ea_normalise(
  data = naturetypes,
  vector = "ADSV",
  upper_reference_level = lower,
  lower_reference_level = upper,
  break_point = threshold,
  reverse = T
)

naturetypes$i_alien <- eaTools::ea_normalise(
  data = naturetypes,
  vector = "7FA",
  upper_reference_level = lower,
  lower_reference_level = upper,
  break_point = threshold,
  reverse = T
)

naturetypes$i_ditch <- eaTools::ea_normalise(
  data = naturetypes,
  vector = "7GR-GI",
  upper_reference_level = lower2,
  lower_reference_level = upper2,
  break_point = threshold2,
  reverse = T
)
```

```{r getMunicipalities}
#| eval: true

# Preparing the outlines for the three municipalieties

# The data contains some multisurfaces 
# table(st_geometry_type(muni))
# Here is a function to make sure that multipolygons are returned
ensure_multipolygons <- function(X) {
  tmp1 <- tempfile(fileext = ".gpkg")
  tmp2 <- tempfile(fileext = ".gpkg")
  st_write(X, tmp1)
  gdalUtilities::ogr2ogr(tmp1, tmp2, f = "GPKG", nlt = "MULTIPOLYGON")
  Y <- st_read(tmp2)
  st_sf(st_drop_geometry(X), geom = st_geometry(Y))
}

muni <- ensure_multipolygons(muni)
# table(st_geometry_type(muni)) #OK

# subset of the three target municipalities
muni3 <- muni |>
  filter(kommunenummer %in% c(
    "3020", # Nordre Follo
    "3451", # Nord-Aurdal
    "3446" # Gran
  )) |>
  mutate(Municipality = case_when(
    kommunenummer == "3020" ~ "Nordre Follo",
    kommunenummer == "3451" ~ "Nord-Aurdal",
    kommunenummer == "3446" ~ "Gran"
  ))

# To crop EDM, I need the three municipalities seprately.
nf <- muni3 |>
  filter(kommunenummer == "3020")
na <- muni3 |>
  filter(kommunenummer == "3451")
gr <- muni3 |>
  filter(kommunenummer == "3446")
```

```{r prepPolygons}
#| eval: true

# I need to intersect the naturetypes data with the municipalities
nature3 <- naturetypes |>
  st_intersection(muni3)

nature3 |>
  as_tibble() |>
  count(municipality,
    sort = TRUE,
    name = "Number of polygons")
# There where some polygons that spanned municipal borders. 
# It's not a problem

# and also to get the data coverage polygon.
coverage3 <- coverage |>
  st_intersection(muni3)
```

```{r prepSomeMoreMunicipalityShapes}
#| eval: true

# Simplified coastline / terrestrial area
terrestrial <- outline |>
  st_intersection(muni3)

# Polygons for the oceans in each municipality
ocean <- muni3 |>
  st_difference(outline)

# calculate stats - terrestrial area
terrestrial <- terrestrial |>
  mutate(
    area_t = geometry |> st_area(),
    t_area_km =
      round(units::drop_units(area_t * 1e-6))
  ) 
```

```{r positionMap}
#| eval: true
# Make map to show where the three municipalities are
world <- ne_countries(scale = "medium", returnclass = "sf") |>
  st_transform(myCRS) |>
  filter(admin %in% c("Norway", "Sweden")) |>
  st_make_valid()

# get centroids
centroids <- muni3 |>
  st_centroid()

inc <- 200000
myBbox <- st_bbox(centroids)
myBbox[1:2] <- myBbox[1:2]-inc 
myBbox[3:4] <- myBbox[3:4]+inc 

(positionMap <- 
  tm_shape(world,
           bbox = myBbox) +
    tm_polygons() +
  tm_shape(muni3) +
    tm_polygons(col = "green") +
  tm_shape(centroids) +
  tm_text(
    text = "Municipality",
    just= "left",
    size = .8,
    xmod = 1,
    ymod = 0
  ) +
  tm_grid(projection = 4326) +
  tm_layout(
    bg.color = "skyblue",
    outer.margins = c(0.01, .02, .02, .02))+
  tm_compass()+
  tm_scale_bar()
)

tmap_save(tm = positionMap,
       "../images/positionMap.jpg")
```

```{r distanceBetweenMunis}
#| eval: true

# what is the distance between Nordre Follo and Nord-Aurdal
(km_distance <- centroids |>
  st_distance() |>
  max() |>
  set_units("km") |>
  drop_units() |>
  round())
```

```{r mireTerra}
#| eval: false

# I first tried to import and crop the mire data using stars, 
# but that failed (see pre 21 feb 2023).
# Trying { terra } instead

# convert municipal outline to vect via st
nf_vect <- as(nf, "Spatial") |>
  terra::vect()
gr_vect <- as(gr, "Spatial") |>
  terra::vect()
na_vect <- as(na, "Spatial") |>
  terra::vect()

# crop and mask (very fast!)
mire_terra_nf <- mire_terra |>
  terra::crop(nf_vect) |>
  terra::mask(nf_vect)

mire_terra_gr <- mire_terra |>
  terra::crop(gr_vect) |>
  terra::mask(gr_vect)

mire_terra_na <- mire_terra |>
  terra::crop(na_vect) |>
  terra::mask(na_vect)

# Plot to check overlap
# ggplot()+
#  geom_spatraster(data = mire_nf_terra)+
#  geom_spatvector(data = nf_vect,
#                  fill = NA)
# The cropping and masking worked.

# # I like the stars, sf and tmap combo better, so I return to stars
mire_stars_nf <- mire_terra_nf |>
  st_as_stars()
mire_stars_gr <- mire_terra_gr |>
  st_as_stars()
mire_stars_na <- mire_terra_na |>
  st_as_stars()

par(mfrow=c(3,1))
plot(mire_stars_nf)
plot(mire_stars_gr)
plot(mire_stars_na)

saveRDS(mire_stars_nf, "manual_cache/mire_stars_nf.RDS")
saveRDS(mire_stars_gr, "manual_cache/mire_stars_gr.RDS")
saveRDS(mire_stars_na, "manual_cache/mire_stars_na.RDS")

```

```{r terraToSTars}
#| eval: true
mire_stars_nf <- readRDS("manual_cache/mire_stars_nf.RDS")
mire_stars_gr <- readRDS("manual_cache/mire_stars_gr.RDS")
mire_stars_na <- readRDS("manual_cache/mire_stars_na.RDS")
mire_terra_nf <- rast(mire_stars_nf)
mire_terra_gr <- rast(mire_stars_gr)
mire_terra_na <- rast(mire_stars_na)
```

```{r dk2-CoverageMaps}
#| eval: true
#| cache: true

# calculate area of survey coverage maps
# values goes into summary table in the ms
dk2 <- coverage3 |>
  group_by(Municipality) |>
  summarise(SHAPE = st_union(SHAPE)) |>
  mutate(
    dk_area_km = SHAPE |> st_area(),
    dk_area_km = round(units::drop_units(dk_area_km * 1e-6))
  )
```

```{r mireArea}
#| eval: true
#| cache: true

# calculate area of the mires in each municipality
# -- Nordre Follo
(mireArea <- mire_terra_nf |>
  global(c("mean", "sum"), na.rm = T) |>
  add_column("Municipality" = "Nordre Follo") |>
  mutate(
    mirePercent = round(mean * 100, 1),
    mire_km2 = sum / 1e+4
  ))

# -- Gran
mireArea2 <- mire_terra_gr |>
  global(c("mean", "sum"), na.rm = T) |>
  add_column("Municipality" = "Gran") |>
  mutate(
    mirePercent = round(mean * 100, 1),
    mire_km2 = sum / 1e+4
  )

# -- Nord-Aurdal
mireArea3 <- mire_terra_na |>
  global(c("mean", "sum"), na.rm = T) |>
  add_column("Municipality" = "Nord-Aurdal") |>
  mutate(
    mirePercent = round(mean * 100, 1),
    mire_km2 = sum / 1e+4
  )

mireArea <- mireArea |>
  rbind(mireArea2, mireArea3)

# Calculate the area of mire inside the coverage maps
# -- Nordre Follo
mire_in_dk <- mire_terra_nf |>
  terra::mask(dk2 |> filter(Municipality == "Nordre Follo")) |>
  global("sum", na.rm = T) |>
  mutate(mireInSurvey_km2 = sum / 1e+4) |>
  add_column(Municipality = "Nordre Follo")

mire_in_dk2 <- mire_terra_gr |>
  terra::mask(dk2 |> filter(Municipality == "Gran")) |>
  global("sum", na.rm = T) |>
  mutate(mireInSurvey_km2 = sum / 1e+4) |>
  add_column(Municipality = "Gran")
mire_in_dk3 <- mire_terra_na |>
  terra::mask(dk2 |> filter(Municipality == "Nord-Aurdal")) |>
  global("sum", na.rm = T) |>
  mutate(mireInSurvey_km2 = sum / 1e+4) |>
  add_column(Municipality = "Nord-Aurdal")

mire_in_dk <- mire_in_dk |>
  rbind(mire_in_dk2, mire_in_dk3)
```

```{r infrastructureIndex}
#| eval: false
# this data is on a 100x100m grid
infra <- infra |>
  # , which is more then we need - warp it to 1x1km
  st_warp(
    cellsize = c(1000, 1000),
    crs = st_crs(nf),
    use_gdal = TRUE,
    method = "average"
  ) |>
  setNames("infrastructureIndex") |>
  st_transform(myCRS) |>
  mutate(infrastructureIndex = case_when(
    infrastructureIndex < 1 ~ 0,
    infrastructureIndex < 6 ~ 1,
    infrastructureIndex < 12 ~ 2,
    infrastructureIndex >= 12 ~ 3
  )) |>
  # taking away point in the sea
  st_crop(outline)

# This step might seem rather stupid. We want to vectorize a rather large 
# raster. This makes it a quite big data object. The reason is that there is no 
# really good way to burn polygon data on to raster grid cells after the disuse 
# of the raster package. It was not straight forward then either. But 
# calculating intersections between polygons is very fast and easy.

infra <- eaTools::ea_homogeneous_area(infra,
  groups = infrastructureIndex
)

saveRDS(infra, paste0(path_temp, "infrastructureIndex_discrete_vectorized.rds"))
```

```{r InfraStructureIndex}
#| eval: true

# read cached vetorized infrastructure data
infra <- readRDS(paste0(path_temp, "infrastructureIndex_discrete_vectorized.rds"))

```

```{r}
#| eval: true

# Calculate area
infra <- infra |>
  mutate(
    area = geometry |> st_area(),
    area_km = area |> set_units("km2")
  )

# show the summered area per HIA
infra |>
  as_tibble() |>
  group_by(infrastructureIndex) |>
  summarise(area_km = sum(area_km)) |>
  ggplot(aes(
    x = infrastructureIndex,
    y = area_km
  )) +
  geom_col()


# intersect with the three municipalities
# and calculate area
infraMuni3 <- infra |>
  st_intersection(muni3) |>
  mutate(area = geometry |> st_area())

# Turn m2 into km2
# and sum the total area per HIA
(infraMuni3_tbl <- infraMuni3 |>
  as.data.frame() |>
  mutate(area_HIA_km2 = units::drop_units(area) * 1e-6) |>
  group_by(Municipality, infrastructureIndex) |>
  summarise(total_area_HIAs_km2 = round(sum(area_HIA_km2))))

# Calculate the area weighted mean HIA value per municipality
infraMuni3_summary <- infraMuni3_tbl |>
  group_by(Municipality) |>
  summarise(
    meanHIA =
      round(
        weighted.mean(
          infrastructureIndex, total_area_HIAs_km2
        ), 2
      )
  )

# Make a plot to check that it has worked
(infra_dist_plot <- infraMuni3_tbl |>
  ggplot() +
  geom_bar(
    aes(
      x = infrastructureIndex,
      y = total_area_HIAs_km2,
      fill = factor(infrastructureIndex),
      colour = factor(infrastructureIndex)
    ),
    stat = "identity",
    lwd = 1.2
  ) +
  scale_fill_manual(values = RColorBrewer::brewer.pal(4, "YlOrBr")) +
  scale_color_manual(values = RColorBrewer::brewer.pal(5, "YlOrBr")[-1]) +
  theme_minimal_hgrid() +
  labs(
    x = "Homogeneous Impact Areas",
    y = "Area (km<sup>2</sup>)"
  ) +
  theme(
    axis.title.x = element_textbox_simple(
      width = NULL,
      padding = margin(4, 4, 4, 4),
      margin = margin(4, 0, 0, 0),
      linetype = 1,
      r = grid::unit(8, "pt"),
      fill = "azure1"
    ),
    axis.title.y = element_textbox_simple(
      width = NULL,
      padding = margin(4, 4, 4, 4),
      margin = margin(4, 0, 0, 0),
      linetype = 1,
      orientation = "left-rotated",
      r = grid::unit(8, "pt"),
      fill = "azure1"
    ),
    strip.background = element_blank(),
    strip.text = element_textbox(
      size = 12,
      color = "white", fill = "#5D729D", box.color = "#4A618C",
      halign = 0.5, linetype = 1, r = unit(5, "pt"), width = unit(1, "npc"),
      padding = margin(2, 0, 1, 0), margin = margin(3, 3, 3, 3)
    )
  ) +
  guides(fill = "none", colour = "none") +
  #scale_y_log10() +
  facet_grid(cols = vars(Municipality))
)

#ggsave(plot = infra_dist_plot,
#       "../images/infra-dist-plot.jpg")
```

```{r corrCheck}
#| eval: true
#| cache: true
# Now I want to see if the indicator values depend on the HIA is a predictable
# way to justify the stratification
corrCheck <- st_intersection(naturetypes, infra)
```

```{r HIA-validate}
#| eval: false
# A first look
corrCheck |>
  mutate(
    i_ADSV_fct = floor(round(i_ADSV * 10, 2)) / 10,
    i_alien_fct = floor(round(i_alien * 10, 2)) / 10,
    i_ditch_fct = floor(round(i_ditch * 10, 2)) / 10
  ) |>
  pivot_longer(
    cols = c(i_ADSV_fct, i_alien_fct, i_ditch_fct),
    values_to = "indicatorValue",
    names_to = "indicator",
    values_drop_na = T
  ) |>
  ggplot(aes(
    x = factor(infrastructureIndex),
    fill = factor(indicatorValue)
  )) +
  geom_bar(
    position = "fill"
  ) +
  theme_bw(base_size = 12) +
  guides(fill = guide_legend("Scaled indicator values")) +
  ylab("Fraction of data points") +
  xlab("HIA") +
  scale_fill_brewer(palette = "RdYlGn") +
  facet_grid(indicator ~ year)

# After this I also tried chaning the color gradient, 
# and the number of categories. I tries discrete colors and log-transformation.
# These variants are interpretted slightly differetly by the brain.
# See versions pre 27.02.2023
```

```{r realValidationPlot}
#| eval: true
# Using a color gradient emphasizes the first color (dark green).
# Lets try discrete colors, and merge some classes to simplify
(validationPlot <- corrCheck |>
  pivot_longer(
    cols = c(i_ADSV, i_alien, i_ditch),
    values_to = "indicatorValue",
    names_to = "indicator",
    values_drop_na = T
  ) |>
  mutate(
    condition = case_when(
      indicatorValue < 0.6 ~ "<0.6",
      indicatorValue < 0.8 ~ "0.6 to 0.8",
      indicatorValue < 0.91 ~ "0.8 to 0.9",
      .default = "0.9 to 1"
    ),
    condition = fct_reorder(condition, indicatorValue),
    indicator = case_when(
      indicator == "i_ADSV" ~ "ADSV",
      indicator == "i_alien" ~ "Alien species",
      indicator == "i_ditch" ~ "Trenching"
    )
  ) |>
  as_tibble() |>
  group_by(indicator, infrastructureIndex, condition) |>
  summarise(n = n()) |>
  ungroup() |>
  group_by(indicator, infrastructureIndex) |>
  mutate(lab = round(n/sum(n)*100),
         lab = case_when(
           lab < 5 ~ NA,
           .default = paste0(lab, "%")
         )) |>
  ggplot(aes(
    x = infrastructureIndex,
    y = n,
    fill = condition
  )) +
  geom_bar(
    position = "fill",
    stat = "identity"
  ) +
  geom_text(aes(label = lab),
    position = position_fill(vjust = 0.5),
    color= "black", vjust = 0.5, size = 4) +
  theme_minimal(base_size = 15) +
  theme(
    panel.grid = element_blank(), 
    axis.text.x = element_text(margin = margin(t = -10)),
    axis.text.y = element_blank(),
    axis.title.y = element_blank(),
    strip.text = element_textbox(
      size = 12,
      halign = 0.5, linetype = 1, r = unit(5, "pt"), width = unit(1, "npc"),
      padding = margin(2, 0, 1, 0), margin = margin(3, 3, 3, 3))) +
  guides(fill = guide_legend("Indicator values")) +
  xlab("Homogeneous Impact Areas") +
  scale_fill_manual(values = c("#E85437","#FBAF00", "#B5DF73", "#009000")) +
  facet_grid(~indicator)
)

#ggsave("../images/validation-plot.jpg",
#  plot=validationPlot,
#  width = 9,
#  height = 5)
```

```{r infraMuniMaps}
#| eval: true
#| cache: true

# Infratructure in each municipality
infraMuniMap <- tm_shape(muni3) +
  tm_borders() +
  tm_shape(infraMuni3) +
  tm_polygons(
    col = "infrastructureIndex",
    style = "cat",
    title = "Homogeneous Impact Areas"
  ) +
  tm_layout(
    legend.show = F,
    panel.label.height = 0) +
  tm_shape(muni3) +
  tm_borders(lwd = 3, col = "black") +
  tm_facets(by = "Municipality")

# A figure with just the legend
infraMuniMap_l <- tm_shape(muni3) +
  tm_borders() +
  tm_shape(infraMuni3) +
  tm_polygons(
    col = "infrastructureIndex",
    style = "cat",
    title = "Homogeneous\nImpact Area"
  ) +
  tm_layout(legend.only = TRUE,
            legend.position = c("left", "bottom"),
            legend.outside = F)


empty <- tm_shape(muni3) +
  tm_borders(col="white") +
  tm_layout(frame = F)

# Survey coverage and other colour overlaid the municipalities
muniPlot <- tm_shape(muni3) +
  tm_borders() +
  tm_facets(
    by = "Municipality",
    ncol = 3
  ) +
  tm_shape(terrestrial) +
  tm_fill(
    col = "lightgreen",
    alpha = .4
  ) +
  tm_shape(ocean) +
  tm_polygons(
    col = "skyblue",
    border.col = "black"
  ) +
  tm_shape(dk2) +
  tm_polygons(
    col = "grey",
    alpha = .8
  ) +
  tm_shape(nature3) +
  tm_polygons(
    col = "red",
    border.col = "red",
    lwd = 3
  )


(methodsMap <- tmap_arrange(
  muniPlot,
  empty,
  infraMuniMap,
  infraMuniMap_l,
  ncol = 2,
  widths = c(.8, .2),
  heights = c(.6, .4),
  outer.margins = NULL
))


#tmap_save(methodsMap, "../images/studyLocations.tiff",
#           dpi= 1000,
#           units = "cm",
#           width = 18,
#           height = 10)
#tmap_save(methodsMap, "../images/studyLocations.jpg",
#           units = "cm",
#           width = 18,
#           height = 10)
#
#saveRDS(methodsMap, "../figures/studyLocation.RDS")
```

```{r infra-NA}
# Infratructure in Nord-Aurdal only
(infraMuniMap_NA <- 
  tm_shape(muni3 |> filter(Municipality == "Nord-Aurdal")) +
  tm_borders() +
  tm_shape(infraMuni3) +
  tm_polygons(
    col = "infrastructureIndex",
    style = "cat",
    title = "Homogeneous\nImpact\nAreas"
  ) +
  tm_layout(
    legend.show = T,
    legend.position = c("left", "top"),
    legend.text.size = 1.2) +
  tm_shape(muni3) +
  tm_borders(lwd = 3, col = "black")
)

tmap_save(tm = infraMuniMap_NA,
       "../images/HIA-NA.jpg",
       width=4,
       height=4)
```

```{r muni-tbl}
#| eval: true
#| cache: true

# Calculate some more stats for the table with municipality stats

# Number of mire polygons per municipality
nature3_tbl <- nature3 |>
  group_by(Municipality) |>
  summarise(n = n()) |>
  as_tibble()

# Take muni3 and add all sorts of other data to it
# using left_join.
muni_tbl <- muni3 |>
  # calculate total area
  mutate(
    area_km =
      round(
        units::drop_units(
          geom |> st_area() * 1e-6
        )
      )
  ) |>
  # make tibble for the coming join
  as_tibble() |>
  # paste inn terrestrial area
  left_join(
    terrestrial |>
      as_tibble() |>
      select(kommunenummer, t_area_km),
    keep = F
  ) |>
  # area of survey
  left_join(dk2 |> select(Municipality, dk_area_km)) |>
  mutate(dk_percent = round((dk_area_km / t_area_km) * 100)) |>
  # number of polygons
  left_join(nature3_tbl |> select(Municipality, n)) |>
  # total mire area and %
  left_join(mireArea |> select(Municipality, mirePercent, mire_km2)) |>
  # % mire inside survey coverage map
  left_join(mire_in_dk |> select(Municipality, mireInSurvey_km2)) |>
  mutate(mireInSurvery_percent = round(mireInSurvey_km2 / mire_km2 * 100, 1)) |>
  left_join(infraMuni3_summary) |>
  mutate(mire_km2 = round(mire_km2, 1),
         meanHIA = round(meanHIA, 1))
```

```{r}
#| eval: true

# Now I need to spatially (horizontally) aggregate the indicator values for each 
# HIA. Rather than taking the arithmetic mean, I will use a Bayesian updating 
# approach. The point estimate (central tendency) will probably be the same
# more or less, with the two approaches, but with the updating approach I can
# get a honest measure for the uncertainty even with very small sample sizes.
# For this I need a 'true' value for the variation in the indicator
# I will use the entire national data set to determine this number

national_sd_ADSV <- sd(naturetypes$i_ADSV, na.rm=T)
national_sd_alien <- sd(naturetypes$i_alien, na.rm=T)
national_sd_ditch <- sd(naturetypes$i_ditch, na.rm=T)

barplot(c(national_sd_ADSV, national_sd_alien, national_sd_ditch),
        names.arg = c("ADSV", "Alien", "Trenching"),
        ylab="SD")

# The figure shows that the Trenching indicator is more spatially variable
```

```{r wgt-mean-fn}
#| eval: true

# This function is modified from Bolstad::normdp
# It updates a flat prior based on a sample of values (indicator values)
# and weight (polygon area) and returns a distribution for the weighted mean
# that has a gaussian distribution. It assumes the sampled population is also
# Gaussian. The variance is estimated from the full sample of indicator values.


wgt_mean <- function(x, weights, 
                     sigma.x = NULL, 
                     mu = seq(0, 1, length.out=1000), 
                     mu.prior = rep(1/length(mu), times=length(mu)), 
                     stat = "mean",
                     ...) {
  
  mx <- weighted.mean(x, weights)
  if (round(sum(mu.prior), 7) != 1) {
    warning("The prior probabilities did not sum to 1, therefore the prior has been normalized")
    mu.prior <- mu.prior / sum(mu.prior)
  }
  n.mu <- length(mu)
  nx <- length(x)
  snx <- sigma.x^2 / nx
  likelihood <- exp(-0.5 * (mx - mu)^2 / snx)
  posterior <- likelihood * mu.prior / sum(likelihood * mu.prior)
  mx <- sum(mu * posterior)
  vx <- sum((mu - mx)^2 * posterior)
  
  # draw 1k samples from the posterior to calculate the quantiles and sd from
  sample <- sample(mu, size = 1000, prob = posterior, replace = T)
  lower <- quantile(sample, probs = 0.025)
  upper <- quantile(sample, probs = 0.975)
  # we can assume that the distrubutin for the mean is gaussian, so we take a 
  # symetrical sd from here and use it, along with the mean, to recreate a
  # normal distribution later which we can sample from
  sdx <- sd(sample)
  results <- list(
    name = "mu", 
    param.x = mu, 
    prior = mu.prior,
    likelihood = likelihood, 
    posterior = posterior, 
    weighted_mean = mx,
    var = vx
  )
  if(stat == "mean") return(mx)
  if(stat == "lower") return(lower)
  if(stat == "upper") return(upper)
  if(stat == "sd") return(sdx)
}

```

```{r meanPerHIA}
#| eval: true
#| cache: true

# Intersect nature3 with the HIA
stats_tbl <- nature3 |>
  st_intersection(infraMuni3) |>
  pivot_longer(cols = c(
    i_ADSV,
    i_alien,
    i_ditch),
    names_to = "indicator",
    values_to = "indicatorValue") |>
  filter(!is.na(indicatorValue)) |>
  mutate(area = drop_units(area)) |>
  group_by(indicator, Municipality, infrastructureIndex) |>
  summarise(sd = wgt_mean(indicatorValue,
      weights = area,
      sigma.x = case_when(
        indicator == "i_alien" ~ national_sd_alien,
        indicator == "i_ditch" ~ national_sd_ditch,
        indicator == "i_ADSV" ~ national_sd_ADSV,
        .default = NULL
      ),
      stat = "sd"),
    mean = wgt_mean(indicatorValue,
      weights = area,
      sigma.x = case_when(
        indicator == "i_alien" ~ national_sd_alien,
        indicator == "i_ditch" ~ national_sd_ditch,
        indicator == "i_ADSV" ~ national_sd_ADSV,
        .default = NULL
      ),
      stat = "mean"),
    n = n())
stats_tbl
```

```{r forets-plot}
#| eval: true
#| cache: true


# Forest plot

# #define colours for dots and bars
dotCOLS = c("grey90","grey70", "grey50", "grey40")
#barCOLS <- tmaptools::get_brewer_pal("Set1", n = 4)
barCOLS <- c("#fff7bd", "#fecf66", "#f88b22", "#cc4c02")


(forest_plot <- stats_tbl |>
  mutate(
    indicator = case_when(
      indicator == "i_ADSV" ~ "ADSV",
      indicator == "i_alien" ~ "Alien species",
      indicator == "i_ditch" ~ "Trenching"
    )
  ) |>
  rowwise() |>
  mutate(
    low = quantile(rnorm(200, mean, sd), probs = 0.025),
    high = quantile(rnorm(200, mean, sd), probs = 0.975),
    high = ifelse(high > 1, 1, high)
  ) |>
  ggplot(aes(x=infrastructureIndex, 
             y=mean, 
             ymin=low,
             ymax=high
             )) + 
  geom_linerange(
    aes(colour=factor(infrastructureIndex)),
    size=10) +
  geom_point(
    aes(fill=factor(infrastructureIndex)),
    size=3, 
    shape=21, 
    colour="white", 
    stroke = 0.5,
    ) +
  geom_text(
    aes(y = low, label = n),
    nudge_y = -0.1,
    show.legend = F
  ) +
  scale_fill_manual(values=rev(dotCOLS))+
  scale_color_manual(values=barCOLS)+
  scale_x_discrete(name="") +
  scale_y_continuous(name="Indicator values", limits = c(-0.1, 1)) +
  coord_flip() +
  theme_bw() +
  labs(fill = "HIA",
       col = "HIA") +
  facet_grid(indicator ~ Municipality)
)

#ggsave("../images/forest-plot.jpg",
#       plot=forest_plot,
#       width = 8,
#       height = 6)
```

```{r forest_plotExample}
#| eval: true
#| cache: true

# Here is the same plot but only for trenching
  
(forest_plot_ex <- stats_tbl |>
  filter(indicator == "i_ditch",
         Municipality == "Nord-Aurdal") |>
  rowwise() |>
  mutate(
    low = quantile(rnorm(200, mean, sd), probs = 0.025),
    high = quantile(rnorm(200, mean, sd), probs = 0.975),
    high = ifelse(high > 1, 1, high)
  ) |>
  ggplot(aes(x=infrastructureIndex, 
             y=mean, 
             ymin=low,
             ymax=high,
             col=factor(infrastructureIndex),
             fill=factor(infrastructureIndex))) + 
  geom_linerange(
    size=10) +
  geom_linerange(
    size=10,
    lwd=1,
    colour="black") +
  geom_point(
    size=3, 
    shape=21, 
    colour="white", 
    stroke = 0.5,
    ) +
  scale_fill_manual(values=rev(dotCOLS))+
  scale_color_manual(values=barCOLS)+
  scale_x_discrete(name="") +
  scale_y_continuous(name="Indicator values", limits = c(.7, 1)) +
  coord_flip() +
  theme_bw() +
  labs(fill = "HIA",
       col = "HIA")
)

ggsave(plot = forest_plot_ex,
       "../images/forest-plot-ex.jpg",
       width=3,
       height = 3)
```

```{r notes}
# Next I need to 
#  - copy the weighted means and SDs over to  HIA map and then over to the 
#    ecosystem delineation map polygons. 
#  - show intermittent resulting map with ggmagnify
#  - sample from these distributions (ignore missing HIAs)
```

```{r}
#| eval: true
#| cache: true

# Copy the weighted means and SDs over to HIA map and then over to the 
# ecosystem delineation map polygons. 
spread_na <- mire_stars_na |>
  st_as_sf(merge=T) |>
  filter(Myr153 == 1) |>
  st_intersection(infraMuni3 |> select(infrastructureIndex)) |>
  mutate(area = geometry |> st_area()) |>
  left_join(stats_tbl |> 
              as_tibble() |>
              filter(Municipality == "Nord-Aurdal") |>
              select(mean, sd, indicator, n, infrastructureIndex, Municipality),
            by = "infrastructureIndex")

spread_nf <- mire_stars_nf |>
  st_as_sf(merge=T) |>
  filter(Myr153 == 1) |>
  st_intersection(infraMuni3 |> select(infrastructureIndex)) |>
  mutate(area = geometry |> st_area()) |>
  left_join(stats_tbl |> 
              as_tibble() |>
              filter(Municipality == "Nordre Follo") |>
              select(mean, sd, indicator, n, infrastructureIndex, Municipality),
            by = "infrastructureIndex")

spread_gr <- mire_stars_gr |>
  st_as_sf(merge=T) |>
  filter(Myr153 == 1) |>
  st_intersection(infraMuni3 |> select(infrastructureIndex)) |>
  mutate(area = geometry |> st_area()) |>
  left_join(stats_tbl |> 
              as_tibble() |>
              filter(Municipality == "Gran") |>
              select(mean, sd, indicator, n, infrastructureIndex, Municipality),
            by = "infrastructureIndex")


```

```{r example-spread}
#| eval: true
#| cache: true

# A plot of Nord-Aurdal with the mean indicator values per HIA spread over the 
# EDM
from <- c(xmin = 510000, xmax = 515000, ymin = 6741000, ymax = 6746000)
to <-   c(xmin = 495000, xmax = 520000, ymin = 6765000, ymax = 6790000)
myCols <- c(
  #"#E85437",
  "#FBAF00", 
  "#B5DF73", 
  "#009000"
  )


(spread_na_map <- spread_na |>
  filter(indicator == "i_ditch") |>
  mutate(Trenching = factor(round(mean, 2))) |>
  ggplot() +
  geom_sf(aes(fill=Trenching,
              color = Trenching)) +
  geom_sf(data = na,
          alpha=0) +
  scale_fill_manual(values = myCols) +
  scale_color_manual(values = myCols) +
  coord_sf(
    datum = st_crs(myCRS),
    xlim = c(494174.8 , 537114.7 ), 
    ylim = c(6737092 , 6789676)) +
  ggmagnify::geom_magnify(from = from, to = to, 
                          expand = 0,
                          shadow =T,
                          corners = 0.1) +
  theme_bw()
)
#ggsave("../images/spread-na.tiff",
#       plot = spread_na_map)
#ggsave("../images/spread-na-small.tiff",
#       plot = spread_na_map,
#       dpi=150)
#ggsave("../images/spread-na-small.jpg",
#       plot = spread_na_map)
```

```{r EAA}
#| eval: true
#| cache: true

# Now I sample indicator values from the EDM and create a new distribution for 
# indicator values in the EAAs (the municipalities). First I sample the individual
# distributions for each mire polygon with n defined by the polygon area.
# The distribution for the polygon areas is strongly right skewed, meaning
# some polygons will contribute much more to the EAA value than others.
# I will keep this design, in-line with SEEA EA guidelines for area weighting, 
# but it's worth noting that this could be solved in other ways.
# 
combineAll <- rbind(
  spread_nf,
  spread_gr,
  spread_na
) |>
  as_tibble() |>
  drop_na() |>
  group_by(Municipality, indicator) |>
  rowwise() |>
  mutate(
    # this draws one sample pre m2 from a normal distribution:
    i_sample = list(sample(rnorm(n, mean, sd)))) |>
  select(-geometry) |>
  group_by(Municipality, indicator) |>
  reframe(i_sample = 
  # this samples randomly from i_sample, ie large polygons are more likely
  # to contibute:
            sample(
              unlist(i_sample), 
              size = 1000,
              replace = TRUE)) |>
  # truncation, since the tails from the normal distribution can go beyond 0 and 1
  mutate(i_sample = case_when(
    i_sample < 0 ~ 0,
    i_sample >1  ~ 1,
    .default = i_sample
  ))
```

```{r EEA-plot}
#| eval: true
#| cache: true
(ridgepPlot <- combineAll |>
  mutate(
    indicator = case_when(
      indicator == "i_ADSV" ~ "ADSV",
      indicator == "i_alien" ~ "Alien species",
      indicator == "i_ditch" ~ "Trenching"
    )) |>
  ggplot(
    aes(
      x = i_sample, 
      y = Municipality,
      fill = after_stat(x))
  ) +
  geom_density_ridges_gradient(
    bandwidth=.05,
    quantile_lines = TRUE, 
    quantiles = c(0.025, .5, 0.975)
  ) +
  #scale_fill_viridis_c(option = "D") +
  scale_fill_distiller(palette = "RdYlGn",
    direction = 1) +
  theme_bw(base_size = 15) +
  guides(fill = "none") +
  geom_vline(xintercept = .6,
    size=1.2,
    lty=2) +
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+
  labs(y = "", x = "Indicator values")+
  xlim(c(0,1)) +
  facet_wrap(.~indicator)
)

#ggsave(plot = ridgepPlot,
#       "../images/ridgePlot.jpg",
#       width=10,
#       height=5)
```

```{r EEA-plot2}
#| eval: true
#| cache: true

# Same as above, but for trenching only

(ridgepPlot2 <- combineAll |>
  filter(indicator == "i_ditch",
     Municipality == "Nord-Aurdal") |>
  select(indicator, i_sample, Municipality) |>  
  ggplot(
    aes(
      x = i_sample)
    ) +
  geom_density(
    fill = "lightgreen",
    alpha=.5
  ) +
  theme_bw(base_size = 15) +
  guides(fill = "none") +
  theme(
        axis.ticks.y = element_blank(),
        axis.text.y = element_blank())+
  labs(y = "", x = "Indicator values")+
  xlim(c(0.7,1))
)

#ggsave(plot = ridgepPlot2,
#       "../images/ridgePlot-ex.jpg",
#       width=6,
#       height=4)
```

```{r EEA-tbl}
#| eval: true
(EEA_tbl <- combineAll |>
  mutate(
    indicator = case_when(
      indicator == "i_ADSV" ~ "ADSV",
      indicator == "i_alien" ~ "Alien species",
      indicator == "i_ditch" ~ "Trenching"
    )) |>
  group_by(Municipality, indicator) |>
  summarise(mean = round(mean(i_sample), 2),
            median = round(median(i_sample), 2),
            percentile_025 = round(quantile(i_sample, probs = 0.025), 2),
            percentile_975 = round(quantile(i_sample, probs = 0.975), 2))
)
# saveRDS(EEA_tbl, "../output/EEA-table.RDS")
```

```{r}
(EEA_tbl_out <- EEA_tbl |>
  ungroup() |>
  mutate("Indicator value" = 
    paste0(
      format(median, 2), 
      " [",
      format(percentile_025,2),
      " - ",
      format(percentile_975,2),
      "] ")) |>
  rename(Indicator = indicator) |>
  select(
    -mean,
    -median,
    -percentile_025,
    -percentile_975,
    -Municipality
  )|>
  kbl(table.attr = "style = \"color: black;\"",
      align = "lr") |>
  kable_classic("striped",
    full_width = F) |>
  row_spec(0, bold=T) %>%
    pack_rows("Gran", 1, 3, 
      label_row_css = "background-color: #cef598; color: #000000;") |>
  pack_rows("Nord-Aurdal", 4, 6, 
      label_row_css = "background-color: #cef598; color: #000000;") |>
  pack_rows("Nordre Follo", 7, 9, 
      label_row_css = "background-color: #cef598; color: #000000;")
)

#saveRDS(EEA_tbl_out, "../output/EEA-tbl.RDS")
```

```{r indicator-map-zoom}
#| eval: true
#| cache: true

# Just an example figure to show what the spatial indicator data looks like
from <- c(xmin = 510000, xmax = 510600, ymin = 6747200, ymax = 6747800)
to <-   c(xmin = 495000, xmax = 520000, ymin = 6765000, ymax = 6790000)
myCols <- c(
  #"#E85437",
  "#FBAF00", 
  "#B5DF73", 
  "#009000"
  )


(indicator_magnify <- nature3 |>
  select(i_ditch) |>
  drop_na() |>
  mutate(Trenching = factor(format(round(i_ditch, 2), 2))) |>
  ggplot() +
  geom_sf(data = na,
          alpha=0) +
  geom_sf(aes(fill=Trenching,
              color = Trenching)) +
  scale_color_manual(values = RColorBrewer::brewer.pal(5, "Set2")) +
  scale_fill_manual(values = RColorBrewer::brewer.pal(5, "Set2")) +
  coord_sf(
    datum = st_crs(myCRS),
    xlim = c(494174.8 , 537114.7 ), 
    ylim = c(6737092 , 6789676)) +
  ggmagnify::geom_magnify(from = from, to = to, 
                          expand = 0,
                          shadow =T,
                          corners = 0.1) +
  theme_bw()
)

#ggsave(plot = indicator_magnify,
#       "../images/indicator-magnify.jpg")
```

# Introduction

Ecosystem condition accounting is the process of compiling relevant data on the status, trends and qualities of ecosystems (i.e. nature) and communicating this in a structured format [@comte2022].
Its purpose is to make it easier to account for nature in policy by making the environmental costs of certain policies and practices visible to decision makers.
A statistical standard for ecosystem accounting, including ecosystem condition accounting, was developed by the UN and adopted by the UN Statistical Commission in 2021 and is called SEEA EA (@unitednations2021; System of Environmental-Economic Accounting - Ecosystem Accounting).
The standard, or framework, is a set of rules, principles and best practices for compiling ecosystem accounts, mainly aimed at national accounts.

Central to ecosystem condition accounts are *variables* and *indicators*.
These are metrics chosen to reflect the central condition characteristics of the ecosystems.
These metrics are quantified and ideally monitored over time to reflect the status and trends in condition.
Indicators are variables that are normalised (rescaled) against upper and lower reference values to become bound between the values 0 and 1.
This normalisation ensures that indicators are more comparable because an indicator value of 1 will mean the same for all indicators, i.e. that the variable equals the upper reference value which again reflect the value of the variables under the reference (pristine) condition.
Similarly, a value of 0 means that the variable is in the worst possible state, for example that the species or ecosystem function in question is completely lost from the ecosystem.
The reference condition needs to be defined for each ecosystem condition assessment separately, but SEEA EA gives some suggestions, such as an an ecosystem with no or minimal anthropogenic disturbance.

The SEEA EA framework recommends that ecosystem condition indicators give an unbiased representation of the condition for a given area [table 1 in @czucz_selection_2021; see also @unitednations2021 §2.87], and that condition indicators are recorded uniquely for each ecosystem asset so that the condition in the ecosystem accounting area (EAA) can be inferred from an area weighted mean based on the relative extents of the ecosystem assets [@unitednations2021 §5.54].
Ecosystem assets are defined as "ecological entities" (meaning areas) about which information is sought and about which statistics are compiled [@unitednations2021].
This recommendation for representativity in the indicators means that spatially biased data are ill suited, especially if sampling intensity varies systematically along gradients of anthropogenic pressures and hence ecosystem condition.
The recommendation for spatial resolution also means that when a condition metric is sampled coarsely, for example in an opportunistic and spatially biased field campaign, then it is very likely that you will not have a enough data points to reliably estimate an indicator value for each ecosystem asset.
This problem does not exist for complete wall-to-wall data, like remotely sensed imagery, where each ecosystem asset will have enough data to estimate its unique indicator value.
This challenge if spatial generalisation is not unique to ecosystem condition accounts, and it is much discussed in the ecosystem service accounting community of practice via the related term *value transfer* [@unitednations2022; @ncaves; @grammatikopoulou2023; @barton2023].
There are at least three ways to achieve this same complete areal coverage from indicator values that originally are spatially incomplete:

a.  Predict (project) indicator values from a statistical model that accounts for the effect of environmental variation on the variables.
b.  Take a central tendency (e.g. the mean) from the sample population and project it to all ecosystem assets of a given ecosystem type.
c.  Take a central tendency from stratified sections of the sample population and projection them to the ecosystem assets, adhering to the corresponding strata.

The need for an unbiased estimation of indicator values is unquestionable, but nonetheless, this requirement puts a large limitation on what types of data one can use in ecosystem condition accounts.
Ecosystem condition assessments are generally limited by data availability, and the choice of variables and indicators to include in assessments is more often than not a pragmatic and opportunistic one, which is unlikely to reflect the full scope of the ecosystem condition characteristics.
Note that the same is true for thematic biases.
This is for example reflected in the scarcity of data commonly included on insects or soil biota, even though most will agree they represent key ecosystem characteristics.
Also having data from only one or a subset of nature types inside what is defined as the ecosystem in the assessment, is a typical thematic bias in ecosystem accounting.
However, in this paper we chose to focus on spatial bias.

Being able to make use of spatially biased data would greatly alleviate data shortage problems in ecosystem condition accounts.
One way to achieve this is modelling (option *a* in the list above).
Statistical models can describe the general associations between the sampled data and the context where it was sampled (e.g. a set of environmental variables) and use these relationships to predict and project indicator values to areas that where not originally sampled.
Depending on the data that goes into these models, they can make very precise and good indicators.
This is especially true when the EAAs are large (e.g. regions or nations).
But when they are small, like the scale of a municipality, and when the indicator is more likely to be used as the evidence base in concrete physical land use planning, then the inherent level of uncertainty from the spatial extrapolation of such models becomes unacceptable.

In this study we explore the potential for using a stratified aggregation technique to make use of spatially biased field data in ecosystem condition accounting.
We demonstrate this technique on three indicators from the same spatially biased nature type mapping data set from Norwegian mires.
We highlight the opportunities for local use-cases of this GIS-based workflow by contrasting our findings across three neighboring municipalities in Norway.

# Material and Methods {#sec-methods}

This study makes use of a data set from from a standardised field survey of nature types in Norway that started in 2018 and which is still ongoing [@norwegian_environmental_agency_naturtyper_2024].
In this survey, selected nature types are delineated on a map (over 140 000 polygons at the end of 2023), and each locality is scored on a range of variables relevant for describing the state and quality of nature [@norwegian_environmental_agency_protocol].
The surveys are commissioned with the goal of producing data relevant for immediate land-use decisions, and are therefore spatially biased, typically towards areas with high human impact or expected impact.
In addition, there is a thematic and size bias in the sampling protocol.
For example, for the forest ecosystem, rare, endangered or calcareous forest types are delineated, whereas more common or ordinary forest types are not.
In this study we focused on open mire ecosystems in Norway where the thematic bias is less severe (the spatial bias being presumably equal for all ecosystems).
The survey maps the following mire types which we use in our analyses:

-   Southern ombrotrophic mires (bogs) \> 2500 m^2^

-   Northern ombrotrophic mires \> 10.000 m^2^

-   All semi-natural mires \> 1000 m^2^

-   Calcareous southern fens (minerotrophic mires) \> 500 m^2^

-   Calcareous northern fens \> 1000 m^2^

In the above, *southern* refers to boreonemoral and southboreal zones, and *northern* refers to mid-boreal, north boreal, and alpine zones [@asbjørnmoen1998].
In addition, the northern fens need to be even more calcareous than the southern fens in order to be surveyed.
We included data from 2018 to 2023.
In this paper we assume the survey is representative for the entire mire ecosystem in Norway (i.e. that there is no thematic bias).
Although it is possible that smaller or less calcareous mires will score systematically different than the ones that are surveyed, we do not think this is so much the case for our variables (see below).
However, we do assume that alien plants are slightly less common on bogs relative to fens, and therefore that this variable will be biased since that variable is only recorded in fens.
The other variables are recorded in all delineated mires.

From the survey data set we identified six relevant variables (variable 1-6 in @tbl-variables) which we attribute to three different ecosystem condition characteristics that describe the typical behavior of open mire ecosystems in the reference condition (Figure S1).
Variable 1, 7FA, represents the abundance of alien species, mainly plants.
Variables 2-5 (7SE, PRSL, 7TK and PRTK) describe very related aspects, and are attributed to the same ecosystem condition characteristic (vegetation intactness) and so they were combined into a single indicator called anthropogenic disturbance to soil and vegetation (ADSV; Figure S1).
These variables were were originally recorded along binned frequency ranges [@norwegian_environmental_agency_protocol].
Because the data was strongly right skewed, we used the lower limit for each frequency range to convert them into percentages.
This was done by summing the four variables after they had been converted to percentages.
This was not a perfect solution, especially since some localities only had one of the variables recorded, but we chose this, rather than for example using a *worst rule* principle, to better separate the localities in terms of their indicator values.
Variable 6, 7GR-GI, is different from the preceding variables in @tbl-variables in that it includes an estimation of future effects that the observed trenches are projected to have on mire vegetation, function, or structure over time.
This is not a favourable trait in a metric used to evaluate the ecosystem condition as it is today, yet we include it here nonetheless because there is a general shortage of data on mire hydrology, which is a fundamental part of mire integrity.

To turn variables into indicators we scaled them using three normative reference values each: an upper (best possible condition), a lower (worst possible condition), and a threshold value that defines the breaking point between good and poor condition [@jakobsson2020].
The reference values make up a numerical representation of the reference condition.
We define the reference condition as one where ecosystems are subject to little or no human influence, with a climate as in the period 1961-1990 and a native species pool similar as today [@jakobsson2021].

The variable 7FA was rescaled into the indicator named *Alien species*, 7GR-GI was rescaled into the indicator *Trenching*, and ADSV was rescaled and kept the name *ADSV* (Figure S1).
For *Alien species* and *ADSV*, the lower and upper reference values were defined as 100% and 0%, respectively.
The threshold for *good ecosystem condition* was defined as 10%, which was then mapped to the value 0.6 on the rescaled indicators, thus creating a non-linear rescaling of the variable (Figure S2).
For *Trenching* we used the lower and upper reference values 1 and 5, respectively, and a threshold value of 2.5 (Figure S2).
A variable value of 1 indicates an intact mire, and a value of 5 indicates a mire transitioning away from a wetland.
A value of 2 indicates observable change within the range expected for the same mapping unit, and a value of 3 indicates a mire transitioning into a neighbouring (ecologically speaking) mapping unit.
See @fig-workflowExample and S1 for schematic workflows for the indicator development.
*Alien species* was attributed to Ecosystem Condition Typology (ECT) class B1 - Compositional state characteristics and *ADSV,* and *Trenching* were attributed to ECT class A1 - Physical state characteristics [@czucz_common_2021].

```{r fig-studyLocation2}
#| eval: true
#| include: true
#| out.width: '80%'
#| fig.cap:
#| - "Map of the three focal municipalities. For each municipality the maps
#| in the top row show ocean in blue and non-ocean in green. 
#| The survey coverage maps are in grey, and the mapped mires are
#| in red, with polygon borders made extra thick to make them visible,
#| but then also exaggerating their size.
#| The bottom row shows the delineation for homogeneous impact areas which
#| is an ordinal gradient from 0–3 with increasing presence of human 
#| infrastructure."

knitr::include_graphics("../images/studyLocations.jpg")
```

\pagebreak

We used an ecosystem delineation map for open mires in southern Norway, produced using remotely sensed data and a deep learning model [@bakkestuen_delineation_2023].
This model estimates 12.7% of the area in southern Norway is mire [@bakkestuen_delineation_2023].
Mires are ecologically and socially important in Norway simply due to their large extent, and due to their role in climate mitigation as mires store a large amount of carbon.
There has not been a national assessment of the ecosystem condition of mires in Norway, but the authors recently contributed to a report which presented several new indicators that can be used in future national assessments for this ecosystem [@nybo_indikatorer_2023; see also @kolstad2023]).
The current study builds on the work in that report.

Because the nature type survey data are spatially biased, we cannot assume that they are area-representative.
In an attempt to overcome this issue, we divided the area of Norway into four non-overlapping Homogeneous Impact Areas (HIAs) based on an infrastructure index for the year 2022 (@tbl-variables).
This index is a continuous variable that represents the frequency of different infrastructure types inside 500 m radius circles around each 100×100 m pixel [@erikstad2023].
We then categorised this continuous variable into four classes (0–3) using ranges chosen to produce a sensible and relatively linear area classification when visualised along an urban-wilderness gradient, whilst at the same time delineating sufficient areas for each stratum across Norway (Table S1).
We then aggregated the data to 1×1 km pixels to ease computations and vectorised it.
The name HIAs is not a perfect representation of the information found in the infrastructure index, but we introduce this name here as a general term which is aligned with the concept of Homogeneous Ecosystem Areas, *sensu* @vallecillo_eu-wide_2022.

We used the entire national data set for assessing the relationship between the HIA levels and the indicator values.
We then subsetted the national data set and chose three municipalities in south Norway to test out the indicators.
The municipalities differ in several aspects, such as the amount of mire area, the total area surveyed, and the prevalence of infrastructure (@tbl-munis, @fig-studyLocation2, S3, S4).
The nature type polygons with the indicator values were intersected with the HIA map and a map of municipal outlines in Norway.
The relationship between the HIA classes and indicator values was examined visually.
For each HIA and municipality combination we then created a probability distribution for the area weighted mean of the indicator values using Bayesian updating of a uniform prior between 0 and 1, informed by the standard deviation (SD) of the indicator values in the national data set (@fig-workflowExample C; Appendix C, line 134 <!--# update line number manually when re-rendering appendix C -->).
We refer to this step as the first of two *aggregation* steps (Figure S1).
The resulting distributions are assumed to be normally distributed, and we therefore simply carried the mean and SD from the posterior distributions over to individual polygons in the ecosystem delineation map, for each HIA class separately.
This step is referred to as *spreading* the data (Figure S1).
For each polygon we then sampled random numbers from a normal distribution with this same mean and SD.
The number of m^2^ dictated the number of samples for each polygon, thus ensuring that large polygons ended up counting more towards the indicator value in the given municipality.
We then drew 1000 random values from this vector of possible indicator values and created a probability distribution for the indicators for each municipality (the second aggregation step; Figure S1).
When there were no indicator values for a given HIA class, we ignored that class also in the municipal estimate.

All the data preparation and analyses were done in RStudio [@positteam2024] and R version 4.3.0 [@rcoreteam2023], relying heavily on the packages tidyverse [@wickham2019], sf [@pebesma2018], and stars [@pebesma2023].

\pagebreak

\blandscape

| id  | Variable code         | Variable name                                    | Measurement unit                                                                         | Description                                                                                                                                                       | Reference                                |
|------------|------------|------------|------------|------------|------------|
| 1   | 7FA                   | Prevalence of alien species                      | Unit-less, ordinal, non-linear scale from 1 (no alien species) to 7 (only alien species) | The fraction of the species composition made up from alien species                                                                                                | @runehalvorsen2019                       |
| 2   | 7SE                   | Human caused abrasion or abrasion-caused erosion | Unit-less, ordinal, non-linear scale from 1 to 4.                                        | Measures the frequency of imagined 4 m^2^ quadrats laid over the area that has some sign of abrasion                                                              | @runehalvorsen2019                       |
| 3   | PRSL                  | *as above*                                       | Unit-less, ordinal, non-linear scale from 0 to 7.                                        | Same as 7SE, but recorded at a higher resolution                                                                                                                  | @norwegian_environmental_agency_protocol |
| 4   | 7TK                   | Tracks from large vehicles                       | Unit-less, ordinal, non-linear scale from 1 to 4.                                        | Measures the frequency of imagined 100 m^2^ quadrats laid over the area that has some signs of vehicle tracks                                                     | @runehalvorsen2019                       |
| 5   | PRTK                  | *as above*                                       | Unit-less, ordinal, non-linear scale from 0 to 7.                                        | Same as 7TK, but recorded at a higher resolution                                                                                                                  | @norwegian_environmental_agency_protocol |
| 6   | 7GR-GI                | Trenching intensity                              | Unit-less, ordinal scale from 1 to 5                                                     | Describes the effect that drainage ditches is estimate to have on the species composition and environmental variables ones the system reached its new equilibrium | @norwegian_environmental_agency_protocol |
| 7   | Infra-structure Index | Infrastructure Index                             | Unit-less linear scale from 0 to 13.2                                                    | Unit-less index ranging from from 0 to 13.2                                                                                                                       | @erikstad_index_2023                     |
| 8   | HIA                   | Homogeneous Impact Area                          | Ordinal, non-linear scale from 1 to 4                                                    | A categorical representation of the infrastructure index                                                                                                          | this paper                               |

: Variables used in this study {#tbl-variables} {tbl-colwidths="\[5,10,15,25,25,20\]"}

{{< pagebreak >}}

| Municipality  | Total terrestrial area (km^2^)                                            | \% of terrestrial area surveyed                                            | \% open mires in relation to total terrestrial area                         | Total mire area (km^2^)                                                  | %of mire area inside survey area                                                      | Number of mire polygons in survey                                 | Mean Infrastructure Index value                                         |
|---------|---------|---------|---------|---------|---------|---------|---------|
| Nordre Follo  | `r muni_tbl |> filter(Municipality == "Nordre Follo") |> pull(t_area_km)` | `r muni_tbl |> filter(Municipality == "Nordre Follo") |> pull(dk_percent)` | `r muni_tbl |> filter(Municipality == "Nordre Follo") |> pull(mirePercent)` | `r muni_tbl |> filter(Municipality == "Nordre Follo") |> pull(mire_km2)` | `r muni_tbl |> filter(Municipality == "Nordre Follo") |> pull(mireInSurvery_percent)` | `r muni_tbl |> filter(Municipality == "Nordre Follo") |> pull(n)` | `r muni_tbl |> filter(Municipality == "Nordre Follo") |> pull(meanHIA)` |
| `Gran`        | `r muni_tbl |> filter(Municipality == "Gran") |> pull(t_area_km)`         | `r muni_tbl |> filter(Municipality == "Gran") |> pull(dk_percent)`         | `r muni_tbl |> filter(Municipality == "Gran") |> pull(mirePercent)`         | `r muni_tbl |> filter(Municipality == "Gran") |> pull(mire_km2)`         | `r muni_tbl |> filter(Municipality == "Gran") |> pull(mireInSurvery_percent)`         | `r muni_tbl |> filter(Municipality == "Gran") |> pull(n)`         | `r muni_tbl |> filter(Municipality == "Gran") |> pull(meanHIA)`         |
| `Nord-Aurdal` | `r muni_tbl |> filter(Municipality == "Nord-Aurdal") |> pull(t_area_km)`  | `r muni_tbl |> filter(Municipality == "Nord-Aurdal") |> pull(dk_percent)`  | `r muni_tbl |> filter(Municipality == "Nord-Aurdal") |> pull(mirePercent)`  | `r muni_tbl |> filter(Municipality == "Nord-Aurdal") |> pull(mire_km2)`  | `r muni_tbl |> filter(Municipality == "Nord-Aurdal") |> pull(mireInSurvery_percent)`  | `r muni_tbl |> filter(Municipality == "Nord-Aurdal") |> pull(n)`  | `r muni_tbl |> filter(Municipality == "Nord-Aurdal") |> pull(meanHIA)`  |

: Information for the three target municipalities in Norway {#tbl-munis}

\elandscape

# Results

## Indicator validity and trends in Norway

The three indicators showed some association with the HIAs when we looked at data from all of Norway with `r nrow(naturetypes)` individual mires.
The indicators mostly showed a worsening of condition with increasing presence of human infrastructure (@fig-validation).
This relationship was much stronger for the indicator *Trenching*, and weakest for the indicator *Alien species*.
*Alien species* also had the highest indicator values overall, with most mires having no alien species recorded.
For *Trenching*, 57% of mires in HIA-3 had some trenches, whereas in HIA-0 this number was 10%.

## The three focal municipalities

Only in Gran municipality was there a significant difference in the *Trenching* indicator between HIAs, with HIA-2 having worse condition than HIA-1 (@fig-forest).
Conversely, the *ADSV* indicator in Gran showed worse condition in HIA-1 compared to HIA-2.
Nordre-Follo had considerably fewer data points compared to Gran, and especially to Nord-Aurdal who had the most data points, and this paucity of data is reflected in the wide credible intervals for all three indicators in Nordre Follo.
The credible intervals are widest for the *Trenching* indicator, and this is because the national data set showed more variation in the underlying variable 7GR-GI compared to the other variables 1-5 (@tbl-variables), and this information is informing the Bayesian updating process (see @sec-methods).
Besides Nordre Follo, which do not contain any HIA-0 areas (i.e. relatively wild areas), the three municipalities had areas in all HIA classes.
Yet, none of the three municipalities had survey data from all HIAs.
Therefore, when we transferred the mean and SD for each HIA (@fig-forest) over to the ecosystem delineation map (@fig-spread-example), some mire polygons in the ecosystem delineation map did not get assigned any data, and were therefore not included (i.e. did not have any influence on) the following aggregation to ecosystem accounting areas (EAA) (@fig-workflowExample, E) .
At the municipality level (i.e. the EAA level) the three municipalities show some differentiation.
Nordre Follo had the highest (the best) indicator values for ADSV, but the lowest for Trenching, which was also the only instance of an indicator crossing the threshold from good to deteriorated condition (@fig-EEA-ridgePlot; Table S2).

```{r fig-workflowExample}
#| eval: true
#| include: true
#| out.width: '100%'
#| fig.cap:
#| - "Simplified schematic showing the workflow (Panes A-E) for spatial generalisation of a spatially biased ecosystem condition indicator. Pane A shows a spatially explicit, patchy and spatially biased indicator. The outline is the ecosystem accounting area (EEA). Pane B shows the location of four homogeneous impact areas (HIAs) inside the EEA. The indicator values in Pane A are used, in combination with the HIA map in Pane B, to update a uniform prior and produce a posterior probability distribution for the mean area weighted indicator value for each HIA (Pane C; here simplified to only show the mean (circles) and 95% credible intervals (coloured bands)). The Bayesian updating is informed by the standard deviation for the indicator in a much bigger national data set. The width of the posterior distribution responds both to variation in the data and to the sample size, giving  a realistic measure for the uncertainty around the indicator even with a single observation. The distributions in Pane C are assumed normally distributed. Note that because in this example there were no indicator data for HIA-3, indicator values are only aggregated for HIAs 0–2. In Pane D the mean and SD from the posterior distributions are transferred to individual polygons in an ecosystem delineation map, for each HIA separately. The colours indicate different indicator values (highlighted in the inset). For each ecosystem occurrence (i.e. ecosystem assets) in Pane D, we draw one random value for each square meter of area from a normal distribution with the mean and SD that is associated with that polygon. In Pane E we have randomly sampled 1000 values from the entire vector of samples in the previous step, and we get an area weighted probability distribution for the mean indicator value for the EEA."

knitr::include_graphics("../images/aggregation-workflow.jpg")

```

```{r fig-validation}
#| eval: true
#| include: true
#| out.width: '90%'
#| fig.cap:
#| - "Proportion of localities with different indicator values for three 
#| ecosystem condition indicators along an ordinal gradient of increasing
#| infrastructure densities (Homogeneous Impact Areas 0–3). 
#| The data is from a national nature type survey in Norway." 

knitr::include_graphics("../images/validation-plot.jpg")
```

```{r fig-forest}
#| eval: true
#| include: true
#| out.width: '80%'
#| fig.cap:
#| - "Indicator values (circles = mean; bars = 95% credible intervals) for three
#| mire ecosystem condition indicators (rows) 
#| in three Norwegian municipalities (columns).
#| The indicator values 
#| are calculated uniquely for each Homogeneous Impact Area (HIA) in each 
#| municipality. The numbers to the left of each bar is the sample size, i.e. 
#| the number of surveyed mires."  

knitr::include_graphics("../images/forest-plot.jpg")
```

```{r fig-spread-example}
#| eval: true
#| include: true
#| out.width: '80%'
#| fig.cap:
#| - "An indicator for mire trenching shown for Nord-Aurdal municipality. 
#| Individual mire polygons are coloured by the mean indicator values for the
#| homogenous impact area where it lies. Colours are chosen to best reflect 
#| categorical differences and exaggerates the absolute difference between 
#| areas. The inset it just a visual aid. Coordinate reference system is 
#| EPSG 25832. Axis are in meters."

knitr::include_graphics("../images/spread-na-small.jpg")
```

```{r fig-EEA-ridgePlot}
#| eval: true
#| include: true
#| out.width: '80%'
#| fig.cap:
#| - "Distributions for three ecosystem condition indicators in the Norwegian 
#| municipalities. The colour gradient reflects the value of the x-axis. 
#| The dotted vertical line represents the threshold for what is considered 
#| reduced ecosystem condtion (< 0.6). Vertical lines under the density curves
#| are 2.5%, 50% (the median) and 97.5% percentiles."

# Histogram smoothing makes it look like there are more data points beyond the lower and upper percentiles than there really is. The 2.5% percentile sometimes looks like the 25% percentile. 

knitr::include_graphics("../images/ridgePlot.jpg")
```

# Discussion

## The stratified GIS-based approach using HIAs

In this paper we have demonstrated a generalisable GIS-based workflow for making use of spatially biased field data to produce representative ecosystem condition indicators at a local municipal scale (@fig-workflowExample).
This approach is useful especially for local assessments where we do not want to use data from outside the EAA to inform or influence our estimates for condition indicators inside the EAA.
The method relies on stratifying the field data by Homogeneous Impact Areas (HIAs).
Our HIA was an ordinal gradient from low to high prevalence of infrastructure, and we found a relationship of increasing infrastructure and decreasing indicator values ranging from weak, in the case of the *Alien species* indicator, to strong, for the indicator for *Trenching* in mires (the *ADSV* indicator falling somewhere in the middle) (@fig-validation) .
This implies that the stratification was warranted in the latter case (*Trenching*), i.e. that field data from urban or sub-urban mires could not be said to represent the status of mires in more remote wilderness areas.
In the case for *Alien species*, stratifying the variable by the HIAs can be predicted to have had a less directional effect on average, but we did find a difference between HIA-1 and 2 in Nord-Aurdal municipality, indicating that the impact from alien species differs between these two strata of varying infrastructure (@fig-forest).
The drawback that we see from using this stratification technique when the HIA variable is not a good predictor for variation in the indicator, is that we end up with less data points, and therefore higher uncertainties around our indicator estimates, than we otherwise would.
However, we do not see this as a major problem.
Also, since our HIA areas are somewhat cohesive, forming a clumped or aggregated pattern, stratifying by it will on average lead to relatively more local data being used for each local indicator estimate, with less geographic extrapolation.
Therefore, we see little risk in using the stratification process also on indicators where the relationship between the indicators and the HIA are weak, although we stress that not accounting for other, potentially more relevant predictors of indicator values, would leave the indicator as spatially biased as the original variable.

## The benefits of using Bayesian updating

We have also demonstrated how Bayesian updating and re-sampling can be used instead of traditional arithmetic aggregation of indicators, from ecosystem assets to EAAs, to produce probability density distributions for indicator values.
A major benefit of this approach in our example was that it allowed us to present reliable uncertainty ranges for our indicator estimates even with very low sampling sizes, thus making it possible to estimate indicator values for a much greater area than if we were forced to set a minimum sample size threshold.
An indicator estimate is of little value unless it is accompanied by a measure of uncertainty.
The variance is commonly used, usually calculated from the variation between spatially distant sampling points.
However, the variance is not reliable for low sample sizes.
For example, even if a numerical population has a large variance, it is still possible to sample, by chance alone, two or three numbers that are very close to each other, and this sample would then display a low variance which is not reflective of the true variance.
Knowing this, one approach is to set a sample size limit, and refrain from estimating indicator values unless there are more than a given number of data points.
In our test municipality Nordre Follo, we had only 1-3 data points for each HIA strata, yet we were able to use these data regardless because the Bayesian updating produces a reliable, and relatively broad, uncertainty range for the estimate.
This is an advantage, especially for local ecosystem condition assessments where data scarcity is likely to be more of an issue than in for example national assessments.

## How to get more data for local Ecosystem Condition Accounts

The UN standard for ecosystem accounting has been tailed towards national accounts [@unitednations2021].
Similarly, reporting of Ecosystem Accounts to Eurostat, expected to become mandatory for all EU and EEA countries from 2026, is unlikely to require sub-national resolution.
Assessments at national scales can make use of data sets, and hence indicators, that local assessments cannot.
For example, national forest inventories or other national area representative field-based nature monitoring provide enough data points for reliable estimates of indicator values only at relatively large spatial scales.
Local ecosystem accounts instead rely more on remote sensing or field measurements.
Remote sensing holds great promise for delivering valuable information to ecosystem accounts across all scales, yet there are some ecosystem characteristics that cannot easily be sensed remotely.
This includes for example species identities, and hence indicators on topics such as alien species or occurrence of threatened or certain keystone species.
Fieldwork will therefore still be vital for collection of information for ecosystem accounts also in the future.

To increase efficacy of fieldwork at local scales - and hence, reduce costs - our results demonstrate that municipalities can stratify their field sampling according to the HIAs.
For example, Gran and Nord-Aurdal have not recorded any field data from their most urban areas (HIA 3) and could make these areas a priority in upcoming field campaigns in order to get more complete spatial indicator coverage.

We used a national data set of nature type surveys that follow a shared protocol [@norwegian_environmental_agency_protocol].
There can be other data sets, that may not be compiled nationally, but which could be relevant to local ECAs.
In some cases, a local ECA is mostly interested in describing and monitoring the status within its own EAA, but great benefits may be had if accounts can also be compared between EAA, say between municipalities.
This kind of information can give grounds for a more regional assessment, for example looking at the effects of policies on ecosystem condition trajectories.
To accomplish this level of synthesis it helps to be able to use indicators that are common across EAAs.
This all depends on the right variables being collected in a similar way, for example across different projects or monitoring frameworks.
In Norway, ecosystem impact assessments (EIA) are frequently being produced in conjunction with planned development projects.
This data could be made very valuable for local ecosystem condition accounts, but currently there are several things stopping this.
Firstly, there is no standardised reporting for EIA in Norway, and no mandatory publication pathway, and therefore no way to obtain a unified data set of variables recorded across different EIAs.
If data is made publicly available in a national data base, they could be used to design indicators that are informative at multiple scales, also larger than the project level scale at which they were recorded.
Second, the data collection is mainly on the occurrence of specific nature type or species of interest, and true condition parameters are not part of the information required by the authorities.
In contrast, other nations have implemented EIA or EIA equivalent systems that have much greater synergistic properties towards ecosystem accounting, such as the Biodiversity Net Gain program in England [@departmentforenvironmentfoodruralaffairs].
This program enforces developers to record and compile data on ecosystem extent and condition at very fine resolution in order to document their effect on nature and ensure a positive net gain from their activities.
Relevant ecosystem condition variables are recorded for each delineated ecosystem areas, providing an explicit link between ecosystem type and condition.

## Indicator values in the three pilot municipalities

The indicator values for our three indicators across the three municipalities showed in general a good condition in the mire ecosystems (@fig-EEA-ridgePlot).
The exception was for variable *Trenching* in Nordre Follo, where the indicator value was 0.25; well below the threshold for good ecosystem condition (0.6).
This estimate is based on just 4–5 data points (i.e. mires), and this is also reflected in the wide credible intervals \[0.04 – 0.50\].
Nordre Follo has surveyed more of its total land area compared to the other two municipalities, but has less mire area to begin with (@tbl-munis), which is the reason for this low sample size.
We believe this indicator estimate is trustworthy, considering the wide credible intervals, and that it could reflect the true prevalence of trenching in Nordre Follo.
Future work could test this assumption by simulating the effect of varying the sample size for an EAA where all the mires are sampled.

Mire condition in Norway has not been assessed at a national scale.
Still, we know that they are subject to multiple threats, especially land conversion into agriculture, forestry, or development, and we see this reflected in the Norwegian Red List for Nature types where many ombrotrophic bog types, and rich southern minerotrophic fens are listed [@artsdatabanken2018].
Land conversion of mires is often initiated with trenching to lower the water table but note that our *Trenching* indicator does not document the effect of trenching on old mires that are today transitioned into another ecosystem.
This is in line with the UN standard for ecosystem accounting, where these effects should be captured by the extent accounts [@unitednations2021].
Therefore, the negative consequences of trenching on all Norwegian mires, old and current, must be higher than what our indicator is made to show, most likely considerably higher.

Alien species risk is not considered a threat to any of the Norwegian mire types, and our results confirm this, at least for the present (@fig-EEA-ridgePlot).
Motorised vehicle traffic and other forms of human transportation are also threats listed for some mire types in the red list assessment, and we see some reduction in the ADSV indicator, most so in Gran and in areas closer to infrastructure (@fig-forest).
Still, the indicator values are consistently above the threshold value for good ecological condition (@fig-EEA-ridgePlot).

# Conclusion

In conclusion, we have demonstrated a new GIS based workflow for constructing ecosystem condition indicators from spatially biased variables using a method of stratified aggregation by an ordinal gradient of Homogeneous Impact Areas (HIA).
In addition, we show how Bayesian updating can help produce uncertainty estimates for condition indicators that suffer from low sampling sizes where traditional variances would be unreliable.
Our workflow can also help guide local field efforts to in the future get better and less resource demanding local ecosystem condition accounts.

# CRediT authorship contribution statement {.unnumbered}

**Anders L. Kolstad**: Conceptualization, Methodology, Software, Validation, Formal analysis, Data Curation, Writing - Original Draft, Visualization, Project administration, Funding acquisition.
**Matthew Grainger**: Validation, Writing - Review & Editing.
**Marianne Evju**: Conceptualization, Writing - Review & Editing.

# Declaration of Competing Interest {.unnumbered}

We have no competing interests to declare.

# Acknowledgements {.unnumbered}

We wish to thank Kwaku Peprah Adjei for statistical help, and Hanno Sandvik for code review.

# Data availability {.unnumbered}

This manuscript is written in Quarto, and the source file (Appendix B) also contains code for all the analyses underlying this study, including data exploration and cleaning.
For a rendered version of the source file, with all code and calculation visible, see Appendix C. The source files and the data are also located on GitHub (https://github.com/anders-kolstad/HIAs) with an archived stable version on Zenodo (doi:10.5281/zenodo.11504703).
The exception is the large nature type survey data, which was downloaded locally, but which is freely available (see @norwegian_environmental_agency_naturtyper_2024).

{{< pagebreak >}}

# References {.unnumbered}

::: {#refs}
:::