# Fireveg DB - overview of taxonomic coverage

Author: [José R. Ferrer-Paris](https://github.com/jrfep) and [Ada Sánchez-Mercado](https://github.com/adasanchez)

Date: July 2024

This Jupyter Notebook includes R code to visualise data from the Fireveg Database. 

The input is loaded from a public data record of the database.

We use this code to ...

## Set-up

### Load packages

In [1]:
library(ggplot2)
library(dplyr)
require(tidyr)
library(igraph)
library(ggraph)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: tidyr


Attaching package: ‘igraph’


The following object is masked from ‘package:tidyr’:

    crossing


The following objects are masked from ‘package:dplyr’:

    as_data_frame, groups, union


The following objects are masked from ‘package:stats’:

    decompose, spectrum


The following object is masked from ‘package:base’:

    union




For [Markdown display from a code cell](https://stackoverflow.com/questions/35786496/using-r-in-jupyter-display-markdown-in-loop?rq=4).

In [2]:
library(IRdisplay)

For data download from cloud storage

In [3]:
require(osfr)
library(jsonlite)

Loading required package: osfr

Automatically registered OSF personal access token



### Paths for inputs and outputs

Locate the root directory of the repo

In [4]:
here::i_am("Notebooks/Overview-taxonomic-coverage.ipynb")

here() starts at /Users/z3529065/proyectos/fireveg/fireveg-analysis



Relative path to local data files within project repository

In [5]:
data_dir <- here::here("data")
if (!dir.exists(data_dir))
    dir.create(data_dir)

### Download data from OSF

 <div class="alert alert-info">
     <img src='../img/open-data-2.png' width=25 alt="open data icon"/>
Data for this Notebook is available from the following OSF repository:

> Ferrer-Paris, J. R., Keith, D., & Sánchez-Mercado, A. (2024, August 15). Export data records from Fire Ecology Traits for Plants database. Retrieved from [osf.io/h96q2](https://osf.io/h96q2/)
</div>

Here we will download data programmatically from OSF cloud storage to our local data folder. First we will check the metadata for the target file. We use the `osf_ls_files` function from package `osfr` to explore the metadata of the file associated to the OSF component.

In [6]:
osf_project <- osf_retrieve_node("https://osf.io/h96q2")
file_list <- osf_ls_files(osf_project)

In [7]:
select(file_list, name, id)

name,id
<chr>,<chr>
fire-history.rds,6452ba9d13904f00b7fc85d2
Quadrat-sample-data.rds,6452bab38ea16b0093b69427
site-visits.rds,6452bac07177850087b0f73c
Summary-traits-family.rds,6452bacfb30b4900b4b9ddc4
Summary-traits-species.rds,6452bae3717785008bb0f4b1
field-sites.gpkg,648a583bbee36d028d0e6261
Summary-traits-sources.rds,64966f6fa2a2f4075a436743
Trait-info.rds,649a64e8a2a2f40aa7436407
References-traits-sources.rds,66c8198039554f1e062d2f46
Summary-traits-species-orders.rds,66c829ef6725569184c5ca7a


We will select a subset of files to download

In [8]:
files_to_download <- c(
    "Trait-info.rds",
    "Summary-traits-sources.rds", 
    "References-traits-sources.rds",
    "Quadrat-sample-data.rds",
    "Summary-traits-species.rds",
    "Summary-traits-species-orders.rds"
)

In [9]:
selected_files <- filter(file_list, name %in% files_to_download)

To download the latest version we apply the `osf_download` function with option `conflicts="overwrite"`. 
If we already have the latest version we can choose option `conflicts="skip"`.

In [10]:
downloaded_files <- osf_download(selected_files,
             data_dir,
             conflicts = "overwrite")


**What about older versions?**

We can request more complete version information with a direct call to the API using the `read_json` function. For example the version for the first downloaded file are available here:

In [11]:
file_versions <- read_json(downloaded_files$meta[[1]]$relationships$versions$links$related$href)

And we could use these urls to download specific versions:

In [12]:
results <- lapply(file_versions$data, function(x) {
    sprintf("version id %s from %s available at %s\n",
            x$id,
            x$attributes$date_created,
            x$links$download)
})
for (res in results) 
    cat(res)

version id 7 from 2024-08-23T09:56:59.235285 available at https://osf.io/download/6452bab38ea16b0093b69427/?revision=7
version id 6 from 2024-08-23T04:45:32.377746 available at https://osf.io/download/6452bab38ea16b0093b69427/?revision=6
version id 5 from 2024-08-22T12:27:43.452137 available at https://osf.io/download/6452bab38ea16b0093b69427/?revision=5
version id 4 from 2024-08-14T21:45:11.547888 available at https://osf.io/download/6452bab38ea16b0093b69427/?revision=4
version id 3 from 2023-09-11T09:37:38.971966 available at https://osf.io/download/6452bab38ea16b0093b69427/?revision=3
version id 2 from 2023-06-01T04:36:35.649577 available at https://osf.io/download/6452bab38ea16b0093b69427/?revision=2
version id 1 from 2023-05-03T19:49:07.778770 available at https://osf.io/download/6452bab38ea16b0093b69427/?revision=1


### Read data from local folder

The data is now available in our local data folder and we can use `readRDS` to read this file into our R session:

In [13]:
traits_table <- readRDS(here::here(data_dir,'Summary-traits-sources.rds'))
references <- readRDS(here::here(data_dir,"References-traits-sources.rds"))
quadrat_samples <- readRDS(here::here(data_dir,"Quadrat-sample-data.rds"))
spp_traits_table <- readRDS(here::here(data_dir,"Summary-traits-species.rds"))
trait_info <- readRDS(here::here(data_dir,"Trait-info.rds"))

In [14]:
sptraits <- readRDS(here::here(data_dir,"Summary-traits-species-orders.rds")) %>%
  rowwise() %>% 
  mutate(litdata=sum(c_across(germ8:surv1))>0) %>%
  ungroup() %>%
  mutate(
    fielddata=nquadrat>0,
    kldg=case_when(
      fielddata & litdata ~ "both",
      fielddata ~ "field",
      litdata ~ "literature",
      TRUE ~ "none"
    )
  )


In [22]:
dim(sptraits)
dim(spp_traits_table)
sptraits |>
    filter(taxonrank %in% "Species") |>
    group_by(fielddata, litdata) |>
    summarise(n_distinct(current_species), .groups = "drop") 
# this can introduce errors due to double counting of a species with multiple names
# for example, if two names are synonyms but one name has data and the other hasn't

fielddata,litdata,n_distinct(current_species)
<lgl>,<lgl>,<int>
False,False,6140
False,True,5484
True,False,100
True,True,793


In [23]:
5484+100+793

## Trait information

The data frame `trait_info` includes descriptions of all traits, here we show the priority traits that are already uploaded in the current version of the database.

In [27]:
tbl_trait_info <- trait_info %>% 
  filter(!is.na(priority)) %>%
  rowwise() %>% 
  mutate(Code=code, Trait=name, 
         Description = description,
            `Classification` = paste( 
              life_stage,
              life_history_process, 
              sep="/")) %>%
  ungroup() %>% 
    arrange(desc(life_history_process),Code) %>% 
  select(Code, Trait, Classification, Description) %>%
  knitr::kable()
    
display_markdown(paste(as.character(tbl_trait_info), collapse="\n"))

|Code   |Trait                                                |Classification              |Description                                                                                                                                   |
|:------|:----------------------------------------------------|:---------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------|
|surv1  |Resprouting - full canopy scorch                     |Standing plant/Survival     |Ordinal categories of survival and resprouting proportions for plants subjected to 100% canopy scorch                                         |
|surv4  |Regenerative Organ                                   |Standing plant/Survival     |NA                                                                                                                                            |
|surv5  |Standing plant longevity (Max)                       |Standing plant/Survival     |Age at which 50% of individuals in a cohort (excluding outliers) have died from senescence                                                    |
|surv6  |Seedbank half-life                                   |Seed/Survival               |Age at which 50% of a seed cohort in an in situ seedbank have decayed or become inviable                                                      |
|surv7  |Seed longevity                                       |Seed/Survival               |Age at which all seeds in a cohort (excluding outliers, e.g. 95th percentile) have decayed or become inviable                                 |
|repr2  |Post-fire flowering response                         |Standing plant/Reproduction |NA                                                                                                                                            |
|repr3  |Age at first flower production (from seed)           |Standing plant/Reproduction |The time taken for first individual in a recruitment cohort to produce their first reproductive organs (e.g. flowers, sporophylls)            |
|repr3a |Time to first postfire reproduction (from resprouts) |Standing plant/Reproduction |The time taken after fire for first reproductive organs (e.g. flowers, sporophylls) to be produced in a population of resprouting individuals |
|repr4  |Maturation age                                       |Standing plant/Reproduction |The time taken for 50% of individuals in a cohort [even aged recruits] to produce their first viable seed                                     |
|rect2  |Establishment pattern                                |Seedling/Recruitment        |The temporal pattern of seedling or clonal establishment through the fire cycle                                                               |
|grow1  |Age to develop regenerative/resistance organs        |Standing plant/Growth       |The time taken to develop organs or tissues enabling at least 50% of cohort survival when fully scorched in a fire                            |
|germ1  |Seedbank Type                                        |Seed/Germination            |NA                                                                                                                                            |
|germ8  |Seed dormancy type                                   |Seed/Germination            |NA                                                                                                                                            |
|disp1  |Propagule dispersal mode                             |Seed/Dispersal              |Propagule dispersal mode                                                                                                                      |

## Plant species in NSW according to BioNet

The data frame `spp_traits_table` is based on the BioNet Altas list of species. 

This list includes around 8170 distinct taxa (based on current taxonomic status) at the species level which are considered native and alive in NSW. It also includes  around 1250 infra-species level taxa considered to be alive in NSW.

In [13]:
spp_traits_table |>
    group_by(`species level`= taxonrank %in% "Species",establishment) |>
    summarise(records=n(), `original names` = n_distinct(spp), `current names`=n_distinct(current_species), .groups = "drop") 

species level,establishment,records,original names,current names
<lgl>,<chr>,<int>,<int>,<int>
False,"Alive in NSW, Native",1813,1813,1251
False,"Extinct in NSW, Native",9,9,4
False,Hybrid,1,1,1
False,Introduced,399,399,242
False,Not Known from NSW,5,5,5
True,"Alive in NSW, Native",10556,10556,8169
True,"Extinct in NSW, Native",38,38,28
True,Introduced,3309,3309,2804
True,Not Known from NSW,27,27,26


We can filter this table by considering how many species have at least one record in the field sample (column `nquadrat` in this dataframe).

In [14]:
spp_traits_table |>
    filter(nquadrat>0) |>
    group_by(`species level`= taxonrank %in% "Species",establishment) |>
    summarise(records=n(), `original names` = n_distinct(spp), `current names`=n_distinct(current_species), .groups = "drop") 

species level,establishment,records,original names,current names
<lgl>,<chr>,<int>,<int>,<int>
False,"Alive in NSW, Native",68,68,68
False,Introduced,2,2,2
True,"Alive in NSW, Native",812,812,806
True,Introduced,84,84,81


And we can do the same for the species with at least one record from existing sources in the list of references and authors:

In [15]:
spp_traits_table |>
    mutate(
        `existing sources` = (disp1 +
                           germ1 + germ8 + 
                           rect2 + 
                           grow1 + 
                           repr2 + repr3a + repr3 + repr4 +  
                           surv1 + surv4 + surv5 + surv6 + surv7) > 0
    ) |>
    filter(`existing sources`>0) |>
    group_by(`species level`= taxonrank %in% "Species",establishment) |>
    summarise(records=n(), `original names` = n_distinct(spp), `current names`=n_distinct(current_species), .groups = "drop") 

species level,establishment,records,original names,current names
<lgl>,<chr>,<int>,<int>,<int>
False,"Alive in NSW, Native",1019,1019,844
False,"Extinct in NSW, Native",7,7,4
False,Introduced,59,59,46
False,Not Known from NSW,1,1,1
True,"Alive in NSW, Native",5854,5854,5509
True,"Extinct in NSW, Native",22,22,20
True,Introduced,696,696,669
True,Not Known from NSW,17,17,16


If we focus only on the taxa at the species level which are native and alive in NSW, we can look at the overlap between both field work data and existing sources:

In [66]:
spp_traits_table |> 
    filter(
           taxonrank %in% "Species",
           establishment %in% "Alive in NSW, Native") |>
    mutate(
     fieldwork_sources =  nquadrat>0,
     literature_sources = (disp1 +
                           germ1 + germ8 + 
                           rect2 + 
                           grow1 + 
                           repr2 + repr3a + repr3 + repr4 +  
                           surv1 + surv4 + surv5 + surv6 + surv7) > 0
    ) |> 
    group_by(fieldwork_sources, literature_sources) |>
    summarise(total = n_distinct(scientific_name), total_current = n_distinct(current_species), .groups = "drop")

fieldwork_sources,literature_sources,total,total_current
<lgl>,<lgl>,<int>,<int>
False,False,4574,3844
False,True,5129,4843
True,False,87,87
True,True,724,723


In [77]:
total_records <- spp_traits_table |> 
    filter(
           taxonrank %in% "Species") |>
    mutate(
     any_sources = (nquadrat +
                           disp1 +
                           germ1 + germ8 + 
                           rect2 + 
                           grow1 + 
                           repr2 + repr3a + repr3 + repr4 +  
                           surv1 + surv4 + surv5 + surv6 + surv7) > 0
    ) |>
    filter(any_sources>0) |>
    summarise(total_current=n_distinct(current_species))

In [78]:
total_records

total_current
<int>
6287


In [24]:
total_records <- sptraits |> 
    filter(
           taxonrank %in% "Species") |>
    filter(fielddata | litdata) |>
    summarise(total_current=n_distinct(current_species))

In [25]:
total_records

total_current
<int>
6287


## Comparing existing trait records from different sources

These sources are mentioned in the primary source column, but bibliographic details are incomplete or missing from our database:

In [17]:
traits_table |> 
    filter(!primary_source %in% references$ref_code) |>
    distinct(primary_source) |> arrange() |> pull()


In [18]:
traits_table |>
    filter(!primary_source %in% 'Kenny Orscheg Tasker Gill Bradstock 2014', # this is the same as NSWFFRDv2.1
           primary_source %in% references$ref_code) |> # exclude transcription errors
    group_by(main_source) |>
    summarise(
        total=n(), 
        records=n_distinct(rid), 
        species=n_distinct(current_species), 
        sources=n_distinct(primary_source))

main_source,total,records,species,sources
<chr>,<int>,<int>,<int>,<int>
NSWFFRDv2.1,12175,10391,2860,206
austraits-6.0.0,38532,37883,6848,102


In [75]:
summary_per_source <- traits_table |> 
    filter(taxonrank %in% "Species") |>
    group_by(main_source) |>
    summarise(
        records=n_distinct(rid),
        traits = n_distinct(traitcode),
        species=n_distinct(current_species), 
        sources=n_distinct(primary_source))
NSWFFRD_records <- filter(summary_per_source, main_source %in% 'NSWFFRDv2.1')
Austraits_records <- filter(summary_per_source, main_source %in% 'austraits-6.0.0')

In [76]:
summary_per_source

main_source,records,traits,species,sources
<chr>,<int>,<int>,<int>,<int>
NSWFFRDv2.1,9486,12,2549,203
austraits-6.0.0,39820,5,6173,153


## field work data

In [51]:
summary_per_survey <- quadrat_samples |> 
    filter(!is.na(species_code),
           taxonrank %in% "Species") |>
    group_by(survey_group=survey_name %in% "Mallee Woodlands") |>
    summarise(
        locations = n_distinct(visit_id),
        visits  = n_distinct(visit_id, visit_date),
              samples  = n_distinct(visit_id, visit_date, sample_nr),
              records = n(), 
              species = n_distinct(species),
              current_species = n_distinct(current_species), 
              codes = n_distinct(species_code)) |> 
    arrange(survey_group)
Postfire_samples <- filter(summary_per_survey, !survey_group)
Mallee_samples <- filter(summary_per_survey, survey_group)

In [52]:
summary_per_survey

survey_group,locations,visits,samples,records,species,current_species,codes
<lgl>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
False,85,85,714,8936,773,765,773
True,61,85,510,7054,129,129,129


Summary of data inputs

In [74]:
new_surveys <- quadrat_samples |> filter(!survey_name %in% c("Mallee Woodlands"))
old_surveys <- quadrat_samples |> filter(survey_name %in% c("Mallee Woodlands"))
total_spp <- pull(total_records,total_current)

tbl = sprintf("
| Type	| Unit of observation	| Spatial information	| Number of records	| Number of taxa (including non-native)	| Data source |
|---|---|---|---|---|---|
| **Primary Observations** |
| Post-fire field surveys | Individual | %s sites | %s | %s  | East coast post-fire surveys 2020-2022 |
| Time series field observations | Individual | %s sites | %s | %s | Mallee vegetation dynamics 2007-2018 [@Keith_Tozer_2012] |
| **Compilations** |
| Fire response 	| Species	| Not applicable	| %s	| %s	 | %s sources compiled in NSW plant fire response database [@Kenny2014] | 
|Species traits	| Individuals / Populations / Species	| Variable |	%s	| %s	 | %s sources compiled in AusTraits plant database [@Falster2021] |
| Total | |  |  | %s  |
", 
              Postfire_samples$locations, Postfire_samples$records, Postfire_samples$current_species,
            Mallee_samples$locations, Mallee_samples$records, Mallee_samples$current_species,
              NSWFFRD_records$records, NSWFFRD_records$species, NSWFFRD_records$sources,
              Austraits_records$records, Austraits_records$species, Austraits_records$sources,
               total_spp
             )

display_markdown(tbl)



| Type	| Unit of observation	| Spatial information	| Number of records	| Number of taxa (including non-native)	| Data source |
|---|---|---|---|---|---|
| **Primary Observations** |
| Post-fire field surveys | Individual | 85 sites | 8936 | 765  | East coast post-fire surveys 2020-2022 |
| Time series field observations | Individual | 61 sites | 7054 | 129 | Mallee vegetation dynamics 2007-2018 [@Keith_Tozer_2012] |
| **Compilations** |
| Fire response 	| Species	| Not applicable	| 9130	| 2325	 | 203 sources compiled in NSW plant fire response database [@Kenny2014] | 
|Species traits	| Individuals / Populations / Species	| Variable |	37555	| 5472	 | 142 sources compiled in AusTraits plant database [@Falster2021] |
| Total | |  |  | 6287  |


## That is it for now!

✅ Job done! 😎👌🔥

You can:
- go [back home](../Instructions-and-workflow.ipynb),
- continue navigating the repo on [GitHub](https://github.com/ces-unsw-edu-au/fireveg-db-exports)
- continue exploring the repo on [OSF](https://osf.io/h96q2/).
- visit the database at <http://fireecologyplants.net>