# Fireveg DB imports -- import trait data from Austraits

Author: [José R. Ferrer-Paris](https://github.com/jrfep)

Date: July 2024, updated 19 August 2024

This Jupyter Notebook includes [Python](https://www.python.org) code to populate fire ecology traits for plants in the Fireveg database. 

We will download data from [AusTraits](https://austraits.org/) and add entries to the database for several traits and for each species.

**Please note:**
<div class="alert alert-warning">
    This repository contains code that is intended for internal project management and is documented for the sake of reproducibility.<br/>
    🛂 Only users contributing directly to the project have access to the credentials for data download/upload. 
</div>

## Set-up
### Load modules

We are using Python for this. Start your session and load the packages.

In [1]:
# work with paths in operating system
from pathlib import Path
import os, sys

import json
import urllib
from zipfile import ZipFile
import pandas as pd
import numpy as np
from pybtex.database.input import bibtex
import yaml
import psycopg2

# Pyprojroot for easier handling of working directory
import pyprojroot

### Define paths for input and output

Define project directory using the `pyprojroot` functions, and add this to the execution path.

In [2]:
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))
sys.path.append(str(repodir))

Path to the folder with the downloaded data:

In [3]:
inputdir = repodir / "data" 

### Load own functions
Load functions from `lib` folder, we will use a function to read db credentials, one for executing database queries and three functions for extracting data from the reference description string

In [4]:
from lib.parseparams import read_dbparams
from lib.firevegdb import dbquery, batch_upsert
import lib.austraits_util as aust

### Database credentials

🤫 We use a folder named "secrets" to keep the credentials for connection to different services (database credentials, API keys, etc). This checked this folder in our `.gitignore` so that its content are not tracked by git and not exposed. Future users need to copy the contents of this folder manually.

We read database credentials stored in a `database.ini` file using our own `read_dbparams` function.

In [5]:
dbparams = read_dbparams(repodir / 'secrets' / 'database.ini', 
                         section='fireveg-db-v1.1')

## Read data

### Read reference taxonomic data

We will also read the updated species data from BioNET:

In [6]:
BIONET = pd.read_excel(inputdir / "vis-survey-datasheet-6000.PowerQuery.20210708.xlsx")

### Read _austraits_ data 
We downloadrf the file from the [Zenodo repository](https://zenodo.org/record/5112001) using the API url and saving this under the data folder. We used the version 3.0.2 before, let's try version 6.0.0

We will read from the zipfile the data that we need:

In [7]:
zfobj = ZipFile(inputdir / "austraits" / "austraits-6.0.0.zip" ) 
zfobj.namelist()

['austraits-6.0.0/',
 'austraits-6.0.0/schema.yml',
 'austraits-6.0.0/taxa.csv',
 'austraits-6.0.0/methods.csv',
 'austraits-6.0.0/definitions.yml',
 'austraits-6.0.0/build_info.md',
 'austraits-6.0.0/contributors.csv',
 'austraits-6.0.0/contexts.csv',
 'austraits-6.0.0/excluded_data.csv',
 'austraits-6.0.0/locations.csv',
 'austraits-6.0.0/traits.csv',
 'austraits-6.0.0/taxonomic_updates.csv',
 'austraits-6.0.0/metadata.yml',
 'austraits-6.0.0/sources.bib']

We will need to read the files with the definitions (in _yaml_ format), the sources or references (in _bibtex_ format) and the traits and taxonomic data (in _csv_ format)

In [8]:
with zfobj.open('austraits-6.0.0/definitions.yml') as file:
    try:
        ATdefinitions = yaml.safe_load(file)   
    except yaml.YAMLError as exc:
        print(exc)

Here we parse the bibliography file, and check functions to extract reference info and reference label.

In [9]:
parser = bibtex.Parser()
ATrefs = parser.parse_bytes(zfobj.open('austraits-6.0.0/sources.bib').read())

Now the trait and taxonomic data

In [10]:
ATtraits = pd.read_csv(zfobj.open('austraits-6.0.0/traits.csv'),
                       low_memory=False,
                    encoding="ISO-8859-1")

## Import fire response data

The Austrait variable used to be called 'fire_response' in version 3.0.2, and is now called 'resprouting_capacity' in version 6.0.0 

In [11]:

fireveg_defs={
    'germ1':[{
        'austrait_name': 'seedbank_location',
        'matched_values': {
            "soil_seedbank": "Soil-persistent",
            'canopy_seedbank': 'Canopy',
            "canopy_seedbank_absent soil_seedbank": "Soil-persistent",
            "canopy_seedbank_absent": "Non-canopy",
            "canopy_seedbank soil_seedbank_absent": "Canopy",
            "none": None,
            "soil_seedbank_absent": "Transient",
            "canopy_seedbank_absent soil_seedbank_absent": "Non-canopy",
            "canopy_seedbank soil_seedbank": "Transient"
        }
    },],
    'repr2':[{
        'austrait_name': 'post_fire_flowering',
        'matched_values':{
            "fire_dependent_flowering": "Exclusive",
            'fire_enhanced_flowering': 'Facultative',
            "fire_independent_flowering": "Negligible",
            "fire_suppressed_flowering": "Negligible",
            "fire_dependent_flowering fire_independent_flowering": "Facultative",
            "fire_dependent_flowering fire_enhanced_flowering": "Facultative",
            "fire_enhanced_flowering fire_suppressed_flowering": "Facultative"
        }
    },],
    'surv1':[{
        'austrait_name': 'resprouting_capacity',
        'matched_values':{
            "fire_killed": "None",
            'fire_killed resprouts': 'Half',
            "resprouts": "All",
            "partial_resprouting": "Half",
            "fire_killed partial_resprouting": "Half",
            "partial_resprouting resprouts": "Half",
            "fire_killed partial_resprouting resprouts": "Half"
        }
    },],
    'disp1':[
        {
            'austrait_name': 'dispersal_appendage',
            'matched_values':{
                "aril": "ant",
                "awns": "animal-cohesion",
                "awn_bristle": "animal-cohesion",
                "barbs": "animal-cohesion",
                "beak": "animal-cohesion",
                "berry": "animal-ingestion",
                "caruncle": "animal-cohesion",
                "curved_awn": "animal-cohesion",
                "drupe": "animal-ingestion",
                "elaiosome": "ant",
                "glumes": "wind-hairs",
                "plumose": "wind-hairs",
                "pseudo-wing": "wind-wing",
                "receptacle": "wind-wing",
                "seed_airsac": "wind-wing",
                "seed_unilaterally_winged": "wind-wing",
                "seed_wing_obsolete": "wind-wing",
                "winged_fruit": "wind-wing",
                "wings": "wind-wing",
                "wings_small": "wind-wing",
                "floating seed": "water"}
        },
        {
            'austrait_name': 'dispersers',
            'matched_values':{
                "ants": "ant", 
                "bats": "animal-unspec.", 
                "birds": "animal-unspec.", 
                "cassowary": "animal-unspec.", 
                "flying": "animal-unspec.", 
                "flying_foxes": "animal-unspec.", 
                "mammals": "animal-unspec.", 
                "non-flying": "animal-unspec.", 
                "rodents": "animal-unspec.", 
                "vertebrate": "animal-unspec.",
                "vertebrates": "animal-unspec.", 
                "invertebrates": "animal-unspec.", 
                "wind": "wind-unspec.", 
                "water": "water", 
                }
        },
        {
            'austrait_name': 'dispersal_syndrome',
            'matched_values':{
                "adhesion": "animal-cohesion",
                "anemochory": "wind-unspec.",
                "animal_vector": "animal-unspec.", 
                "aril": "ant",
                "ballistic": "ballistic", 
                "bird": "animal-unspec.", 
                "dispersal_rare": "passive",
                "dyszoochory": "animal-ingestion",
                "elaiosome": "ant",
                "endozoochory": "animal-ingestion",
                "endozoochory_mammal": "animal-ingestion",
                "endozoochory_bird": "animal-ingestion",
                "exozoochory": "animal-cohesion",
                "epizoochory": "animal-cohesion",
                "exozoochory_mammal": "animal-cohesion",
                "exozoochory_bird": "animal-cohesion",
                "gravity":"passive",
                "hydrochory":"water",
                "insect": "ant",
                "invertebrate_insect": "ant", 
                "mammal": "animal-unspec.", 
                "myrmecochory": "ant", 
                "nautohydrochory": "water", 
                "ombrohydrochory": "water",
                "synzoochory": "animal-unspec.",  
                "unassisted": "passive",
                "vertebrate": "animal-unspec.", 
                "water": "water",
                "wind": "wind-unspec.", 
                "zoochory": "animal-unspec.",
            }
        },
    ],
    'germ8':[
        {
            'austrait_name': 'seed_dormancy_class',
            'matched_values':{
                "non_dormant": "ND",
                'physiological_dormancy': "PD",
                'morphophysiological_dormancy': 'MPD', 
                'physical_dormancy': 'PY'
            }
        },]
}


In [12]:
len(fireveg_defs['disp1'])

3

First upload a reference record for the Austraits data

In [13]:
austraits = [{'ref_code': 'austraits-6.0.0',
             'alt_code': 'austraits-6.0.0',
             'ref_cite': 'Falster, D., Gallagher, R., Wenk, E., & Sauquet, H. (2024). AusTraits: a curated plant trait database for the Australian flora [Data set]. In Scientific Data (v6.0.0, Vol. 8, p. 254). Zenodo. https://doi.org/10.5281/zenodo.11188867'
            },]

batch_upsert(dbparams, 
             table='litrev.ref_list',
             records=austraits, 
             keycol=['ref_code',], 
             idx=None,
            execute = True)

Connecting to the PostgreSQL database...
0 rows updated
Database connection closed.


Now we will split the large data dataframe into smaller dataframes contained in a list.

In [14]:
ATtraits.fillna("nan",inplace=True)
qrystr="""
SELECT count(*) 
FROM litrev.{} 
WHERE main_source = 'austraits-6.0.0';
"""
connstr='Connecting to the PostgreSQL database to update trait %s from %s'

For each one of these data frames, we will extract a list of references and a list of records, and then upload all this information into the database.

In [15]:
for trait, austvals in fireveg_defs.items():
    for vals in austvals: 
        ss = (ATtraits['trait_name']==vals['austrait_name'])
        df = ATtraits[ss]
        n = 500  #chunk row size
        list_df = [df[i:i+n] for i in range(0,df.shape[0],n)]
    
        db_conn = psycopg2.connect(**dbparams)
        print(connstr % (trait,vals['austrait_name']))
        qry = qrystr.format(trait)
        res = dbquery(qry, dbparams, useconn=db_conn)
        nrecords=list(res[0])[0]
        if int(nrecords)>=df.shape[0]:
            print("Already %s records in the database, will skip this." % nrecords)
        else:
            for target in list_df:    
                reflist=list()
                records=list()
                refrecords=list()
                for idx, row in target.iterrows():
                    record=aust.create_record(row, ATrefs, vals['matched_values'], BIONET)
                    refids=[row['dataset_id'],]
                    if row['source_id'] != "nan":
                        srcids = [x.strip() for x in row['source_id'].split(',')]
                        refids.extend(srcids)
                    for refid in refids:       
                        if refid not in reflist:
                            reflist.append(refid)
                    records.append(record)
                for refid in reflist:
                    if refid in list(ATrefs.entries.keys()):
                        refrecords.append({'ref_code': aust.extract_reflabel(ATrefs,refid),
                                 'alt_code': refid,
                                 'ref_cite': aust.extract_refinfo(ATrefs, refid)})
                batch_upsert(dbparams, 
                         table='litrev.ref_list',
                         records=refrecords, 
                         keycol=['ref_code',], 
                         idx=None,
                         execute = True, 
                         useconn=db_conn)
                batch_upsert(dbparams, 
                         table="litrev."+trait,
                         records=records, 
                         keycol=['ref_code',], 
                         idx=None,
                         execute = True, 
                         useconn=db_conn)    
        db_conn.close()
        print('Database connection closed.')

Connecting to the PostgreSQL database to update trait germ1 from seedbank_location
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
164 rows updated
Database connection closed.
Connecting to the PostgreSQL database to update trait repr2 from post_fire_flowering
0 rows updated
431 rows updated
Database connection closed.
Connecting to the PostgreSQL database to update trait surv1 from resprouting_capacity
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500 rows updated
0 rows updated
500

Let's check the number of references and trait records in the database:

In [16]:
dbquery("select count(*) from litrev.ref_list", dbparams)

[[347]]

In [17]:
for trait in fireveg_defs.keys():
    qrystr="""SELECT count(*) 
    FROM litrev.{} 
    WHERE main_source = 'austraits-6.0.0';""".format(trait)
    res = dbquery(qrystr, dbparams)
    nrecords=list(res[0])[0]
    print("Table litrev.{} with {} records".format(trait,nrecords))


Table litrev.germ1 with 4164 records
Table litrev.repr2 with 431 records
Table litrev.surv1 with 29896 records
Table litrev.disp1 with 34591 records
Table litrev.germ8 with 4171 records


## That is it for now!

✅ Job done! 😎👌🔥

You can:
- go [back home](../Instructions-and-workflow.ipynb),
- continue navigating the repo on [GitHub](https://github.com/ces-unsw-edu-au/fireveg-db-exports)
- continue exploring the repo on [OSF](https://osf.io/h96q2/).
- visit the database at <http://fireecologyplants.net>

## References

> Falster, Gallagher et al (2021) AusTraits, a curated plant trait database for the Australian flora. Scientific Data 8: 254, <https://doi.org/10.1038/s41597-021-01006-6>