# Fireveg DB imports -- categorical data from NSWFFRD 2014 (v2.1)

Author: [José R. Ferrer-Paris](https://github.com/jrfep)

Date: July 2024, updated 19 August 2024

This Jupyter Notebook includes [Python](https://www.python.org) code to populate one of the fire ecology traits for plants in the Fireveg database. 

This code show how to read the spreadsheet from **NSW Flora Fire response database** and extract information for several traits, translate the original values into standard values and insert records into the Fireveg response database.


**Please note:**
<div class="alert alert-warning">
    This repository contains code that is intended for internal project management and is documented for the sake of reproducibility.<br/>
    🛂 Only users contributing directly to the project have access to the credentials for data download/upload. 
</div>

## Set-up 

### Load libraries


In [1]:
import openpyxl
from pathlib import Path
import os, sys
import re
import copy
import psycopg2

# Pyprojroot for easier handling of working directory
import pyprojroot

### Define paths for input and output

Define project directory using the `pyprojroot` functions, and add this to the execution path.

In [2]:
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))
sys.path.append(str(repodir))

Path to the folder with the downloaded data:

In [3]:
inputdir = repodir / "data" 

### Load own functions
Load functions from `lib` folder, we will use a function to read db credentials, one for executing database queries and three functions for extracting data from the reference description string

In [4]:
from lib.parseparams import read_dbparams
from lib.firevegdb import dbquery, batch_upsert
import lib.nswfireflora_util as nswff

### Database credentials

🤫 We use a folder named "secrets" to keep the credentials for connection to different services (database credentials, API keys, etc). This checked this folder in our `.gitignore` so that its content are not tracked by git and not exposed. Future users need to copy the contents of this folder manually.

We read database credentials stored in a `database.ini` file using our own `read_dbparams` function.

In [5]:
dbparams = read_dbparams(repodir / 'secrets' / 'database.ini', 
                         section='fireveg-db-v1.1')

## Open the workbook and read spreadsheets
We will use the _openpyxl_ library to read the spreadsheet document.
Here we will load the workbook (_wb_):

In [6]:
wb = openpyxl.load_workbook(inputdir / "NSWFFRDv2.1.xlsx")

We will use the sheet names to read them. We need access to sheet 'Species data' and 'References', we will also check their column notes:

In [7]:
species_data = wb['SpeciesData']
references = wb['References']
column_notes = wb['Notes'] 

In [8]:
sp_col='A'
spcode_col='B'


### Create list(s) of references 
We need to prepare list of references from spreadsheet 'References'.

There are three sets of references:
- the  "normal" references in columns C and D (pink)
- the  "Recovery Plan / Regional Forest Agreement Report" references in columns N, O, and P (blue)
- the  "NFRR" references in columns S and T (lila)

Normal and NFRR references are identified by a simple two-cipher or -letter code and reference description, we will use a function to create a more descriptive reference code for the references based on the list of authors and date.

For Recovery plans and Regional Forest Agreement Reports, we will use the species or region as reference code.


In [9]:
NFRR_refs=list()
for row in range(1,66):
    cite_text = references['T'][row].value.replace("(1) ","")
    cite_code = nswff.create_ref_code(cite_text) 
    record={"refcode": references['S'][row].value.replace("1","I"),
            "refstring": cite_code,#re.sub(r", [A-Z\.]+"," ",cite_code),
            "refinfo": cite_text
    }
    NFRR_refs.append(record)

In [10]:
other_refs=list()
for row in range(1,139):
    cite_text = references['D'][row].value
    cite_code = nswff.create_ref_code(cite_text) 
    if cite_code == "Benson 1985":
        cite_code = "Benson 1985b"
    record={"refcode": references['C'][row].value,
            "refstring": cite_code,
            "refinfo": cite_text
    }
    other_refs.append(record)

In [11]:
rp_refs=list()
for row in range(1,46):
    cite_code = nswff.create_ref_code_RP(references['O'][row].value) 
    cite_text = "%s. %s" % (cite_code, references['P'][row].value)
    record={"refcode": references['N'][row].value,
            "refstring": cite_code,
            "refinfo": cite_text
    }
    rp_refs.append(record)

## Format records for input in database

We will use custom functions to take each species (row) from the spreadsheet and add records for the trait tables in the database. 


### Add list of references

First upload a record for the main source, the NSWFFRDv2.1 reference:

In [12]:
main_ref = [{'ref_code': "NSWFFRDv2.1",
             'ref_cite': "NSW Flora Fire Response Database. Version 2.1. February 2010 (last update May 2014)"
            },]

batch_upsert(dbparams, 
             table='litrev.ref_list',
             records=main_ref, 
             keycol=['ref_code',], 
             idx=None,
            execute = True)

Connecting to the PostgreSQL database...
0 rows updated
Database connection closed.


Now we will add references from the list we read before (`NFRR_refs`, `other_refs` and `rp_refs`). We will use the first 50 letters from the reference description as a `ref_code` (we will be able to update that later to something more meaningful in the database), and create an `alt_code` to identify the origin of the reference.

In [13]:
reflist = list()
for item in NFRR_refs:
    ref={
        'ref_code': item['refstring'],
        'ref_cite': item['refinfo'],
        'alt_code':'NSWFFRD-NFRR-ref-%s' % item['refcode']}
    reflist.append(ref)
for item in other_refs:
    ref={
        'ref_code': item['refstring'],
        'ref_cite': item['refinfo'],
        'alt_code':'NSWFFRD-other-ref-%s' % item['refcode']}
    reflist.append(ref)
for item in rp_refs:
    ref={
        'ref_code': item['refstring'],
        'ref_cite': item['refinfo'],
        'alt_code':'NSWFFRD-RP-ref-%s' % item['refcode']}
    reflist.append(ref)

In [14]:
len(reflist)

248

In [15]:
batch_upsert(dbparams, 
             table='litrev.ref_list',
             records=reflist, 
             keycol=['ref_code',], 
             idx=None,
            execute = True)

Connecting to the PostgreSQL database...
0 rows updated
Database connection closed.


### Importing categorical traits from NSWFFRDv2.1

We will create one record per species, using "NSWFFRDv2.1" as _main reference_, adding the reported references in the _original sources_ column.

We will use the functions declared above to read row values and hyperlinks to create one or multiple records from each entry.

In [16]:
switcher={
    "repr2":{
        "facultative": "Facultative",
        "yes": "Facultative",
        "yes?": "Facultative",
        "most profuse after fire": "Facultative",
        "exclusive": "Exclusive",
        "exclusive?": "Exclusive",
        "negligible": "Negligible"
    },
    "rect2":{
        "I":"Intolerant",
        "T":"Tolerant",
        "R":"Requiring",
        "T R":"Tolerant-Requiring",
        "I T":"Intolerant-Tolerant",
        "T I":"Intolerant-Tolerant"
    },
    "germ1":{
        'canopy': 'Canopy',
        
        'persistent soil': 'Soil-persistent', 
        'persistent': 'Soil-persistent', 
        'peristent': 'Soil-persistent', 
        'soil': 'Soil-persistent', 
        
        'transient': 'Transient', 
        'none':'Transient', 
        'shed at maturity': 'Transient', 
        'viviparous':'Transient', 
        'canopy / released at maturity':'Transient', 
        'canopy / regularly without fire':'Transient', 
        'canopy - transient':'Transient', 
        'transient': 'Transient', 
        
        'serotinous canopy': 'Canopy',
        'non-canopy': 'Non-canopy',
        'not canopy': 'Non-canopy',
        
        'other': 'Other'
    },
     "surv4":{
        'epicormic': 'Epicormic', 
        'stem buds': 'Epicormic', 
        'apical': 'Apical', 
        'lignotuber': 'Lignotuber',
        'root stock': 'Lignotuber',
        'rootstock': 'Lignotuber',
        'basal': 'Basal',
        'basal buds': 'Basal',
        'coppice': 'Basal',
        'tuber': 'Tuber',
        'taproot': 'Tuber',
        'tap root': 'Tuber',
        'tussock': 'Tussock',
        'rhizome': 'Long rhizome or root sucker',
        'rootucker': 'Long rhizome or root sucker',
        'rootuckers': 'Long rhizome or root sucker',
        'rootsuckers': 'Long rhizome or root sucker',
        'root buds': 'Long rhizome or root sucker',
        'root sucker': 'Long rhizome or root sucker',
        'root suckers': 'Long rhizome or root sucker',
        'rhizome': 'Short rhizome',
        'stolon': 'Stolon',
        'stolons': 'Stolon'
    }
}

Now we will read through the spreadsheet and prepare records

In [17]:
row_min = 2
row_max = species_data.max_row
## row_max = 10

target_cols={'germ1':'M', 'repr2':'X', 'rect2':'W', 'surv4':'L'}

for trait in target_cols.keys():
    if trait in ('surv4','germ1'):
        mysplitstring="&|;|,| or | and "
    else:
        mysplitstring="DO NOT SPLIT SENTENCE"
    
    print('Connecting to the PostgreSQL database to update values for %s' % trait)
    db_conn = psycopg2.connect(**dbparams)
    records=list()
    for row in range(row_min,row_max):
        rr = nswff.create_record(species_data,
                                 target_cols[trait],
                                 row,
                                 switcher[trait],
                                 references, other_refs, rp_refs, NFRR_refs,
                                 splitstring=mysplitstring)
        if rr is not None :
            records.extend(rr)
        if (((row-row_min) % 250) == 0 and len(records)>10) or (row==(row_max-1)):
            print("total of %s records prepared" % len(records)) 
            
            batch_upsert(dbparams, 
                 table='litrev.'+trait,
                 records=records, 
                 keycol=['ref_code',], 
                 idx=None,
                execute = True, 
                useconn=db_conn)
            
            records.clear()
    if db_conn is not None:
        db_conn.close()
        print('Database connection closed.') 


Connecting to the PostgreSQL database to update values for germ1
total of 193 records prepared
193 rows updated
total of 132 records prepared
132 rows updated
total of 104 records prepared
104 rows updated
total of 134 records prepared
134 rows updated
total of 165 records prepared
165 rows updated
total of 132 records prepared
132 rows updated
total of 132 records prepared
132 rows updated
total of 141 records prepared
141 rows updated
total of 124 records prepared
124 rows updated
total of 108 records prepared
108 rows updated
total of 109 records prepared
109 rows updated
total of 118 records prepared
118 rows updated
total of 43 records prepared
43 rows updated
Database connection closed.
Connecting to the PostgreSQL database to update values for repr2
total of 15 records prepared
15 rows updated
total of 13 records prepared
13 rows updated
total of 13 records prepared
13 rows updated
total of 17 records prepared
17 rows updated
total of 25 records prepared
25 rows updated
total of

In [18]:
print('Connecting to the PostgreSQL database...')
db_conn = psycopg2.connect(**dbparams)

Connecting to the PostgreSQL database...


This is somehow slow, but it works, and all the records are in the database.


#### Special case for `Fire response`

For `surv1` (Resprouting - full canopy scorch), we use a custom function that add more details to the records.

In [19]:
va_groups = wb['VA Groups']
reg_cats=list()
for row in range(3,13):
    record={"NFRRcode":va_groups['A'][row].value,
    "othercode":va_groups['B'][row].value,
     "category":va_groups['C'][row].value
    }
    reg_cats.append(record)

In [20]:
row_min = 2
row_max = species_data.max_row
##row_max = 20

print('Connecting to the PostgreSQL database to update values for surv1' )
db_conn = psycopg2.connect(**dbparams)

records=list()
for row in range(row_min,row_max):
    rr = nswff.read_rows_resprouting(species_data,row, reg_cats, NFRR_refs, other_refs)
    if rr is not None :
        records.extend(rr)
    if (((row-row_min) % 250) == 0 and len(records)>10) or (row==(row_max-1)):
        print("total of %s records prepared" % len(records))            
        batch_upsert(dbparams, 
             table='litrev.surv1',
             records=records, 
             keycol=['ref_code',], 
             idx=None,
            execute = True, 
            useconn=db_conn)
        
        records.clear()
if db_conn is not None:
    db_conn.close()
    print('Database connection closed.') 



Connecting to the PostgreSQL database to update values for surv1
total of 1034 records prepared
1034 rows updated
total of 936 records prepared
936 rows updated
total of 890 records prepared
890 rows updated
total of 951 records prepared
951 rows updated
total of 890 records prepared
890 rows updated
total of 885 records prepared
885 rows updated
total of 1071 records prepared
1071 rows updated
total of 1156 records prepared
1156 rows updated
total of 915 records prepared
915 rows updated
total of 845 records prepared
845 rows updated
total of 806 records prepared
806 rows updated
total of 860 records prepared
860 rows updated
empty row
total of 324 records prepared
324 rows updated
Database connection closed.


### Importing numeric traits
Read the spreadsheet from NSW Flora Fire response database and extract information for the time to first flowering after fire (primary and secondary juvenile periods for recruits and resprouters respectively)

In the case of time to first flowering we need to read data from columns _Z_ ('Primary juvenile period'), and _AA_ ('Secondary juvenile period').

We can use square brackets to refer to a column and then use python indices (starting with _0_ for the top row) to slice it. We use the property _value_ to show their stored content. 

In [21]:
sp_col='A'
spcode_col='B'
target_cols = {'repr3':'Z', 
               'repr3a':'AA', 
               'grow1':'AD', 
               'repr4':None, # Data entry by Renee Woodward
               'surv5':'AE', 
               'surv6':None, # Data entry by Renee Woodward
               'surv7':'AF'}

print("%s (%s) / %s / %s / %s / %s " %
(species_data[sp_col][1].value,
 species_data[spcode_col][1].value,
species_data[target_cols['repr3a']][1].value,
species_data[target_cols['grow1']][1].value,
species_data[target_cols['surv5']][1].value,
species_data[target_cols['surv7']][1].value))

Current Scientific Name (Species Code) / Secondary juvenile period / Fire tolerance / Life span / Seed-bank longevity 


In [22]:
row_min = 2
row_max = species_data.max_row
##row_max = 20

traits = target_cols.keys()
for trait in traits:
    if target_cols[trait] is None:
        continue
    print(trait)
    varname=species_data[target_cols[trait]][1].value
    print('Connecting to the PostgreSQL database to update values for %s' % trait)
    db_conn = psycopg2.connect(**dbparams)
    
    records=list()
    for row in range(row_min,row_max):
        rr = nswff.create_numeric_record(species_data,target_cols[trait],row,
                                  references, other_refs, rp_refs, NFRR_refs)
        if len(rr) > 0 :
            records.extend(rr)
        if (((row-row_min) % 250) == 0 and len(records)>10) or (row==(row_max-1)):
            print("total of %s records prepared" % len(records)) 
            batch_upsert(dbparams, 
                 table='litrev.'+trait,
                 records=records, 
                 keycol=['ref_code',], 
                 idx=None,
                execute = True, 
                useconn=db_conn)
            records.clear()
    if db_conn is not None:
        db_conn.close()
        print('Database connection closed.')


repr3
Connecting to the PostgreSQL database to update values for repr3
total of 76 records prepared
76 rows updated
total of 64 records prepared
64 rows updated
total of 74 records prepared
74 rows updated
total of 69 records prepared
69 rows updated
total of 50 records prepared
50 rows updated
total of 56 records prepared
56 rows updated
total of 86 records prepared
86 rows updated
total of 61 records prepared
61 rows updated
total of 87 records prepared
87 rows updated
total of 66 records prepared
66 rows updated
total of 66 records prepared
66 rows updated
total of 56 records prepared
56 rows updated
total of 27 records prepared
27 rows updated
Database connection closed.
repr3a
Connecting to the PostgreSQL database to update values for repr3a
total of 34 records prepared
34 rows updated
total of 66 records prepared
66 rows updated
total of 62 records prepared
62 rows updated
total of 54 records prepared
54 rows updated
total of 46 records prepared
46 rows updated
total of 43 record

## That is it for now!

✅ Job done! 😎👌🔥

You can:
- go [back home](../Instructions-and-workflow.ipynb),
- continue navigating the repo on [GitHub](https://github.com/ces-unsw-edu-au/fireveg-db-exports)
- continue exploring the repo on [OSF](https://osf.io/h96q2/).
- visit the database at <http://fireecologyplants.net>