# Fireveg DB imports -- import entries from manual curation

Author: [José R. Ferrer-Paris](https://github.com/jrfep) and Renee Woodward

Date: July 2024, updated 19 August 2024

This Jupyter Notebook includes [Python](https://www.python.org) code to populate fire ecology traits for plants in the Fireveg database.

This code show how to read a spreadsheet with records created or edited manually from one of our contributors.

**Please note:**
<div class="alert alert-warning">
    This repository contains code that is intended for internal project management and is documented for the sake of reproducibility.<br/>
    🛂 Only users contributing directly to the project have access to the credentials for data download/upload. 
</div>

## Set-up
### Load modules

In [5]:
import openpyxl
from pathlib import Path
import os, sys
from datetime import datetime
from configparser import ConfigParser
import psycopg2
from psycopg2.extras import DictCursor
from psycopg2.extensions import AsIs
import pandas as pd
import numpy as np

# Pyprojroot for easier handling of working directory
import pyprojroot

### Define paths for input and output

Define project directory using the `pyprojroot` functions, and add this to the execution path.

In [6]:
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))
sys.path.append(str(repodir))

Path to the folder with the form for input of records:

In [7]:
inputdir = repodir / "data" / "input-form"

In [10]:
os.listdir(inputdir)

['Species traits_Blue table_RFW_ 20220505.xlsx',
 'fireveg-trait-input-model.xlsx']

### Load own functions
Load functions from `lib` folder, we will use a function to read db credentials and one for executing insert/update queries in the database.

In [16]:
from lib.parseparams import read_dbparams
from lib.firevegdb import dbquery, batch_upsert


### Database credentials

🤫 We use a folder named "secrets" to keep the credentials for connection to different services (database credentials, API keys, etc). This checked this folder in our `.gitignore` so that its content are not tracked by git and not exposed. Future users need to copy the contents of this folder manually.

We read database credentials stored in a `database.ini` file using our own `read_dbparams` function.

In [9]:
dbparams = read_dbparams(repodir / 'secrets' / 'database.ini', 
                         section='fireveg-db-v1.1')

## Read data entry form with pandas

In [11]:
newdata = pd.read_excel(inputdir / 'Species traits_Blue table_RFW_ 20220505.xlsx', sheet_name='Data entry')
contributor = pd.read_excel(inputdir / 'Species traits_Blue table_RFW_ 20220505.xlsx', sheet_name='Contributor')

In [12]:
contributor.shape[0]

3

In [13]:
newdata.head()

Unnamed: 0,Main source,Original sources,Original species name,Species code,Species name,Trait code,Trait name,Trait type,Raw value,Norm value,Best,Lower,Upper,Method of estimation,Notes
0,NSWFFRDv2.1,Auld 1987,Acacia suaveolens,3881,Acacia suaveolens,surv6,Seedbank half-life,numerical,hl 10.7 y,,10.7,,,,No vocabularies for MoE
1,NSWFFRDv2.1,Auld Keith Bradstock 2000,Conospermum taxifolium,5352,Conospermum taxifolium,surv6,Seedbank half-life,numerical,hl 2,,2.0,,,,No vocabularies for MoE
2,NSWFFRDv2.1,Auld Scott 1997,Darwinia biflora,4024,Darwinia biflora,surv6,Seedbank half-life,numerical,hl 0.9,,0.9,,,,No vocabularies for MoE
3,NSWFFRDv2.1,Auld Scott 1997,Grevillea caleyi,5365,Grevillea caleyi,surv6,Seedbank half-life,numerical,hl 7.6,,7.6,,,,No vocabularies for MoE
4,NSWFFRDv2.1,Auld Keith Bradstock 2000,Grevillea linearifolia,5381,Grevillea linearifolia,surv6,Seedbank half-life,numerical,hl 9-10,,,9.0,10.0,,No vocabularies for MoE


Read information from each row and create records for importing into the database:

In [14]:
records=dict()
for row in newdata.to_dict(orient='records'):
    trait = row['Trait code']
    if trait not in records.keys():
        records[trait]=list()
    ttype = row['Trait type']
    record=dict()
    notes=list()
    if contributor.shape[0]>0:
        contribdata = [x for x in contributor['Your response'].values.tolist() if pd.isnull(x) == False]
        notes.append('Data entry by')
        notes.extend(contribdata)
    
    record['species']=row['Species name']
    if row['Species name']!=row['Original species name']:
        notes.append('Original species name')
        notes.append(record['Original species name'])
    if not pd.isnull(row['Notes']):
        notes.append(row['Notes'])
    for k in ('Main source','Species code',):
        if not pd.isnull(row[k]):
            record[k.lower().replace(' ','_')] = row[k]
    for k in ('Original sources','Raw value'):
        if not pd.isnull(row[k]):
            record[k.lower().replace(' ','_')] = [row[k]]
    if ttype == 'numerical':
        for k in ('Best','Lower', 'Upper'):
            if not pd.isnull(row[k]):
                record[k.lower()] = row[k]
    elif ttype == 'categorical':
        for k in ('Norm value',):
            if not pd.isnull(row[k]):
                record[k.lower().replace(' ','_')] = row[k]
    if len(notes)>0:
        record['original_notes']=notes
    records[trait].append(record)


In [15]:
print(records.keys())

records['repr3a']

dict_keys(['surv6', 'repr3', 'repr4', 'surv1', 'surv5', 'repr3a'])


[{'species': 'Acacia melanoxylon',
  'main_source': 'NSWFFRDv2.1',
  'species_code': 3824,
  'original_sources': ['Wark 1997'],
  'raw_value': ['Secondary juvenile period ->3<10'],
  'lower': 3.0,
  'upper': 10.0,
  'original_notes': ['Data entry by', 'Renee Woodward']},
 {'species': 'Lambertia formosa',
  'main_source': 'NSWFFRDv2.1',
  'species_code': 5440,
  'original_sources': ['Pyke 1983'],
  'raw_value': ['Secondary juvenile period -peak flowering at 2-3 y post-fire'],
  'best': 2.0,
  'upper': 3.0,
  'original_notes': ['Data entry by', 'Renee Woodward']}]

In [18]:
for traitname in records.keys():
    qrystr="""SELECT count(*) 
    FROM litrev.{};""".format(traitname)
    res = dbquery(qrystr, dbparams)
    nrecords=list(res[0])[0]
    print("Table litrev.{} with {} records".format(traitname,nrecords))

Table litrev.surv6 with 0 records
Table litrev.repr3 with 838 records
Table litrev.repr4 with 0 records
Table litrev.surv1 with 41459 records
Table litrev.surv5 with 1262 records
Table litrev.repr3a with 662 records


In [19]:
for traitname in records.keys():
    batch_upsert(dbparams, 
                 table='litrev.'+traitname,
                 records=records[traitname], 
                 keycol=['ref_code',], 
                 idx=None,
                execute = True)

Connecting to the PostgreSQL database...
7 rows updated
Database connection closed.
Connecting to the PostgreSQL database...
98 rows updated
Database connection closed.
Connecting to the PostgreSQL database...
23 rows updated
Database connection closed.
Connecting to the PostgreSQL database...
2 rows updated
Database connection closed.
Connecting to the PostgreSQL database...
1 rows updated
Database connection closed.
Connecting to the PostgreSQL database...
2 rows updated
Database connection closed.


In [20]:
for traitname in records.keys():
    qrystr="""SELECT count(*) 
    FROM litrev.{};""".format(traitname)
    res = dbquery(qrystr, dbparams)
    nrecords=list(res[0])[0]
    print("Table litrev.{} with {} records".format(traitname,nrecords))

Table litrev.surv6 with 7 records
Table litrev.repr3 with 936 records
Table litrev.repr4 with 23 records
Table litrev.surv1 with 41461 records
Table litrev.surv5 with 1263 records
Table litrev.repr3a with 664 records


## That is it for now!

✅ Job done! 😎👌🔥

You can:
- go [back home](../Instructions-and-workflow.ipynb),
- continue navigating the repo on [GitHub](https://github.com/ces-unsw-edu-au/fireveg-db-exports)
- continue exploring the repo on [OSF](https://osf.io/h96q2/).
- visit the database at <http://fireecologyplants.net>