# Fireveg DB imports -- define vocabularies

Author: [José R. Ferrer-Paris](https://github.com/jrfep)

Date: December 2024, updated 4 February 2025

This Jupyter Notebook includes [Python](https://www.python.org) code to create/refine a vocabulary file for the Fireveg database. 

**Please note:**
<div class="alert alert-warning">
    This repository contains code that is intended for internal project management and is documented for the sake of reproducibility.<br/>
    🛂 Only users contributing directly to the project have access to the credentials for data download/upload. 
</div>

## Set-up
### Load modules

We are using Python for this. Start your session and load the packages.

In [48]:
# work with paths in operating system
from pathlib import Path
import os, sys

#import json
#import urllib
#from zipfile import ZipFile
import pandas as pd
#import numpy as np
#from pybtex.database.input import bibtex
import yaml
import psycopg2

# Pyprojroot for easier handling of working directory
import pyprojroot

### Define paths for input and output

Define project directory using the `pyprojroot` functions, and add this to the execution path.

In [2]:
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))
sys.path.append(str(repodir))

Path to the folder with the downloaded data:

In [3]:
inputdir = repodir / "data" 

### Load own functions
Load functions from `lib` folder, we will use a function to read db credentials, one for executing database queries and three functions for extracting data from the reference description string

In [4]:
from lib.parseparams import read_dbparams
from lib.firevegdb import dbquery, batch_upsert
import lib.austraits_util as aust

### Database credentials

🤫 We use a folder named "secrets" to keep the credentials for connection to different services (database credentials, API keys, etc). This checked this folder in our `.gitignore` so that its content are not tracked by git and not exposed. Future users need to copy the contents of this folder manually.

We read database credentials stored in a `database.ini` file using our own `read_dbparams` function.

In [5]:
dbparams = read_dbparams(repodir / 'secrets' / 'database.ini', 
                         section='fireveg-db-v1.1')

## Read data from database

In [6]:
qry = """
SELECT code, name, description, value_type, life_stage,
    life_history_process, priority, category_vocabulary, method_vocabulary 
FROM litrev.trait_info 
ORDER BY code
"""
trait_info = dbquery(qry, dbparams)

In [8]:
traitnames=dict()
traitvocabs=dict()
methodvocabs=dict()
qrycat="SELECT pg_catalog.obj_description(t.oid, 'pg_type')::json from pg_type t where typname = '%s';"
qrynum="""
SELECT (
    SELECT pg_catalog.col_description(c.oid, cols.ordinal_position::int) 
    FROM pg_catalog.pg_class c 
    WHERE c.oid     = (
        SELECT CONCAT(cols.table_schema,'.',cols.table_name)::regclass::oid) 
        AND c.relname = cols.table_name)::json  as column_comment 
        FROM information_schema.columns cols 
        WHERE cols.table_catalog = 'dbfireveg' 
        AND cols.table_schema  = 'litrev' 
        AND cols.table_name    = '%s' 
        AND cols.column_name    = 'best';  """
        
for k in trait_info:
    traitnames[k[0]]={'name':k[1],'type':k[3],'method':k["method_vocabulary"] is not None}
    if k[3] == 'categorical':
        qrystr = qrycat % (k["category_vocabulary"],)
    else:
        qrystr = qrynum % (k[0],)
    res = dbquery(qrystr, dbparams)
    if (len(res)==1):
        traitvocabs[k[0]] = res[0]
    if k["method_vocabulary"] is not None:
        qrystr = qrycat % (k["method_vocabulary"],)
        methodvocabs[k[0]] = dbquery(qrystr, dbparams)

## Create a dictionary of definitions

In [40]:
trait_definitions = dict()

for trait in trait_info:
    code = trait['code']
    trait_definitions[code] = { 
        'label': trait['name'], 
        'description': trait['description'], 
        'life_stage': trait['life_stage'], 
        'life_history_process': trait['life_history_process'], 
        'type': trait['value_type'], 
        }
    if code in traitvocabs.keys():
        trait_definitions[code]['allowed_values_levels'] = traitvocabs[code][0]
    if code in ['surv1', 'surv6', 'surv5', 'grow1', 'repr3', 'repr3a', 'repr4']:
        trait_definitions[code]['units'] = 'years'
        trait_definitions[code]['allowed_values_min'] = 0.1
    if code in methodvocabs.keys():
        trait_definitions[code]['allowed_methods_levels'] = methodvocabs[code][0][0]


In [41]:
trait_definitions['repr4']


{'label': 'Maturation age',
 'description': 'The time taken for 50% of individuals in a cohort [even aged recruits] to produce their first viable seed',
 'life_stage': 'Standing plant',
 'life_history_process': 'Reproduction',
 'type': 'numerical',
 'units': 'years',
 'allowed_values_min': 0.1,
 'allowed_methods_levels': {'Direct field observation or measure (unknown sample size)': 'Estimates based on data observed or measured in the field (unknown number of individuals observed). Time series data for seedbank accumulation.',
  'Direct observation (small sample)': 'Estimates based on data observed or measured in the field based on fewer than 10 individuals. Time series measurements of flowering in relation to cohort age.',
  'Direct observation (large sample)': 'Estimates based on data observed or measured in the field based on 10 or more individuals observed). Time series measurements of flowering in relation to cohort age.',
  'Estimated by extrapolation from observed values': 'Estim

Write this dictionary into a `yaml` file:

In [42]:
with open(repodir / 'trait-definitions.yml', 'w') as yaml_file:
    yaml.dump(trait_definitions, yaml_file, default_flow_style=False, sort_keys=False)

## Read in definitions from YAML
This file was created manually, let's parse it and check it is well formed:


In [43]:
with open(repodir / 'fieldvisits-definitions.yml', 'r') as file:
    field_visits_definitions = yaml.safe_load(file)

In [45]:
field_visits_definitions.keys()

dict_keys(['observerID', 'field_site', 'surveys', 'field_visit', 'fire_history', 'field_samples', 'quadrat_samples', 'field_visit_veg_description', 'field_visit_vegetation_estimates', 'field_visit_vegetation_raw_values'])

In [46]:
field_visits_definitions['fire_history']

{'label': 'Fire history table',
 'description': 'Table with one record for each recorded fire events. Each fire event is recorded per field site. One field site can have multiple fire events. Large fires affecting multiple sites are recorded here multiple times, once for each field site, but can be grouped by fire name if needed.',
 'attributes': {'site_label': 'ID for a field site, references the Field site table',
  'fire_name': 'Name given to the fire, specially for large fire events affecting multiple sites',
  'fire_date': 'Approximate description of the date of the fire in a text format, can be a year, a range of dates or a full date string.',
  'earliest_date': 'Earliest possible date inferred from fire_date, in a formatted date string.',
  'latest_date': 'Latest possible date inferred from fire_date, in a formatted date string.',
  'how_inferred': 'How fire date was inferred or approximated, either from existing records or from evidence in situ.',
  'cause_of_ignition': 'Probab

In [52]:

df = pd.DataFrame(field_visits_definitions)
df = df.fillna(' ').T
df.to_html(repodir / 'fieldvisits-definitions.html')

## That is it for now!

✅ Job done! 😎👌🔥

You can:
- go [back home](../Instructions-and-workflow.ipynb),
- continue navigating the repo on [GitHub](https://github.com/ces-unsw-edu-au/fireveg-db-exports)
- continue exploring the repo on [OSF](https://osf.io/h96q2/).
- visit the database at <http://fireecologyplants.net>