# Summary of database import



## Set up

In [1]:
from pathlib import Path
import os
from configparser import ConfigParser
import psycopg2
from psycopg2.extras import DictCursor


repodir = Path("../../") 
inputdir = repodir / "data/"
filename = repodir / 'secrets' / 'database.ini'
section = 'aws-lght-sl'


def read_dbparams(filename,section="postgresql"):
    # create a parser
    parser = ConfigParser()
    # read config file
    parser.read(filename)

    # get section, default to postgresql
    db = {}
    if parser.has_section(section):
        params = parser.items(section)
        for param in params:
            db[param[0]] = param[1]
    else:
        raise Exception('Section {0} not found in the {1} file'.format(section, filename))

    return db

def dbquery(query,dbparams):
    #print('Connecting to the PostgreSQL database...')
    conn = psycopg2.connect(**dbparams)
    cur = conn.cursor(cursor_factory=DictCursor)
    cur.execute(query)
    res = cur.fetchall()
    cur.close()
    if conn is not None:
        conn.close()
        #print('Database connection closed.')    
    return res



In [2]:
filename = repodir / 'secrets' / 'database.ini'
dbparams=read_dbparams(filename,section='aws-lght-sl')

In [3]:
qry = "select * from litrev.trait_info ;"
trait_info = dbquery(qry,dbparams)

## First tranche

### Trait code surv1

Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others| Data curation
---|---|---|---|---|---|---
<font color='darkgreen'>Done</font>|<font color='darkgreen'>Done</font>|<font color='darkgreen'>Done</font>|<font color='darkgreen'>Done</font>|TO DO|None|None 

#### Trait information

Check trait information

In [4]:
trait_info[1]['code']
for elem in filter(lambda x: x['code'] == 'surv1', trait_info):
    for k in elem.keys():
        print("%s: %s" % (k,elem[k]))

code: surv1
name: Resprouting - full canopy scorch
description: Ordinal categories of survival and resprouting proportions for plants subjected to 100% canopy scorch
value_type: categorical
life_stage: Standing plant
life_history_process: Survival
priority: 3rd tranche
category_vocabulary: resprouting_vocabulary
method_vocabulary: method_surv1_vocabulary


#### Vocabularies

This trait uses one vocabulary for the categories and one for the methods

In [5]:
#For the category
qry_vocabulary="SELECT pg_catalog.obj_description(t.oid, 'pg_type')::json from pg_type t where typname = '%s';" 
dbquery(qry_vocabulary % elem['category_vocabulary'],dbparams)

[[{'None': '< 5 % individuals in a population resprout after 100% canopy scorch',
   'Few': '> 5 and < 30 % individuals in a population resprout after 100% canopy scorch',
   'Half': '> 30 and < 70 % individuals in a population resprout after 100% canopy scorch',
   'Most': '> 70 and < 90 % individuals in a population resprout after 100% canopy scorch',
   'All': '> 90 % individuals in a population resprout after 100% canopy scorch',
   'Unknown': 'No data'}]]

In [6]:
#For the methods
dbquery(qry_vocabulary % elem['method_vocabulary'],dbparams)

[[{'Direct field observation or measure (unknown sample size)': 'Estimates based on data observed or measured in the field (unknown number of individuals observed). Survival estimates based on counts of individuals tagged pre-fire or counts of resprouters and dead remains postfire',
   'Direct observation (small sample)': 'Estimates based on data observed or measured in the field based on fewer than 10 individuals. Survival estimates based on counts of individuals tagged pre-fire or counts of resprouters and dead remains postfire',
   'Direct observation (large sample)': 'Estimates based on data observed or measured in the field based on 10 or more individuals observed). Survival estimates based on counts of individuals tagged pre-fire or counts of resprouters and dead remains postfire',
   'Estimated by extrapolation from observed values': 'Estimated by extrapolation from (time series) of observed values. High mortality from 100% scorch may be inferred if mortality is high from partia

#### Importing records

I ran the script in notebook '[Read resprouting data from NSWFFRD](python/Literature-review/Read-resprouting-data-from-NSWFFRD.ipynb)'. It apparently imported 11563 records, let's check...

#### Check records in database

In [7]:
qry_values = ' select norm_value,count(*),count(distinct species),count(distinct species_code) from litrev.%s group by norm_value;'

dbquery(qry_values % 'surv1',dbparams)

[['None', 4439, 1601, 1589],
 ['Few', 254, 181, 181],
 ['Half', 237, 179, 178],
 ['Most', 244, 163, 163],
 ['All', 6385, 1935, 1907],
 ['Unknown', 4, 4, 4]]

Any values missing?

In [8]:
qry = ' select raw_value,count(*),count(distinct species),count(distinct species_code) from litrev.surv1 where norm_value is NULL group by raw_value;'

dbquery(qry,dbparams)

[]

### Trait code surv4

Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others | Data curation
---|---|---|---|---|---|---
<font color='darkgreen'>Done</font>|<font color='darkgreen'>Done</font>|<font color='darkgreen'>Done</font>| <font color='darkgreen'>Done</font> | TO DO | None | TO DO: ca. 20 records 

#### Trait information

Check trait information

In [9]:
trait_info[1]['code']
for elem in filter(lambda x: x['code'] == 'surv4', trait_info):
    for k in elem.keys():
        print("%s: %s" % (k,elem[k]))

code: surv4
name: Regenerative Organ
description: None
value_type: categorical
life_stage: Standing plant
life_history_process: Survival
priority: 3rd tranche
category_vocabulary: resprout_organ_vocabulary
method_vocabulary: None


#### Vocabularies

This trait uses one vocabulary for the categories but does not include a methods vocabulary:

In [10]:
dbquery(qry_vocabulary % elem['category_vocabulary'],dbparams)

[[{'Epicormic': 'Resprouting from epicormic meristematic tissues or buds beneath bark on woody aerial stems',
   'Apical': 'Resprouting from active shoots protected by crowded leaf bases on woody stems',
   'Lignotuber': 'Resprouting from meristematic tissues or buds on lignotubers (swollen woody organ) at or just below the soil surface',
   'Basal': 'Resprouting from meristematic tissues or buds at the base of woody stems at or just below the soil surface',
   'Tuber': 'Resprouting from non-woody nodular subsoil organs (bulbs, corms, tubers, taproots) with active shoots or dormant buds',
   'Tussock': 'Shoots protected and resprouting within tightly clustered tillers and resprouting  without significant lateral spread',
   'Long rhizome or root sucker': 'Resprouting from buried woody or non-woody horizontal organs capable of lateral spread, typically >0.5m. Includes root suckers.',
   'Short rhizome': 'Resprouting from buried woody or non-woody horizontal organs, but not capable of si

#### Importing records

I ran the script in notebook '[Read categorical variables from NSWFFRD](python/Literature-review/Read-categorical-variables-from-NSWFFRD.ipynb)' for several variables. For _surv4_ it imported 1411 records, only 18 records had errors and could not be assigned to one of the predefined categories.


#### Check records in database

In [11]:
dbquery(qry_values % 'surv4',dbparams)

[['Epicormic', 147, 146, 146],
 ['Apical', 14, 14, 14],
 ['Lignotuber', 157, 157, 157],
 ['Basal', 630, 630, 613],
 ['Tuber', 104, 104, 104],
 ['Short rhizome', 166, 165, 165],
 ['Long rhizome or root sucker', 172, 171, 171],
 ['Stolon', 3, 3, 3],
 [None, 18, 17, 17]]

Any values missing?

In [12]:
qry = ' select raw_value,count(*),count(distinct species),count(distinct species_code) from litrev.surv4 where norm_value is NULL group by raw_value;'

res=dbquery(qry,dbparams)

In [13]:
len(res)

13

In [14]:
res

[[['Resprout location', 'apical buds'], 4, 4, 4],
 [['Resprout location', 'basal, possibly rhizome', '->', 'possibly rhizome'],
  1,
  1,
  1],
 [['Resprout location', 'basal rhizome'], 1, 1, 1],
 [['Resprout location', 'basal & some epicormic', '->', 'some epicormic'],
  1,
  1,
  1],
 [['Resprout location',
   'basal; stem buds; swamp form basal buds',
   '->',
   'swamp form basal buds'],
  1,
  1,
  1],
 [['Resprout location', 'basal: tuber (10; 48)', '->', 'basal: tuber'],
  1,
  1,
  1],
 [['Resprout location', 'has rhizome'], 2, 2, 2],
 [['Resprout location', 'or rhizome'], 1, 1, 1],
 [['Resprout location', 'rhizome?'], 1, 1, 1],
 [['Resprout location', 'roots'], 2, 2, 2],
 [['Resprout location', 'rootsuckers &', '->', ''], 1, 1, 1],
 [['Resprout location', 'sometimes suckers'], 1, 1, 1],
 [['Resprout location', 'stem bases'], 1, 1, 1]]

### repr3
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others
---|---|---|---|---|---
<font color='red'>TO DO</font>|<font color='red'>TO DO</font>|<font color='red'>TO DO</font>| <font color='red'>TO DO</font>| <font color='red'>TO DO</font> | None 

In [52]:
qry_comment = """
SELECT
    (
        SELECT
            pg_catalog.col_description(c.oid, cols.ordinal_position::int)
        FROM pg_catalog.pg_class c
        WHERE
            c.oid     = (SELECT CONCAT(cols.table_schema,'.',cols.table_name)::regclass::oid) AND
            c.relname = cols.table_name
    ) as column_comment
 
FROM information_schema.columns cols
WHERE
    cols.table_catalog = 'dbfireveg' AND
    cols.table_schema  = 'litrev' AND
    cols.table_name    = '%s' AND
    cols.column_name    = 'best';   
""" 

res=dbquery(qry_comment % 'repr3',dbparams)
res

[['{"years": "The time taken for first individual in a recruitment cohort to produce their first reproductive organs (e.g. flowers, sporophylls)"}']]

### repr4
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others
---|---|---|---|---|---
<font color='red'>TO DO</font>|<font color='red'>TO DO</font>|<font color='red'>TO DO</font>| <font color='red'>TO DO</font>| <font color='red'>TO DO</font>  | None 

### germ8
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others
---|---|---|---|---|---
<font color='red'>TO DO</font>|<font color='red'>TO DO</font>|<font color='red'>TO DO</font>| <font color='red'>TO DO</font>| <font color='red'>TO DO</font>  | None 

## Second tranche

### repr2
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others | Data curation
---|---|---|---|---|---|---
<font color='red'>TO DO: update description</font>|<font color='darkgreen'>Done</font>|<font color='darkgreen'>Done</font>| <font color='darkgreen'>Done</font>|  TO DO | None | <font color='red'>TO DO: ca. 20 records</font>


#### Trait information

In [20]:
trait_info[1]['code']
for elem in filter(lambda x: x['code'] == 'repr2', trait_info):
    for k in elem.keys():
        print("%s: %s" % (k,elem[k]))

code: repr2
name: Post-fire flowering response
description: None
value_type: categorical
life_stage: Standing plant
life_history_process: Reproduction
priority: 2nd tranche
category_vocabulary: postfire_response
method_vocabulary: None


#### Vocabularies

This trait uses one vocabulary for the categories but does not include a methods vocabulary:

In [21]:
dbquery(qry_vocabulary % elem['category_vocabulary'],dbparams)

[[{'Exclusive': 'Flowering occurs exclusively in the first 5 years after fire (excluding outliers, e.g roadsides)',
   'Facultative': 'Flowering occurs most prolifically in the first 5 years after fire',
   'Negligible': 'Flowering similar or variable throughout the fire cycle',
   'Unknown': 'No data'}]]

#### Importing records

I ran the script in notebook '[Read categorical variables from NSWFFRD](python/Literature-review/Read-categorical-variables-from-NSWFFRD.ipynb)' for several variables. For _repr2_ it imported 144 records, ca. 120 species in two categories, and ca. 23 records had errors and could not be assigned to one of the predefined categories. No records for class 'Negligible' or 'Unknown'.


#### Check records in database

In [23]:
dbquery(qry_values % 'repr2',dbparams)

[['Exclusive', 30, 30, 30], ['Facultative', 91, 91, 91], [None, 23, 21, 21]]

### grow1
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others
---|---|---|---|---|---
TO DO|TO DO|TO DO| TO DO| TO DO | None 


### rect2
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others | Data curation
---|---|---|---|---|---|---
<font color='darkgreen'>Done</font>|<font color='darkgreen'>Done</font>|<font color='darkgreen'>Done</font>| <font color='darkgreen'>Done</font>| TO DO | None | <font color='red'>TO DO: >90 records</font>


In [24]:
trait_info[1]['code']
for elem in filter(lambda x: x['code'] == 'rect2', trait_info):
    for k in elem.keys():
        print("%s: %s" % (k,elem[k]))

code: rect2
name: Establishment pattern
description: The temporal pattern of seedling or clonal establishment through the fire cycle
value_type: categorical
life_stage: Seedling
life_history_process: Recruitment
priority: 3rd tranche
category_vocabulary: establishment_vocabulary
method_vocabulary: None


In [25]:
dbquery(qry_vocabulary % elem['category_vocabulary'],dbparams)

[[{'Intolerant': 'I- intolerant of conditions in established communities (i.e. unburnt for some years) and largely restricted to early post-fire years (open conditions, notionally within 5 yrs of previous fire). See Noble & Slatyer (1980, Vegetatio)',
   'Intolerant-Tolerant': 'I/T- intolerant or tolerant of conditions in established communities (i.e. unburnt for some years) , variable or uncertain. See Noble & Slatyer (1980, Vegetatio)',
   'Tolerant': 'T- tolerant of conditions through the fire cycle (i.e. new recruits observed in both burnt and unburnt sites). See Noble & Slatyer (1980, Vegetatio)',
   'Tolerant-Requiring': 'T/R- tolerant or requiring of conditions in established communities, uncertain or variable. See Noble & Slatyer (1980, Vegetatio)',
   'Requiring': 'R- requiring of of conditions in established communities (i.e. new recruits only observed at sites unburnt for some years). See Noble & Slatyer (1980, Vegetatio)',
   'Unknown': 'No data'}]]

#### Importing records

I ran the script in notebook '[Read categorical variables from NSWFFRD](python/Literature-review/Read-categorical-variables-from-NSWFFRD.ipynb)' for several variables. For _rect2_ it imported 1088 records, >900 species in four categories, and ca. 96 records had errors and could not be assigned to one of the predefined categories. 


#### Check records in database

In [40]:
dbquery(qry_values % 'rect2',dbparams)

[['Intolerant', 636, 634, 634],
 ['Intolerant-Tolerant', 3, 3, 3],
 ['Tolerant', 318, 318, 318],
 ['Requiring', 35, 34, 34],
 [None, 96, 92, 92]]

### germ1
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others | Data curation
---|---|---|---|---|---|--- 
<font color='red'>TO DO: add description</font>|<font color='darkgreen'>Done</font>|<font color='darkgreen'>Done</font>| <font color='darkgreen'>Done: 1635 imported</font>| TO DO | None | <font color='red'>TO DO: >80 records</font>



In [36]:
trait_info[1]['code']
for elem in filter(lambda x: x['code'] == 'germ1', trait_info):
    for k in elem.keys():
        print("%s: %s" % (k,elem[k]))

code: germ1
name: Seedbank Type
description: None
value_type: categorical
life_stage: Seed
life_history_process: Germination
priority: 2nd tranche
category_vocabulary: seedbank_vocabulary
method_vocabulary: None


In [37]:
dbquery(qry_vocabulary % elem['category_vocabulary'],dbparams)

[[{'Canopy': 'Seeds withheld for >1yr within woody fruits in the plant canopy that release seeds when scorched',
   'Soil-persistent': 'Seeds released at maturity with a fraction (min 10%) remaining viable in the soil for >1yr',
   'Transient': 'Seeds released from woody or fleshy fruits at maturity with a negligible fraction (max 1%) remaining viable in the soil for >1yr',
   'Non-canopy': 'Seedbanks uncertain but not canopy type (either Soil-persistent or Transient seedbank type)'}]]

#### Importing records

I ran the script in notebook '[Read categorical variables from NSWFFRD](python/Literature-review/Read-categorical-variables-from-NSWFFRD.ipynb)' for several variables. For _rect2_ it imported 1635 records, >1300 species in four categories, and >80 records had errors and could not be assigned to one of the predefined categories. 


#### Check records in database

In [41]:
dbquery(qry_values % 'germ1',dbparams)

[['Soil-persistent', 962, 958, 957],
 ['Transient', 267, 267, 266],
 ['Canopy', 228, 226, 220],
 ['Non-canopy', 95, 95, 90],
 [None, 83, 82, 82]]

### repr3a
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others
---|---|---|---|---|---
TO DO|TO DO|TO DO| TO DO| TO DO | None 


## Third tranche

### surv5
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others | Data curation
---|---|---|---|---|---|---
<font color='darkgreen'>Done</font>|TO DO|TO DO|TO DO| TO DO| TO DO | None 


In [42]:
for elem in filter(lambda x: x['code'] == 'surv5', trait_info):
    for k in elem.keys():
        print("%s: %s" % (k,elem[k]))

code: surv5
name: Standing plant longevity (Max)
description: Age at which 50% of individuals in a cohort (excluding outliers) have died from senescence
value_type: numerical
life_stage: Standing plant
life_history_process: Survival
priority: 3rd tranche
category_vocabulary: None
method_vocabulary: None


### surv6
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others | Data curation
---|---|---|---|---|---|---
<font color='darkgreen'>Done</font>|TO DO|TO DO|TO DO| TO DO| TO DO | None 


In [43]:
for elem in filter(lambda x: x['code'] == 'surv6', trait_info):
    for k in elem.keys():
        print("%s: %s" % (k,elem[k]))

code: surv6
name: Seedbank half-life
description: Age at which 50% of a seed cohort in an in situ seedbank have decayed or become inviable
value_type: numerical
life_stage: Seed
life_history_process: Survival
priority: 3rd tranche
category_vocabulary: None
method_vocabulary: None


### surv7
Metadata | Structure | Vocabularies | Records NSWFFRD | Records austraits | Others | Data curation
---|---|---|---|---|---|---
<font color='darkgreen'>Done</font>|TO DO|TO DO|TO DO| TO DO| TO DO | None 


In [44]:
for elem in filter(lambda x: x['code'] == 'surv7', trait_info):
    for k in elem.keys():
        print("%s: %s" % (k,elem[k]))

code: surv7
name: Seed longevity
description: Age at which all seeds in a cohort (excluding outliers, e.g. 95th percentile) have decayed or become inviable
value_type: numerical
life_stage: Seed
life_history_process: Survival
priority: 3rd tranche
category_vocabulary: None
method_vocabulary: None


In [48]:
res = dbquery('SELECT code, name, life_stage, life_history_process, priority,value_type FROM litrev.trait_info',dbparams)
import pandas as pd
data = pd.DataFrame(res)
data=data.rename(columns={0:"Trait code", 1:"Trait name", 2:"Life stage", 3:"Life history process", 4:"priority", 5:"value_type"})

first = data.loc[data.priority.notna()].sort_values(by='priority').fillna(0)
data.set_index(['Trait code'], inplace=True)
data.index.name=None
fourth = data.loc[data.priority.isna()][['Trait name','Life stage','Life history process']]

In [50]:
fourth

Unnamed: 0,Trait name,Life stage,Life history process
rect3,Seedling emergence phenology,Seedling,Recruitment
disp1,Propagule dispersal mode,Seed,Dispersal
germ10,Postfire residual seedbank,Seed,Germination
grow2a,Maximum growth stage,Standing plant,Growth
grow3,Maximum bark thickness,Standing plant,Growth
grow4,Maximum plant height,Standing plant,Growth
repr1,Flowering time,Standing plant,Reproduction
repr5,Age at maximum seed production,Standing plant,Reproduction
repr6,Age of maximum seed bank size,Standing plant,Reproduction
repr7a,Resprout flowering response to summer fire,Standing plant,Reproduction
