# Read fire response from austraits data

We will download data from [AusTraits](https://austraits.org/) ([pre-print](https://www.biorxiv.org/content/10.1101/2021.01.04.425314v1)) and add entries to the database for resprouting time for each species.

Let's start loading the libraries

In [1]:
from pathlib import Path
import os
import json
import urllib
from zipfile import ZipFile
import pandas as pd
import numpy as np

## Read _austraits_ data 
We will download the file from the [Zenodo repository](https://zenodo.org/record/5112001) using the API url and saving this under the data folder.

In [8]:
repodir = Path("../../") 
dataset = "https://zenodo.org/api/records/3568417"
outputdir = repodir / "data/austraits/"

if not os.path.isdir(outputdir):
    os.makedirs(outputdir)

We use urllib to open the url and read the data (if successfully connected!)

In [9]:
def getResponse(url):
    operUrl = urllib.request.urlopen(url)
    if(operUrl.getcode()==200):
       data = operUrl.read()
    else:
       print("Error receiving data", operUrl.getcode())
    return data
zrecord = getResponse(dataset)

Response data is in json format, need to parse it and read list of files:

In [10]:
jsonData = json.loads(zrecord)
#jsonData
for files in jsonData['files']:
    print(files['key'])

austraits-3.0.2.rds
austraits-3.0.2.zip
dictionary.html
NEWS.md
readme.txt


We want to download the zip file with the csv_files

In [11]:
outputfile = outputdir / jsonData['files'][1]['key']

if os.path.isfile(outputfile):
    print('File exists')
else:
    resp = getResponse(jsonData['files'][1]['links']['self'])
    output = open(outputfile,'wb')
    output.write(resp)
    output.close()

File exists


We will read from the zipfile the data that we need:

In [12]:
zfobj = ZipFile(outputfile)
zfobj.namelist()

['austraits-3.0.2/',
 'austraits-3.0.2/taxa.csv',
 'austraits-3.0.2/methods.csv',
 'austraits-3.0.2/definitions.yml',
 'austraits-3.0.2/build_info.md',
 'austraits-3.0.2/contributors.csv',
 'austraits-3.0.2/contexts.csv',
 'austraits-3.0.2/excluded_data.csv',
 'austraits-3.0.2/traits.csv',
 'austraits-3.0.2/taxonomic_updates.csv',
 'austraits-3.0.2/sites.csv',
 'austraits-3.0.2/sources.bib']

### Read files
We will need to read the files with the definitions (in _yaml_ format), the sources or references (in _bibtex_ format) and the traits and taxonomic data (in _csv_ format)

In [13]:
import yaml

with zfobj.open('austraits-3.0.2/definitions.yml') as file:
    try:
        ATdefinitions = yaml.safe_load(file)   
        print(ATdefinitions.keys())
    except yaml.YAMLError as exc:
        print(exc)

dict_keys(['traits', 'value_type', 'austraits', 'metadata'])


In [14]:
from pybtex.database.input import bibtex
parser = bibtex.Parser()

ATrefs = parser.parse_bytes(zfobj.open('austraits-3.0.2/sources.bib').read())


In [15]:
ATtraits = pd.read_csv(zfobj.open('austraits-3.0.2/traits.csv'),low_memory=False)

In [16]:
ATtaxa = pd.read_csv(zfobj.open('austraits-3.0.2/taxa.csv'))

We will also read the updated species data from BioNET:

In [17]:
inputdir = repodir / "data/"
BioNET = pd.read_excel(inputdir / 'vis-survey-datasheet-6000.PowerQuery.20210708.xlsx')

## Read fire response data

In [18]:
ss = (ATtraits['trait_name']=='fire_response' )
ATtraits[ss]

Unnamed: 0,dataset_id,taxon_name,site_name,context_name,observation_id,trait_name,value,unit,date,value_type,replicates,original_name
15332,Baker_2019,Commersonia bartramia,Bogangar -28.3303611_and_153.5519444,,Baker_2019_01,fire_response,resprouts,,,expert_mean,,Commersonia bartramia
15334,Baker_2019,Denhamia celastroides,Bogangar -28.3303611_and_153.5519444,,Baker_2019_02,fire_response,resprouts,,,expert_mean,,Denhamia celastroides
15336,Baker_2019,Blechnum neohollandicum,Bogangar -28.3303611_and_153.5519444,,Baker_2019_03,fire_response,resprouts,,,expert_mean,,Doodia aspera
15338,Baker_2019,Pittosporum undulatum,Bogangar -28.3303611_and_153.5519444,,Baker_2019_04,fire_response,resprouts,,,expert_mean,,Pittosporum undulatum
15340,Baker_2019,Polyscias sambucifolia,Main Arm -28.4936667_and_153.3923611,,Baker_2019_05,fire_response,resprouts,,,expert_mean,,Polyscias sambucifolia
...,...,...,...,...,...,...,...,...,...,...,...,...
959253,White_2020,Roepera ovata,,,White_2020_7923,fire_response,fire_killed,,,expert_mean,,Zygophyllum ovatum
959272,White_2020,Roepera prismatotheca,,,White_2020_7924,fire_response,fire_killed,,,expert_mean,,Zygophyllum prismatothecum
959291,White_2020,Roepera similis,,,White_2020_7925,fire_response,fire_killed,,,expert_mean,,Zygophyllum simile
959310,White_2020,Roepera ammophila,,,White_2020_7926,fire_response,fire_killed,,,expert_mean,,Zygophyllum sp. aff. ammophilum


In [19]:
def extract_reflabel(refid):
    authors=list()
    year=ATrefs.entries[refid].fields['year']
    for person in ATrefs.entries[refid].persons['author']:
        authors.extend(person.last_names)
    reflabel = "%s %s" % (" ".join(authors),year)
    if len(reflabel)>50:
        reflabel=reflabel[0:47]+"..."
    return(reflabel)

def extract_refinfo(refid):
    year=ATrefs.entries[refid].fields['year']
    title=ATrefs.entries[refid].fields['title']
    persons = ATrefs.entries[refid].persons['author']
    if len(persons)==1:
        refcitation = "%s (%s) %s" % (persons[0],year, title)
    else:
        authors=list()
        for person in persons:
            authors.append(person.__str__())
        refcitation = "%s (%s) %s" % ("; ".join(authors),year, title)
    for f in ('journal','volume','doi'):
        if f in ATrefs.entries[refid].fields.keys():
            refcitation = refcitation + " " + ATrefs.entries[refid].fields[f]
    return refcitation 

def match_spcode(row):
    spname=row['taxon_name']
    altname=row['original_name']
    result={'species':spname}
    if altname!=spname:
        result['original_notes']=['original_name:',altname]
    spp_info = BioNET[BioNET['scientificName'] == spname] 
    spcode=None
    if len(spp_info)==1 and spp_info.speciesCode_Synonym is not None:
        spcode=spp_info.speciesCode_Synonym.values[0]
        result['species_code']=spcode
    elif spname != altname:
        spp_info = BioNET[BioNET['scientificName'] == altname]
        if len(spp_info)==1 and spp_info.speciesCode_Synonym is not None:
            spcode=spp_info.speciesCode_Synonym.values[0]
            result['species_code']=spcode
            result['original_notes'].append('original name used to match with BioNET names')
 
    return result


In [20]:
print(extract_refinfo('NSWFRD_2014'))
print(extract_reflabel('NSWFRD_2014'))

Kenny, Belinda; Orscheg, Corinna; Tasker, Elizabeth; Gill, Malcolm A.; Bradstock, Ross (2014) {NSW Flora Fire Response Database, v2.1}
Kenny Orscheg Tasker Gill Bradstock 2014


In [21]:
def create_record(row):
    refid=row['dataset_id']
    reflabel = extract_reflabel(refid)
    transvalue=switcher.get(row['value'], None)
    
    record={'main_source': 'austraits-3.0.2',
            'additional_notes': ['Values reclassified by JRFP',
                                'Automatic extraction with python script'],
            'raw_value': [row['trait_name'],row['value'],row['value_type']],
            'original_notes': list(),
           'original_sources':[reflabel]}
    spinfo=match_spcode(row)
    for key in spinfo.keys():
        record[key]=spinfo[key]
    if reflabel=='NSWFRD_2014':
        record['weight'] = 0
        record['weight_notes'] = ["python-script import","default of 0 for redundant records"]
    else:
        record['weight'] = 1
        record['weight_notes'] = ["python-script import","default of 1"]
    if transvalue is not None:   
        record["norm_value"]=transvalue
    if row['site_name'] != "nan":
        record['original_notes'].append('site name:')
        record['original_notes'].append(row['site_name'])
    return(record)

In [22]:
target = ATtraits[ss].head()

switcher={
        "fire_killed": "None",
        'not_fire_killed_does_not_resprout': "None",
        'fire_not_relevant': 'Unknown', 
        'fire_killed resprouts': 'Half',
        'unknown': 'Unknown',
        "resprouts": "All"
    }
    
target.fillna("nan",inplace=True)
reflist=list()
records=list()
for idx, row in target.iterrows():
    record=create_record(row)
    refid=row['dataset_id']
    extract_reflabel(refid)
    if refid not in reflist:
        reflist.append(refid)
    records.append(record)
records

[{'main_source': 'austraits-3.0.2',
  'additional_notes': ['Values reclassified by JRFP',
   'Automatic extraction with python script'],
  'raw_value': ['fire_response', 'resprouts', 'expert_mean'],
  'original_notes': ['site name:', 'Bogangar -28.3303611_and_153.5519444'],
  'original_sources': ['Baker 2019'],
  'species': 'Commersonia bartramia',
  'species_code': '6129',
  'weight': 1,
  'weight_notes': ['python-script import', 'default of 1'],
  'norm_value': 'All'},
 {'main_source': 'austraits-3.0.2',
  'additional_notes': ['Values reclassified by JRFP',
   'Automatic extraction with python script'],
  'raw_value': ['fire_response', 'resprouts', 'expert_mean'],
  'original_notes': ['site name:', 'Bogangar -28.3303611_and_153.5519444'],
  'original_sources': ['Baker 2019'],
  'species': 'Denhamia celastroides',
  'species_code': '8387',
  'weight': 1,
  'weight_notes': ['python-script import', 'default of 1'],
  'norm_value': 'All'},
 {'main_source': 'austraits-3.0.2',
  'additiona

In [23]:
target[0:]

Unnamed: 0,dataset_id,taxon_name,site_name,context_name,observation_id,trait_name,value,unit,date,value_type,replicates,original_name
15332,Baker_2019,Commersonia bartramia,Bogangar -28.3303611_and_153.5519444,,Baker_2019_01,fire_response,resprouts,,,expert_mean,,Commersonia bartramia
15334,Baker_2019,Denhamia celastroides,Bogangar -28.3303611_and_153.5519444,,Baker_2019_02,fire_response,resprouts,,,expert_mean,,Denhamia celastroides
15336,Baker_2019,Blechnum neohollandicum,Bogangar -28.3303611_and_153.5519444,,Baker_2019_03,fire_response,resprouts,,,expert_mean,,Doodia aspera
15338,Baker_2019,Pittosporum undulatum,Bogangar -28.3303611_and_153.5519444,,Baker_2019_04,fire_response,resprouts,,,expert_mean,,Pittosporum undulatum
15340,Baker_2019,Polyscias sambucifolia,Main Arm -28.4936667_and_153.3923611,,Baker_2019_05,fire_response,resprouts,,,expert_mean,,Polyscias sambucifolia


In [24]:
from configparser import ConfigParser
import psycopg2
from psycopg2.extensions import AsIs

filename = repodir / 'secrets' / 'database.ini'
section = 'aws-lght-sl'

parser = ConfigParser()
parser.read(filename)

dbparams = {}
if parser.has_section(section):
    params = parser.items(section)
    for param in params:
        dbparams[param[0]] = param[1]
else:
    raise Exception('Section {0} not found in the {1} file'.format(section, filename))

In [25]:
#split your dataframe into smaller dataframes contained in a list.

ATtraits.fillna("nan",inplace=True)
ss = (ATtraits['trait_name']=='fire_response' )
df = ATtraits[ss]
n = 500  #chunk row size
list_df = [df[i:i+n] for i in range(0,df.shape[0],n)]

len(list_df)

37

In [26]:
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**dbparams)
cur = conn.cursor()
affected_rows=0
switcher={
        "fire_killed": "None",
        'not_fire_killed_does_not_resprout': "None",
        'fire_not_relevant': 'Unknown', 
        'fire_killed resprouts': 'Half',
        'unknown': 'Unknown',
        "resprouts": "All"
    }

for target in list_df:    
    reflist=list()
    records=list()
    for idx, row in target.iterrows():
        record=create_record(row)
        refid=row['dataset_id']
        extract_reflabel(refid)
        if refid not in reflist:
            reflist.append(refid)
        records.append(record)
        
    for refid in reflist:
        cur.execute("INSERT INTO litrev.ref_list(ref_code,alt_code,ref_cite) values(%s,%s,%s) ON CONFLICT DO NOTHING",
                    (extract_reflabel(refid), refid, extract_refinfo(refid)))
        affected_rows = affected_rows+cur.rowcount
    conn.commit()
    print("total number of lines updated: %s" % affected_rows)

    insert_statement = 'insert into litrev.surv1 (%s) values %s ON CONFLICT DO NOTHING'
    print("total of %s records prepared" % len(records)) 
    for record in records: 
        cur.execute(insert_statement, (AsIs(','.join(record.keys())), tuple(record.values())))
        affected_rows = affected_rows+cur.rowcount
    records.clear()
    conn.commit()
    print("total number of lines updated: %s" % affected_rows)

cur.close()
if conn is not None:
    conn.close()
    print('Database connection closed.')     


Connecting to the PostgreSQL database...
total number of lines updated: 0
total of 500 records prepared
total number of lines updated: 500
total number of lines updated: 500
total of 500 records prepared
total number of lines updated: 1000
total number of lines updated: 1000
total of 500 records prepared
total number of lines updated: 1500
total number of lines updated: 1500
total of 500 records prepared
total number of lines updated: 2000
total number of lines updated: 2000
total of 500 records prepared
total number of lines updated: 2500
total number of lines updated: 2500
total of 500 records prepared
total number of lines updated: 3000
total number of lines updated: 3000
total of 500 records prepared
total number of lines updated: 3500
total number of lines updated: 3500
total of 500 records prepared
total number of lines updated: 4000
total number of lines updated: 4000
total of 500 records prepared
total number of lines updated: 4500
total number of lines updated: 4500
total of 5

In [16]:
print(ATdefinitions['traits']['elements']['fire_response'].keys())

dict_keys(['description', 'type', 'label', 'values'])


In [17]:
print(ATdefinitions['traits']['elements']['fire_response']['values'])

{'fire_killed': 'Plants killed by hot fires', 'resprouts': "Plants resprout from underground storage organ following fire. (For studies that don't differentiate between respouting strength)", 'not_fire_killed_does_not_resprout': 'Plants that are rarely killed by a moderate-intensity fire, but do not resprout', 'fire_not_relevant': 'Plant never affected by fire (for aquatic taxon)', 'unknown': 'Fire status assessed, but unknown'}
