# Read Resprouting data from NSWFRD 2014
Read the spreadsheet from NSW Flora Fire response database and extract information from resprouting values and references.
We will use the _openpyxl_ library in ***python***

In [1]:
import openpyxl
from pathlib import Path
import os
import re
import copy

We need to define a path to locate the documents relative to the current repository directory

In [2]:
repodir = Path("../../") 
inputdir = repodir / "data/"

## Open the workbook and read main spreadsheet
Here we will load the workbook (_wb_):

In [3]:
wb = openpyxl.load_workbook(inputdir / "NSWFFRDv2.1.xlsx")

We will use the sheet names to read them. We need access to sheet 'Species data', 'References' and 'VA Groups':

In [4]:
species_data = wb['SpeciesData']
va_groups = wb['VA Groups']
references = wb['References']

In the case of the resprouting we need to decifer the information in columns _J_ ('Fireresponse'), _BN_ ('NFRR data') and _BO_ ('Additional fire response data').

We can use square brackets to refer to a column and then use python indices (starting with _0_ for the top row) to slice it. We use the property _value_ to show their stored content. 

In [5]:
print(species_data['BN'][1].value)
print(species_data['BN'][2].value)

NFRR data
8CN


Alternatively, we can use the function _cell_ to retrieve individual cells. Indices here follow the spreadsheet convention and start with _1_ for the top row. The header is in the second row, the first value is in the third row:

In [6]:
print(species_data.cell(row=2,column=10).value)
print(species_data.cell(row=3,column=10).value)

Fireresponse
S


### Example for one species:
Let's start checking the columns we need:

In [7]:
sp_col='A'
fireresponse_col='J'
comment_col='K'
NFRR_col='BN'
oref_col='BO'

print("%s / %s / %s / %s / %s" %
(species_data[sp_col][1].value,
species_data[fireresponse_col][1].value,
species_data[comment_col][1].value,
species_data[NFRR_col][1].value,
species_data[oref_col][1].value))

Current Scientific Name / Fireresponse / Comments on regeneration / NFRR data / Additional fire response data


Descriptions of these columns are found in the spreadsheet:
> Fire Response: S=seeder, R=resprouter. r=usually killed but sometimes resprouts, s=usually resprouts but sometimes killed; these may indicate a variable response seen by one observer, or a conflict between different observers (see comments column). When an equal number of references list the species as seeder or resprouter, this column reads as 'S/R' and details are given in the comments cloumn. Ideally fire response should be defined by mortality >70%=seeder, mortality <30%=resprouter [Gill & Bradstock, 1992].

> Comments on regeneration Notes on conflicting or variable regeneration information. May indicate response to various fire intensity levels; level of mortality seen; a variable fire response seen by one observer within the same species. Some species with distinct recordings for both resprouter and seeder have been listed seperately as 'fire sensitive/tolerant variety'.

> NFRR data: Fire response data from CSIRO National Fire Response Register, given in original format (see VA sheet for regeneration codes, Reference sheet for reference codes). Corrections made to these after checking original references in brown text: all brown (eg 5W) =species missed from reference; regen code brown (eg 5W) =corrected regen code; strikethrough (eg 5W) =species not in reference

> Additional Fire Response data: Fire response data from other references. See VA Groups sheet for regeneration codes; see References sheet for reference codes

Now select one record:

In [8]:
row_index=3

print("%s / %s / %s / %s / %s" %
(species_data[sp_col][row_index].value,
species_data[fireresponse_col][row_index].value,
species_data[comment_col][row_index].value,
species_data[NFRR_col][row_index].value,
species_data[oref_col][row_index].value))


Abutilon oxycarpum / S / None / 8BD / II-95b


The code '8BD' refers to 'Regeneration category' 8 and the reference code 'BD'. Similarly, 'II-95b' can be decomposed into regeneration categories IIb and reference 95. We will need to create python dictionaries to link this information across the spreadsheets.

### Look-up table for VA groups

For the VA groups we need colums A, B and C:

In [9]:
reg_cats=list()
for row in range(3,13):
    record={"NFRRcode":va_groups['A'][row].value,
    "othercode":va_groups['B'][row].value,
     "category":va_groups['C'][row].value
    }
    reg_cats.append(record)

In [10]:
reg_cats[0]

{'NFRRcode': 1,
 'othercode': 'I',
 'category': 'Killed by 100% scorch; seed storage on plant'}

Now if we need to lookup one value we can use a filter:

In [11]:
qry=1
for elem in filter(lambda x: x['NFRRcode'] == qry, reg_cats):
    print("NFRR code of %s refers to '%s'" % (qry, elem['category']))

NFRR code of 1 refers to 'Killed by 100% scorch; seed storage on plant'


### List of references 
We do the same for the list of references in spreadsheet 'References'.

We will need a function to create code for the references based on the list of authors and date:


In [12]:
r = re.compile("[A-Z][a-z]+")
def create_ref_code(x):
    if x.__contains__("personal communication"):
        y = x[0:x.find(" personal")].replace(",","")
        year = "pers. comm."
    elif x.__contains__("unpublished"):
        y = x[0:x.find("unpublished")].replace(",","")
        year = "unpub."
    else:
        y = x[0:x.find(")")].replace(",","")
        year = ''.join(re.findall("\d+", y))
    z = list(filter(r.match, y.split()))
    author = ' '.join(z)
    final_code =  "%s %s" % (author, year)
    if (len(final_code)>50):
        final_code=final_code[0:50]
    return(final_code)

val=references['T'][26].value.replace("(1) ","")
print(val)
create_ref_code(val)

Neal Enright and Pembe Ata, Grampians and Little Desert N.P.'s, Vic. (unpublished)


'Neal Enright Pembe Ata Grampians Little Desert Vic'

Now we check references of NFRR (notice that we will substitute number _1_ with capital _I_ in refcode to avoid problems with one reference (see below):

In [13]:
NFRR_refs=list()
for row in range(1,66):
    cite_text = references['T'][row].value.replace("(1) ","")
    cite_code = create_ref_code(cite_text) 
    record={"refcode": references['S'][row].value.replace("1","I"),
            "refstring": cite_code,#re.sub(r", [A-Z\.]+"," ",cite_code),
            "refinfo": cite_text
    }
    NFRR_refs.append(record)

In [14]:
NFRR_refs[56]

{'refcode': 'SA',
 'refstring': 'Carolyn Sandercoe Qld. unpub.',
 'refinfo': 'Carolyn Sandercoe, Qld. (unpublished)'}

In [15]:
NFRR_refs[6]["refcode"]

'BF'

In [16]:
qry="FOI"
for elem in filter(lambda x: x['refcode'] == qry, NFRR_refs):
    print("NFRR reference %s refers to '%s'" % (qry, elem['refinfo']))

NFRR reference FOI refers to 'Fox, J.E.D. (1985). Fire in Mulga: Studies at the margins. In: Fire ecology and management of Western Australian ecosystems. (ed: J.R. Ford). Western Australian Institute of Technology, report no. 14.'


We do the same for the additional references column:

In [17]:
other_refs=list()
for row in range(1,139):
    cite_text = references['D'][row].value
    cite_code = create_ref_code(cite_text) 
    if cite_code == "Benson 1985":
        cite_code = "Benson 1985b"
    record={"refcode": references['C'][row].value,
            "refstring": cite_code,
            "refinfo": cite_text
    }
    other_refs.append(record)

In [18]:
other_refs[9]

{'refcode': 10,
 'refstring': 'Wark White Robertson Marriott 1987',
 'refinfo': 'Wark, M.C., White, M.D., Robertson, D.J. and Marriott, P.F. (1987). Regeneration of heath and heath woodland in the north-eastern Otway Ranges following the wildfire of February 1983. Proc.Roy.Soc.Vic. 99, 51-88.'}

Check if there are duplicated references:

In [19]:
l1 = list()
for r in NFRR_refs: 
    l1.append(r["refstring"])
l2 = list()
for r in other_refs: 
    l2.append(r["refstring"])

for i in l1:
    if i in l2:
        print(i)


Benwell 1998
Molnar Fletcher Parsons 1989
Wark White Robertson Marriott 1987
Wark 1997


In [20]:
qry="Benwell 1998"
for elem in filter(lambda x: x['refstring'] == qry, NFRR_refs):
    print("Reference %s refers to '%s'" % (qry, elem['refinfo']))
for elem in filter(lambda x: x['refstring'] == qry, other_refs):
    print("Reference %s refers to '%s'" % (qry, elem['refinfo']))
    

Reference Benwell 1998 refers to 'Benwell A.S. (1998). Post-fire seedling recruitment in coastal heathland in relation to regeneration strategy and habitat. Aust. J. Bot. 46, 75-101.'
Reference Benwell 1998 refers to 'Benwell, A.S. (1998) Post-fire seedling recruitment in coastal heathland in relation to regeneration strategy and habitat. Aust. J. Bot. 46:75-101.  Data compiled by D.Keith (Keith, D.A., McCaw, W.L. & Whelan, R.J. (2002) pp. 199-237 in "Flammable Australia: The fire regimes and biodiversity of a continent" Ed. R.A. Bradstock, J.E. Williams & M.A. Gill. Cambridge University Press, Cambridge)'


### Colored and modified fonts

Some records include additional information coded in font color or strikethrough of values:

>  Corrections made to these after checking original references in brown text: all brown (eg 5W) =species missed from reference; regen code brown (eg 5W) =corrected regen code; strikethrough (eg 5W) =species not in reference

With Python we can query cell colors and strikethrough properties of the font to verify if information has been annotated, but not with enough detail to distinguish with part of the value is annotated and which is not. For example:

In [21]:
for row in [22,23,66,67,70,72]:
    if species_data['BN'][row].font.color == None:
        print("Cell %s has no colored font" % (row+1))
    else:
        print("Cell %s has colored font" % (row+1))
    if species_data['BN'][row].font.strike != None:
        print("Cell %s has strikethrough" % (row+1))

Cell 23 has colored font
Cell 24 has no colored font
Cell 67 has colored font
Cell 68 has no colored font
Cell 71 has no colored font
Cell 73 has colored font
Cell 73 has strikethrough


### Back to single species example



In [22]:
row_index=3

spname=species_data[sp_col][row_index].value
NFRRval=species_data[NFRR_col][row_index].value
otherval=species_data[oref_col][row_index].value

We can now use regular expressions to separate the different pieces of information, for NFRR:

In [23]:
import re
print(re.findall("\d+", NFRRval))
print(re.findall("[A-Z]+", NFRRval))

['8']
['BD']


Look up the group and references:

In [24]:
for qry in re.findall("\d+", NFRRval):
    for elem in filter(lambda x: x['NFRRcode'] == int(qry), reg_cats):
        print("NFRR code of %s refers to '%s'" % (qry, elem['category']))

for qry in re.findall("[A-Z]+", NFRRval):
    for elem in filter(lambda x: x['refcode'] == qry, NFRR_refs):
        print("Reference %s refers to '%s'" % (qry, elem['refinfo']))


NFRR code of 8 refers to 'Killed by 100% scorch; seed storage unknown'
Reference BD refers to 'Benson, D. and McDougall, L. (1997). Ecology of Sydney plant species part 5: Dicotyledon families Flacourtiaceae to Myrsinaceae. Cunninghamia 5, 330-544.'


Now we can do the same for the additional references:

In [25]:
for qry in re.findall("[IVX]+", otherval):
    for elem in filter(lambda x: x['othercode'] == qry, reg_cats):
        print("NFRR code of %s refers to '%s'" % (qry, elem['category']))

for qry in re.findall("\d+", otherval):
    for elem in filter(lambda x: x['refcode'] == int(qry), other_refs):
        print("Reference %s refers to '%s'" % (qry, elem['refinfo']))

NFRR code of II refers to 'Killed by 100% scorch; seed storage in soil'
Reference 95 refers to 'Hunter, J.T. (1998) Vegetation and floristics of the Washpool National Park Western Additions / Hunter, J.T. (2000) Vegetation and floristics of Mt Conobolas State Recreation Area / Hunter, J.T. (2000) Vegetation and floristics of Burnt Down Scrub Nature Reserve. Reports to NSW NPWS. a=personal observation b=personal communication c=referenced source'


### Example with multiple references per species
Now let's pick other examples: 

In [26]:
row_index=9
spname=species_data[sp_col][row_index].value
NFRRval=species_data[NFRR_col][row_index].value

For this species the values in the NFRR column are separated by blank spaces, representing different observations, but the ones in the parenthesis are picked up incorrectly as additional group references instead of references:

In [27]:
print(spname)
print(NFRRval)
print('separated as')
print(re.findall("\d+", NFRRval))
print(re.findall("[A-Z]+", NFRRval))

Acacia aneura
8HG 8FO 8FO(1) 9FO(1) 8PL
separated as
['8', '8', '8', '1', '9', '1', '8']
['HG', 'FO', 'FO', 'FO', 'PL']


Let's first replace "FO(1)" with "FOI", and then run the code again:

In [28]:
NFRRval=species_data[NFRR_col][row_index].value.replace('FO(1)','FOI')
print(re.findall("\d+", NFRRval))
print(re.findall("[A-Z]+", NFRRval))

['8', '8', '8', '9', '8']
['HG', 'FO', 'FOI', 'FOI', 'PL']


Now, if we want a list of unique references and VA groups, we can use the function _set_.

In [29]:
for qry in set(re.findall("\d+", NFRRval)):
    for elem in filter(lambda x: x['NFRRcode'] == int(qry), reg_cats):
        print("NFRR code of %s refers to '%s'" % (qry, elem['category']))

for qry in set(re.findall("[A-Z]+", NFRRval)):
    for elem in filter(lambda x: x['refcode'] == qry, NFRR_refs):
        print("Reference %s refers to '%s'" % (qry, elem['refstring']))


NFRR code of 9 refers to 'Survives 100% scorch; resprout location unknown'
NFRR code of 8 refers to 'Killed by 100% scorch; seed storage unknown'
Reference FO refers to 'Fox 1980'
Reference PL refers to 'Latz 1995'
Reference HG refers to 'Hodgkinson Griffin 1982'
Reference FOI refers to 'Fox 1985'


Alternative processing each combination of reference and value:

In [30]:
for item in NFRRval.split(" "):
    qry = re.findall("\d+", item)[0]
    group = list(filter(lambda x: x['NFRRcode'] == int(qry), reg_cats))[0]
    qry = re.findall("[A-Z]+", item)[0]
    ref = list(filter(lambda x: x['refcode'] == qry, NFRR_refs))[0]
    print("Reference %s... reports '%s'" % (ref['refinfo'][:30], group['category']))


Reference Hodgkinson, K.C. and Griffin, ... reports 'Killed by 100% scorch; seed storage unknown'
Reference Fox, J.E.D. (1980). Effects of... reports 'Killed by 100% scorch; seed storage unknown'
Reference Fox, J.E.D. (1985). Fire in Mu... reports 'Killed by 100% scorch; seed storage unknown'
Reference Fox, J.E.D. (1985). Fire in Mu... reports 'Survives 100% scorch; resprout location unknown'
Reference Latz, P.K. (1995) Bushfires an... reports 'Killed by 100% scorch; seed storage unknown'


Here we pick a different species with multiple values in the 'additional fire response data' column:

In [31]:
row_index=25
spname=species_data[sp_col][row_index].value
otherval=species_data[oref_col][row_index].value

The values with multiple references are separated by semicolons and blanks, so they are picked up just fine. 

In [32]:
if otherval is None:
    print('No other reference')
else:
    print(otherval)
    print(re.findall("\d+", otherval))
    print(re.findall("[A-Z]+", otherval))

II-69; II-100; II-134
['69', '100', '134']
['II', 'II', 'II']


## Format records for input in database

Using the code above it is possible to:
- create records for each original reference and add them to a central "reference list" table
- take each species (row) from the spreadsheet and add records for a "resprouting" table:
    - create one record based on the "Fire response" value citing NSWFFRDv2.1 as the main source and other references as original source
    - create one or more records for each original reference using the "Regeneration category" as input value
    
But first we need to connect to the database from python.

### Connect to database from Python

We use the library _psygopg2_ to connect to the database. We first read the database credential from a file with restricted read access:

In [33]:
from configparser import ConfigParser
import psycopg2
from psycopg2.extensions import AsIs

filename = repodir / 'secrets' / 'database.ini'
section = 'aws-lght-sl'

parser = ConfigParser()
parser.read(filename)

dbparams = {}
if parser.has_section(section):
    params = parser.items(section)
    for param in params:
        dbparams[param[0]] = param[1]
else:
    raise Exception('Section {0} not found in the {1} file'.format(section, filename))

Typically we will connect to the database, run a query and then disconnect:

### Add list of references

We can create a table for the list of references using this SQL code in our database (for example in _psql_ client):

We can then insert values into the database by substituting the corresponding values for the query:

For example, for the NSWFFRDv2.1 reference we can use:

In [34]:
insert_statement = "INSERT INTO litrev.ref_list(%s) values %s ON CONFLICT DO NOTHING"
record = { "ref_code" : "NSWFFRDv2.1",
          "ref_cite" : 
          "NSW Flora Fire Response Database. Version 2.1. February 2010 (last update May 2014)"}

print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**dbparams)

cur = conn.cursor()
cur.execute(insert_statement, (AsIs(','.join(record.keys())), tuple(record.values())))
conn.commit()
print("total number of lines updated: %s" % cur.rowcount)
cur.close()

if conn is not None:
    conn.close()
    print('Database connection closed.')

Connecting to the PostgreSQL database...
total number of lines updated: 0
Database connection closed.


Now we will add references from the list we read before (_NFRR_refs_). We will use the first 50 letters from the reference description as a _ref_code_ (we will be able to update that later to something more meaningful in the database), and create an _alt_code_ to identify the origin of the reference.

In [35]:
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**dbparams)

cur = conn.cursor()
insert_statement = "INSERT INTO litrev.ref_list(ref_code,ref_cite,alt_code) values (%s,%s,%s) ON CONFLICT DO NOTHING"
affected_rows=0

for item in NFRR_refs:
    cur.execute(insert_statement,
                (item['refstring'],
                item['refinfo'],
                'NSWFFRD-NFRR-ref-%s' % item['refcode']))
    affected_rows = affected_rows+cur.rowcount

conn.commit()
print("total number of lines updated: %s" % affected_rows)
cur.close()


if conn is not None:
    conn.close()
    print('Database connection closed.')

Connecting to the PostgreSQL database...
total number of lines updated: 0
Database connection closed.


We can then add the references from the _otherref_ dictionary:

In [36]:
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**dbparams)

cur = conn.cursor()
insert_statement = "INSERT INTO litrev.ref_list(ref_code,ref_cite,alt_code) values (%s,%s,%s) ON CONFLICT DO NOTHING"
affected_rows=0

for item in other_refs:
    cur.execute(insert_statement,
                (item['refstring'],
                item['refinfo'],
                'NSWFFRD-other-ref-%s' % item['refcode']))
    affected_rows = affected_rows+cur.rowcount

conn.commit()
print("total number of lines updated: %s" % affected_rows)
cur.close()


if conn is not None:
    conn.close()
    print('Database connection closed.')

Connecting to the PostgreSQL database...
total number of lines updated: 0
Database connection closed.


### Resprouting table in database

The structure of the _resprouting_ table still needs to be refined, here is a proposed structure: 

We will test how to insert some records into this table from the spreadsheet.

### Inserting the fire response values of NSWFFRDv2.1

We will create one record per species, using "NSWFFRDv2.1" as _main reference_, adding the reported references in the _original sources_ column, using a dictionary to translate the original value into the range of accepted values for the column, and including comments in the record.

First we will define the columns we need:

In [37]:
sp_col='A'
code_col='B'
fireresponse_col='J'
comment_col='K'
NFRR_col='BN'
oref_col='BO'

print("%s / %s / %s / %s/ %s / %s" %
(species_data[sp_col][1].value,
 species_data[code_col][1].value,
species_data[fireresponse_col][1].value,
species_data[comment_col][1].value,
species_data[NFRR_col][1].value,
species_data[oref_col][1].value))

Current Scientific Name / Species Code / Fireresponse / Comments on regeneration/ NFRR data / Additional fire response data


The Fire response variable can be translated using a simple dictionary of terms:

In [38]:
switcher={
    "S": "None",
    "Sr": "Few",
    "S/R": "Half",
    "Rs": "Most",
    "R": "All"
}

We will define a function to read one row and create a record:


In [39]:
def read_row_resprouting(sheet,row):
    sp_col='A'
    code_col='B'
    fireresponse_col='J'
    comment_col='K'
    NFRR_col='BN'
    oref_col='BO'

    switcher={
        "S": "None",
        "Sr": "Few",
        "S/R": "Half",
        "Rs": "Most",
        "R": "All"
    }
    
    varname=sheet[fireresponse_col][1].value
    
    spname=sheet[sp_col][row].value
    spcode=sheet[code_col][row].value
    varvalue=sheet[fireresponse_col][row].value
    origcomment=sheet[comment_col][row].value
    NFRRraw=sheet[NFRR_col][row].value
    otherraw=sheet[oref_col][row].value
    
    record={"main_source":"NSWFFRDv2.1", 
            "additional_notes":["Values reclassified following rules proposed by D. Keith et al.",
                                "Automatic extraction with python script"],
            "raw_value":[varname], 
            "original_sources":list(), 
            "original_notes":list()}

    if varvalue is not None:
        record["raw_value"].append(varvalue)
        transvalue=switcher.get(varvalue, "unknown")
        record["norm_value"]=transvalue
        if origcomment is not None:
            record["original_notes"].append(origcomment)
        if spcode is not None:
            record["species_code"]=spcode
        if spname is not None:
            record["species"]=spname
        if NFRRraw is not None:
            NFRRval=NFRRraw.replace('FO(1)','FOI')
            for qry in set(re.findall("\d+", NFRRval)):
                for elem in filter(lambda x: x['NFRRcode'] == int(qry), reg_cats):
                    record["raw_value"].append("NFRR VA group (%s): %s" % (qry, elem['category']))
            for qry in set(re.findall("[A-Z]+", NFRRval)):
                for elem in filter(lambda x: x['refcode'] == qry, NFRR_refs):
                    record["original_sources"].append(elem['refstring'])
            if sheet[NFRR_col][row].font.color is not None:
                record["original_notes"].append("NFRR record(s) might have been ammended")
            if sheet[NFRR_col][row].font.strike is not None:
                record["original_notes"].append("NFRR record(s) might have been discarded")
        if otherraw is not None:
            otherval=otherraw
            for qry in set(re.findall("[IVX]+", otherval)):
                for elem in filter(lambda x: x['othercode'] == qry, reg_cats):
                    record["raw_value"].append("Other VA group (%s): %s" % (qry, elem['category']))
            for qry in set(re.findall("\d+", otherval)):
                #record["original_sources"].append('NSWFFRD-other-ref-%s' % qry)
                for elem in filter(lambda x: x['refcode'] == qry, other_refs):
                    record["original_sources"].append(elem['refstring'])
        if len(record["original_sources"])==0:
            record.pop("original_sources")
        if len(record["original_notes"])==0:
            record.pop("original_notes")
        return(record)
    else:
        print("empty row")
        return(None)

We can combine all information into records using the code discussed above:

In [40]:
read_row_resprouting(species_data,14)

{'main_source': 'NSWFFRDv2.1',
 'additional_notes': ['Values reclassified following rules proposed by D. Keith et al.',
  'Automatic extraction with python script'],
 'raw_value': ['Fireresponse',
  'S',
  'NFRR VA group (2): Killed by 100% scorch; seed storage in soil',
  'NFRR VA group (8): Killed by 100% scorch; seed storage unknown',
  'Other VA group (VIII): Killed by 100% scorch; seed storage unknown'],
 'original_sources': ['Benwell 1998', 'Peter Byrne Beerwah Qld. unpub.'],
 'norm_value': 'None',
 'species_code': '7060',
 'species': 'Acacia baueri subsp. baueri'}

This is an improved version of the function to split the data into a summary value and single entries for each reference:

In [44]:


def read_row_resprouting(sheet,row):
    sp_col='A'
    code_col='B'
    fireresponse_col='J'
    comment_col='K'
    NFRR_col='BN'
    oref_col='BO'

    switcher={
        "S": "None",
        "Sr": "Few",
        "S/R": "Half",
        "Rs": "Most",
        "R": "All"
    }
    
    varname=sheet[fireresponse_col][1].value
    
    spname=sheet[sp_col][row].value
    spcode=sheet[code_col][row].value
    varvalue=sheet[fireresponse_col][row].value
    origcomment=sheet[comment_col][row].value
    NFRRraw=sheet[NFRR_col][row].value
    otherraw=sheet[oref_col][row].value

    records=list()
    
    
    record={"raw_value":[varname, varvalue], 
            "original_sources":list(), 
            "main_source":"NSWFFRDv2.1", 
            "additional_notes":["Values reclassified following rules proposed by D. Keith et al.",
                                "Automatic extraction with python script"],
           "weight_notes":["python-script import","default of 1"],
           "weight":1, 
            "original_notes":list()}

    if varvalue is not None:
        # we won't record the original or attempt transforming the value
        transvalue=switcher.get(varvalue, "Unknown")
        record["norm_value"]=transvalue
        if origcomment is not None:
            record["original_notes"].append(origcomment)
            record["additional_notes"].append("See comments in NSWFFRDv2.1 entry")
        if spcode is not None:
            record["species_code"]=spcode
        if spname is not None:
            record["species"]=spname
        newrecord=copy.deepcopy(record)
        if len(newrecord["original_sources"])==0:
            newrecord.pop("original_sources")
        if len(newrecord["original_notes"])==0:
            newrecord.pop("original_notes")
        newrecord["weight"]=10
        newrecord["weight_notes"][1]="default of 10 for summary value"
        records.append(newrecord)
        if NFRRraw is not None:
            NFRRval=NFRRraw.replace('FO(1)','FOI')
            NFRRval=NFRRval.strip(" ")
            for item in NFRRval.split(" "):
                newrecord=copy.deepcopy(record)
                newrecord["additional_notes"].append("Raw values extracted from notes/comments in NSWFFRDBv2.1")
                newrecord["raw_value"].append("Overall value of fireresponse column is %s" % varvalue)
                qry = re.findall("\d+", item)
                if len(qry)==1:
                    group = list(filter(lambda x: x['NFRRcode'] == int(qry[0]), reg_cats))
                    if len(group)==1:
                        newrecord["raw_value"][0]=("VA Group %s" % qry[0])
                        newrecord["raw_value"][1]=group[0]['category']
                    if qry[0] in ('1','2','3','8'):
                        newrecord["norm_value"] = 'None'
                    elif qry[0] in ('4','5','6','7','9','11'):
                        newrecord["norm_value"] = 'All'
                    else:
                        newrecord["norm_value"] = 'Unknown'
                qry = re.findall("[A-Z]+", item)
                if len(qry)==1:
                    ref = list(filter(lambda x: x['refcode'] == qry[0], NFRR_refs))
                    if len(ref)==1:
                        newrecord["original_sources"].append(ref[0]['refstring'])
                if species_data[NFRR_col][row].font.color is not None:
                    newrecord["additional_notes"].append("NFRR record(s) might have been ammended in NSWFFRDv2.1")
                if species_data[NFRR_col][row].font.strike is not None:
                    newrecord["additional_notes"].append("NFRR record(s) might have been discarded in NSWFFRDv2.1")
                if len(newrecord["original_sources"])==0:
                    newrecord.pop("original_sources")
                if len(newrecord["original_notes"])==0:
                    newrecord.pop("original_notes")
                records.append(newrecord)
        if otherraw is not None:
            otherval=otherraw
            for item in otherval.split(" "):
                newrecord=copy.deepcopy(record)
                newrecord["additional_notes"].append("Raw values extracted from notes/comments in NSWFFRDBv2.1")
                newrecord["raw_value"].append("Overall value of fireresponse column is %s" % varvalue)
                qry = re.findall("[IVX]+", item)
                if len(qry)==1:
                    group = list(filter(lambda x: x['othercode'] == qry[0], reg_cats))[0]
                    newrecord["raw_value"][0]=("VA Group %s" % qry[0])
                    newrecord["raw_value"][1]=group['category']
                    if qry[0] in ('I','II','III','VIII'):
                        newrecord["norm_value"] = 'None'
                    elif qry[0] in ('IV','V','VI','VII','IX','XI'):
                        newrecord["norm_value"] = 'All'
                    else:
                        newrecord["norm_value"] = 'Unknown'
                qry = re.findall("\d+", item)
                if len(qry)==1:
                    ref = list(filter(lambda x: x['refcode'] == int(qry[0]), other_refs))[0]
                    newrecord["original_sources"].append(ref['refstring'])
                if len(newrecord["original_sources"])==0:
                    newrecord.pop("original_sources")
                if len(newrecord["original_notes"])==0:
                    newrecord.pop("original_notes")
                records.append(newrecord)

        return(records)
    else:
        print("empty row")
        return(None)

In [45]:
read_row_resprouting(species_data,14)

[{'raw_value': ['Fireresponse', 'S'],
  'main_source': 'NSWFFRDv2.1',
  'additional_notes': ['Values reclassified following rules proposed by D. Keith et al.',
   'Automatic extraction with python script'],
  'weight_notes': ['python-script import', 'default of 10 for summary value'],
  'weight': 10,
  'norm_value': 'None',
  'species_code': '7060',
  'species': 'Acacia baueri subsp. baueri'},
 {'raw_value': ['VA Group 2',
   'Killed by 100% scorch; seed storage in soil',
   'Overall value of fireresponse column is S'],
  'original_sources': ['Benwell 1998'],
  'main_source': 'NSWFFRDv2.1',
  'additional_notes': ['Values reclassified following rules proposed by D. Keith et al.',
   'Automatic extraction with python script',
   'Raw values extracted from notes/comments in NSWFFRDBv2.1'],
  'weight_notes': ['python-script import', 'default of 1'],
  'weight': 1,
  'norm_value': 'None',
  'species_code': '7060',
  'species': 'Acacia baueri subsp. baueri'},
 {'raw_value': ['VA Group 8',
  

In [49]:
species_data[fireresponse_col][3].value

'S'

In [54]:
row_min = 2
row_max = species_data.max_row
varname=species_data[fireresponse_col][1].value

print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**dbparams)
cur = conn.cursor()
affected_rows=0
insert_statement = 'insert into litrev.surv1 (%s) values %s ON CONFLICT DO NOTHING'

records=list()
for row in range(row_min,row_max):
    rr = read_row_resprouting(species_data,row)
    if rr is not None:
        records.extend(rr)
    if (((row-row_min) % 250) == 0 and len(records)>10) or (row==(row_max-1)):
        print("total of %s records prepared" % len(records)) 
        for record in records: 
            cur.execute(insert_statement, (AsIs(','.join(record.keys())), tuple(record.values())))
            affected_rows = affected_rows+cur.rowcount
        records.clear()
        conn.commit()
        print("total number of lines updated: %s" % affected_rows)

cur.close()
if conn is not None:
    conn.close()
    print('Database connection closed.')     


Connecting to the PostgreSQL database...
total of 1034 records prepared
total number of lines updated: 1034
total of 936 records prepared
total number of lines updated: 1970
total of 890 records prepared
total number of lines updated: 2860
total of 951 records prepared
total number of lines updated: 3811
total of 890 records prepared
total number of lines updated: 4701
total of 885 records prepared
total number of lines updated: 5586
total of 1071 records prepared
total number of lines updated: 6657
total of 1156 records prepared
total number of lines updated: 7813
total of 915 records prepared
total number of lines updated: 8728
total of 845 records prepared
total number of lines updated: 9573
total of 806 records prepared
total number of lines updated: 10379
total of 860 records prepared
total number of lines updated: 11239
empty row
total of 324 records prepared
total number of lines updated: 11563
Database connection closed.


This is somehow slow, but it works, and all the records are in the database.