# Read juvenile periods data from NSWFRD 2014
Read the spreadsheet from NSW Flora Fire response database and extract information for the time to first flowering after fire (primary and secondary juvenile periods for recruits and resprouters respectively).
We will use the _openpyxl_ library in ***python***

In [1]:
import openpyxl
from pathlib import Path
import os
import re
import copy

We need to define a path to locate the documents relative to the current repository directory

In [2]:
repodir = Path("../..") 
inputdir = repodir / "data/"

## Open the workbook and read main spreadsheet
Here we will load the workbook (_wb_):

In [3]:
wb = openpyxl.load_workbook(inputdir / "NSWFFRDv2.1.xlsx")

We will use the sheet names to read them. We need access to sheet 'Species data', 'References' and 'VA Groups':

In [4]:
species_data = wb['SpeciesData']
va_groups = wb['VA Groups']
references = wb['References']
column_notes = wb['Notes'] 

In the case of time to first flowering we need to read data from columns _Z_ ('Primary juvenile period'), and _AA_ ('Secondary juvenile period').

We can use square brackets to refer to a column and then use python indices (starting with _0_ for the top row) to slice it. We use the property _value_ to show their stored content. 

In [5]:
print(species_data['Z'][1].value)
print(species_data['Z'][5].value)

Primary juvenile period
3


Alternatively, we can use the function _cell_ to retrieve individual cells. Indices here follow the spreadsheet convention and start with _1_ for the top row. The header is in the second row, the first value is in the third row:

In [6]:
print(species_data.cell(row=2,column=27).value)
print(species_data.cell(row=14,column=27).value)

Secondary juvenile period
3


### Example for one species:
Let's start checking the columns we need:

In [7]:
sp_col='A'
spcode_col='B'
target_cols = {'repr3':'Z', 'repr3a':'AA', 'grow1':'AD', 'repr4':None, 'surv5':'AE', 'surv6':None, 'surv7':'AF'}

print("%s (%s) / %s / %s / %s / %s " %
(species_data[sp_col][1].value,
 species_data[spcode_col][1].value,
species_data[target_cols['repr3a']][1].value,
species_data[target_cols['grow1']][1].value,
species_data[target_cols['surv5']][1].value,
species_data[target_cols['surv7']][1].value))

Current Scientific Name (Species Code) / Secondary juvenile period / Fire tolerance / Life span / Seed-bank longevity 


Descriptions of these columns are found in the spreadsheet:

In [8]:
#26 27 31 32
#for k in range(10,50):
for k in (26,27,31,32):
    print(" - %s ) *%s*" % (k,column_notes.cell(row=k,column=2).value))
    print("\t%s" % column_notes.cell(row=k,column=3).value)

 - 26 ) *Primary Juvenile Period*
	Plant age at first flowering. May be a single figure or range of ages, may also be an indication (e.g. >5 implies that at 5 years post-fire flowering had still not been observed). May give the percentage of the population observed to flower at a particular time post-fire
 - 27 ) *Secondary juvenile period*
	Post-fire age at which flowering first occurs from resprouting material
 - 31 ) *Life span*
	Plant age at which senescence is expected to occur. In many cases this is a fairly broad range, based on plant life form & structure. May be post-fire age at which a species is no longer found in a community where it was known or assumed to occur in.
 - 32 ) *Seedbank Longeiveity*
	Number of years that a stored seed-bank is expected to stay viable.hl= half life value reported


Now select one record:

In [9]:
row_index=98

print("%s (%s) / %s / %s / %s / %s " %
(species_data[sp_col][row_index].value,
 species_data[spcode_col][row_index].value,
species_data[target_cols['repr3a']][row_index].value,
species_data[target_cols['grow1']][row_index].value,
species_data[target_cols['surv5']][row_index].value,
species_data[target_cols['surv7']][row_index].value))

Acacia mucronata subsp. longifolia (10058) / 1->4 / None / None / None 


#### Dealing with hyperlinks

This cell has a hyperlink:

In [10]:
type(species_data[target_cols['repr3']][row_index].hyperlink)
# same as 
# type(species_data[secondary_jp_col][row_index].hyperlink)

openpyxl.worksheet.hyperlink.Hyperlink

If the cell is a hyperlink it will have a value to "display" and will point to a "location" within the workbook: 

In [11]:
species_data[target_cols['repr3']][row_index].hyperlink.display

'References!C49'

In [12]:
# This will fail if there is no hyperlink 
print(species_data[target_cols['repr3']][row_index].hyperlink.location)

References!C49


Let's see the value of this reference:

In [13]:
hlink = species_data[target_cols['repr3']][row_index].hyperlink.location
hlink = hlink.split("!")

This gives the name of the target sheet and the corresponding cell. We need to read the cell to its right side (add one to the column number) to get the information we need.

In [14]:
ref = wb[hlink[0]]
print("Cell value is :: " + str(ref[hlink[1]].value))
nlink = ref.cell(row=ref[hlink[1]].row,column=ref[hlink[1]].col_idx + 1)

print("Reference data is :: " + nlink.value) 


Cell value is :: 48
Reference data is :: Wark, M.C. (1997) Regeneration of some forest and gully communities in the Angahook-Lorne State Park (north-eastern Otway Ranges) 1-10 years after the wildfire of February 1983. Proc.Roy.Soc.Vic. 109, 7-36.


If there is no hyperlink, it will result in NoneType

In [15]:
type(species_data[target_cols['repr3']][row_index-1].hyperlink)

NoneType

The secondary juvenile period might point to different sets of references

In [16]:
hlink = species_data[target_cols['repr3a']][row_index].hyperlink.location
hlink = hlink.split("!")
ref = wb[hlink[0]]
print("Cell value is :: " + str(ref[hlink[1]].value))
nlink = ref.cell(row=ref[hlink[1]].row,column=ref[hlink[1]].col_idx + 1)

print("Reference data is :: " + nlink.value) 

Cell value is :: 67
Reference data is :: Tolhurst, KG & Oswin, DA (1992) Effects of spring and autumn low intensity fire on understorey vegetation in open eucalypt forest in west-central Victoria. In Tolhurts & Flinn (Eds) Ecological inpacts of fuel reduction burning in dry sclerophyll forest. First progress report. Research report 349. Forest Research, Dept Conservation & Environment, Victoria


### List of references 
We need to prepare list of references from spreadsheet 'References'.

There are three sets of references:
- the  "normal" references in columns C and D (pink)
- the  "Recovery Plan / Regional Forest Agreement Report" references in columns N, O, and P (blue)
- the  "NFRR" references in columns S and T (lila)

Normal and NFRR references are identified by a simple two-cipher or -letter code and reference description, we will use a function to create a more descriptive reference code for the references based on the list of authors and date.

For Recovery plans and Regional Forest Agreement Reports, we will use the species or region as reference code.


In [17]:
r = re.compile("[A-Z][a-z]+")
def create_ref_code(x):
    
    if x.__contains__("personal communication"):
        y = x[0:x.find(" personal")].replace(",","")
        year = "pers. comm."
    elif x.__contains__("unpublished"):
        y = x[0:x.find("unpublished")].replace(",","")
        year = "unpub."
    else:
        y = x[0:x.find(")")].replace(",","")
        year = ''.join(re.findall("\d+", y))
    z = list(filter(r.match, y.split()))
    author = ' '.join(z)
    final_code =  "%s %s" % (author, year)
    if (len(final_code)>50):
        final_code=final_code[0:50]
    return(final_code)

def create_ref_code_RP(x):
    if x.__contains__("^RFA"):
        final_code = x
    else:
        final_code = "RP %s" % x
    if (len(final_code)>50):
        final_code=final_code[0:50]
    return(final_code)


val=references['O'][26].value.replace("(1) ","")
print(val)
create_ref_code_RP(val)

Asterolasia elegans


'RP Asterolasia elegans'

Now we check references of NFRR (notice that we will substitute number _1_ with capital _I_ in refcode to avoid problems with one reference (see below):

In [18]:
NFRR_refs=list()
for row in range(1,66):
    cite_text = references['T'][row].value.replace("(1) ","")
    cite_code = create_ref_code(cite_text) 
    record={"refcode": references['S'][row].value.replace("1","I"),
            "refstring": cite_code,#re.sub(r", [A-Z\.]+"," ",cite_code),
            "refinfo": cite_text
    }
    NFRR_refs.append(record)

In [19]:
NFRR_refs[64]

{'refcode': 'WO',
 'refstring': 'Mike Wouters Horsham Vic. unpub.',
 'refinfo': 'Mike Wouters, Horsham, Vic. (unpublished)'}

In [20]:
NFRR_refs[6]["refcode"]

'BF'

In [21]:
qry="FOI"
for elem in filter(lambda x: x['refcode'] == qry, NFRR_refs):
    print("NFRR reference %s refers to '%s'" % (qry, elem['refinfo']))

NFRR reference FOI refers to 'Fox, J.E.D. (1985). Fire in Mulga: Studies at the margins. In: Fire ecology and management of Western Australian ecosystems. (ed: J.R. Ford). Western Australian Institute of Technology, report no. 14.'


We do the same for the "normal" references column:

In [22]:
other_refs=list()
for row in range(1,139):
    cite_text = references['D'][row].value
    cite_code = create_ref_code(cite_text) 
    if cite_code == "Benson 1985":
        cite_code = "Benson 1985b"
    record={"refcode": references['C'][row].value,
            "refstring": cite_code,
            "refinfo": cite_text
    }
    other_refs.append(record)

In [23]:
other_refs[137]

{'refcode': 138,
 'refstring': 'Kubiak 2009',
 'refinfo': 'Kubiak, P.J. (2009). Fire responses of bushland plants after the January 1994 wildfires in northern Sydney'}

Now the recovery plan references:

In [24]:
rp_refs=list()
for row in range(1,46):
    cite_code = create_ref_code_RP(references['O'][row].value) 
    cite_text = "%s. %s" % (cite_code, references['P'][row].value)
    record={"refcode": references['N'][row].value,
            "refstring": cite_code,
            "refinfo": cite_text
    }
    rp_refs.append(record)

Check if there are duplicated references:

In [25]:
l1 = list()
for r in NFRR_refs: 
    l1.append(r["refstring"])
l2 = list()
for r in other_refs: 
    l2.append(r["refstring"])

for i in l1:
    if i in l2:
        print(i)


Benwell 1998
Molnar Fletcher Parsons 1989
Wark White Robertson Marriott 1987
Wark 1997


In [26]:
qry="Benwell 1998"
for elem in filter(lambda x: x['refstring'] == qry, NFRR_refs):
    print("Reference %s refers to '%s'" % (qry, elem['refinfo']))
for elem in filter(lambda x: x['refstring'] == qry, other_refs):
    print("Reference %s refers to '%s'" % (qry, elem['refinfo']))
    

Reference Benwell 1998 refers to 'Benwell A.S. (1998). Post-fire seedling recruitment in coastal heathland in relation to regeneration strategy and habitat. Aust. J. Bot. 46, 75-101.'
Reference Benwell 1998 refers to 'Benwell, A.S. (1998) Post-fire seedling recruitment in coastal heathland in relation to regeneration strategy and habitat. Aust. J. Bot. 46:75-101.  Data compiled by D.Keith (Keith, D.A., McCaw, W.L. & Whelan, R.J. (2002) pp. 199-237 in "Flammable Australia: The fire regimes and biodiversity of a continent" Ed. R.A. Bradstock, J.E. Williams & M.A. Gill. Cambridge University Press, Cambridge)'


#### Matching references from hyperlinks
We will create a function to translate hyperlinks to a reference:

In [27]:
def extract_link(target):
    p=re.compile('[,;\s]+')
    assert (target.hyperlink is not None),"Only works when cell has a hyperlink!"
    hlink = target.hyperlink.location
    hlink = hlink.split("!")
    if (hlink[0] != "References"): #"Expecting hyperlink to 'References' sheet"
        return None
    else:
        column=hlink[1][0:1]
        cell=hlink[1].strip('\\')
        refcodes=references[cell].value
        refinfo=list()
        if refcodes is not None:
            if isinstance(refcodes,int):
                for elem in filter(lambda x: x['refcode'] == refcodes, other_refs):
                    refinfo.append(elem['refstring'])
            else:
                for refcode in p.split(refcodes):
                    refcode=refcode.strip(" ")
                    refcode=re.sub("[abc]$","",refcode)
                    if refcode.isnumeric():
                        for elem in filter(lambda x: x['refcode'] == int(refcode), other_refs):
                            refinfo.append(elem['refstring'])
                    else:
                        for elem in filter(lambda x: x['refcode'] == refcode, rp_refs):
                            refinfo.append(elem['refstring'])
                        for elem in filter(lambda x: x['refcode'] == refcode, NFRR_refs):
                            refinfo.append(elem['refstring'])
            return (refcodes,refinfo)
        else:
            return None

            

We can test this function for several rows:

In [28]:
for row_index in (98,99,100,128,206, 1422,1421):
    spname=species_data[sp_col][row_index].value
    pjp=species_data[target_cols['repr3a']][row_index]
    #sjp=species_data[secondary_jp_col][row_index].value

    raw=pjp.value
    #val=extract_value(species_data[primary_jp_col][row_index])
    if (pjp.hyperlink is not None):
        ref=extract_link(pjp)
        if ref is not None:
            print("%s :: [%s] // %s" % (row_index,raw,ref[1]))
        else:
            print("%s :: [%s] " % (row_index,raw))            
    else:
        print("%s :: [%s] " % (row_index,raw))

98 :: [1->4] // ['Tolhurst Oswin 1992']
99 :: [None] 
100 :: [None] 
128 :: [None] 
206 :: [None] 
1422 :: [0.5] // ['Benson McDougall Ecology Sydney Plant Species Cunn']
1421 :: [None] 


#### Processing strings with and without references
The value of the cell might contain one or multiple values, and sometimes references are given in parentheses.

In [29]:
def extract_value(target,varname):
    assert (target.value is not None),"Only works whith non-empty cells"
    p=re.compile('[,;\s]+')
    val = target.value
    note = list()
    if target.font.color != None:
        note.append('Cell color index %s' % target.font.color.indexed)
    if target.font.strike != None:
        note.append('Cell text has strikethrough')
  
    rslts = list()
    if isinstance(val,int) or isinstance(val,float):
        record={"raw_value":[varname,str(val)],"best":val,"main_source":"NSWFFRDv2.1"}
        if len(note)>0:
            record["original_notes"]=note 
        rslts.append(record)
    else:
        for w in val.split('/'):
            newnote=copy.deepcopy(note)
            w=w.strip(" ")
            record={"raw_value":[varname,w],"main_source":"NSWFFRDv2.1"}
            if w.find("?")>0:
                newnote.append("uncertain")
                w=w.replace("?","")
            end=len(w)
            if w.find("(")>0:
                record["original_sources"]=list()
                for refs in re.findall("\(([\w\d, ]+)\)",w):
                    for ref in p.split(refs):
                        ref=ref.strip(" ")
                        ref=re.sub("[abc]$","",ref)
                        if ref.isnumeric():
                            for elem in filter(lambda x: x['refcode'] == int(ref), other_refs):
                                record["original_sources"].append(elem['refstring'])
                        else:
                            for elem in filter(lambda x: x['refcode'] == ref, rp_refs):
                                record["original_sources"].append(elem['refstring'])
                            for elem in filter(lambda x: x['refcode'] == ref, NFRR_refs):
                                record["original_sources"].append(elem['refstring'])
                end=w.index("(")
            sw=w[0:end].strip(" ")
            if sw.isnumeric():
                record["best"]=sw
            elif sw.find("-")>0:
                val = sw.split("-")
                if val[0].isnumeric():
                    record["lower"]=val[0]
                if val[1].isnumeric():
                    record["upper"]=val[1]
            elif sw.find(">")==0:
                val=sw[1:]
                if val.isnumeric():
                    record["lower"]=val
            elif sw.find("<")==0:
                val=sw[1:]
                if val.isnumeric():
                    record["upper"]=val
            else:
                val=sw    
            
            if len(newnote)>0:
                record["original_notes"]=newnote  
            rslts.append(record)
    return(rslts)


In [30]:
varname=species_data[target_cols['repr3']][1].value
for row_index in (98,99,100,206, 1422,1421):
    target=species_data[target_cols['repr3']][row_index]
    if (target.hyperlink is not None):
        ref=extract_link(target)
    else:
        ref=None
    if (target.value is not None):
        spname=species_data[sp_col][row_index].value
        spcode=species_data[spcode_col][row_index].value
        rec=extract_value(target,varname)
        for record in rec:
            record["main_source"]="NSWFFRDv2.1"
            record["species"]=spname
            record["species_code"]=spcode
            if 'original_sources' not in record and ref is not None:
                record['original_sources'] = ref[1]
            print("%s ::  %s" % (row_index,record))
           
    else:
        print("%s is empty " % (row_index))

98 ::  {'raw_value': ['Primary juvenile period', '3'], 'best': 3, 'main_source': 'NSWFFRDv2.1', 'original_notes': ['Cell color index 12'], 'species': 'Acacia mucronata subsp. longifolia', 'species_code': '10058', 'original_sources': ['Wark 1997']}
99 ::  {'raw_value': ['Primary juvenile period', 'c. 3'], 'main_source': 'NSWFFRDv2.1', 'original_notes': ['Cell color index 12'], 'species': 'Acacia murrayana', 'species_code': '3832', 'original_sources': ['Hodgkinson Griffin 1982']}
100 ::  {'raw_value': ['Primary juvenile period', '2 (10)'], 'main_source': 'NSWFFRDv2.1', 'original_sources': ['Wark White Robertson Marriott 1987'], 'best': '2', 'species': 'Acacia myrtifolia', 'species_code': '3834'}
100 ::  {'raw_value': ['Primary juvenile period', '2.5 (1b)'], 'main_source': 'NSWFFRDv2.1', 'original_sources': ['Benson McDougall Ecology Sydney Plant Species Cunn'], 'species': 'Acacia myrtifolia', 'species_code': '3834'}
100 ::  {'raw_value': ['Primary juvenile period', '3 (9, 48)'], 'main_so

## Format records for input in database

Using the code above it is possible to:
- create records for each original reference and add them to a central "reference list" table
- take each species (row) from the spreadsheet and add records for a "resprouting" table:
    - create one record based on the "Fire response" value citing NSWFFRDv2.1 as the main source and other references as original source
    - create one or more records for each original reference using the "Regeneration category" as input value
    
But first we need to connect to the database from python.

### Connect to database from Python

We use the library _psygopg2_ to connect to the database. We first read the database credential from a file with restricted read access:

In [31]:
from configparser import ConfigParser
import psycopg2
from psycopg2.extensions import AsIs

filename = repodir / 'secrets' / 'database.ini'
section = 'aws-lght-sl'

parser = ConfigParser()
parser.read(filename)

dbparams = {}
if parser.has_section(section):
    params = parser.items(section)
    for param in params:
        dbparams[param[0]] = param[1]
else:
    raise Exception('Section {0} not found in the {1} file'.format(section, filename))

Typically we will connect to the database, run a query and then disconnect:

### Add list of references

We can create a table for the list of references using this SQL code in our database (for example in _psql_ client):

We can then insert values into the database by substituting the corresponding values for the query:

For example, for the NSWFFRDv2.1 reference we can use:

In [32]:
insert_statement = "INSERT INTO litrev.ref_list(%s) values %s ON CONFLICT DO NOTHING"
record = { "ref_code" : "NSWFFRDv2.1",
          "ref_cite" : 
          "NSW Flora Fire Response Database. Version 2.1. February 2010 (last update May 2014)"}

print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**dbparams)

cur = conn.cursor()
cur.execute(insert_statement, (AsIs(','.join(record.keys())), tuple(record.values())))
conn.commit()
print("total number of lines updated: %s" % cur.rowcount)
cur.close()

if conn is not None:
    conn.close()
    print('Database connection closed.')

Connecting to the PostgreSQL database...
total number of lines updated: 0
Database connection closed.


Now we will add references from the list we read before (_NFRR_refs_). We will use the first 50 letters from the reference description as a _ref_code_ (we will be able to update that later to something more meaningful in the database), and create an _alt_code_ to identify the origin of the reference.

In [33]:
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**dbparams)

cur = conn.cursor()
insert_statement = "INSERT INTO litrev.ref_list(ref_code,ref_cite,alt_code) values (%s,%s,%s) ON CONFLICT DO NOTHING"
affected_rows=0

for item in NFRR_refs:
    cur.execute(insert_statement,
                (item['refstring'],
                item['refinfo'],
                'NSWFFRD-NFRR-ref-%s' % item['refcode']))
    affected_rows = affected_rows+cur.rowcount

conn.commit()
print("total number of lines updated: %s" % affected_rows)
cur.close()


if conn is not None:
    conn.close()
    print('Database connection closed.')

Connecting to the PostgreSQL database...
total number of lines updated: 0
Database connection closed.


We can then add the references from the _otherref_ dictionary:

In [34]:
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**dbparams)

cur = conn.cursor()
insert_statement = "INSERT INTO litrev.ref_list(ref_code,ref_cite,alt_code) values (%s,%s,%s) ON CONFLICT DO NOTHING"
affected_rows=0

for item in other_refs:
    cur.execute(insert_statement,
                (item['refstring'],
                item['refinfo'],
                'NSWFFRD-other-ref-%s' % item['refcode']))
    affected_rows = affected_rows+cur.rowcount

conn.commit()
print("total number of lines updated: %s" % affected_rows)
cur.close()


if conn is not None:
    conn.close()
    print('Database connection closed.')

Connecting to the PostgreSQL database...
total number of lines updated: 0
Database connection closed.


And the references from the Recovery Plans...

In [35]:
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**dbparams)

cur = conn.cursor()
insert_statement = "INSERT INTO litrev.ref_list(ref_code,ref_cite,alt_code) values (%s,%s,%s) ON CONFLICT DO NOTHING"
affected_rows=0

for item in rp_refs:
    cur.execute(insert_statement,
                (item['refstring'],
                item['refinfo'],
                'NSWFFRD-RP-ref-%s' % item['refcode']))
    affected_rows = affected_rows+cur.rowcount

conn.commit()
print("total number of lines updated: %s" % affected_rows)
cur.close()


if conn is not None:
    conn.close()
    print('Database connection closed.')

Connecting to the PostgreSQL database...
total number of lines updated: 0
Database connection closed.


### 'First flowering' table in database

The structure of the _firstflower_ table still needs to be refined, here is a proposed structure: 

We will test how to insert some records into this table from the spreadsheet.

### Inserting the 'primary juvenile period' values of NSWFFRDv2.1

We will create multiple records per species, using "NSWFFRDv2.1" as _main reference_, adding the reported references in the _original sources_ column.

First we will define the columns we need:

In [36]:
sp_col='A'
spcode_col='B'
target_cols = {'repr3':'Z', 'repr3a':'AA', 'grow1':'AD', 'repr4':None, 'surv5':'AE', 'surv6':None, 'surv7':'AF'}

print("%s (%s) / %s / %s / %s / %s " %
(species_data[sp_col][1].value,
 species_data[spcode_col][1].value,
species_data[target_cols['repr3a']][1].value,
species_data[target_cols['grow1']][1].value,
species_data[target_cols['surv5']][1].value,
species_data[target_cols['surv7']][1].value))

Current Scientific Name (Species Code) / Secondary juvenile period / Fire tolerance / Life span / Seed-bank longevity 


We will use two functions to read row values and hyperlinks to create one or multiple records from each entry.

We wrap all this into a single function call for each row:


In [37]:
def create_record(spreadsheet,target_col,row_index):
    records = list()
    target=spreadsheet[target_col][row_index]
    if (target.hyperlink is not None):
        ref=extract_link(target)
    else:
        ref=None
    if (target.value is not None):
        spname=spreadsheet[sp_col][row_index].value
        spcode=spreadsheet[spcode_col][row_index].value
        varname=spreadsheet[target_col][1].value
        rec=extract_value(target,varname)
        for record in rec:
            record["main_source"]="NSWFFRDv2.1"
            record["species"]=spname
            record["species_code"]=spcode
            record["weight"]=1
            record["weight_notes"]=['automatic assignment of weight by python script','default value of 1']
            if 'original_sources' not in record and ref is not None:
                record['original_sources'] = ref[1]
            records.append(record)
    return(records)

In [38]:
x=create_record(species_data,target_cols['repr3'],100)
len(x)
x

[{'raw_value': ['Primary juvenile period', '2 (10)'],
  'main_source': 'NSWFFRDv2.1',
  'original_sources': ['Wark White Robertson Marriott 1987'],
  'best': '2',
  'species': 'Acacia myrtifolia',
  'species_code': '3834',
  'weight': 1,
  'weight_notes': ['automatic assignment of weight by python script',
   'default value of 1']},
 {'raw_value': ['Primary juvenile period', '2.5 (1b)'],
  'main_source': 'NSWFFRDv2.1',
  'original_sources': ['Benson McDougall Ecology Sydney Plant Species Cunn'],
  'species': 'Acacia myrtifolia',
  'species_code': '3834',
  'weight': 1,
  'weight_notes': ['automatic assignment of weight by python script',
   'default value of 1']},
 {'raw_value': ['Primary juvenile period', '3 (9, 48)'],
  'main_source': 'NSWFFRDv2.1',
  'original_sources': ['Keith David pers. comm.', 'Wark 1997'],
  'best': '3',
  'species': 'Acacia myrtifolia',
  'species_code': '3834',
  'weight': 1,
  'weight_notes': ['automatic assignment of weight by python script',
   'default va

Now we will read through the spreadsheet and prepare records

In [43]:
row_min = 2
row_max = species_data.max_row

print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**dbparams)
cur = conn.cursor()
affected_rows=0

traits = target_cols.keys()
to_do = ('surv7',)
completed_traits = ( 'repr3', 'repr3a', 'grow1', 'repr4',  'surv6','surv5')
for trait in traits:
    if target_cols[trait] is None:
        continue
    if trait in completed_traits:
        continue
    print(trait)
    varname=species_data[target_cols[trait]][1].value
    insert_statement = 'insert into litrev.{trait} (%s) values %s ON CONFLICT DO NOTHING'.format(trait=trait)

    records=list()
    for row in range(row_min,row_max):
        rr = create_record(species_data,target_cols[trait],row)
        if len(rr) > 0 :
            records.extend(rr)
        if (((row-row_min) % 250) == 0 and len(records)>10) or (row==(row_max-1)):
            print("total of %s records prepared" % len(records)) 
            for record in records: 
                #print(cur.mogrify(insert_statement, (AsIs(','.join(record.keys())), tuple(record.values()))))
                cur.execute(insert_statement, (AsIs(','.join(record.keys())), tuple(record.values())))
                affected_rows = affected_rows+cur.rowcount
            records.clear()
            conn.commit()
            print("total number of lines updated: %s" % affected_rows)

cur.close()
if conn is not None:
    conn.close()
    print('Database connection closed.')     


Connecting to the PostgreSQL database...
surv7
total of 12 records prepared
total number of lines updated: 12
total of 14 records prepared
total number of lines updated: 26
total of 12 records prepared
total number of lines updated: 38
total of 11 records prepared
total number of lines updated: 49
total of 18 records prepared
total number of lines updated: 67
total of 11 records prepared
total number of lines updated: 78
total of 9 records prepared
total number of lines updated: 87
Database connection closed.


In [40]:
str(record['raw_value'][1])

'5'

This is somehow slow, but it works, and all the records are in the database.