# Fireveg DB - download and explore NSWFFRD 2014 data

Author: [José R. Ferrer-Paris](https://github.com/jrfep) 

Date: July 2024

This Jupyter Notebook includes [Python](https://www.python.org) code to download and explore the spreadsheet from NSW Flora Fire response database and extract hyperlinks that point to references. We will use the `openpyxl` library to work with the workbook.

## Set-up

### Load libraries

In [1]:
import openpyxl
from pathlib import Path
import os,sys

# Pyprojroot for easier handling of working directory
import pyprojroot

### Define paths for input and output

Define project directory using the `pyprojroot` functions, and add this to the execution path.

In [2]:
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))
sys.path.append(str(repodir))

In [3]:
inputdir = repodir / "data/"

### Load own functions
Load functions from `lib` folder, we will use a function to read db credentials, one for executing database queries and three functions for extracting data from the reference description string

In [4]:
import lib.nswfireflora_util as nswff

## Open the workbook and read main spreadsheet
Here we will load the workbook:

In [5]:
wb = openpyxl.load_workbook(inputdir / "NSWFFRDv2.1.xlsx")

List all worksheets:

In [6]:
for k in wb.worksheets:
    print(k)

<Worksheet "Read Me">
<Worksheet "SpeciesData">
<Worksheet "References">
<Worksheet "Notes">
<Worksheet "VA Groups">


Use the sheet name to read data

### Read Me

In [7]:
readme = wb['Read Me']

In [8]:
for cell in ['A1','A2','A3','A6','A7',]:
    print(readme[cell].value)

NSW Flora Fire Response Database
Version 2.1
February 2010 (last update May 2014)
Please read NSW Flora Fire Response Database.doc
Refer to 'Notes' sheet for explanations of data type and format used in 'SpeciesData' sheet.


### Species data

In [9]:
species_data = wb['SpeciesData']

Let's look at all the values in the second column (Species code)

In [10]:
species_data['B2'].value

'Species Code'

In [76]:
species_data['Z2'].value

'Primary juvenile period'

We want to count the number of unique values:

In [11]:
row_count = species_data.max_row
column_count = species_data.max_column
j=2
unique_list = list()
unique_items = 0
for i in range(1, row_count + 1):
    item = species_data.cell(row=i, column=j).value
    if item not in unique_list and item is not None:
        unique_list.append(item)
        unique_items += 1
print(unique_items)

3000


Which columns include information on traits? this will print out the names in the second row...:

In [12]:
for j in range(1, column_count + 1):
    print(species_data.cell(row=2, column=j).value)


Current Scientific Name
Species Code
Legal Status
Exotic
2010 Update
Notes on Name / Synonym as used in source reference
Family
Group
Life form
Fireresponse
Comments on regeneration
Resprout location
Seed storage
Seed dispersal mechanism
Seed dispersal distance
Seed weight / size
Seed viability
Dormancy
Germination cue
Fecundity
Seed predation
Post-fire recruitment
Establishment
Post-fire flowering
Flowering time
Primary juvenile period
Secondary juvenile period
Seed set
Seed-bank developed
Fire tolerance
Life span
Seed-bank longevity
"Maturity" (from source)
"Extinction" (from source)
"Rec. min fire interval" (from source)
"Rec. max fire interval" (from source)
NC
CC
SC
NT
CT
ST
NWS
CWS
SWS
NWP
SWP
NFWP
SFWP
Distribution: extra NSW
Vegetation
Rainforest
Wet Sclerophyll Forest (Shrubby)
Wet Sclerophyll Forest (Grassy)
Grassy Woodland
Grassland
Dry Sclerophyll Forest (Shrub/Grass)
Dry Sclerophyll Forest (Shrubby)
Heathland
Alpine Complex
Freshwater Wetland
Forested Wetlands
Saline Wetla

In [13]:
species_data.cell(row=2, column=31).value

'Life span'

#### Read cell values
We can use square brackets to refer to a column and then use python indices (starting with _0_ for the top row) to slice it. We use the property _value_ to show their stored content. 

In [14]:
print(species_data['X'][1].value)
print(species_data['X'][157].value)

Post-fire flowering
flowers well after fire


Descriptions of these columns are found in the column_notes sheet:

In [15]:
column_notes = wb['Notes'] 
for k in (23,24):
    print(" - *%s*" %column_notes.cell(row=k,column=2).value)
    print("\t%s" % column_notes.cell(row=k,column=3).value)

 - *Establishment*
	Seedling establishment groups of Noble & Slatyer (1980); See VA sheet for details: I=Intolerant, T=Tolerant, R=Requiring
 - *Post-fire flowering*
	exclusive or facultative post-fire flowering observed


In [16]:
#26 27 31 32
#for k in range(10,50):
for k in (26,27,31,32):
    print(" - %s ) *%s*" % (k,column_notes.cell(row=k,column=2).value))
    print("\t%s" % column_notes.cell(row=k,column=3).value)

 - 26 ) *Primary Juvenile Period*
	Plant age at first flowering. May be a single figure or range of ages, may also be an indication (e.g. >5 implies that at 5 years post-fire flowering had still not been observed). May give the percentage of the population observed to flower at a particular time post-fire
 - 27 ) *Secondary juvenile period*
	Post-fire age at which flowering first occurs from resprouting material
 - 31 ) *Life span*
	Plant age at which senescence is expected to occur. In many cases this is a fairly broad range, based on plant life form & structure. May be post-fire age at which a species is no longer found in a community where it was known or assumed to occur in.
 - 32 ) *Seedbank Longeiveity*
	Number of years that a stored seed-bank is expected to stay viable.hl= half life value reported


We can use this approach to read several columns from one row, let's start checking the columns names in row 1:

In [17]:
sp_col='A'
spcode_col='B'
target_cols={'repr2':'X', 'rect2':'W'}

target_cols.values()
print("%s (%s) / %s / %s  " %
(species_data[sp_col][1].value,
 species_data[spcode_col][1].value,
species_data[target_cols['repr2']][1].value,
species_data[target_cols['rect2']][1].value))

Current Scientific Name (Species Code) / Post-fire flowering / Establishment  


Now select one record:

In [18]:
row_index=157

print("%s (%s)  ~ %s  / %s " %
(species_data[sp_col][row_index].value,
 species_data[spcode_col][row_index].value,
 species_data[target_cols['repr2']][row_index].value,
species_data[target_cols['rect2']][row_index].value))

Acianthus caudatus (4351)  ~ flowers well after fire  / None 


## Dealing with hyperlinks

The cell Q6 has a hyperlink. We can use cell rows and columns or cell name:

In [19]:
type(species_data.cell(row=6, column=17).hyperlink)
# same as 
type(species_data['Q6'].hyperlink)

openpyxl.worksheet.hyperlink.Hyperlink

If the cell is a hyperlink it will have a value to "display" and will point to a "location" within the workbook: 

In [20]:
species_data.cell(row=6, column=17).hyperlink.display

'viability average-very good'

In [21]:
# This will fail if there is no hyperlink 
print(species_data.cell(row=6, column=17).hyperlink.location)

References!C94


Let's see the value of this reference:

In [22]:
hlink = species_data.cell(row=6, column=17).hyperlink.location
hlink = hlink.split("!")

This gives the name of the target sheet and the corresponding cell. We need to read the cell to its right side (add one to the column number) to get the information we need.

In [23]:
ref = wb[hlink[0]]
print("Cell value is :: " + str(ref[hlink[1]].value))
nlink = ref.cell(row=ref[hlink[1]].row,column=ref[hlink[1]].col_idx + 1)

print("Reference data is :: " + nlink.value) 


Cell value is :: 93
Reference data is :: Mortlock, W. & Lloyd, MV (Eds) (2001) Floradata - A guide to collection, storage and propogation of Australian native plant seed. AUsttralian Centre for Mining Environmental Research, Brisbane; Australian National Botanic Gardens, CSIRO Forestry and Forest Products and Greening Australia Limited, Canberra. Searchable Database February 2001. a=survey data, b=test data


If there is no hyperlink, it will result in NoneType

In [24]:
type(species_data.cell(row=5, column=17).hyperlink)

NoneType

In [25]:
type(species_data.cell(row=5, column=17))


openpyxl.cell.cell.Cell

In [26]:
species_data.cell(row=5, column=17)


<Cell 'SpeciesData'.Q5>

## Read data from a column
For a selected variable (column), we can query data for the list of species.


In [27]:
species_data['Q6']

<Cell 'SpeciesData'.Q6>

In [28]:
species_data.cell(row=2, column=31).value

'Life span'

In [29]:
species_data.cell(row=16, column=31).value

Example loop for querying values from one variable for all species in a range of cells:

In [30]:
i=3
j=31
varname=species_data.cell(row=2, column=j).value
print(varname)
for i in range(13,40):
    spname=species_data.cell(row=i, column=1).value
    spcode=species_data.cell(row=i, column=2).value
    varvalue=species_data.cell(row=i, column=j).value
    varref=species_data.cell(row=i, column=j).hyperlink
    if varvalue is not None:
        if varref is not None:
            print("%s: %s / %s / %s" % (spcode,spname,varvalue, varref.location))
        else:
            print("%s: %s / %s " % (spcode,spname,varvalue))


Life span
3710: Acacia baileyana / 20-30 / References!C2
3716: Acacia binervata / 20 
3717: Acacia binervia / 50-100 / References!C2
8601: Acacia bulgaensis / c. 20-40 
3747: Acacia constablei / short? / References!N16


## Read data for a row
We can now do the same for a single species (row) and query values of each variable in a range. For example:

In [31]:
# for j in 17
i=18
spname=species_data.cell(row=i, column=1).value
spcode=species_data.cell(row=i, column=2).value
print("%s: %s" %(spcode,spname))
    
for j in range(3,30):
    varname=species_data.cell(row=2, column=j).value
    varvalue=species_data.cell(row=i, column=j).value
    varref=species_data.cell(row=i, column=j).hyperlink
    if varvalue is not None:
        if varref is not None:
            print("%s: %s / %s / %s" % (j,varname,varvalue, varref.location))
        else:
            print("%s: %s / %s " % (j,varname,varvalue))


3716: Acacia binervata
7: Family / Fabaceae: Mimosoideae 
8: Group / D 
9: Life form / T 
10: Fireresponse / Sr 
11: Comments on regeneration / Resprouting form in northern tablelands (134) 
12: Resprout location / basal buds 
13: Seed storage / persistent soil 
14: Seed dispersal mechanism / a-ant / References!C56
18: Dormancy / hard seed coat 
19: Germination cue / heat / References!C94
22: Post-fire recruitment / prolific / References!C106
23: Establishment / I / References!A94
25: Flowering time / Aug-Nov 
26: Primary juvenile period / 5 / References!C36
27: Secondary juvenile period / References!C36 / References!C36
28: Seed set / 6 / References!C36
29: Seed-bank developed / 10 / References!C36


## Search for a species code
Here we try to locate the species code and then return the values for that row:

In [32]:
for cell in species_data['A']:
    if(cell.value is not None): #We need to check that the cell is not empty.
        if 'Eryngium vesiculosum' in cell.value: #Check if the value of the cell contains the text 'Table'
            print('Found header with name: {} at row: {} and column: {}. In cell {}'.format(cell.value,cell.row,cell.column,cell))

Found header with name: Eryngium vesiculosum at row: 1123 and column: 1. In cell <Cell 'SpeciesData'.A1123>


In [33]:
# for j in 17
i=1123
spname=species_data.cell(row=i, column=1).value
spcode=species_data.cell(row=i, column=2).value
print("%s: %s" %(spcode,spname))
    
for j in range(3,50):
    varname=species_data.cell(row=2, column=j).value
    varvalue=species_data.cell(row=i, column=j).value
    varref=species_data.cell(row=i, column=j).hyperlink
    if varvalue is not None:
        if varref is not None:
            print("%s: %s / %s / %s" % (j,varname,varvalue, varref.location))
        else:
            print("%s: %s / %s " % (j,varname,varvalue))


1117: Eryngium vesiculosum
7: Family / Apiaceae 
8: Group / D 
9: Life form / H 
10: Fireresponse / R 
25: Flowering time / Dec-Mar 
27: Secondary juvenile period / <1 / References!C80
31: Life span / short / References!C2
37: NC / - 
38: CC / - 
39: SC / - 
40: NT / 1 
41: CT / 1 
42: ST / 1 
43: NWS / 1 
44: CWS / - 
45: SWS / 1 
46: NWP / - 
47: SWP / - 
48: NFWP / - 
49: SFWP / - 


### Create list(s) of references 
We need to prepare list of references from spreadsheet 'References'.

There are three sets of references:
- the  "normal" references in columns C and D (pink)
- the  "Recovery Plan / Regional Forest Agreement Report" references in columns N, O, and P (blue)
- the  "NFRR" references in columns S and T (lila)

Normal and NFRR references are identified by a simple two-cipher or -letter code and reference description, we will use a function to create a more descriptive reference code for the references based on the list of authors and date.

For Recovery plans and Regional Forest Agreement Reports, we will use the species or region as reference code.


In [34]:
references = wb['References']

val=references['O'][26].value.replace("(1) ","")
print(val)
nswff.create_ref_code_RP(val)

Asterolasia elegans


'RP Asterolasia elegans'

Now we check references of NFRR (notice that we will substitute number _1_ with capital _I_ in _refcode_ to avoid problems with one reference (see below):

In [35]:
NFRR_refs=list()
for row in range(1,66):
    cite_text = references['T'][row].value.replace("(1) ","")
    cite_code = nswff.create_ref_code(cite_text) 
    record={"refcode": references['S'][row].value.replace("1","I"),
            "refstring": cite_code,#re.sub(r", [A-Z\.]+"," ",cite_code),
            "refinfo": cite_text
    }
    NFRR_refs.append(record)

In [36]:
NFRR_refs[56]

{'refcode': 'SA',
 'refstring': 'Carolyn Sandercoe Qld. unpub.',
 'refinfo': 'Carolyn Sandercoe, Qld. (unpublished)'}

In [37]:
NFRR_refs[6]["refcode"]

'BF'

In [38]:
qry="FOI"
for elem in filter(lambda x: x['refcode'] == qry, NFRR_refs):
    print("NFRR reference %s refers to '%s'" % (qry, elem['refinfo']))

NFRR reference FOI refers to 'Fox, J.E.D. (1985). Fire in Mulga: Studies at the margins. In: Fire ecology and management of Western Australian ecosystems. (ed: J.R. Ford). Western Australian Institute of Technology, report no. 14.'


We do the same for the "normal" references column:

In [39]:
other_refs=list()
for row in range(1,139):
    cite_text = references['D'][row].value
    cite_code = nswff.create_ref_code(cite_text) 
    if cite_code == "Benson 1985":
        cite_code = "Benson 1985b"
    record={"refcode": references['C'][row].value,
            "refstring": cite_code,
            "refinfo": cite_text
    }
    other_refs.append(record)

In [40]:
other_refs[9]

{'refcode': 10,
 'refstring': 'Wark White Robertson Marriott 1987',
 'refinfo': 'Wark, M.C., White, M.D., Robertson, D.J. and Marriott, P.F. (1987). Regeneration of heath and heath woodland in the north-eastern Otway Ranges following the wildfire of February 1983. Proc.Roy.Soc.Vic. 99, 51-88.'}

Now the recovery plan references:

In [41]:
rp_refs=list()
for row in range(1,46):
    cite_code = nswff.create_ref_code_RP(references['O'][row].value) 
    cite_text = "%s. %s" % (cite_code, references['P'][row].value)
    record={"refcode": references['N'][row].value,
            "refstring": cite_code,
            "refinfo": cite_text
    }
    rp_refs.append(record)

Check if there are duplicated references:

In [42]:
l1 = list()
for r in NFRR_refs: 
    l1.append(r["refstring"])
l2 = list()
for r in other_refs: 
    l2.append(r["refstring"])

for i in l1:
    if i in l2:
        print(i)


Benwell 1998
Molnar Fletcher Parsons 1989
Wark White Robertson Marriott 1987
Wark 1997


In [43]:
qry="Benwell 1998"
for elem in filter(lambda x: x['refstring'] == qry, NFRR_refs):
    print("Reference %s refers to '%s'" % (qry, elem['refinfo']))
for elem in filter(lambda x: x['refstring'] == qry, other_refs):
    print("Reference %s refers to '%s'" % (qry, elem['refinfo']))
    

Reference Benwell 1998 refers to 'Benwell A.S. (1998). Post-fire seedling recruitment in coastal heathland in relation to regeneration strategy and habitat. Aust. J. Bot. 46, 75-101.'
Reference Benwell 1998 refers to 'Benwell, A.S. (1998) Post-fire seedling recruitment in coastal heathland in relation to regeneration strategy and habitat. Aust. J. Bot. 46:75-101.  Data compiled by D.Keith (Keith, D.A., McCaw, W.L. & Whelan, R.J. (2002) pp. 199-237 in "Flammable Australia: The fire regimes and biodiversity of a continent" Ed. R.A. Bradstock, J.E. Williams & M.A. Gill. Cambridge University Press, Cambridge)'


### Matching references from hyperlinks
We created a function to translate hyperlinks to a reference.
We can test this function for several rows:

In [44]:
for row_index in (157,162,233):
    spname=species_data[sp_col][row_index].value
    pjp=species_data[target_cols['repr2']][row_index]
 
    raw=pjp.value
    if (pjp.hyperlink is not None):
        ref=nswff.extract_link(pjp,references,other_refs, rp_refs, NFRR_refs)
        if ref is not None:
            print("%s :: [%s] // %s" % (row_index,raw,ref[1]))
        else:
            print("%s :: [%s] " % (row_index,raw))            
    else:
        print("%s :: [%s] " % (row_index,raw))

157 :: [flowers well after fire] // ['Bishop 1996']
162 :: [flowering 1 year post-fire] // ['Knox Clarke 2004']
233 :: [facultative] // ['Keith David pers. comm.']


Note: 
We had to modify the function to deal with one malformed hyperlink referring to use this to fix error with one hyperlink `C84\\` at `SpeciesData.AE2080`

### Colored and modified fonts

Some records include additional information coded in font color or strikethrough of values. With Python we can query cell colors and strikethrough properties of the font to verify if information has been annotated, but not with enough detail to distinguish with part of the value is annotated and which is not. For example:

In [45]:
for row in [22,23,66,67,70,72]:
    if species_data['BN'][row].font.color == None:
        print("Cell %s has no colored font" % (row+1))
    else:
        print("Cell %s has colored font" % (row+1))
        print(species_data['BN'][row].font.color.indexed)
    if species_data['BN'][row].font.strike != None:
        print("Cell %s has strikethrough" % (row+1))

Cell 23 has colored font
60
Cell 24 has no colored font
Cell 67 has colored font
60
Cell 68 has no colored font
Cell 71 has no colored font
Cell 73 has colored font
60
Cell 73 has strikethrough


### Processing strings with and without references
Cell values in the target columns might includes values in mixed formats, sometimes numbers and sometimes text, sometimes different observations are recorded for each species using delimiters and citing references in text, e.g.: 
> value1 (ref a) / value2 (ref b)
 
In such cases we want to split the values into different records and keep the values as 'raw value' and document the references cited. If the value in the cell matches our predefined values (e.g. Exclusive, Facultative, Negligible for post-fire flowering), we will fill a 'norm_value' with the corresponding category, if no match is found we will keep it empty for later processing.

In exceptional cases a reference is given in the text: "(12)" refers to reference 12.

We will define a _switcher_ function to transform raw values into normalised values:

In [46]:
switcher={
    "repr2":{
        "facultative": "Facultative",
        "yes": "Facultative",
        "yes?": "Facultative",
        "most profuse after fire": "Facultative",
        "exclusive": "Exclusive",
        "exclusive?": "Exclusive",
        "negligible": "Negligible"
    },
    "rect2":{
        "I":"Intolerant",
        "T":"Tolerant",
        "R":"Requiring",
        "T R":"Tolerant-Requiring",
        "I T":"Intolerant-Tolerant",
        "T I":"Intolerant-Tolerant"
    },
    "germ1":{
        'canopy': 'Canopy',
        
        'persistent soil': 'Soil-persistent', 
        'persistent': 'Soil-persistent', 
        'peristent': 'Soil-persistent', 
        'soil': 'Soil-persistent', 
        
        'transient': 'Transient', 
        'none':'Transient', 
        'shed at maturity': 'Transient', 
        'viviparous':'Transient', 
        'canopy / released at maturity':'Transient', 
        'canopy / regularly without fire':'Transient', 
        'canopy - transient':'Transient', 
        'transient': 'Transient', 
        
        'serotinous canopy': 'Canopy',
        'non-canopy': 'Non-canopy',
        'not canopy': 'Non-canopy',
        
        'other': 'Other'
    },
     "surv4":{
        'epicormic': 'Epicormic', 
        'stem buds': 'Epicormic', 
        'apical': 'Apical', 
        'lignotuber': 'Lignotuber',
        'root stock': 'Lignotuber',
        'rootstock': 'Lignotuber',
        'basal': 'Basal',
        'basal buds': 'Basal',
        'coppice': 'Basal',
        'tuber': 'Tuber',
        'taproot': 'Tuber',
        'tap root': 'Tuber',
        'tussock': 'Tussock',
        'rhizome': 'Long rhizome or root sucker',
        'rootucker': 'Long rhizome or root sucker',
        'rootuckers': 'Long rhizome or root sucker',
        'rootsuckers': 'Long rhizome or root sucker',
        'root buds': 'Long rhizome or root sucker',
        'root sucker': 'Long rhizome or root sucker',
        'root suckers': 'Long rhizome or root sucker',
        'rhizome': 'Short rhizome',
        'stolon': 'Stolon',
        'stolons': 'Stolon'
    }
}
isinstance(switcher["germ1"],dict)

True

And we defined a function to extract values from a target cell:

In [47]:
target_col=target_cols["repr2"]

varname=species_data[target_col][1].value

for row_index in (157,162,233):
    pjp=species_data[target_col][row_index]
    if (pjp.hyperlink is not None):
        ref=nswff.extract_link(pjp,references,other_refs,rp_refs,NFRR_refs)
    else:
        ref=None
    if (pjp.value is not None):
        spname=species_data[sp_col][row_index].value
        spcode=species_data[spcode_col][row_index].value
        rec=nswff.extract_value(pjp,switcher["repr2"],varname,
                               references,other_refs,rp_refs,NFRR_refs)
        for record in rec:
            record["species"]=spname
            record["species_code"]=spcode
            if 'original_sources' not in record and ref is not None:
                record['original_sources'] = ref[1]
            print("%s ::  %s" % (row_index,record))
           
    else:
        print("%s is empty " % (row_index))

157 ::  {'raw_value': ['Post-fire flowering', 'flowers well after fire'], 'main_source': 'NSWFFRDv2.1', 'original_notes': ['Cell color index 12'], 'species': 'Acianthus caudatus', 'species_code': '4351', 'original_sources': ['Bishop 1996']}
162 ::  {'raw_value': ['Post-fire flowering', 'flowering 1 year post-fire'], 'main_source': 'NSWFFRDv2.1', 'species': 'Aciphylla simplicifolia', 'species_code': '1091', 'original_sources': ['Knox Clarke 2004']}
233 ::  {'raw_value': ['Post-fire flowering', 'facultative'], 'main_source': 'NSWFFRDv2.1', 'norm_value': 'Facultative', 'original_notes': ['Cell color index 12'], 'species': 'Amperea xiphoclada var. xiphoclada', 'species_code': '9713', 'original_sources': ['Keith David pers. comm.']}


We wrap this in one single function call so that we can get one or many records per cell with a simple function call:

In [48]:
target_col=target_cols["rect2"]
for row_index in (36,122,167):
    rr = nswff.create_record(species_data,
                             target_col,
                             row_index,
                             switcher["rect2"],
                             references,
                             other_refs,
                             rp_refs,
                             NFRR_refs
                            )
    print(rr)

[{'raw_value': ['Establishment', 'I (R35)', '->', 'I'], 'main_source': 'NSWFFRDv2.1', 'norm_value': 'Intolerant', 'original_sources': ['RP RFA NSW - Eden'], 'original_notes': ['original record split into multiple entries, prob. different sources'], 'species': 'Acacia constablei', 'species_code': '3747'}, {'raw_value': ['Establishment', 'even aged stands indicate post fire recruitment; though some recruitment in absence of fire (R15)', '->', 'even aged stands indicate post fire recruitment; though some recruitment in absence of fire', '->', 'even aged stands indicate post fire recruitment'], 'main_source': 'NSWFFRDv2.1', 'original_sources': ['RP Threatened Flora of Rocky Outcrops in South Eas'], 'original_notes': ['original record split into multiple entries, prob. different sources', 'original record split into multiple entries separated by and/or'], 'species': 'Acacia constablei', 'species_code': '3747'}, {'raw_value': ['Establishment', 'even aged stands indicate post fire recruitment

### Numeric traits
Read the spreadsheet from NSW Flora Fire response database and extract information for the time to first flowering after fire (primary and secondary juvenile periods for recruits and resprouters respectively)

In [49]:
sp_col='A'
spcode_col='B'
target_cols = {'repr3':'Z', 
               'repr3a':'AA', 
               'grow1':'AD', 
               'repr4':None, 
               'surv5':'AE', 
               'surv6':None, 
               'surv7':'AF'}

print("%s (%s) / %s / %s / %s / %s " %
(species_data[sp_col][1].value,
 species_data[spcode_col][1].value,
species_data[target_cols['repr3a']][1].value,
species_data[target_cols['grow1']][1].value,
species_data[target_cols['surv5']][1].value,
species_data[target_cols['surv7']][1].value))

Current Scientific Name (Species Code) / Secondary juvenile period / Fire tolerance / Life span / Seed-bank longevity 


In [50]:
x=nswff.create_numeric_record(species_data,target_cols['repr3'],100,
                             references, other_refs, rp_refs, NFRR_refs)
len(x)
x

[{'raw_value': ['Primary juvenile period', '2 (10)'],
  'main_source': 'NSWFFRDv2.1',
  'original_sources': ['Wark White Robertson Marriott 1987'],
  'best': '2',
  'species': 'Acacia myrtifolia',
  'species_code': '3834',
  'weight': 1,
  'weight_notes': ['automatic assignment of weight by python script',
   'default value of 1']},
 {'raw_value': ['Primary juvenile period', '2.5 (1b)'],
  'main_source': 'NSWFFRDv2.1',
  'original_sources': ['Benson McDougall Ecology Sydney Plant Species Cunn'],
  'species': 'Acacia myrtifolia',
  'species_code': '3834',
  'weight': 1,
  'weight_notes': ['automatic assignment of weight by python script',
   'default value of 1']},
 {'raw_value': ['Primary juvenile period', '3 (9, 48)'],
  'main_source': 'NSWFFRDv2.1',
  'original_sources': ['Keith David pers. comm.', 'Wark 1997'],
  'best': '3',
  'species': 'Acacia myrtifolia',
  'species_code': '3834',
  'weight': 1,
  'weight_notes': ['automatic assignment of weight by python script',
   'default va

In [51]:
varname=species_data[target_cols['repr3']][1].value
for row_index in (98,99,100,206, 1422,1421):
    target=species_data[target_cols['repr3']][row_index]
    if (target.hyperlink is not None):
        ref=nswff.extract_link(target,references,other_refs,rp_refs,NFRR_refs)
    else:
        ref=None
    if (target.value is not None):
        spname=species_data[sp_col][row_index].value
        spcode=species_data[spcode_col][row_index].value
        rec=nswff.extract_numeric_value(target,varname,
                                  references,other_refs,rp_refs,NFRR_refs)
        for record in rec:
            record["main_source"]="NSWFFRDv2.1"
            record["species"]=spname
            record["species_code"]=spcode
            if 'original_sources' not in record and ref is not None:
                record['original_sources'] = ref[1]
            print("%s ::  %s" % (row_index,record))
           
    else:
        print("%s is empty " % (row_index))

98 ::  {'raw_value': ['Primary juvenile period', '3'], 'best': 3, 'main_source': 'NSWFFRDv2.1', 'original_notes': ['Cell color index 12'], 'species': 'Acacia mucronata subsp. longifolia', 'species_code': '10058', 'original_sources': ['Wark 1997']}
99 ::  {'raw_value': ['Primary juvenile period', 'c. 3'], 'main_source': 'NSWFFRDv2.1', 'original_notes': ['Cell color index 12'], 'species': 'Acacia murrayana', 'species_code': '3832', 'original_sources': ['Hodgkinson Griffin 1982']}
100 ::  {'raw_value': ['Primary juvenile period', '2 (10)'], 'main_source': 'NSWFFRDv2.1', 'original_sources': ['Wark White Robertson Marriott 1987'], 'best': '2', 'species': 'Acacia myrtifolia', 'species_code': '3834'}
100 ::  {'raw_value': ['Primary juvenile period', '2.5 (1b)'], 'main_source': 'NSWFFRDv2.1', 'original_sources': ['Benson McDougall Ecology Sydney Plant Species Cunn'], 'species': 'Acacia myrtifolia', 'species_code': '3834'}
100 ::  {'raw_value': ['Primary juvenile period', '3 (9, 48)'], 'main_so

## Resprouting
This column has more complexity than other columns in the spreadsheet.
In the case of the resprouting we need to decifer the information in columns _J_ ('Fireresponse'), _BN_ ('NFRR data') and _BO_ ('Additional fire response data').

We can use square brackets to refer to a column and then use python indices (starting with _0_ for the top row) to slice it. We use the property _value_ to show their stored content. 

In [52]:
print(species_data['BN'][1].value)
print(species_data['BN'][2].value)

NFRR data
8CN


Alternatively, we can use the function _cell_ to retrieve individual cells. Indices here follow the spreadsheet convention and start with _1_ for the top row. The header is in the second row, the first value is in the third row:

In [53]:
print(species_data.cell(row=2,column=10).value)
print(species_data.cell(row=3,column=10).value)

Fireresponse
S


### Example for one species:
Let's start checking the columns we need:

In [54]:
sp_col='A'
fireresponse_col='J'
comment_col='K'
NFRR_col='BN'
oref_col='BO'

print("%s / %s / %s / %s / %s" %
(species_data[sp_col][1].value,
species_data[fireresponse_col][1].value,
species_data[comment_col][1].value,
species_data[NFRR_col][1].value,
species_data[oref_col][1].value))

Current Scientific Name / Fireresponse / Comments on regeneration / NFRR data / Additional fire response data


Descriptions of these columns are found in the spreadsheet:
> Fire Response: S=seeder, R=resprouter. r=usually killed but sometimes resprouts, s=usually resprouts but sometimes killed; these may indicate a variable response seen by one observer, or a conflict between different observers (see comments column). When an equal number of references list the species as seeder or resprouter, this column reads as 'S/R' and details are given in the comments cloumn. Ideally fire response should be defined by mortality >70%=seeder, mortality <30%=resprouter [Gill & Bradstock, 1992].

> Comments on regeneration Notes on conflicting or variable regeneration information. May indicate response to various fire intensity levels; level of mortality seen; a variable fire response seen by one observer within the same species. Some species with distinct recordings for both resprouter and seeder have been listed seperately as 'fire sensitive/tolerant variety'.

> NFRR data: Fire response data from CSIRO National Fire Response Register, given in original format (see VA sheet for regeneration codes, Reference sheet for reference codes). Corrections made to these after checking original references in brown text: all brown (eg 5W) =species missed from reference; regen code brown (eg 5W) =corrected regen code; strikethrough (eg 5W) =species not in reference

> Additional Fire Response data: Fire response data from other references. See VA Groups sheet for regeneration codes; see References sheet for reference codes

Now select one record

In [55]:
row_index=3

print("%s / %s / %s / %s / %s" %
(species_data[sp_col][row_index].value,
species_data[fireresponse_col][row_index].value,
species_data[comment_col][row_index].value,
species_data[NFRR_col][row_index].value,
species_data[oref_col][row_index].value))

Abutilon oxycarpum / S / None / 8BD / II-95b


The code '8BD' refers to 'Regeneration category' 8 and the reference code 'BD'. Similarly, 'II-95b' can be decomposed into regeneration categories IIb and reference 95. We will need to create python dictionaries to link this information across the spreadsheets.

### Look-up table for VA groups

For the VA groups we need colums A, B and C:

In [56]:
va_groups = wb['VA Groups']
reg_cats=list()
for row in range(3,13):
    record={"NFRRcode":va_groups['A'][row].value,
    "othercode":va_groups['B'][row].value,
     "category":va_groups['C'][row].value
    }
    reg_cats.append(record)

In [57]:
reg_cats[0]

{'NFRRcode': 1,
 'othercode': 'I',
 'category': 'Killed by 100% scorch; seed storage on plant'}

Now if we need to lookup one value we can use a filter:

In [58]:
qry=1
for elem in filter(lambda x: x['NFRRcode'] == qry, reg_cats):
    print("NFRR code of %s refers to '%s'" % (qry, elem['category']))

NFRR code of 1 refers to 'Killed by 100% scorch; seed storage on plant'


In [59]:
row_index=3

spname=species_data[sp_col][row_index].value
NFRRval=species_data[NFRR_col][row_index].value
otherval=species_data[oref_col][row_index].value

We can now use regular expressions to separate the different pieces of information, for NFRR:


In [60]:
import re
print(re.findall("\d+", NFRRval))
print(re.findall("[A-Z]+", NFRRval))

['8']
['BD']


Look up the group and references:

In [61]:
for qry in re.findall("\d+", NFRRval):
    for elem in filter(lambda x: x['NFRRcode'] == int(qry), reg_cats):
        print("NFRR code of %s refers to '%s'" % (qry, elem['category']))

for qry in re.findall("[A-Z]+", NFRRval):
    for elem in filter(lambda x: x['refcode'] == qry, NFRR_refs):
        print("Reference %s refers to '%s'" % (qry, elem['refinfo']))

NFRR code of 8 refers to 'Killed by 100% scorch; seed storage unknown'
Reference BD refers to 'Benson, D. and McDougall, L. (1997). Ecology of Sydney plant species part 5: Dicotyledon families Flacourtiaceae to Myrsinaceae. Cunninghamia 5, 330-544.'


Now we can do the same for the additional references:

In [62]:
for qry in re.findall("[IVX]+", otherval):
    for elem in filter(lambda x: x['othercode'] == qry, reg_cats):
        print("NFRR code of %s refers to '%s'" % (qry, elem['category']))

for qry in re.findall("\d+", otherval):
    for elem in filter(lambda x: x['refcode'] == int(qry), other_refs):
        print("Reference %s refers to '%s'" % (qry, elem['refinfo']))

NFRR code of II refers to 'Killed by 100% scorch; seed storage in soil'
Reference 95 refers to 'Hunter, J.T. (1998) Vegetation and floristics of the Washpool National Park Western Additions / Hunter, J.T. (2000) Vegetation and floristics of Mt Conobolas State Recreation Area / Hunter, J.T. (2000) Vegetation and floristics of Burnt Down Scrub Nature Reserve. Reports to NSW NPWS. a=personal observation b=personal communication c=referenced source'


### Example with multiple references per species
Now let's pick other examples: 

In [63]:
row_index=9
spname=species_data[sp_col][row_index].value
NFRRval=species_data[NFRR_col][row_index].value

For this species the values in the NFRR column are separated by blank spaces, representing different observations, but the ones in the parenthesis are picked up incorrectly as additional group references instead of references:

In [64]:
print(spname)
print(NFRRval)
print('separated as')
print(re.findall("\d+", NFRRval))
print(re.findall("[A-Z]+", NFRRval))

Acacia aneura
8HG 8FO 8FO(1) 9FO(1) 8PL
separated as
['8', '8', '8', '1', '9', '1', '8']
['HG', 'FO', 'FO', 'FO', 'PL']


Let's first replace "FO(1)" with "FOI", and then run the code again:

In [65]:
NFRRval=species_data[NFRR_col][row_index].value.replace('FO(1)','FOI')
print(re.findall("\d+", NFRRval))
print(re.findall("[A-Z]+", NFRRval))

['8', '8', '8', '9', '8']
['HG', 'FO', 'FOI', 'FOI', 'PL']


Now, if we want a list of unique references and VA groups, we can use the function _set_.

In [66]:
for qry in set(re.findall("\d+", NFRRval)):
    for elem in filter(lambda x: x['NFRRcode'] == int(qry), reg_cats):
        print("NFRR code of %s refers to '%s'" % (qry, elem['category']))

for qry in set(re.findall("[A-Z]+", NFRRval)):
    for elem in filter(lambda x: x['refcode'] == qry, NFRR_refs):
        print("Reference %s refers to '%s'" % (qry, elem['refstring']))

NFRR code of 9 refers to 'Survives 100% scorch; resprout location unknown'
NFRR code of 8 refers to 'Killed by 100% scorch; seed storage unknown'
Reference FOI refers to 'Fox 1985'
Reference PL refers to 'Latz 1995'
Reference HG refers to 'Hodgkinson Griffin 1982'
Reference FO refers to 'Fox 1980'


Alternative processing each combination of reference and value:

In [67]:
for item in NFRRval.split(" "):
    qry = re.findall("\d+", item)[0]
    group = list(filter(lambda x: x['NFRRcode'] == int(qry), reg_cats))[0]
    qry = re.findall("[A-Z]+", item)[0]
    ref = list(filter(lambda x: x['refcode'] == qry, NFRR_refs))[0]
    print("Reference %s... reports '%s'" % (ref['refinfo'][:30], group['category']))

Reference Hodgkinson, K.C. and Griffin, ... reports 'Killed by 100% scorch; seed storage unknown'
Reference Fox, J.E.D. (1980). Effects of... reports 'Killed by 100% scorch; seed storage unknown'
Reference Fox, J.E.D. (1985). Fire in Mu... reports 'Killed by 100% scorch; seed storage unknown'
Reference Fox, J.E.D. (1985). Fire in Mu... reports 'Survives 100% scorch; resprout location unknown'
Reference Latz, P.K. (1995) Bushfires an... reports 'Killed by 100% scorch; seed storage unknown'


Here we pick a different species with multiple values in the 'additional fire response data' column:

In [68]:
row_index=25
spname=species_data[sp_col][row_index].value
otherval=species_data[oref_col][row_index].value

The values with multiple references are separated by semicolons and blanks, so they are picked up just fine. 

In [69]:
if otherval is None:
    print('No other reference')
else:
    print(otherval)
    print(re.findall("\d+", otherval))
    print(re.findall("[A-Z]+", otherval))

II-69; II-100; II-134
['69', '100', '134']
['II', 'II', 'II']


## Format records for input in database

Using the code above it is possible to:
- create records for each original reference and add them to a central "reference list" table
- take each species (row) from the spreadsheet and add records for a "resprouting" table:
    - create one record based on the "Fire response" value citing NSWFFRDv2.1 as the main source and other references as original source
    - create one or more records for each original reference using the "Regeneration category" as input value
    
We will create one record per species, using "NSWFFRDv2.1" as _main reference_, adding the reported references in the _original sources_ column, using a dictionary to translate the original value into the range of accepted values for the column, and including comments in the record.

First we will define the columns we need:

In [70]:
sp_col='A'
code_col='B'
fireresponse_col='J'
comment_col='K'
NFRR_col='BN'
oref_col='BO'

print("%s / %s / %s / %s/ %s / %s" %
(species_data[sp_col][1].value,
 species_data[code_col][1].value,
species_data[fireresponse_col][1].value,
species_data[comment_col][1].value,
species_data[NFRR_col][1].value,
species_data[oref_col][1].value))

Current Scientific Name / Species Code / Fireresponse / Comments on regeneration/ NFRR data / Additional fire response data


We will define a function to read one row and create a record:


In [71]:
def read_row_resprouting(sheet,row):
    sp_col='A'
    code_col='B'
    fireresponse_col='J'
    comment_col='K'
    NFRR_col='BN'
    oref_col='BO'

    switcher={
        "S": "None",
        "Sr": "Few",
        "S/R": "Half",
        "Rs": "Most",
        "R": "All"
    }
    
    varname=sheet[fireresponse_col][1].value
    
    spname=sheet[sp_col][row].value
    spcode=sheet[code_col][row].value
    varvalue=sheet[fireresponse_col][row].value
    origcomment=sheet[comment_col][row].value
    NFRRraw=sheet[NFRR_col][row].value
    otherraw=sheet[oref_col][row].value
    
    record={"main_source":"NSWFFRDv2.1", 
            "additional_notes":["Values reclassified following rules proposed by D. Keith et al.",
                                "Automatic extraction with python script"],
            "raw_value":[varname], 
            "original_sources":list(), 
            "original_notes":list()}

    if varvalue is not None:
        record["raw_value"].append(varvalue)
        transvalue=switcher.get(varvalue, "unknown")
        record["norm_value"]=transvalue
        if origcomment is not None:
            record["original_notes"].append(origcomment)
        if spcode is not None:
            record["species_code"]=spcode
        if spname is not None:
            record["species"]=spname
        if NFRRraw is not None:
            NFRRval=NFRRraw.replace('FO(1)','FOI')
            for qry in set(re.findall("\d+", NFRRval)):
                for elem in filter(lambda x: x['NFRRcode'] == int(qry), reg_cats):
                    record["raw_value"].append("NFRR VA group (%s): %s" % (qry, elem['category']))
            for qry in set(re.findall("[A-Z]+", NFRRval)):
                for elem in filter(lambda x: x['refcode'] == qry, NFRR_refs):
                    record["original_sources"].append(elem['refstring'])
            if sheet[NFRR_col][row].font.color is not None:
                record["original_notes"].append("NFRR record(s) might have been ammended")
            if sheet[NFRR_col][row].font.strike is not None:
                record["original_notes"].append("NFRR record(s) might have been discarded")
        if otherraw is not None:
            otherval=otherraw
            for qry in set(re.findall("[IVX]+", otherval)):
                for elem in filter(lambda x: x['othercode'] == qry, reg_cats):
                    record["raw_value"].append("Other VA group (%s): %s" % (qry, elem['category']))
            for qry in set(re.findall("\d+", otherval)):
                #record["original_sources"].append('NSWFFRD-other-ref-%s' % qry)
                for elem in filter(lambda x: x['refcode'] == qry, other_refs):
                    record["original_sources"].append(elem['refstring'])
        if len(record["original_sources"])==0:
            record.pop("original_sources")
        if len(record["original_notes"])==0:
            record.pop("original_notes")
        return(record)
    else:
        print("empty row")
        return(None)

This is an improved version of the function to split the data into a summary value and single entries for each reference:

We can combine all information into records using the code discussed above:

In [72]:
read_row_resprouting(species_data,14)

{'main_source': 'NSWFFRDv2.1',
 'additional_notes': ['Values reclassified following rules proposed by D. Keith et al.',
  'Automatic extraction with python script'],
 'raw_value': ['Fireresponse',
  'S',
  'NFRR VA group (2): Killed by 100% scorch; seed storage in soil',
  'NFRR VA group (8): Killed by 100% scorch; seed storage unknown',
  'Other VA group (VIII): Killed by 100% scorch; seed storage unknown'],
 'original_sources': ['Peter Byrne Beerwah Qld. unpub.', 'Benwell 1998'],
 'norm_value': 'None',
 'species_code': '7060',
 'species': 'Acacia baueri subsp. baueri'}

In [73]:
nswff.read_rows_resprouting(species_data,14, reg_cats, NFRR_refs, other_refs)

[{'raw_value': ['Fireresponse', 'S'],
  'main_source': 'NSWFFRDv2.1',
  'additional_notes': ['Values reclassified following rules proposed by D. Keith et al.',
   'Automatic extraction with python script'],
  'weight_notes': ['python-script import', 'default of 10 for summary value'],
  'weight': 10,
  'norm_value': 'None',
  'species_code': '7060',
  'species': 'Acacia baueri subsp. baueri'},
 {'raw_value': ['VA Group 2',
   'Killed by 100% scorch; seed storage in soil',
   'Overall value of fireresponse column is S'],
  'original_sources': ['Benwell 1998'],
  'main_source': 'NSWFFRDv2.1',
  'additional_notes': ['Values reclassified following rules proposed by D. Keith et al.',
   'Automatic extraction with python script',
   'Raw values extracted from notes/comments in NSWFFRDBv2.1'],
  'weight_notes': ['python-script import', 'default of 1'],
  'weight': 1,
  'norm_value': 'None',
  'species_code': '7060',
  'species': 'Acacia baueri subsp. baueri'},
 {'raw_value': ['VA Group 8',
  

In [74]:
species_data[fireresponse_col][3].value

'S'

## That is it for now!

✅ Job done! 😎👌🔥

You can:
- go [back home](../Instructions-and-workflow.ipynb),
- continue navigating the repo on [GitHub](https://github.com/ces-unsw-edu-au/fireveg-db-exports)
- continue exploring the repo on [OSF](https://osf.io/h96q2/).
- visit the database at <http://fireecologyplants.net>