# Read NSWFRD 2014
Read the spreadsheet from NSW Flora Fire response database and extract hyperlinks that point to references.
We will use the openpyxl library

In [1]:
import openpyxl
from pathlib import Path
import os

In [2]:
repodir = Path("../") 
inputdir = repodir / "data/"

## Open the workbook and read main spreadsheet
Here we will load the workbook:

In [3]:
wb = openpyxl.load_workbook(inputdir / "NSWFFRDv2.1.xlsx")

Use the sheet name to read data

In [4]:
ws = wb['SpeciesData']

Let's look at all the values in the second column (Species code)

In [17]:
ws['B2'].value

'Species Code'

We want to count the number of unique values:

In [18]:
row_count = ws.max_row
column_count = ws.max_column
j=2
unique_list = list()
unique_items = 0
for i in range(1, row_count + 1):
    item = ws.cell(row=i, column=j).value
    if item not in unique_list and item is not None:
        unique_list.append(item)
        unique_items += 1
print(unique_items)

3000


Which columns include information on traits? this will print out the names in the second row...:

In [24]:
for j in range(1, column_count + 1):
    print(ws.cell(row=2, column=j).value)


Current Scientific Name
Species Code
Legal Status
Exotic
2010 Update
Notes on Name / Synonym as used in source reference
Family
Group
Life form
Fireresponse
Comments on regeneration
Resprout location
Seed storage
Seed dispersal mechanism
Seed dispersal distance
Seed weight / size
Seed viability
Dormancy
Germination cue
Fecundity
Seed predation
Post-fire recruitment
Establishment
Post-fire flowering
Flowering time
Primary juvenile period
Secondary juvenile period
Seed set
Seed-bank developed
Fire tolerance
Life span
Seed-bank longevity
"Maturity" (from source)
"Extinction" (from source)
"Rec. min fire interval" (from source)
"Rec. max fire interval" (from source)
NC
CC
SC
NT
CT
ST
NWS
CWS
SWS
NWP
SWP
NFWP
SFWP
Distribution: extra NSW
Vegetation
Rainforest
Wet Sclerophyll Forest (Shrubby)
Wet Sclerophyll Forest (Grassy)
Grassy Woodland
Grassland
Dry Sclerophyll Forest (Shrub/Grass)
Dry Sclerophyll Forest (Shrubby)
Heathland
Alpine Complex
Freshwater Wetland
Forested Wetlands
Saline Wetla

## Dealing with hyperlinks

The cell Q6 has a hyperlink. We can use cell rows and columns or cell name:

In [42]:
type(ws.cell(row=6, column=17).hyperlink)
# same as 
type(ws['Q6'].hyperlink)

openpyxl.worksheet.hyperlink.Hyperlink

If the cell is a hyperlink it will have a value to "display" and will point to a "location" within the workbook: 

In [26]:
ws.cell(row=6, column=17).hyperlink.display

'viability average-very good'

In [27]:
# This will fail if there is no hyperlink 
print(ws.cell(row=6, column=17).hyperlink.location)

References!C94


Let's see the value of this reference:

In [28]:
hlink = ws.cell(row=6, column=17).hyperlink.location
hlink = hlink.split("!")

This gives the name of the target sheet and the corresponding cell. We need to read the cell to its right side (add one to the column number) to get the information we need.

In [29]:
ref = wb[hlink[0]]
print("Cell value is :: " + str(ref[hlink[1]].value))
nlink = ref.cell(row=ref[hlink[1]].row,column=ref[hlink[1]].col_idx + 1)

print("Reference data is :: " + nlink.value) 


Cell value is :: 93
Reference data is :: Mortlock, W. & Lloyd, MV (Eds) (2001) Floradata - A guide to collection, storage and propogation of Australian native plant seed. AUsttralian Centre for Mining Environmental Research, Brisbane; Australian National Botanic Gardens, CSIRO Forestry and Forest Products and Greening Australia Limited, Canberra. Searchable Database February 2001. a=survey data, b=test data


If there is no hyperlink, it will result in NoneType

In [30]:
type(ws.cell(row=5, column=17).hyperlink)

NoneType

In [31]:
type(ws.cell(row=5, column=17))


openpyxl.cell.cell.Cell

In [40]:
ws.cell(row=6, column=17)


<Cell 'SpeciesData'.Q6>

## Read data from a column
For a selected variable (column), we can query data for the list of species.


In [41]:
ws['Q6']

<Cell 'SpeciesData'.Q6>

In [57]:
ws.cell(row=2, column=17).value

'Seed viability'

In [60]:
ws.cell(row=6, column=17).value

'average-very good'

Example loop for querying values from one variable for all species in a range of cells:

In [32]:
# for i in 3:3088
# for j in 17
i=3
j=17
varname=ws.cell(row=2, column=j).value
print(varname)
for i in range(13,40):
    spname=ws.cell(row=i, column=1).value
    spcode=ws.cell(row=i, column=2).value
    varvalue=ws.cell(row=i, column=j).value
    varref=ws.cell(row=i, column=j).hyperlink
    if varvalue is not None:
        if varref is not None:
            print("%s: %s / %s / %s" % (spcode,spname,varvalue, varref.location))
        else:
            print("%s: %s / %s " % (spcode,spname,varvalue))


Seed viability
3710: Acacia baileyana / 0.96 / References!C94
3723: Acacia brownii / good / References!C94
3725: Acacia burbidgeae / 0.76 / References!C94
8242: Acacia burkittii / good 
3727: Acacia buxifolia / good / References!C94
3743: Acacia colletioides / 0.837 


## Read data for a row
We can now do the same for a single species (row) and query values of each variable in a range. For example:

In [33]:
# for j in 17
i=18
spname=ws.cell(row=i, column=1).value
spcode=ws.cell(row=i, column=2).value
print("%s: %s" %(spcode,spname))
    
for j in range(3,30):
    varname=ws.cell(row=2, column=j).value
    varvalue=ws.cell(row=i, column=j).value
    varref=ws.cell(row=i, column=j).hyperlink
    if varvalue is not None:
        if varref is not None:
            print("%s: %s / %s / %s" % (j,varname,varvalue, varref.location))
        else:
            print("%s: %s / %s " % (j,varname,varvalue))


3716: Acacia binervata
7: Family / Fabaceae: Mimosoideae 
8: Group / D 
9: Life form / T 
10: Fireresponse / Sr 
11: Comments on regeneration / Resprouting form in northern tablelands (134) 
12: Resprout location / basal buds 
13: Seed storage / persistent soil 
14: Seed dispersal mechanism / a-ant / References!C56
18: Dormancy / hard seed coat 
19: Germination cue / heat / References!C94
22: Post-fire recruitment / prolific / References!C106
23: Establishment / I / References!A94
25: Flowering time / Aug-Nov 
26: Primary juvenile period / 5 / References!C36
27: Secondary juvenile period / References!C36 / References!C36
28: Seed set / 6 / References!C36
29: Seed-bank developed / 10 / References!C36
