# Add XML nodes for austraits data

We will download data from [AusTraits](https://austraits.org/) ([pre-print](https://www.biorxiv.org/content/10.1101/2021.01.04.425314v1)) and add nodes for each species.

Let's start loading the libraries

In [1]:
from pathlib import Path
import os
import json
import urllib
from zipfile import ZipFile
import pandas as pd
import xml.etree.cElementTree as ET
import numpy as np
from xml.dom import minidom

## Species list from NSW

We can read the CAPS (whatever that means) table from this (non-permanent?) url:

In [2]:
CAPSurl = 'https://www.environment.nsw.gov.au/resources/wildlifelicences/CAPS.xls'
CAPS = pd.read_excel(CAPSurl, index_col=0)

In [3]:
CAPS

Unnamed: 0_level_0,FamilyName,SortOrder,GenusName,SpeciesName,SubspeciesRank,SubspeciesName,ScientificName,PATNLabel,CommonName,NSWStatus,BioStatus,ExtentType,ConservationType,AdequacyType,TField,LatestTaxonCode,LatestTaxon,LatestTaxonPATNLabel
SpeciesCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1000,Chenopodiaceae,445,Atriplex,cinerea,,,Atriplex cinerea,Atricine,Grey Saltbush,,A,,,,,1000,Atriplex cinerea,Atricine
10000,Scrophulariaceae,687,Linaria,dalmatica,,,Linaria dalmatica,Linadalm,,,I,,,,,10000,Linaria dalmatica,Linadalm
10001,Poaceae,640,Urochloa,fasciculata,var.,reticulata,Urochloa fasciculata var. reticulata,Urocfasc,,,I,,,,,10001,Urochloa fasciculata var. reticulata,Urocfasc
10002,Rutaceae,675,Zieria,smithii,subsp.,smithii,Zieria smithii subsp. smithii,Ziersmis,,,A,,,,,5847,Zieria smithii,Ziersmit
10003,Rutaceae,675,Zieria,smithii,subsp.,tomentosa,Zieria smithii subsp. tomentosa,Ziersmih,,,A,,,,,5847,Zieria smithii,Ziersmit
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ZNGB,Zingiberaceae,736,Zingiber,spp.,,,Zingiber spp.,ZNGBspp.,Ginger,,I,,,,,ZNGB,Zingiber spp.,ZNGBspp.
ZORN,Fabaceae (Faboideae),502,Zornia,spp.,,,Zornia spp.,Zorniass,,,A,,,,,ZORN,Zornia spp.,Zorniass
ZOST,Zosteraceae,737,Zostera,spp.,,,Zostera spp.,Zosteras,,,A,,,,,ZOST,Zostera spp.,Zosteras
ZOYS,Poaceae,640,Zoysia,spp.,,,Zoysia spp.,Zoysiasp,,,I,,,,,ZOYS,Zoysia spp.,Zoysiasp


In [4]:
target = CAPS[CAPS['ScientificName'] == "Actinotus helianthi"] 
target

Unnamed: 0_level_0,FamilyName,SortOrder,GenusName,SpeciesName,SubspeciesRank,SubspeciesName,ScientificName,PATNLabel,CommonName,NSWStatus,BioStatus,ExtentType,ConservationType,AdequacyType,TField,LatestTaxonCode,LatestTaxon,LatestTaxonPATNLabel
SpeciesCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1094,Apiaceae,385,Actinotus,helianthi,,,Actinotus helianthi,Actiheli,Flannel Flower,"P,",A,,,,,1094,Actinotus helianthi,Actiheli


## Read _austraits_ data 
We will download the file from the [Zenodo repository](https://zenodo.org/record/5112001) using the API url and saving this under the data folder.

In [5]:
repodir = Path("../") 
dataset = "https://zenodo.org/api/records/5112001"
outputdir = repodir / "data/austraits/"

if not os.path.isdir(outputdir):
    os.makedirs(outputdir)

We use urllib to open the url and read the data (if successfully connected!)

In [6]:
def getResponse(url):
    operUrl = urllib.request.urlopen(url)
    if(operUrl.getcode()==200):
       data = operUrl.read()
    else:
       print("Error receiving data", operUrl.getcode())
    return data
zrecord = getResponse(dataset)

Response data is in json format, need to parse it:

In [7]:
jsonData = json.loads(zrecord)

In [8]:
jsonData

{'conceptdoi': '10.5281/zenodo.3568417',
 'conceptrecid': '3568417',
 'created': '2021-07-18T06:32:30.575319+00:00',
 'doi': '10.5281/zenodo.5112001',
 'files': [{'bucket': '9c997956-8254-4fcc-a17b-5fe1fd079022',
   'checksum': 'md5:cd7ba1c395b976a02fd4c3c772d88d78',
   'key': 'austraits-3.0.2.rds',
   'links': {'self': 'https://zenodo.org/api/files/9c997956-8254-4fcc-a17b-5fe1fd079022/austraits-3.0.2.rds'},
   'size': 12325324,
   'type': 'rds'},
  {'bucket': '9c997956-8254-4fcc-a17b-5fe1fd079022',
   'checksum': 'md5:ed44176eb71466fe9a4ca1773d6b5961',
   'key': 'austraits-3.0.2.zip',
   'links': {'self': 'https://zenodo.org/api/files/9c997956-8254-4fcc-a17b-5fe1fd079022/austraits-3.0.2.zip'},
   'size': 14738862,
   'type': 'zip'},
  {'bucket': '9c997956-8254-4fcc-a17b-5fe1fd079022',
   'checksum': 'md5:7047ae5b30b1727140000a4daa484722',
   'key': 'dictionary.html',
   'links': {'self': 'https://zenodo.org/api/files/9c997956-8254-4fcc-a17b-5fe1fd079022/dictionary.html'},
   'size': 1

The json data includes a list of files:

In [9]:
for files in jsonData['files']:
    print(files['key'])


austraits-3.0.2.rds
austraits-3.0.2.zip
dictionary.html
NEWS.md
readme.txt


We want to download the zip file with the csv_files

In [10]:
outputfile = outputdir / jsonData['files'][1]['key']

if os.path.isfile(outputfile):
    print('File exists')
else:
    resp = getResponse(jsonData['files'][1]['links']['self'])
    output = open(outputfile,'wb')
    output.write(resp)
    output.close()

File exists


We will read from the zipfile the data that we need:

In [11]:
zfobj = ZipFile(outputfile)
zfobj.namelist()

['austraits-3.0.2/',
 'austraits-3.0.2/taxa.csv',
 'austraits-3.0.2/methods.csv',
 'austraits-3.0.2/definitions.yml',
 'austraits-3.0.2/build_info.md',
 'austraits-3.0.2/contributors.csv',
 'austraits-3.0.2/contexts.csv',
 'austraits-3.0.2/excluded_data.csv',
 'austraits-3.0.2/traits.csv',
 'austraits-3.0.2/taxonomic_updates.csv',
 'austraits-3.0.2/sites.csv',
 'austraits-3.0.2/sources.bib']

### Traits data

In [12]:
ATtraits = pd.read_csv(zfobj.open('austraits-3.0.2/traits.csv'),low_memory=False)

In [33]:
sppname='Actinotus helianthi'
traits=('fire_response_juvenile','fire_response','fire_cued_seeding')

ss = (ATtraits['taxon_name']==sppname) & (ATtraits['trait_name'].isin(traits))
ATtraits[ss]

Unnamed: 0,dataset_id,taxon_name,site_name,context_name,observation_id,trait_name,value,unit,date,value_type,replicates,original_name
217795,Falster_2005_2,Actinotus helianthi,Myall_Lakes,,Falster_2005_2_05,fire_response,fire_killed,,2002-09,expert_mean,,Actinotus helianthi
454654,NSWFRD_2014,Actinotus helianthi,,,NSWFRD_2014_0177,fire_response,fire_killed,,,expert_mean,,Actinotus helianthi


In [34]:
# ATtraits['trait_name'].unique()
ss = (ATtraits['trait_name']=='photosynthetic_pathway' )
ATtraits[ss]

Unnamed: 0,dataset_id,taxon_name,site_name,context_name,observation_id,trait_name,value,unit,date,value_type,replicates,original_name
175021,Cunningham_1999,Acacia binervata,Cunningham_Knights Hill_665m,,Cunningham_1999_01,photosynthetic_pathway,c3,,,expert_mean,,Acacia binervata
175037,Cunningham_1999,Acacia brachybotrya,Cunningham_Nombinnie_160m,,Cunningham_1999_02,photosynthetic_pathway,c3,,,expert_mean,,Acacia brachybotrya
175052,Cunningham_1999,Acacia rigens,Cunningham_Rankins-Springs_170m,,Cunningham_1999_03,photosynthetic_pathway,c3,,,expert_mean,,Acacia rigens
175068,Cunningham_1999,Acacia stricta,Cunningham_Narooma_25m,,Cunningham_1999_04,photosynthetic_pathway,c3,,,expert_mean,,Acacia stricta
175083,Cunningham_1999,Boronia ledifolia,Cunningham_KNP-Waratah_165m,,Cunningham_1999_05,photosynthetic_pathway,c3,,,expert_mean,,Boronia ledifolia
...,...,...,...,...,...,...,...,...,...,...,...,...
969325,Williams_2011,Austrostipa muelleri,,grows in Adelaide,Williams_2011_947,photosynthetic_pathway,c4,,,expert_mean,,Austrostipa muelleri
969334,Williams_2011,Deyeuxia minor,,grows in Adelaide,Williams_2011_948,photosynthetic_pathway,c4,,,expert_mean,,Deyeuxia minor
969360,Williams_2011,Aphanes australiana,,grows in Adelaide,Williams_2011_951,photosynthetic_pathway,c3,,,expert_mean,,Aphanes australiana
969371,Williams_2011,Lomandra filiformis,,grows in Adelaide,Williams_2011_952,photosynthetic_pathway,c3,,,expert_mean,,Lomandra filiformis


### Taxonomic data

We will read this into a pandas data frame:

In [35]:
df = pd.read_csv(zfobj.open('austraits-3.0.2/taxa.csv'))

Check information from one species

In [36]:
target = df[df['taxon_name'] == "Actinotus helianthi"] 
target

Unnamed: 0,taxon_name,source,acceptedNameUsageID,scientificNameAuthorship,taxonRank,taxonomicStatus,family,taxonDistribution,ccAttributionIRI,genus
1665,Actinotus helianthi,APC,https://id.biodiversity.org.au/node/apni/2895645,Labill.,Species,accepted,Apiaceae,"Qld, NSW, Vic (naturalised)",https://id.biodiversity.org.au/tree/51354547/5...,Actinotus


In [37]:
qry = target['acceptedNameUsageID'].values[0]+".json"
qry
#apniData=json.loads(qry) # error?

'https://id.biodiversity.org.au/node/apni/2895645.json'

In [38]:
apniData = json.loads(getResponse(qry))
apniData

{'treeElement': {'class': 'au.org.biodiversity.nsl.TreeElement',
  '_links': {'elementLink': 'https://id.biodiversity.org.au/tree/51631224/51242199',
   'taxonLink': 'https://id.biodiversity.org.au/node/apni/2895645',
   'parentElementLink': 'https://id.biodiversity.org.au/tree/51631224/51375672',
   'nameLink': 'https://id.biodiversity.org.au/name/apni/75081',
   'instanceLink': 'https://id.biodiversity.org.au/instance/apni/755469',
   'sourceElementLink': None},
  'tree': {'class': 'au.org.biodiversity.nsl.Tree',
   '_links': {'permalinks': [{'link': 'https://id.biodiversity.org.au/tree/apni/APC',
      'preferred': True,
      'resources': 1}]},
   'audit': None,
   'name': 'APC'},
  'simpleName': 'Actinotus helianthi',
  'namePath': 'Plantae/Charophyta/Equisetopsida/Magnoliidae/Asteranae/Apiales/Apiaceae/Actinotus/helianthi',
  'treePath': '/51209397/51209398/51209399/51210622/51236316/51241866/51242182/51375672/51242199',
  'displayHtml': '<data><scientific><name data-id=\'75081\'

## Dormancy type

In [46]:
#ATtraits['trait_name'].unique()
ss = (ATtraits['trait_name']=='dormancy_type' )
ATtraits[ss]

Unnamed: 0,dataset_id,taxon_name,site_name,context_name,observation_id,trait_name,value,unit,date,value_type,replicates,original_name
479894,Ooi_2007,Acacia binervata,Fredericktown,,Ooi_2007_00001,dormancy_type,physical_dormancy,,1978-11-21,expert_mean,,Acacia binervata
481466,Ooi_2007,Angophora bakeri,Agnes Banks to Castlereagh,,Ooi_2007_01580,dormancy_type,non_dormant,,1977-02-18,expert_mean,,Angophora bakeri
488008,Ooi_2007,Isopogon anemonifolius,Cordeaux Cataract Catchment,,Ooi_2007_08122,dormancy_type,physiological_dormancy,,1975-08-12,expert_mean,,Isopogon anemonifolius
489445,Ooi_2007,Bulbine bulbosa,Blacktown,,Ooi_2007_09558,dormancy_type,morphophysiological_dormancy,,1974-12-04,expert_mean,,Bulbine bulbosa
489745,Ooi_2007,Calotis cuneifolia,Ashford,,Ooi_2007_09858,dormancy_type,non_dormant physiological_dormancy,,1978-11-25,expert_mean,,Calotis cuneifolia


## Start XML file here

In [47]:
frdbCode='test'
frdbVersion='0.1'
frdbDate='2021-09-29'
sppname='Actinotus helianthi'


# write xml file
xml_dir = repodir / "xml"
if not os.path.isdir(xml_dir):
    os.makedirs(xml_dir)

file_name = xml_dir / sppname.replace(" ","_").replace(".","_").replace("/","_")
xml_file = file_name.with_suffix(".xml")

In [393]:
if os.path.isfile(xml_file):
    print('File exists')
else:
    root = ET.Element("SpeciesList")
    spp = ET.SubElement(root, "Species",code=frdbCode,version=frdbVersion,
                    update=frdbDate)
    ET.SubElement(spp, "Name").text = sppname
    ET.SubElement(spp, "Nomenclature")
    ET.SubElement(spp, "ImportedTraits")
    xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent="   ")
    with open(xml_file,"w") as f:
        f.write(xmlstr) #xmlstr.encode('utf-8')

In [394]:
tree = ET.parse(xml_file)
root = tree.getroot()

In [395]:
for spp in root.iter('Species'):
    impTraits = spp.find('ImportedTraits')

In [405]:
list(impTraits)
traitIds=[]
for trait in impTraits.iter('trait'):
    traitIds.append(trait.get('id'))
set(traitIds)
        

{'Falster_2005_2_05', 'NSWFRD_2014_0177'}

In [397]:
traits=('fire_response_juvenile','fire_response','fire_cued_seeding')
ss = (ATtraits['taxon_name']==sppname) & (ATtraits['trait_name'].isin(traits))
tgtTraits = ATtraits[ss]
tgtTraits

Unnamed: 0,dataset_id,taxon_name,site_name,context_name,observation_id,trait_name,value,unit,date,value_type,replicates,original_name
217795,Falster_2005_2,Actinotus helianthi,Myall_Lakes,,Falster_2005_2_05,fire_response,fire_killed,,2002-09,expert_mean,,Actinotus helianthi
454654,NSWFRD_2014,Actinotus helianthi,,,NSWFRD_2014_0177,fire_response,fire_killed,,,expert_mean,,Actinotus helianthi


Function to add traits based on _austraits_ trait table:

In [398]:
def addTraitNode(node,record):
    item=ET.SubElement(node,'trait',source='AusTraits',version='???',id=record['observation_id'],
                       name_used=record['original_name'])
    ET.SubElement(item,'name',type=record['value_type']).text=record['trait_name'] 
    if ~np.isnan(record['unit']):
        ET.SubElement(item,'value',unit=record['unit']).text=record['value']
    else:
        ET.SubElement(item,'value').text=record['value'] 
    if ~np.isnan(record['replicates']):
        print('replicates')
    ET.SubElement(item,'dataset').text=record['dataset_id']
    if (isinstance(record['site_name'],str)):
        if record['site_name'] != '':
            ET.SubElement(item,'site').text=record['site_name'] 
    if (isinstance(record['date'],str)):
        if record['date'] != '':
            ET.SubElement(item,'date').text=record['date'] 

In [406]:
for index, row in tgtTraits.iterrows():
    if row['observation_id'] in set(traitIds):
        print('skipping existing trait record')
    else:
        addTraitNode(impTraits,row)

skipping existing trait record
skipping existing trait record


In [407]:
list(impTraits)


[<Element 'trait' at 0x7fc420cd2040>, <Element 'trait' at 0x7fc41a60fa40>]

In [408]:
xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent="   ")

In [409]:
with open(xml_file,"w") as f:
    f.write(xmlstr) #xmlstr.encode('utf-8')