# iNaturalist sensitive lists
Taxa in iNaturalist have conservation statuses that the ALA is responsible for maintaining. The process for bulk loads is to submit the data to iNaturalist in December/January using provided templates and checklists:

https://docs.google.com/spreadsheets/d/1yTwWh4d-lHeaBGCB9m70-HKEMtvrquHsPu3Zrgz9BcE/edit#gid=1531097917

Current statuses per iNaturalist taxonID are available in the iNaturalist site export, accessible via an iNaturalist AU site admin and in this repository (inaturalist-australia-9-conservation_statuses.xls)

### Suggested approach:

To update the statuses (eg for Qld), we need to:
1. Find the taxon name for each iNaturalist taxonID in an Australian place. We'll need to match the lists by taxon name.
2. We need to find:
    * New - those on the Qld list that are not on the iNat list (the list I uploaded before had authority: `QLD DEHP` and my user id is 708886).
    * Update - those on the Qld list that need updating (probably most because I feel we should change the authority text and try to link out to the wildnet page for each taxonID)
    * Remove - I expect there will be a few of these

In [51]:
import pandas as pd
projectdir = "/Users/oco115/PycharmProjects/authoritative-lists/source-data/inaturalist-statuses/"
listdir = "/Users/oco115/PycharmProjects/authoritative-lists/current-lists/"
inatcsv = projectdir + "inaturalist-australia-9-conservation_statuses.csv"
joincsv = projectdir + "inaturalist-qld-outer-join.csv"
usercsv = projectdir + "inaturalist-qld-user-708886.csv"
apiurlbase = "https://api.inaturalist.org/v1/taxa/"


In [None]:
## Read inaturalist conservation statuses file
df = pd.read_csv(inatcsv, encoding='UTF-8')

### Extract unique authorities for each state
 * find unique authorities
 * manually determine lists for each state

In [None]:
authlist = df['authority'].unique().tolist()
qldauths = ['QLD DEHP', 'Queensland Government', 'Queensland Nature Conservation Act 1992']
# Identified Australian place names
filterlistaus = ['Australia', 'Australia Exclusive Economic Zone', 'Australian Capital Territory, AU', 'Brisbane City, Cairns - Pt B, QL, AU', 'Christmas Island', 'New South Wales, AU', 'Norfolk Island', 'Norfolk Island (Phillip Island)', 'Northern Territory, AU', 'Rottnest Island, AU', 'South Australia, AU', 'South Australia, marine waters', 'South East Queensland, QL, AU', 'Tasmania, AU', 'Victoria, AU', 'Western Australia, AU', 'Yarrabah, QL, AU', 'Queensland, AU']
filterlistqld = ['Brisbane City, Cairns - Pt B, QL, AU', 'South East Queensland, QL, AU', 'Yarrabah, QL, AU', 'Queensland, AU']

### Retrieve all Australian records

 Records are not consistent in place names/locality so we need to:
1. extract records with place_display_name containing 'Australia' or 'AU'
2. extract records manually identified with place_display_name in the list of other place names in australia that are present
3. Merge the 2 extracts - this will result in duplicates that need to be removed

In [8]:
dfaus = df.apply(lambda row: row[df['place_display_name'].isin(filterlistaus)])  # All Australia
dfqld = df.apply(lambda row: row[df['place_display_name'].isin(filterlistqld)])  # Qld only

### Extract Qld records for user 70886 (Peggy)

In [None]:
checkrecs = dfqld[dfqld['user_id']== 708886] # for output of only those updated by user 708886
checkrecs['taxon_id'] = checkrecs['taxon_id'].astype(int)
checkrecs['user_id'] = checkrecs['user_id'].astype(int)

### Retrieve taxon information and statuses from iNaturalist API

In [None]:
rlist = []
ct = 0
dfextract = pd.DataFrame(columns=['id','taxonid','taxonname', 'taxonstatus', 'authority', 'taxonurl', 'user_id'])

### Process Qld
* Read extract csv created in previous cell run
* Retrieve ALA Qld sensitive species list
* Extract Qld records from iNat dataframe based on Qld Locations
* Create lists of taxon name for Sensitive List and iNat data, for searching
* Create dataframes of records:
   * in Qld Sensitive list and in iNat - matchdf
   * in Qld Sensitive list but not in iNat -notmatchdf

In [47]:
dfextract = pd.read_csv(usercsv)  # Qld sensitive list
qldsensitive = pd.read_csv(listdir + "sensitive-lists/QLD-sensitive.csv")  # Qld sensitive list
# qldinat = dfextract[dfextract['authority'].isin(qldauths)] # not all qld authority in list???

In [39]:
# Test code checking for matches and non-matches Not required but was used earlier
# taxsearch1 = dfextract['taxonname'].tolist()  #iNat taxon
# taxsearch2 = qldsensitive['scientificName'].tolist() # Qld sensitive List taxon
# matchdf = dfextract[dfextract['taxonname'].isin(taxsearch2)]     # in Qld sensitive list and in iNat
# nomatchdf = qldsensitive[~qldsensitive['scientificName'].isin(taxsearch1)]  # in Qld Sensitive list but not on iNat

### Merge sensitive list and iNat dataframes to include all columns from both
* Take the matched rows and compare with status in sensitive list
* Merge List and iNat data frames with matching rows based on taxon

In [49]:
# taxmatch = qldinat1.merge(qldsensitive, how = 'inner', on = ['scientificName'])
taxouter =  dfextract.merge(qldsensitive, how = 'outer', left_on = 'taxonname', right_on='scientificName')
# taxouter =  dfextract.merge(qldsensitive, how = 'outer', indicator = True, left_on = 'taxonname', right_on='scientificName')
# taxouter

Unnamed: 0,id,taxonid,taxonname,taxonstatus,authority,taxonurl,user_id,taxonID,kingdom,class,family,scientificName,vernacularName,scientificNameAuthorship,sourceStatus,Significant,status,Endemicity,EPBC Status
0,152478.0,83578.0,Dendrobium kingianum,LC,QLD DEHP,https://data.qld.gov.au/dataset/conservation-s...,708886.0,22382.0,Plantae,Equisetopsida,Orchidaceae,Dendrobium kingianum,,Bidwill ex Lindl.,SL,Y,Special least concern,Queensland Endemic,
1,152527.0,370122.0,Liparis nugentiae,LC,QLD DEHP,https://data.qld.gov.au/dataset/conservation-s...,708886.0,12772.0,Plantae,Equisetopsida,Orchidaceae,Liparis nugentiae,,F.M.Bailey,C,N,Least concern,Queensland Endemic,
2,152549.0,369261.0,Calanthe triplicata,LC,QLD DEHP,https://data.qld.gov.au/dataset/conservation-s...,708886.0,14760.0,Plantae,Equisetopsida,Orchidaceae,Calanthe triplicata,christmas orchid,(Willemet) Ames,SL,Y,Special least concern,Not Endemic to Australia,
3,152563.0,83579.0,Dendrobium aemulum,LC,QLD DEHP,https://data.qld.gov.au/dataset/conservation-s...,708886.0,13280.0,Plantae,Equisetopsida,Orchidaceae,Dendrobium aemulum,ironbark orchid,R.Br.,SL,Y,Special least concern,Intranational,
4,167753.0,898148.0,Phlegmariurus verticillatus,endangered,QLD DEHP,https://data.qld.gov.au/dataset/conservation-s...,708886.0,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
991,,,,,,,,41354.0,Plantae,Equisetopsida,Thelypteridaceae,Amblovenatum tildeniae,,(Holttum) T.E.Almeida & A.R.Field,CR,Y,Critically Endangered,Queensland Endemic,
992,,,,,,,,9553.0,Plantae,Equisetopsida,Thelypteridaceae,Chingia australis,,Holttum,E,Y,Endangered,Queensland Endemic,Endangered
993,,,,,,,,11646.0,Plantae,Equisetopsida,Thelypteridaceae,Plesioneuron tuberculatum,,(Ces.) Holttum,E,Y,Endangered,Regional Endemic,Endangered
994,,,,,,,,11699.0,Plantae,Equisetopsida,Thelypteridaceae,Pneumatopteris costata,,(Brack.) Holttum,NT,Y,Near Threatened,Regional Endemic,


In [52]:
taxouter.to_csv(joincsv, index = False,encoding='utf-8-sig')

In [None]:
# taxlistfound = taxmatch['scientificName'].tolist()  # iNat taxon
# taxlistfound = dfextract['scientificName'].tolist()  # iNat taxon from whole dataset
# taxonnew = qldsensitive[~qldsensitive['scientificName'].isin(taxlistfound)]
# taxonnew.to_csv(newtaxoncsv,index = False,encoding='utf-8-sig')


In [None]:
# taxonnew.to_csv(newtaxoncsv,index = False,encoding='utf-8-sig')

### Build iNaturalist Templates - placeholder for now
Based on templates found at: https://docs.google.com/spreadsheets/d/1yTwWh4d-lHeaBGCB9m70-HKEMtvrquHsPu3Zrgz9BcE/edit#gid=1531097917


In [28]:
newtemplate = pd.DataFrame(columns=['Taxon Name','Status','Authority','IUCN equivalent','Description',
                                    'iNaturalist Place ID','url','Taxon Geoprivacy','Username','taxon_id'])
# newtemplate['Taxon Name'] = taxonnew['scientificName']
# newtemplate['Status'] = taxonnew['scientificName']
# newtemplate['Authority'] = taxonnew['scientificName']
# newtemplate['IUCN equivalent'] = taxonnew['scientificName']
# newtemplate['Description'] = taxonnew['scientificName']
# newtemplate['iNaturalist Place ID'] = taxonnew['scientificName']
# newtemplate['url'] = taxonnew['scientificName']
# newtemplate['Taxon Geoprivacy'] = taxonnew['scientificName']
# newtemplate['Username'] = taxonnew['scientificName']
# newtemplate['taxon_id'] = taxonnew['scientificName']

Unnamed: 0,taxonID,kingdom,class,family,scientificName,vernacularName,scientificNameAuthorship,sourceStatus,Significant,status,Endemicity,EPBC Status
1,1376,Animalia,Aves,Estrildidae,Chloebia gouldiae,Gouldian finch,"(Gould, 1844)",E,Y,Endangered,Intranational,Endangered
2,1378,Animalia,Aves,Estrildidae,Erythrura trichroa,blue-faced parrot-finch,"(Kittlitz, 1835)",NT,Y,Near Threatened,Not Endemic to Australia,
3,1370,Animalia,Aves,Estrildidae,Neochmia phaeton evangelinae,crimson finch (white-bellied subspecies),"(Hombron & Jacquinot, 1841)",E,Y,Endangered,Regional Endemic,Endangered
4,1365,Animalia,Aves,Estrildidae,Poephila cincta cincta,black-throated finch (white-rumped subspecies),"Gould, 1837",E,Y,Endangered,Intranational,Endangered
5,1355,Animalia,Aves,Estrildidae,Stagonopleura guttata,diamond firetail,"(Shaw, 1796)",V,Y,Vulnerable,Intranational,
...,...,...,...,...,...,...,...,...,...,...,...,...
947,41354,Plantae,Equisetopsida,Thelypteridaceae,Amblovenatum tildeniae,,(Holttum) T.E.Almeida & A.R.Field,CR,Y,Critically Endangered,Queensland Endemic,
948,9553,Plantae,Equisetopsida,Thelypteridaceae,Chingia australis,,Holttum,E,Y,Endangered,Queensland Endemic,Endangered
949,11646,Plantae,Equisetopsida,Thelypteridaceae,Plesioneuron tuberculatum,,(Ces.) Holttum,E,Y,Endangered,Regional Endemic,Endangered
950,11699,Plantae,Equisetopsida,Thelypteridaceae,Pneumatopteris costata,,(Brack.) Holttum,NT,Y,Near Threatened,Regional Endemic,


### Records for Update- needs
* Set status to standard terms
* Compare status for sensitive vs iNat
* Write Update template if update required

In [None]:
updatetemplate = pd.DataFrame(columns=['action', 'taxon_name', 'taxon_id', 'status', 'iucn equivalent',
                                    'authority','url', 'geoprivacy', 'place_id', 'username'])
