# iNaturalist sensitive lists
Taxa in iNaturalist have conservation statuses that the ALA is responsible for maintaining. The process for bulk loads is to submit the data to iNaturalist in December/January using provided templates and checklists:

https://docs.google.com/spreadsheets/d/1yTwWh4d-lHeaBGCB9m70-HKEMtvrquHsPu3Zrgz9BcE/edit#gid=1531097917

Current statuses per iNaturalist taxonID are available in the iNaturalist site export, accessible via an iNaturalist AU site admin and in this repository (inaturalist-australia-9-conservation_statuses.xls)

### Suggested approach:

To update the statuses (eg for Qld), we need to:
1. Find the taxon name for each iNaturalist taxonID in an Australian place. We'll need to match the lists by taxon name.
2. We need to find:
    * New - those on the Qld list that are not on the iNat list (the list I uploaded before had authority: `QLD DEHP` and my user id is 708886).
    * Update - those on the Qld list that need updating (probably most because I feel we should change the authority text and try to link out to the wildnet page for each taxonID)
    * Remove - I expect there will be a few of these

In [51]:
import pandas as pd
import requests
import json
projectdir = "/Users/oco115/PycharmProjects/authoritative-lists/source-data/inaturalist-statuses/"
listdir = "/Users/oco115/PycharmProjects/authoritative-lists/current-lists/"
inatcsv = projectdir + "inaturalist-australia-9-conservation_statuses.csv"
joincsv = projectdir + "inaturalist-qld-outer-join.csv"
usercsv = projectdir + "inaturalist-qld-user-708886.csv"
apiurlbase = "https://api.inaturalist.org/v1/taxa/"

# matchtaxoncsv = projectdir + "/inaturalist-qld-match.csv"
# newtaxoncsv = projectdir + "inaturalist-qld-newtaxon.csv"

In [None]:
## Read inaturalist conservation statuses file
df = pd.read_csv(inatcsv, encoding='UTF-8')

### Extract unique authorities for each state
 * find unique authorities
 * manually determine lists for each state

In [None]:
authlist = df['authority'].unique().tolist()
# print(authlist)
qldauths = ['QLD DEHP', 'Queensland Government', 'Queensland Nature Conservation Act 1992']
# nswlocs = ['NSW Office of Environment & Heritage']
# actlocs = ['ACT Government']
# viclocs = ['VIC Government' 'Victoria Flora and Fauna Guarantee Act 1988', 'Victoria Flora and Fauna Guarantee Act 1988 ']
# salocs = ['SA DEWNR']
# walocs = ['WA Department of Environment and Convservation']
# ntlocs = ['NT NRETAS']

### Retrieve all Australian records

 Records are not consistent in place names/locality so we need to:
1. extract records with place_display_name containing 'Australia' or 'AU'
2. extract records manually identified with place_display_name in the list of other place names in australia that are present
3. Merge the 2 extracts - this will result in duplicates that need to be removed

In [3]:
# Identified Australian place names
filterlistaus = ['Australia', 'Australia Exclusive Economic Zone', 'Australian Capital Territory, AU', 'Brisbane City, Cairns - Pt B, QL, AU', 'Christmas Island', 'New South Wales, AU', 'Norfolk Island', 'Norfolk Island (Phillip Island)', 'Northern Territory, AU', 'Rottnest Island, AU', 'South Australia, AU', 'South Australia, marine waters', 'South East Queensland, QL, AU', 'Tasmania, AU', 'Victoria, AU', 'Western Australia, AU', 'Yarrabah, QL, AU', 'Queensland, AU']
filterlistqld = ['Brisbane City, Cairns - Pt B, QL, AU', 'South East Queensland, QL, AU', 'Yarrabah, QL, AU', 'Queensland, AU']

In [8]:
dfaus = df.apply(lambda row: row[df['place_display_name'].isin(filterlistaus)])  # All Australia
dfqld = df.apply(lambda row: row[df['place_display_name'].isin(filterlistqld)])  # Qld only


In [5]:
dfaus

Unnamed: 0,id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,iucn,created_at,updated_at,place_name,place_display_name
248,166449,38493.0,1138587.0,7830.0,,Flora and Fauna Guarantee Act 1988,CR,,,obscured,50,2021-03-01 10:35:01.317401,2021-03-01 10:35:01.317401,Victoria,"Victoria, AU"
352,234788,918383.0,702203.0,9994.0,,Atlas of Living Australia,NT,https://bie.ala.org.au/species/https://id.biod...,,,20,2022-01-08 03:30:36.078473,2022-01-08 03:30:36.078473,Northern Territory,"Northern Territory, AU"
381,234789,918383.0,702203.0,7308.0,,Atlas of Living Australia,LC,https://bie.ala.org.au/species/https://id.biod...,,,10,2022-01-08 03:30:36.143044,2022-01-08 03:30:36.143044,Queensland,"Queensland, AU"
457,166416,1033183.0,3669610.0,6825.0,,NSW Office of Environment & Heritage,EN,https://www.environment.nsw.gov.au/threateneds...,,obscured,40,2021-02-22 07:22:28.46345,2021-02-22 07:23:11.418318,New South Wales,"New South Wales, AU"
458,180721,1247288.0,222137.0,6825.0,,NSW Threatened Species Scientific Committee,vu,https://www.environment.nsw.gov.au/topics/anim...,,obscured,30,2021-08-27 06:18:35.700055,2021-08-27 06:18:35.700055,New South Wales,"New South Wales, AU"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
253417,268967,370476.0,3669610.0,7308.0,,Queensland Nature Conservation Act 1992,VU,https://apps.des.qld.gov.au/species-search/det...,,open,30,2022-12-01 01:29:17.802611,2022-12-01 01:29:17.802611,Queensland,"Queensland, AU"
253418,268968,370476.0,3669610.0,6825.0,,New South Wales Office of Environment and Heri...,VU,https://www.environment.nsw.gov.au/threateneds...,,open,30,2022-12-01 01:30:38.793061,2022-12-01 01:30:38.793061,New South Wales,"New South Wales, AU"
253437,268871,960479.0,1138587.0,6744.0,,Environment Protection and Biodiversity Conser...,EN,http://www.environment.gov.au/cgi-bin/sprat/pu...,,obscured,40,2022-11-25 09:34:42.314303,2022-11-25 09:34:42.314303,Australia,Australia
253450,268880,1429513.0,708886.0,7308.0,16653.0,QLD DEHP,endangered,https://data.qld.gov.au/dataset/conservation-s...,,obscured,40,2022-11-27 06:12:40.930242,2022-11-27 06:12:40.930242,Queensland,"Queensland, AU"


In [9]:
dfqld

Unnamed: 0,id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,iucn,created_at,updated_at,place_name,place_display_name
381,234789,918383.0,702203.0,7308.0,,Atlas of Living Australia,LC,https://bie.ala.org.au/species/https://id.biod...,,,10,2022-01-08 03:30:36.143044,2022-01-08 03:30:36.143044,Queensland,"Queensland, AU"
510,223427,1255510.0,3669610.0,7308.0,,Queensland Nature Conservation Act 1992,VU,https://apps.des.qld.gov.au/species-search/det...,,,30,2021-10-18 22:35:58.066769,2021-10-18 22:35:58.066769,Queensland,"Queensland, AU"
776,164339,577809.0,58320.0,7308.0,,Queensland Nature Conservation Act 1992,Near threatened,https://apps.des.qld.gov.au/species-search/det...,,,20,2020-11-27 08:28:40.943012,2021-03-29 02:05:31.717312,Queensland,"Queensland, AU"
790,180872,1255393.0,702203.0,7308.0,,Atlas of Living Australia,NT,https://bie.ala.org.au/species/https://id.biod...,,,20,2021-09-10 01:07:13.003067,2021-09-10 01:07:13.003067,Queensland,"Queensland, AU"
890,169815,334758.0,58320.0,7308.0,,Queensland Nature Conservation Act 1992,Vulnerable,https://apps.des.qld.gov.au/species-search/det...,,,30,2021-07-07 22:26:12.687826,2021-07-07 22:26:12.687826,Queensland,"Queensland, AU"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
252711,152662,796558.0,708886.0,7308.0,16653.0,QLD DEHP,Endangered,https://data.qld.gov.au/dataset/conservation-s...,,obscured,40,2019-07-23 00:08:26.487819,2022-06-14 18:46:14.342413,Queensland,"Queensland, AU"
253258,264341,208164.0,3669610.0,7308.0,,,CR,https://apps.des.qld.gov.au/species-search/det...,,open,50,2022-05-22 04:03:22.228283,2022-10-25 08:01:36.194252,Queensland,"Queensland, AU"
253399,152813,321109.0,708886.0,7308.0,16653.0,QLD DEHP,vulnerable,https://data.qld.gov.au/dataset/conservation-s...,,obscured,30,2019-07-23 00:09:03.514996,2022-11-29 18:55:43.920264,Queensland,"Queensland, AU"
253417,268967,370476.0,3669610.0,7308.0,,Queensland Nature Conservation Act 1992,VU,https://apps.des.qld.gov.au/species-search/det...,,open,30,2022-12-01 01:29:17.802611,2022-12-01 01:29:17.802611,Queensland,"Queensland, AU"


### Extract records for User 708886 (Peggy)


In [10]:
checkrecs = dfqld[dfqld['user_id']== 708886] # for output of only those updated by user 708886
# checkrecs = dfana     # for output of full mergeof Qld
checkrecs['taxon_id'] = checkrecs['taxon_id'].astype(int)
checkrecs['user_id'] = checkrecs['user_id'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  checkrecs['taxon_id'] = checkrecs['taxon_id'].astype(int)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  checkrecs['user_id'] = checkrecs['user_id'].astype(int)


In [12]:
rlist = []
ct = 0
dfextract = pd.DataFrame(columns=['id','taxonid','taxonname', 'taxonstatus', 'authority', 'taxonurl', 'user_id'])   # create empty dataframe with columns

In [None]:
# Retrieve taxon information and statuses from iNaturalist API

In [13]:

for ind in checkrecs.index:
    # print('record count is: ', ct, 'taxonid is: ', taxonid)
    print(checkrecs['taxon_id'][ind], checkrecs['authority'][ind])
    print('record count is: ', ct, 'taxonid is: ', checkrecs['taxon_id'][ind], 'authority is: ', checkrecs['authority'][ind])
    apiurl = apiurlbase + str(checkrecs['taxon_id'][ind])
    response = requests.request("GET", apiurl)
    rlist.append(json.loads(response.text))
    numstatus = len(rlist[ct]['results'][0]['conservation_statuses'])
    # taxonpname = rlist[ct]['results'][0]['preferred_common_name'] # This field is not always available
    taxonid = checkrecs['taxon_id'][ind]
    inatid = checkrecs['id'][ind]
    authority = checkrecs['authority'][ind]
    userid = checkrecs['user_id'][ind]
    taxonname = rlist[ct]['results'][0]['name']
    # Loop through results in JSON record an extract conservation statuses
    # Note: there are multiple records for each species. We need to select the record that has 'authority' matching authority in the input dataset
    # Build final dataframe
    for i in range(numstatus):
        if rlist[ct]['results'][0]['conservation_statuses'][i]['authority'] == checkrecs['authority'][ind]:
            taxonstatus = rlist[ct]['results'][0]['conservation_statuses'][i]['status']
            taxonurl = rlist[ct]['results'][0]['conservation_statuses'][i]['url']
            taxonlist = [inatid, taxonid, taxonname, taxonstatus, authority, taxonurl,userid]
            dfextract.loc[len(dfextract)] = taxonlist
            break

    ct += 1
# Write dataframe to csv for checking and future use
dfextract.to_csv(usercsv,index = False,encoding='utf-8-sig')

83578 QLD DEHP
record count is:  0 taxonid is:  83578 authority is:  QLD DEHP
370122 QLD DEHP
record count is:  1 taxonid is:  370122 authority is:  QLD DEHP
369261 QLD DEHP
record count is:  2 taxonid is:  369261 authority is:  QLD DEHP
83579 QLD DEHP
record count is:  3 taxonid is:  83579 authority is:  QLD DEHP
898148 QLD DEHP
record count is:  4 taxonid is:  898148 authority is:  QLD DEHP
135897 QLD DEHP
record count is:  5 taxonid is:  135897 authority is:  QLD DEHP
148240 QLD DEHP
record count is:  6 taxonid is:  148240 authority is:  QLD DEHP
140454 Queensland Government
record count is:  7 taxonid is:  140454 authority is:  Queensland Government
82321 Queensland Government
record count is:  8 taxonid is:  82321 authority is:  Queensland Government
83522 QLD DEHP
record count is:  9 taxonid is:  83522 authority is:  QLD DEHP
369235 Queensland Nature Conservation Act 1992
record count is:  10 taxonid is:  369235 authority is:  Queensland Nature Conservation Act 1992
332220 Queens

In [46]:
# For Qld and user 708886
dfextract = pd.read_csv(usercsv)  # Qld sensitive list

### Process Qld
* Retrieve ALA Qld sensitive species list
* Extract Qld records from iNat dataframe based on Qld Locations
* Create lists of taxon name for Sensitive List and iNat data, for searching
* Create dataframes of records:
   * in Qld Sensitive list and in iNat - matchdf
   * in Qld Sensitive list but not in iNat -notmatchdf

In [47]:
qldsensitive = pd.read_csv(listdir + "sensitive-lists/QLD-sensitive.csv")  # Qld sensitive list
# qldinat = dfextract[dfextract['authority'].isin(qldauths)] # not all qld authority in list???

In [39]:
taxsearch1 = dfextract['taxonname'].tolist()  #iNat taxon
taxsearch2 = qldsensitive['scientificName'].tolist() # Qld sensitive List taxon
matchdf = dfextract[dfextract['taxonname'].isin(taxsearch2)]     # in Qld sensitive list and in iNat
nomatchdf = qldsensitive[~qldsensitive['scientificName'].isin(taxsearch1)]  # in Qld Sensitive list but not on iNat

### Merge sensitive list and iNat dataframes to include all columns from both
* Take the matched rows and compare with status in sensitive list
* Merge List and iNat data frames with matching rows based on taxon

In [49]:
# taxmatch = qldinat1.merge(qldsensitive, how = 'inner', on = ['scientificName'])
taxouter =  dfextract.merge(qldsensitive, how = 'outer', left_on = 'taxonname', right_on='scientificName')
# taxouter =  dfextract.merge(qldsensitive, how = 'outer', indicator = True, left_on = 'taxonname', right_on='scientificName')
taxouter

Unnamed: 0,id,taxonid,taxonname,taxonstatus,authority,taxonurl,user_id,taxonID,kingdom,class,family,scientificName,vernacularName,scientificNameAuthorship,sourceStatus,Significant,status,Endemicity,EPBC Status
0,152478.0,83578.0,Dendrobium kingianum,LC,QLD DEHP,https://data.qld.gov.au/dataset/conservation-s...,708886.0,22382.0,Plantae,Equisetopsida,Orchidaceae,Dendrobium kingianum,,Bidwill ex Lindl.,SL,Y,Special least concern,Queensland Endemic,
1,152527.0,370122.0,Liparis nugentiae,LC,QLD DEHP,https://data.qld.gov.au/dataset/conservation-s...,708886.0,12772.0,Plantae,Equisetopsida,Orchidaceae,Liparis nugentiae,,F.M.Bailey,C,N,Least concern,Queensland Endemic,
2,152549.0,369261.0,Calanthe triplicata,LC,QLD DEHP,https://data.qld.gov.au/dataset/conservation-s...,708886.0,14760.0,Plantae,Equisetopsida,Orchidaceae,Calanthe triplicata,christmas orchid,(Willemet) Ames,SL,Y,Special least concern,Not Endemic to Australia,
3,152563.0,83579.0,Dendrobium aemulum,LC,QLD DEHP,https://data.qld.gov.au/dataset/conservation-s...,708886.0,13280.0,Plantae,Equisetopsida,Orchidaceae,Dendrobium aemulum,ironbark orchid,R.Br.,SL,Y,Special least concern,Intranational,
4,167753.0,898148.0,Phlegmariurus verticillatus,endangered,QLD DEHP,https://data.qld.gov.au/dataset/conservation-s...,708886.0,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
991,,,,,,,,41354.0,Plantae,Equisetopsida,Thelypteridaceae,Amblovenatum tildeniae,,(Holttum) T.E.Almeida & A.R.Field,CR,Y,Critically Endangered,Queensland Endemic,
992,,,,,,,,9553.0,Plantae,Equisetopsida,Thelypteridaceae,Chingia australis,,Holttum,E,Y,Endangered,Queensland Endemic,Endangered
993,,,,,,,,11646.0,Plantae,Equisetopsida,Thelypteridaceae,Plesioneuron tuberculatum,,(Ces.) Holttum,E,Y,Endangered,Regional Endemic,Endangered
994,,,,,,,,11699.0,Plantae,Equisetopsida,Thelypteridaceae,Pneumatopteris costata,,(Brack.) Holttum,NT,Y,Near Threatened,Regional Endemic,


In [52]:
taxouter.to_csv(joincsv, index = False,encoding='utf-8-sig')

### New records for iNat - taxon in Sensitive list but not in iNat

In [None]:
# taxlistfound = taxmatch['scientificName'].tolist()  # iNat taxon
# taxlistfound = dfextract['scientificName'].tolist()  # iNat taxon from whole dataset
# taxonnew = qldsensitive[~qldsensitive['scientificName'].isin(taxlistfound)]
# taxonnew.to_csv(newtaxoncsv,index = False,encoding='utf-8-sig')


In [None]:
# taxonnew.to_csv(newtaxoncsv,index = False,encoding='utf-8-sig')

## Build iNaturalist Templates - placeholder for now
Based on templates found at: https://docs.google.com/spreadsheets/d/1yTwWh4d-lHeaBGCB9m70-HKEMtvrquHsPu3Zrgz9BcE/edit#gid=1531097917


# New Records
* Write New template if update required

** Question? How do we know the taxon_id and iNaturalist Place ID when these are new records???**

In [28]:
newtemplate = pd.DataFrame(columns=['Taxon Name','Status','Authority','IUCN equivalent','Description',
                                    'iNaturalist Place ID','url','Taxon Geoprivacy','Username','taxon_id'])
# newtemplate['Taxon Name'] = taxonnew['scientificName']
# newtemplate['Status'] = taxonnew['scientificName']
# newtemplate['Authority'] = taxonnew['scientificName']
# newtemplate['IUCN equivalent'] = taxonnew['scientificName']
# newtemplate['Description'] = taxonnew['scientificName']
# newtemplate['iNaturalist Place ID'] = taxonnew['scientificName']
# newtemplate['url'] = taxonnew['scientificName']
# newtemplate['Taxon Geoprivacy'] = taxonnew['scientificName']
# newtemplate['Username'] = taxonnew['scientificName']
# newtemplate['taxon_id'] = taxonnew['scientificName']

Unnamed: 0,taxonID,kingdom,class,family,scientificName,vernacularName,scientificNameAuthorship,sourceStatus,Significant,status,Endemicity,EPBC Status
1,1376,Animalia,Aves,Estrildidae,Chloebia gouldiae,Gouldian finch,"(Gould, 1844)",E,Y,Endangered,Intranational,Endangered
2,1378,Animalia,Aves,Estrildidae,Erythrura trichroa,blue-faced parrot-finch,"(Kittlitz, 1835)",NT,Y,Near Threatened,Not Endemic to Australia,
3,1370,Animalia,Aves,Estrildidae,Neochmia phaeton evangelinae,crimson finch (white-bellied subspecies),"(Hombron & Jacquinot, 1841)",E,Y,Endangered,Regional Endemic,Endangered
4,1365,Animalia,Aves,Estrildidae,Poephila cincta cincta,black-throated finch (white-rumped subspecies),"Gould, 1837",E,Y,Endangered,Intranational,Endangered
5,1355,Animalia,Aves,Estrildidae,Stagonopleura guttata,diamond firetail,"(Shaw, 1796)",V,Y,Vulnerable,Intranational,
...,...,...,...,...,...,...,...,...,...,...,...,...
947,41354,Plantae,Equisetopsida,Thelypteridaceae,Amblovenatum tildeniae,,(Holttum) T.E.Almeida & A.R.Field,CR,Y,Critically Endangered,Queensland Endemic,
948,9553,Plantae,Equisetopsida,Thelypteridaceae,Chingia australis,,Holttum,E,Y,Endangered,Queensland Endemic,Endangered
949,11646,Plantae,Equisetopsida,Thelypteridaceae,Plesioneuron tuberculatum,,(Ces.) Holttum,E,Y,Endangered,Regional Endemic,Endangered
950,11699,Plantae,Equisetopsida,Thelypteridaceae,Pneumatopteris costata,,(Brack.) Holttum,NT,Y,Near Threatened,Regional Endemic,


# Records for Update- needs
* Set status to standard terms
* Compare status for sensitive vs iNat
* Write Update template if update required

In [None]:
updatetemplate = pd.DataFrame(columns=['action', 'taxon_name', 'taxon_id', 'status', 'iucn equivalent',
                                    'authority','url', 'geoprivacy', 'place_id', 'username'])
