# iNaturalist sensitive lists
Taxa in iNaturalist have conservation statuses that the ALA is responsible for maintaining. The process for bulk loads is to submit the data to iNaturalist in December/January using provided templates and checklists:

https://docs.google.com/spreadsheets/d/1yTwWh4d-lHeaBGCB9m70-HKEMtvrquHsPu3Zrgz9BcE/edit#gid=1531097917

Current statuses per iNaturalist taxonID are available in the iNaturalist site export, accessible via an iNaturalist AU site admin and in this repository (inaturalist-australia-9-conservation_statuses.xls)

### Suggested approach:

To update the statuses (eg for Qld), we need to:
1. Find the taxon name for each iNaturalist taxonID in an Australian place. We'll need to match the lists by taxon name.
2. We need to find:
    * New - those on the Qld list that are not on the iNat list (the list I uploaded before had authority: `QLD DEHP` and my user id is 708886).
    * Update - those on the Qld list that need updating (probably most because I feel we should change the authority text and try to link out to the wildnet page for each taxonID)
    * Remove - I expect there will be a few of these

In [1]:
import pandas as pd
import requests
import json
projectdir = "/Users/oco115/PycharmProjects/iNaturalist-lists/"
listdir = "/Users/oco115/PycharmProjects/authoritative-lists/current-lists/"
inatcsv = projectdir + "data/inaturalist-australia-9-conservation_statuses.csv"
outcsv = projectdir + "data/inaturalist-australia.csv"
usercsv = projectdir + "/data/inaturalist-708886-update-statuses.csv"
matchtaxoncsv = projectdir + "/data/inaturalist-qld-match.csv"
newtaxoncsv = projectdir + "/data/inaturalist-qld-newtaxon.csv"
apiurlbase = "https://api.inaturalist.org/v1/taxa/"

In [2]:
## Read inaturalist conservation statuses file
df = pd.read_csv(inatcsv, encoding='UTF-8')
pd.Series(list(df.columns))

0                     id
1               taxon_id
2                user_id
3               place_id
4              source_id
5              authority
6                 status
7                    url
8            description
9             geoprivacy
10                  iucn
11            created_at
12            updated_at
13            place_name
14    place_display_name
dtype: object

### Retrieve all Australian records

 Records are not consistent in place names/locality so we need to:
1. extract records with place_display_name containing 'Australia' or 'AU'
2. extract records manually identified with place_display_name in the list of other place names in australia that are present
3. Merge the 2 extracts - this will result in duplicates that need to be removed

In [3]:
# Identified Australian place names
filterlist = ['Australia', 'Australia Exclusive Economic Zone', 'Australian Capital Territory, AU', 'Brisbane City, Cairns - Pt B, QL, AU', 'Christmas Island', 'New South Wales, AU', 'Norfolk Island', 'Norfolk Island (Phillip Island)', 'Northern Territory, AU', 'Rottnest Island, AU', 'South Australia, AU', 'South Australia, marine waters', 'South East Queensland, QL, AU', 'Tasmania, AU', 'Victoria, AU', 'Western Australia, AU', 'Yarrabah, QL, AU', 'Queensland, AU']

In [4]:
# I have kept each iteration of the dataframe just for debugging/comparison purposes
dfna = df.dropna(subset=['place_display_name']) # need to drop NA in order to search for
df1 = dfna[dfna['place_display_name'].str.contains('Australia')]
df2 = dfna[dfna['place_display_name'].str.contains(', AU')]
dfaus = df1.append(df2, ignore_index=True)
dfana = df.dropna(subset=['authority'])
dfana = dfana.apply(lambda row: row[dfana['place_display_name'].isin(filterlist)])
dfaus1 = dfaus.append(dfana, ignore_index=True)
dfaus1 = dfaus1.drop_duplicates(subset=None, inplace=False)

  dfaus = df1.append(df2, ignore_index=True)
  dfaus1 = dfaus.append(dfana, ignore_index=True)


In [5]:
dfaus1

Unnamed: 0,id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,iucn,created_at,updated_at,place_name,place_display_name
0,166610,1120831.0,3669610.0,6744.0,,Australian Government,VU,http://www.environment.gov.au/epbc,,,30,2021-03-14 08:53:04.517383,2021-03-14 08:53:04.517383,Australia,Australia
1,169816,334758.0,58320.0,6744.0,,Environment Protection and Biodiversity Conser...,Vulnerable,http://www.environment.gov.au/cgi-bin/sprat/pu...,,,30,2021-07-07 22:26:12.780586,2021-07-07 22:26:12.780586,Australia,Australia
2,169818,1262199.0,28.0,6827.0,,WA Department of Environment and Conservation,vulnerable,https://lists.ala.org.au/speciesListItem/list/...,,obscured,30,2021-07-07 22:57:56.655523,2021-07-07 22:57:56.655523,Western Australia,"Western Australia, AU"
3,152965,123278.0,708886.0,6827.0,16654.0,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,40,2019-07-23 00:09:39.861162,2021-03-30 16:58:59.314345,Western Australia,"Western Australia, AU"
4,264550,1377598.0,708886.0,6827.0,16654.0,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,40,2022-05-28 06:48:42.965069,2022-05-28 06:48:42.965069,Western Australia,"Western Australia, AU"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4561,268968,370476.0,3669610.0,6825.0,,New South Wales Office of Environment and Heri...,VU,https://www.environment.nsw.gov.au/threateneds...,,open,30,2022-12-01 01:30:38.793061,2022-12-01 01:30:38.793061,New South Wales,"New South Wales, AU"
4562,268880,1429513.0,708886.0,7308.0,16653.0,QLD DEHP,endangered,https://data.qld.gov.au/dataset/conservation-s...,,obscured,40,2022-11-27 06:12:40.930242,2022-11-27 06:12:40.930242,Queensland,"Queensland, AU"
5136,264584,123102.0,2366151.0,7616.0,,Environmental Protection and Biodiversity Cons...,Endangered,http://www.environment.gov.au/cgi-bin/sprat/pu...,,obscured,40,2022-06-04 22:21:34.088975,2022-06-04 22:21:34.088975,Christmas Island,Christmas Island
5282,165697,1182117.0,3669610.0,73684.0,,Australian Government,CR,http://www.environment.gov.au/biodiversity/thr...,,,50,2021-01-20 05:25:35.010337,2021-01-20 05:30:10.881286,Norfolk Island (Phillip Island),Norfolk Island (Phillip Island)


In [6]:
dfana

Unnamed: 0,id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,iucn,created_at,updated_at,place_name,place_display_name
248,166449,38493.0,1138587.0,7830.0,,Flora and Fauna Guarantee Act 1988,CR,,,obscured,50,2021-03-01 10:35:01.317401,2021-03-01 10:35:01.317401,Victoria,"Victoria, AU"
352,234788,918383.0,702203.0,9994.0,,Atlas of Living Australia,NT,https://bie.ala.org.au/species/https://id.biod...,,,20,2022-01-08 03:30:36.078473,2022-01-08 03:30:36.078473,Northern Territory,"Northern Territory, AU"
381,234789,918383.0,702203.0,7308.0,,Atlas of Living Australia,LC,https://bie.ala.org.au/species/https://id.biod...,,,10,2022-01-08 03:30:36.143044,2022-01-08 03:30:36.143044,Queensland,"Queensland, AU"
457,166416,1033183.0,3669610.0,6825.0,,NSW Office of Environment & Heritage,EN,https://www.environment.nsw.gov.au/threateneds...,,obscured,40,2021-02-22 07:22:28.46345,2021-02-22 07:23:11.418318,New South Wales,"New South Wales, AU"
458,180721,1247288.0,222137.0,6825.0,,NSW Threatened Species Scientific Committee,vu,https://www.environment.nsw.gov.au/topics/anim...,,obscured,30,2021-08-27 06:18:35.700055,2021-08-27 06:18:35.700055,New South Wales,"New South Wales, AU"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
253417,268967,370476.0,3669610.0,7308.0,,Queensland Nature Conservation Act 1992,VU,https://apps.des.qld.gov.au/species-search/det...,,open,30,2022-12-01 01:29:17.802611,2022-12-01 01:29:17.802611,Queensland,"Queensland, AU"
253418,268968,370476.0,3669610.0,6825.0,,New South Wales Office of Environment and Heri...,VU,https://www.environment.nsw.gov.au/threateneds...,,open,30,2022-12-01 01:30:38.793061,2022-12-01 01:30:38.793061,New South Wales,"New South Wales, AU"
253437,268871,960479.0,1138587.0,6744.0,,Environment Protection and Biodiversity Conser...,EN,http://www.environment.gov.au/cgi-bin/sprat/pu...,,obscured,40,2022-11-25 09:34:42.314303,2022-11-25 09:34:42.314303,Australia,Australia
253450,268880,1429513.0,708886.0,7308.0,16653.0,QLD DEHP,endangered,https://data.qld.gov.au/dataset/conservation-s...,,obscured,40,2022-11-27 06:12:40.930242,2022-11-27 06:12:40.930242,Queensland,"Queensland, AU"


### Extract records for User 708886 (Peggy)


In [7]:
checkrecs = dfaus1[dfaus1['user_id']== 708886]
checkrecs['taxon_id'] = checkrecs['taxon_id'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  checkrecs['taxon_id'] = checkrecs['taxon_id'].astype(int)


In [15]:
rlist = []
ct = 0
dfextract = pd.DataFrame(columns=['id','taxonid','taxonname', 'taxonstatus', 'authority', 'taxonurl'])   # create empty dataframe with columns

In [None]:
# Retrieve taxon information and statuses from iNaturalist API

In [16]:
for ind in checkrecs.index:
    # print('record count is: ', ct, 'taxonid is: ', taxonid)
    print(checkrecs['taxon_id'][ind], checkrecs['authority'][ind])
    print('record count is: ', ct, 'taxonid is: ', checkrecs['taxon_id'][ind], 'authority is: ', checkrecs['authority'][ind])
    apiurl = apiurlbase + str(checkrecs['taxon_id'][ind])
    response = requests.request("GET", apiurl)
    rlist.append(json.loads(response.text))
    numstatus = len(rlist[ct]['results'][0]['conservation_statuses'])
    # taxonpname = rlist[ct]['results'][0]['preferred_common_name'] # This field is not always available
    taxonid = checkrecs['taxon_id'][ind]
    inatid = checkrecs['id'][ind]
    authority = checkrecs['authority'][ind]
    taxonname = rlist[ct]['results'][0]['name']
    # Loop through results in JSON record an extract conservation statuses
    # Note: there are multiple records for each species. We need to select the record that has 'authority' matching authority in the input dataset
    # Build final dataframe
    for i in range(numstatus):
        if rlist[ct]['results'][0]['conservation_statuses'][i]['authority'] == checkrecs['authority'][ind]:
            taxonstatus = rlist[ct]['results'][0]['conservation_statuses'][i]['status']
            taxonurl = rlist[ct]['results'][0]['conservation_statuses'][i]['url']
            taxonlist = [inatid, taxonid, taxonname, taxonstatus, authority, taxonurl]
            dfextract.loc[len(dfextract)] = taxonlist
            break

    ct += 1

123278 WA Department of Environment and Convservation
record count is:  0 taxonid is:  123278 authority is:  WA Department of Environment and Convservation
1377598 WA Department of Environment and Convservation
record count is:  1 taxonid is:  1377598 authority is:  WA Department of Environment and Convservation
185435 WA Department of Environment and Convservation
record count is:  2 taxonid is:  185435 authority is:  WA Department of Environment and Convservation
1330928 WA Department of Environment and Convservation
record count is:  3 taxonid is:  1330928 authority is:  WA Department of Environment and Convservation
321088 SA DEWNR
record count is:  4 taxonid is:  321088 authority is:  SA DEWNR
400260 SA DEWNR
record count is:  5 taxonid is:  400260 authority is:  SA DEWNR
369267 WA Department of Environment and Convservation
record count is:  6 taxonid is:  369267 authority is:  WA Department of Environment and Convservation
553179 WA Department of Environment and Convservation
re

In [None]:
# Write dataframe to csv for checking and future use
dfextract.to_csv(usercsv,index = False,encoding='utf-8-sig')

### Extract unique authorities for each state
 * find unique authorities
 * manually determine lists for each state

In [20]:
authlist = df['authority'].unique().tolist()
print(authlist)
qldlocs = ['QLD DEHP', 'Queensland Government', 'Queensland Nature Conservation Act 1992']
# nswlocs = ['NSW Office of Environment & Heritage']
# actlocs = ['ACT Government']
# viclocs = ['VIC Government' 'Victoria Flora and Fauna Guarantee Act 1988', 'Victoria Flora and Fauna Guarantee Act 1988 ']
# salocs = ['SA DEWNR']
# walocs = ['WA Department of Environment and Convservation']
# ntlocs = ['NT NRETAS']

['WA Department of Environment and Convservation', 'SA DEWNR', 'Environment Protection and Biodiversity Conservation Act 1999', 'ACT Government', 'QLD DEHP', 'NSW Office of Environment & Heritage', 'Queensland Government', 'Queensland Nature Conservation Act 1992', 'Victoria Flora and Fauna Guarantee Act 1988', 'Atlas of Living Australia', 'VIC Government', 'NT NRETAS', 'Victoria Flora and Fauna Guarantee Act 1988 ']


list

### Process Qld
* Retrieve ALA Qld sensitive species list
* Extract Qld records from iNat dataframe based on Qld Locations
* Create lists of taxon name for Sensitive List and iNat data, for searching
* Create dataframes of records:
   * in Qld Sensitive list and in iNat - matchdf
   * in Qld Sensitive list but not in iNat -notmatchdf

In [None]:
qldsensitive = pd.read_csv(listdir + "sensitive-lists/QLD-sensitive.csv")
qldinat = df[df['authority'].isin(qldlocs)] # Records for identified Qld Locations
# print(df['taxonstatus'].unique())
# qlddf
# qldsensitive.columns
# qldsensitive

In [22]:
taxsearch1 = qldinat['taxonname'].tolist()  #iNat taxon
taxsearch2 = qldsensitive['scientificName'].tolist() # Qld sensitive List taxon
matchdf = qldinat[qldinat['taxonname'].isin(taxsearch2)]     # in Qld sensitive list and in iNat
nomatchdf = qldsensitive[~qldsensitive['scientificName'].isin(taxsearch1)]  # in Qld Sensitive list but not on iNat


Unnamed: 0,taxonID,kingdom,class,family,scientificName,vernacularName,scientificNameAuthorship,sourceStatus,Significant,status,Endemicity,EPBC Status
1,1376,Animalia,Aves,Estrildidae,Chloebia gouldiae,Gouldian finch,"(Gould, 1844)",E,Y,Endangered,Intranational,Endangered
2,1378,Animalia,Aves,Estrildidae,Erythrura trichroa,blue-faced parrot-finch,"(Kittlitz, 1835)",NT,Y,Near Threatened,Not Endemic to Australia,
3,1370,Animalia,Aves,Estrildidae,Neochmia phaeton evangelinae,crimson finch (white-bellied subspecies),"(Hombron & Jacquinot, 1841)",E,Y,Endangered,Regional Endemic,Endangered
4,1365,Animalia,Aves,Estrildidae,Poephila cincta cincta,black-throated finch (white-rumped subspecies),"Gould, 1837",E,Y,Endangered,Intranational,Endangered
5,1355,Animalia,Aves,Estrildidae,Stagonopleura guttata,diamond firetail,"(Shaw, 1796)",V,Y,Vulnerable,Intranational,
...,...,...,...,...,...,...,...,...,...,...,...,...
947,41354,Plantae,Equisetopsida,Thelypteridaceae,Amblovenatum tildeniae,,(Holttum) T.E.Almeida & A.R.Field,CR,Y,Critically Endangered,Queensland Endemic,
948,9553,Plantae,Equisetopsida,Thelypteridaceae,Chingia australis,,Holttum,E,Y,Endangered,Queensland Endemic,Endangered
949,11646,Plantae,Equisetopsida,Thelypteridaceae,Plesioneuron tuberculatum,,(Ces.) Holttum,E,Y,Endangered,Regional Endemic,Endangered
950,11699,Plantae,Equisetopsida,Thelypteridaceae,Pneumatopteris costata,,(Brack.) Holttum,NT,Y,Near Threatened,Regional Endemic,


### Merge sensitive list and iNat dataframes to include all columns from both
* Rename iNat dataframe column taxonname to scientificName to use as column in merge
* Take the matched rows and compare with status in sensitive list
* Merge List and iNat data frames with matching rows based on taxon

### Records for : taxon matches found in both Sensitive list and iNat

In [None]:
qldinat1 = qldinat.rename(columns={'taxonname': 'scientificName'})
qldinat1
taxmatch = qldinat1.merge(qldsensitive, how = 'inner', on = ['scientificName'])
taxmatch

In [24]:
taxmatch.to_csv(matchtaxoncsv,index = False,encoding='utf-8-sig')

### New records for iNat - taxon in Sensitive list but not in iNat

In [None]:
taxlistfound = taxmatch['scientificName'].tolist()  # iNat taxon
taxonnew = qldsensitive[~qldsensitive['scientificName'].isin(taxlistfound)]
taxonnew.to_csv(newtaxoncsv,index = False,encoding='utf-8-sig')
taxonnew

In [None]:
taxonnew.to_csv(newtaxoncsv,index = False,encoding='utf-8-sig')

## Build iNaturalist Templates
Based on templates found at: https://docs.google.com/spreadsheets/d/1yTwWh4d-lHeaBGCB9m70-HKEMtvrquHsPu3Zrgz9BcE/edit#gid=1531097917


# New Records
* Write New template if update required

** Question? How do we know the taxon_id and iNaturalist Place ID when these are new records???**

In [28]:
newtemplate = pd.DataFrame(columns=['Taxon Name','Status','Authority','IUCN equivalent','Description',
                                    'iNaturalist Place ID','url','Taxon Geoprivacy','Username','taxon_id'])
taxonnew
# newtemplate['Taxon Name'] = taxonnew['scientificName']
# newtemplate['Status'] = taxonnew['scientificName']
# newtemplate['Authority'] = taxonnew['scientificName']
# newtemplate['IUCN equivalent'] = taxonnew['scientificName']
# newtemplate['Description'] = taxonnew['scientificName']
# newtemplate['iNaturalist Place ID'] = taxonnew['scientificName']
# newtemplate['url'] = taxonnew['scientificName']
# newtemplate['Taxon Geoprivacy'] = taxonnew['scientificName']
# newtemplate['Username'] = taxonnew['scientificName']
# newtemplate['taxon_id'] = taxonnew['scientificName']

Unnamed: 0,taxonID,kingdom,class,family,scientificName,vernacularName,scientificNameAuthorship,sourceStatus,Significant,status,Endemicity,EPBC Status
1,1376,Animalia,Aves,Estrildidae,Chloebia gouldiae,Gouldian finch,"(Gould, 1844)",E,Y,Endangered,Intranational,Endangered
2,1378,Animalia,Aves,Estrildidae,Erythrura trichroa,blue-faced parrot-finch,"(Kittlitz, 1835)",NT,Y,Near Threatened,Not Endemic to Australia,
3,1370,Animalia,Aves,Estrildidae,Neochmia phaeton evangelinae,crimson finch (white-bellied subspecies),"(Hombron & Jacquinot, 1841)",E,Y,Endangered,Regional Endemic,Endangered
4,1365,Animalia,Aves,Estrildidae,Poephila cincta cincta,black-throated finch (white-rumped subspecies),"Gould, 1837",E,Y,Endangered,Intranational,Endangered
5,1355,Animalia,Aves,Estrildidae,Stagonopleura guttata,diamond firetail,"(Shaw, 1796)",V,Y,Vulnerable,Intranational,
...,...,...,...,...,...,...,...,...,...,...,...,...
947,41354,Plantae,Equisetopsida,Thelypteridaceae,Amblovenatum tildeniae,,(Holttum) T.E.Almeida & A.R.Field,CR,Y,Critically Endangered,Queensland Endemic,
948,9553,Plantae,Equisetopsida,Thelypteridaceae,Chingia australis,,Holttum,E,Y,Endangered,Queensland Endemic,Endangered
949,11646,Plantae,Equisetopsida,Thelypteridaceae,Plesioneuron tuberculatum,,(Ces.) Holttum,E,Y,Endangered,Regional Endemic,Endangered
950,11699,Plantae,Equisetopsida,Thelypteridaceae,Pneumatopteris costata,,(Brack.) Holttum,NT,Y,Near Threatened,Regional Endemic,


# Records for Update- needs
* Set status to standard terms
* Compare status for sensitive vs iNat
* Write Update template if update required

In [None]:
updatetemplate = pd.DataFrame(columns=['action', 'taxon_name', 'taxon_id', 'status', 'iucn equivalent',
                                    'authority','url', 'geoprivacy', 'place_id', 'username'])


## Peggy's initial code below

In [1]:
# import pandas as pd
#
# projectDir = "/Users/new330/IdeaProjects/authoritative-lists/"
# sourceDataDir = projectDir + "source-data/inaturalist-statuses/"
# processedDataDir = projectDir + "current-lists/inaturalist-statuses/"
#
# qldSensitive = pd.read_csv(processedDataDir + "sensitive-lists/QLD-sensitive.csv")
# qldSensitive

Unnamed: 0,taxonID,kingdom,class,family,scientificName,vernacularName,scientificNameAuthorship,sourceStatus,Significant,status,Endemicity,EPBC Status
0,969,Animalia,Mammalia,Rhinolophidae,Rhinolophus philippinensis,greater large-eared horseshoe bat,"Waterhouse, 1843",E,Y,Endangered,Regional Endemic,Vulnerable
1,1376,Animalia,Aves,Estrildidae,Chloebia gouldiae,Gouldian finch,"(Gould, 1844)",E,Y,Endangered,Intranational,Endangered
2,1378,Animalia,Aves,Estrildidae,Erythrura trichroa,blue-faced parrot-finch,"(Kittlitz, 1835)",NT,Y,Near Threatened,Not Endemic to Australia,
3,1370,Animalia,Aves,Estrildidae,Neochmia phaeton evangelinae,crimson finch (white-bellied subspecies),"(Hombron & Jacquinot, 1841)",E,Y,Endangered,Regional Endemic,Endangered
4,1365,Animalia,Aves,Estrildidae,Poephila cincta cincta,black-throated finch (white-rumped subspecies),"Gould, 1837",E,Y,Endangered,Intranational,Endangered
...,...,...,...,...,...,...,...,...,...,...,...,...
950,11699,Plantae,Equisetopsida,Thelypteridaceae,Pneumatopteris costata,,(Brack.) Holttum,NT,Y,Near Threatened,Regional Endemic,
951,11700,Plantae,Equisetopsida,Thelypteridaceae,Pneumatopteris pennigera,lime fern,(G.Forst.) Holttum,E,Y,Endangered,Not Endemic to Australia,
952,16042,Plantae,Equisetopsida,Thelypteridaceae,Thelypteris confluens,,(Thunb.) C.V.Morton,V,Y,Vulnerable,Not Endemic to Australia,
953,8185,Plantae,Equisetopsida,Proteaceae,Macadamia jansenii,,C.L.Gross & P.H.Weston,CR,Y,Critically Endangered,Queensland Endemic,Endangered
