# iNaturalist status updates by state

Using the file produced in the collate-status-taxa.ipynb: `inat-aust-status-taxa.csv` (statuses joined to taxa names), generate lists to update iNaturalist statuses

## Prep - common to all states
1. Read in the inaturalist statuses & filter out this state
2. Read in the state conservation and sensitive lists
3. Prep fields incl IUCN equivalent mappings and matching to iNat taxonomy  
4. Merge and compare the state and inaturalist lists
5. Create update/removals list
6. Create additions list
7. Save files

## 1. Read in the inaturalist statuses & filter out NT

In [1]:
import pandas as pd
import sys
import os
projectdir = os.path.dirname(os.getcwd()) + "/" # parent dir of cwd
sourcedir = projectdir + "data/in/"
sys.path.append(os.path.abspath(projectdir + "notebooks/includes/"))
import list_functions  as lf

# read in the statuses file
taxastatus = pd.read_csv(sourcedir + "inat-aust-status-taxa.csv", encoding='UTF-8',na_filter=False,dtype=str)

# filter out ACT entries
def filter_state_statuses(stateregex: str, urlregex: str):
    authoritydf = taxastatus['authority'].drop_duplicates().sort_values()
    authoritydf = authoritydf[pd.Series(authoritydf).str.contains(stateregex)]
    urldf = taxastatus['url'].drop_duplicates().sort_values()
    urldf = urldf[pd.Series(urldf).str.contains(urlregex)]
    placedisplaydf = taxastatus['place_display_name'].drop_duplicates().sort_values()
    placedisplaydf = placedisplaydf[pd.Series(placedisplaydf).str.contains(stateregex)]
    placedf = taxastatus['place_name'].drop_duplicates().sort_values()
    placedf = placedf[pd.Series(placedf).str.contains(stateregex)]
    # concat all and remove duplicates
    statedf = pd.concat([taxastatus.apply(lambda row: row[taxastatus['place_display_name'].isin(placedisplaydf)]),
                         taxastatus.apply(lambda row: row[taxastatus['place_name'].isin(placedf)]),
                         taxastatus.apply(lambda row: row[taxastatus['url'].isin(urldf)]),
                         taxastatus.apply(
                             lambda row: row[taxastatus['authority'].isin(authoritydf)])]).drop_duplicates()
    return statedf.sort_values(['taxon_id', 'user_id'])

inatstatuses = filter_state_statuses("Northern Territory|NT NRETAS", " ")
inatstatuses = inatstatuses.add_prefix("inat_")
inatstatuses.groupby(['inat_status']).size()

inat_status
Endangered         1
LC                 3
NT                 1
Threatened       142
VU                 1
Vulnerable         1
endangered         6
least concern      1
dtype: int64

## 2. Read in the State lists
Read in the state lists 
 * Set conservation list value `authority` = "Territory Parks and Wildlife Conservation Act"
 * Set sensitive list value: `geoprivacy` = `obscured`
  * Northern Territory Department of Environment and Natural Resources 
    * https://nt.gov.au/environment

In [2]:
# %%script echo skipping # comment this line to download dataset from lists.ala.org.au the web and save locally
# Download lists data. Retrieve binomial and trinomial names from GBIF. Save locally to CSV

sensitivelist = lf.download_ala_specieslist("https://lists.ala.org.au/ws/speciesListItems/dr492?max=10000&includeKVP=true")
sensitivelist = lf.kvp_to_columns(sensitivelist)
sensitivelist.to_csv(sourcedir + "state-lists/nt-ala-sensitive.csv", index=False)

conservationlist = lf.download_ala_specieslist("https://lists.ala.org.au/ws/speciesListItems/dr651?max=10000&includeKVP=true")
conservationlist = lf.kvp_to_columns(conservationlist)
conservationlist.to_csv(sourcedir + "state-lists/nt-ala-conservation.csv", index=False)

In [3]:
# Read state lists and merge
sensitivelist = pd.read_csv(sourcedir + "state-lists/nt-ala-sensitive.csv", dtype=str)
sensitivelist['status'] = 'Sensitive'
sensitivelist['geoprivacy'] = 'obscured'
sensitivelist['authority'] = 'Northern Territory Department of Environment'
conservationlist = pd.read_csv(sourcedir + "state-lists/nt-ala-conservation.csv", dtype=str)
conservationlist['geoprivacy'] = 'open'
conservationlist['authority'] = 'Territory Parks and Wildlife Conservation Act 1976'
statelist = conservationlist[['id','name','lsid','status','geoprivacy','authority']].merge(sensitivelist[['id','name','lsid','geoprivacy','status','authority']], how="outer",on='name',suffixes=('_conservation', '_sensitive'))
statelist['status'] = statelist['status_conservation'].fillna(statelist['status_sensitive'])
statelist['authority'] = statelist['authority_conservation'].fillna(statelist['authority_sensitive'])
statelist['geoprivacy'] = statelist['geoprivacy_sensitive'].fillna(statelist['geoprivacy_conservation'])
statelist = statelist.rename(columns = {'name':'scientificName'})
statelist = statelist.add_prefix("state_")
print("Conservation list entries:" + str(len(conservationlist)))
print("Sensitive list entries:" + str(len(sensitivelist)))
statelist

Conservation list entries:204
Sensitive list entries:11


Unnamed: 0,state_id_conservation,state_scientificName,state_lsid_conservation,state_status_conservation,state_geoprivacy_conservation,state_authority_conservation,state_id_sensitive,state_lsid_sensitive,state_geoprivacy_sensitive,state_status_sensitive,state_authority_sensitive,state_status,state_authority,state_geoprivacy
0,4410476,Abrodictyum obscurum,https://id.biodiversity.org.au/node/apni/7402565,Endangered,open,Territory Parks and Wildlife Conservation Act ...,,,,,,Endangered,Territory Parks and Wildlife Conservation Act ...,open
1,4410563,Acacia equisetifolia,https://id.biodiversity.org.au/node/apni/2890781,Critically Endangered,open,Territory Parks and Wildlife Conservation Act ...,,,,,,Critically Endangered,Territory Parks and Wildlife Conservation Act ...,open
2,4410606,Acacia latzii,https://id.biodiversity.org.au/node/apni/2906346,Vulnerable,open,Territory Parks and Wildlife Conservation Act ...,,,,,,Vulnerable,Territory Parks and Wildlife Conservation Act ...,open
3,4410487,Acacia peuce,https://id.biodiversity.org.au/node/apni/2906202,Endangered,open,Territory Parks and Wildlife Conservation Act ...,,,,,,Endangered,Territory Parks and Wildlife Conservation Act ...,open
4,4410564,Acacia praetermissa,https://id.biodiversity.org.au/node/apni/2894855,Vulnerable,open,Territory Parks and Wildlife Conservation Act ...,,,,,,Vulnerable,Territory Parks and Wildlife Conservation Act ...,open
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
204,,Macroderma gigas,,,,,4931556,https://biodiversity.org.au/afd/taxa/63bc796a-...,obscured,Sensitive,Northern Territory Department of Environment,Sensitive,Northern Territory Department of Environment,obscured
205,,Hipposideros stenotis,,,,,4931559,https://biodiversity.org.au/afd/taxa/26fe0f53-...,obscured,Sensitive,Northern Territory Department of Environment,Sensitive,Northern Territory Department of Environment,obscured
206,,Falco (Hierofalco) hypoleucos,,,,,4931564,https://biodiversity.org.au/afd/taxa/4c73a934-...,obscured,Sensitive,Northern Territory Department of Environment,Sensitive,Northern Territory Department of Environment,obscured
207,,Chromolaena odorata,,,,,4931562,https://id.biodiversity.org.au/node/apni/2910579,obscured,Sensitive,Northern Territory Department of Environment,Sensitive,Northern Territory Department of Environment,obscured


In [4]:

statelist.groupby('state_status',dropna=False).size()

state_status
Critically Endangered     20
Endangered                52
Extinct                   11
Sensitive                  5
Vulnerable               121
dtype: int64

### 3. Equivalent IUCN statuses 
* ensure the statuses from the state maps to an IUCN equivalent
* ICUN statuses = {'Not Evaluated', 'Data Deficient', 'Least Concern', 'Near Threatened', 'Vulnerable', 'Endangered', 'Critically Endangered', 'Extinct in the Wild', 'Extinct'}

In [5]:
#(50:CE, 30:V, 40:E, 20:NT, EX:70)
iucnStatusMappings = {
    'Least concern':'10',
    'Special least concern':'10',
    'Critically Endangered':'50',
    'Endangered':'40',
    'Vulnerable':'30',
    'Extinct':'70',
    'Extinct in the wild':'70',
    'Near Threatened':'20',
    'Sensitive':'30'
}

### 4. Determine best place ID to use

In [6]:
inatstatuses.groupby(['inat_place_id','inat_place_name','inat_place_display_name'])['inat_place_id'].count()

inat_place_id  inat_place_name     inat_place_display_name
6744           Australia           Australia                    1
9994           Northern Territory  Northern Territory, AU     155
Name: inat_place_id, dtype: int64

### 5. Merge iNaturalist statuses with State sensitive list on scientificName




In [7]:
# set placeid
place_id = 9994
# get the inaturalist taxonomy matches for additions 
inattaxa = pd.read_csv(sourcedir + "inaturalist-australia-9/inaturalist-australia-9-taxa.csv",dtype=str,usecols=['id','name','rank','observations_count','is_active'])
inattaxa = inattaxa[inattaxa['is_active'] == 't']
inattaxa = inattaxa.rename(columns = {'id':'taxon_id','name':'taxon_name'})
inattaxa = inattaxa.add_prefix("inat_")
statelist = statelist[['state_scientificName','state_status','state_geoprivacy', 'state_lsid_conservation','state_lsid_sensitive','state_authority']].merge(inattaxa,how="left",left_on='state_scientificName',right_on='inat_taxon_name',suffixes=(None,'_inat'))
statelist


Unnamed: 0,state_scientificName,state_status,state_geoprivacy,state_lsid_conservation,state_lsid_sensitive,state_authority,inat_taxon_id,inat_taxon_name,inat_rank,inat_observations_count,inat_is_active
0,Abrodictyum obscurum,Endangered,open,https://id.biodiversity.org.au/node/apni/7402565,,Territory Parks and Wildlife Conservation Act ...,503202,Abrodictyum obscurum,species,102,t
1,Acacia equisetifolia,Critically Endangered,open,https://id.biodiversity.org.au/node/apni/2890781,,Territory Parks and Wildlife Conservation Act ...,1253756,Acacia equisetifolia,species,1,t
2,Acacia latzii,Vulnerable,open,https://id.biodiversity.org.au/node/apni/2906346,,Territory Parks and Wildlife Conservation Act ...,1254327,Acacia latzii,species,0,t
3,Acacia peuce,Endangered,open,https://id.biodiversity.org.au/node/apni/2906202,,Territory Parks and Wildlife Conservation Act ...,465191,Acacia peuce,species,35,t
4,Acacia praetermissa,Vulnerable,open,https://id.biodiversity.org.au/node/apni/2894855,,Territory Parks and Wildlife Conservation Act ...,1254561,Acacia praetermissa,species,0,t
...,...,...,...,...,...,...,...,...,...,...,...
204,Macroderma gigas,Sensitive,obscured,,https://biodiversity.org.au/afd/taxa/63bc796a-...,Northern Territory Department of Environment,41326,Macroderma gigas,species,31,t
205,Hipposideros stenotis,Sensitive,obscured,,https://biodiversity.org.au/afd/taxa/26fe0f53-...,Northern Territory Department of Environment,,,,,
206,Falco (Hierofalco) hypoleucos,Sensitive,obscured,,https://biodiversity.org.au/afd/taxa/4c73a934-...,Northern Territory Department of Environment,,,,,
207,Chromolaena odorata,Sensitive,obscured,,https://id.biodiversity.org.au/node/apni/2910579,Northern Territory Department of Environment,199400,Chromolaena odorata,species,10205,t


In [13]:
# prepare the export fields, common to New template and Update template
mergedstatuses = statelist.merge(inatstatuses,how="outer",left_on='state_scientificName',right_on='inat_scientificName')

# add extra fields 
# add some extra fields
mergedstatuses['place_id'] = str(place_id)
mergedstatuses['username'] = 'peggydnew'
mergedstatuses['description'] = "Listed - refer to https://nt.gov.au/environment"
mergedstatuses['state_lsid_conservation'].fillna(mergedstatuses['state_lsid_sensitive'])
mergedstatuses['state_url'] = "https://bie.ala.org.au/species/" + mergedstatuses['state_lsid_conservation']
mergedstatuses['state_iucn_equivalent'] = mergedstatuses['state_status'].map(iucnStatusMappings).fillna('30') # map to dictionary, Vulnerable default
mergedstatuses['state_status'] = mergedstatuses['state_status'].fillna('Sensitive')
mergedstatuses['state_geoprivacy'] = mergedstatuses['state_geoprivacy'].fillna('open')
mergedstatuses['inat_taxon_id'] = mergedstatuses['inat_taxon_id_y'].fillna(mergedstatuses['inat_taxon_id_x'])
mergedstatuses['inat_scientificName'] = mergedstatuses['inat_scientificName'].fillna(mergedstatuses['inat_taxon_name'])

# UPDATE: inat status and state status both exist
# REMOVE: inat status exists, state status does not
# ADD: state status exists, inat status does not (matching taxon)
# NO MATCH: state status exists, inat taxa not found
mergedstatuses['action'] = 'na'
mergedstatuses.loc[mergedstatuses['inat_id'].notnull() & mergedstatuses['state_scientificName'].notnull(), 'action'] = "UPDATE"
mergedstatuses.loc[mergedstatuses['inat_id'].notnull() & mergedstatuses['state_scientificName'].isnull(), 'action'] = "REMOVE"
mergedstatuses.loc[mergedstatuses['inat_id'].isnull() & mergedstatuses['inat_taxon_id'].notnull(), 'action'] = "ADD"
mergedstatuses.loc[mergedstatuses['inat_id'].isnull() & mergedstatuses['inat_taxon_id'].isnull(), 'action'] = "NO MATCH"

# only update those with different values 
mergedstatuses['action'] = mergedstatuses.apply(lambda x: "NO CHANGE" if (x['action'] == "UPDATE") & ((x['state_status'] == x['inat_status']) & (x['state_geoprivacy'] == x['inat_geoprivacy'] ) & (x['state_geoprivacy'] == x['inat_geoprivacy']) & (x['state_iucn_equivalent'] == x['inat_iucn'])) else x['action'], axis=1)


# display
mergedstatusesprintfriendly = mergedstatuses[['action','inat_id','inat_taxon_id','state_scientificName','inat_scientificName', 'state_status','inat_status','state_geoprivacy','inat_geoprivacy','state_iucn_equivalent','inat_iucn','state_authority','inat_authority','state_url','inat_url','inat_description','inat_place_display_name','inat_current_synonymous_taxon_ids']]

mergedstatuses.groupby('action').size()

action
ADD           4
NO MATCH     62
REMOVE       13
UPDATE      143
dtype: int64

## Updates
Updates match cleanly to an iNaturalist taxon and existing status. We'll only do an update if the status, geoprivacy or iucn values are different.

In [14]:
# Headers:
# action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
updates = pd.DataFrame(mergedstatuses[mergedstatuses['action'].isin(['UPDATE','REMOVE'])])
updates = updates[['action','state_scientificName','inat_id','inat_taxon_id','state_status','state_iucn_equivalent','state_authority','state_url','state_geoprivacy','place_id','username','description']]
updates.columns = updates.columns.str.replace("state_", "", regex=True)
updates.columns = updates.columns.str.replace("inat_", "", regex=True)
updates = updates.rename(columns={'scientificName':'taxon_name'})
updates

Unnamed: 0,action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
1,UPDATE,Acacia equisetifolia,270956,1253756,Critically Endangered,50,Territory Parks and Wildlife Conservation Act ...,https://bie.ala.org.au/species/https://id.biod...,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
2,UPDATE,Acacia latzii,270957,1254327,Vulnerable,30,Territory Parks and Wildlife Conservation Act ...,https://bie.ala.org.au/species/https://id.biod...,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
3,UPDATE,Acacia peuce,270958,465191,Endangered,40,Territory Parks and Wildlife Conservation Act ...,https://bie.ala.org.au/species/https://id.biod...,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
4,UPDATE,Acacia praetermissa,270959,1254561,Vulnerable,30,Territory Parks and Wildlife Conservation Act ...,https://bie.ala.org.au/species/https://id.biod...,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
5,UPDATE,Acacia undoolyana,270960,1254884,Vulnerable,30,Territory Parks and Wildlife Conservation Act ...,https://bie.ala.org.au/species/https://id.biod...,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
...,...,...,...,...,...,...,...,...,...,...,...,...
217,REMOVE,,271024,42951,Sensitive,30,,,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
218,REMOVE,,139906,698942,Sensitive,30,,,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
219,REMOVE,,271013,724093,Sensitive,30,,,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
220,REMOVE,,234788,918383,Sensitive,30,,,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment


## Additions
Match to an inat taxon and have a new sensitive or conservation status 

In [15]:
# Headers:
# Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id

additions = pd.DataFrame(mergedstatuses[mergedstatuses['action'] == "ADD"])
additions = additions[['action','state_scientificName','inat_id','inat_taxon_id','state_status','state_iucn_equivalent','state_authority','state_url','state_geoprivacy','place_id','username','description']]
additions = additions.rename(columns={'state_scientificName':'Taxon_Name',
                                      'state_status':'Status',
                                      'state_authority':'Authority',
                                      'state_iucn_equivalent':'IUCN_equivalent',
                                      'description':'Description',
                                      'place_id':'iNaturalst_Place_ID',
                                      'state_url':'url',
                                      'state_geoprivacy':'taxon_Geoprivacy',
                                      'inat_taxon_id':'taxon_id'})
additions

Unnamed: 0,action,Taxon_Name,inat_id,taxon_id,Status,IUCN_equivalent,Authority,url,taxon_Geoprivacy,iNaturalst_Place_ID,username,Description
0,ADD,Abrodictyum obscurum,,503202,Endangered,40,Territory Parks and Wildlife Conservation Act ...,https://bie.ala.org.au/species/https://id.biod...,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
55,ADD,Olearia macdonnellensis,,1458331,Endangered,40,Territory Parks and Wildlife Conservation Act ...,https://bie.ala.org.au/species/https://id.biod...,open,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
207,ADD,Chromolaena odorata,,199400,Sensitive,30,Northern Territory Department of Environment,,obscured,9994,peggydnew,Listed - refer to https://nt.gov.au/environment
208,ADD,Parthenium hysterophorus,,126424,Sensitive,30,Northern Territory Department of Environment,,obscured,9994,peggydnew,Listed - refer to https://nt.gov.au/environment


In [16]:
# write these to output files
mergedstatusesprintfriendly.to_csv(projectdir + "data/out/summaries/nt.csv",index=False)
updates.to_csv(projectdir + "data/out/updates-nt.csv", index=False)
additions.to_csv(projectdir + "data/out/additions-nt.csv", index=False)