# iNaturalist status updates by state

Using the file produced in the collate-status-taxa.ipynb: `inat-aust-status-taxa.csv` (statuses joined to taxa names), generate lists to update iNaturalist statuses

## Prep - common to all states
1. Read in the inaturalist statuses & filter out this state
2. Read in the state conservation and sensitive lists
3. Prep fields incl IUCN equivalent mappings and matching to iNat taxonomy  
4. Merge and compare the state and inaturalist lists
5. Create update/removals list
6. Create additions list
7. Save files

## 1. Read in the inaturalist statuses & filter out ACT

In [1]:
import pandas as pd
import sys
import os
projectdir = os.path.dirname(os.getcwd()) + "/" # parent dir of cwd
sourcedir = projectdir + "data/in/"
sys.path.append(os.path.abspath(projectdir + "notebooks/includes/"))
import list_functions  as lf

# read in the statuses file
taxastatus = pd.read_csv(sourcedir + "inat-aust-status-taxa.csv", encoding='UTF-8',na_filter=False,dtype=str)

# filter out ACT entries
def filter_state_statuses(stateregex: str, urlregex: str):
    authoritydf = taxastatus['authority'].drop_duplicates().sort_values()
    authoritydf = authoritydf[pd.Series(authoritydf).str.contains(stateregex)]
    urldf = taxastatus['url'].drop_duplicates().sort_values()
    urldf = urldf[pd.Series(urldf).str.contains(urlregex)]
    placedisplaydf = taxastatus['place_display_name'].drop_duplicates().sort_values()
    placedisplaydf = placedisplaydf[pd.Series(placedisplaydf).str.contains(stateregex)]
    placedf = taxastatus['place_name'].drop_duplicates().sort_values()
    placedf = placedf[pd.Series(placedf).str.contains(stateregex)]
    # concat all and remove duplicates
    statedf = pd.concat([taxastatus.apply(lambda row: row[taxastatus['place_display_name'].isin(placedisplaydf)]),
                         taxastatus.apply(lambda row: row[taxastatus['place_name'].isin(placedf)]),
                         taxastatus.apply(lambda row: row[taxastatus['url'].isin(urldf)]),
                         taxastatus.apply(
                             lambda row: row[taxastatus['authority'].isin(authoritydf)])]).drop_duplicates()
    return statedf.sort_values(['taxon_id', 'user_id'])

inatstatuses = filter_state_statuses("ACT Government|Australian Capital Territory| ACT, AU", ".act.gov")
inatstatuses = inatstatuses.add_prefix("inat_")
inatstatuses.groupby(['inat_status']).size()


inat_status
Critically Endangered                 6
Endangered                           18
Regionally Conservation Dependent     1
Vulnerable                           22
rare                                  3
vulnerable                            2
dtype: int64

## 2. Read in the State lists
Read in the state lists 
 * Set conservation list value `authority` = "Nature Conservation Act 2014 (ACT)"
 * Set sensitive list value: `geoprivacy` = `obscured`

In [2]:
# %%script echo skipping # comment this line to download dataset from lists.ala.org.au the web and save locally
# Download lists data. Retrieve binomial and trinomial names from GBIF. Save locally to CSV

sensitivelist = lf.download_ala_specieslist("https://lists.ala.org.au/ws/speciesListItems/dr2627?max=10000&includeKVP=true")
sensitivelist = lf.kvp_to_columns(sensitivelist)
sensitivelist.to_csv(sourcedir + "/state-lists/act-ala-sensitive.csv", index=False)

conservationlist = lf.download_ala_specieslist("https://lists.ala.org.au/ws/speciesListItems/dr649?max=10000&includeKVP=true")
conservationlist = lf.kvp_to_columns(conservationlist)
conservationlist.to_csv(sourcedir + "/state-lists/act-ala-conservation.csv", index=False)

In [3]:
# Read lists and merge
sensitivelist = pd.read_csv(sourcedir + "/state-lists/act-ala-sensitive.csv", dtype=str)
conservationlist = pd.read_csv(sourcedir + "/state-lists/act-ala-conservation.csv", dtype=str)

sensitivelist['geoprivacy'] = 'obscured'
conservationlist['authority'] = "Nature Conservation Act 2014 (ACT)"
statelist = conservationlist[['name','lsid','status','authority']].merge(sensitivelist[['name','lsid','geoprivacy']], how="outer",on='name',suffixes=('_conservation', '_sensitive'))
statelist = statelist.rename(columns = {'name':'scientificName'})
statelist = statelist.add_prefix("state_")
print("Conservation list entries:" + str(len(conservationlist)))
print("Sensitive list entries:" + str(len(sensitivelist)))
statelist

Conservation list entries:65
Sensitive list entries:211


Unnamed: 0,state_scientificName,state_lsid_conservation,state_status,state_authority,state_lsid_sensitive,state_geoprivacy
0,Anthochaera phrygia,https://biodiversity.org.au/afd/taxa/31869a0e-...,Critically Endangered,Nature Conservation Act 2014 (ACT),https://biodiversity.org.au/afd/taxa/31869a0e-...,obscured
1,Aprasia parapulchella,https://biodiversity.org.au/afd/taxa/0d74fa05-...,Vulnerable,Nature Conservation Act 2014 (ACT),https://biodiversity.org.au/afd/taxa/0d74fa05-...,obscured
2,Bettongia gaimardi,https://biodiversity.org.au/afd/taxa/19c9bfdf-...,Conservation Dependent,Nature Conservation Act 2014 (ACT),https://biodiversity.org.au/afd/taxa/19c9bfdf-...,obscured
3,Bidyanus bidyanus,https://biodiversity.org.au/afd/taxa/05866f31-...,Endangered,Nature Conservation Act 2014 (ACT),,
4,Bossiaea grayi,https://id.biodiversity.org.au/node/apni/2910201,Endangered,Nature Conservation Act 2014 (ACT),https://id.biodiversity.org.au/node/apni/2910201,obscured
...,...,...,...,...,...,...
230,Todea barbara,,,,https://id.biodiversity.org.au/node/apni/2916843,obscured
231,Toxidia peron,,,,https://biodiversity.org.au/afd/taxa/724c2f19-...,obscured
232,Tympanocryptis lineata pinguicolla,,,,https://biodiversity.org.au/afd/taxa/5bceebc1-...,obscured
233,Varanus rosenbergi,,,,https://biodiversity.org.au/afd/taxa/a01a6bb4-...,obscured


In [4]:
# check for duplicates with conflicting information
#statelist = statelist.rename(columns={'name':'scientificName'})
dupinformation = statelist.groupby('state_lsid_conservation').filter(lambda x: len(x) > 1)
dupinformation

Unnamed: 0,state_scientificName,state_lsid_conservation,state_status,state_authority,state_lsid_sensitive,state_geoprivacy


In [5]:

statelist.groupby('state_status',dropna=False).size()

state_status
Conservation Dependent      1
Critically Endangered       8
Endangered                 24
Vulnerable                 32
NaN                       170
dtype: int64

## 3. Prepare other fields: IUCN equivalent, place ID, userid 
* ensure the statuses from the state maps to an IUCN equivalent
* ICUN statuses = {'Not Evaluated', 'Data Deficient', 'Least Concern', 'Near Threatened', 'Vulnerable', 'Endangered', 'Critically Endangered', 'Extinct in the Wild', 'Extinct'}

In [6]:
#(50:CE, 30:V, 40 E)
iucnStatusMappings = {
    'Critically Endangered':'50',
    'Endangered':'40',
    'Vulnerable':'30',
    'Conservation Dependent':'30'
}

In [7]:
inatstatuses.groupby(['inat_place_id','inat_place_name','inat_place_display_name'])['inat_place_id'].count()
# looks like 12986 - set it


inat_place_id  inat_place_name               inat_place_display_name         
12986          Australian Capital Territory  Australian Capital Territory, AU    52
Name: inat_place_id, dtype: int64

In [8]:
# set placeid
place_id = 12986
# get the inaturalist taxonomy matches for additions 
inattaxa = pd.read_csv(sourcedir + "inaturalist-australia-9/inaturalist-australia-9-taxa.csv",dtype=str,usecols=['id','name','rank','observations_count','is_active'])
inattaxa = inattaxa[inattaxa['is_active'] == 't']
inattaxa = inattaxa.rename(columns = {'id':'taxon_id','name':'taxon_name'})
inattaxa = inattaxa.add_prefix("inat_")
statelist = statelist[['state_scientificName','state_status','state_geoprivacy', 'state_lsid_conservation','state_lsid_sensitive','state_authority']].merge(inattaxa,how="left",left_on='state_scientificName',right_on='inat_taxon_name',suffixes=(None,'_inat'))
statelist

Unnamed: 0,state_scientificName,state_status,state_geoprivacy,state_lsid_conservation,state_lsid_sensitive,state_authority,inat_taxon_id,inat_taxon_name,inat_rank,inat_observations_count,inat_is_active
0,Anthochaera phrygia,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/31869a0e-...,https://biodiversity.org.au/afd/taxa/31869a0e-...,Nature Conservation Act 2014 (ACT),144707,Anthochaera phrygia,species,83,t
1,Aprasia parapulchella,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/0d74fa05-...,https://biodiversity.org.au/afd/taxa/0d74fa05-...,Nature Conservation Act 2014 (ACT),36957,Aprasia parapulchella,species,40,t
2,Bettongia gaimardi,Conservation Dependent,obscured,https://biodiversity.org.au/afd/taxa/19c9bfdf-...,https://biodiversity.org.au/afd/taxa/19c9bfdf-...,Nature Conservation Act 2014 (ACT),42996,Bettongia gaimardi,species,198,t
3,Bidyanus bidyanus,Endangered,,https://biodiversity.org.au/afd/taxa/05866f31-...,,Nature Conservation Act 2014 (ACT),95759,Bidyanus bidyanus,species,38,t
4,Bossiaea grayi,Endangered,obscured,https://id.biodiversity.org.au/node/apni/2910201,https://id.biodiversity.org.au/node/apni/2910201,Nature Conservation Act 2014 (ACT),795624,Bossiaea grayi,species,3,t
...,...,...,...,...,...,...,...,...,...,...,...
230,Todea barbara,,obscured,,https://id.biodiversity.org.au/node/apni/2916843,,323949,Todea barbara,species,1630,t
231,Toxidia peron,,obscured,,https://biodiversity.org.au/afd/taxa/724c2f19-...,,,,,,
232,Tympanocryptis lineata pinguicolla,,obscured,,https://biodiversity.org.au/afd/taxa/5bceebc1-...,,,,,,
233,Varanus rosenbergi,,obscured,,https://biodiversity.org.au/afd/taxa/a01a6bb4-...,,39441,Varanus rosenbergi,species,806,t


## 4. Merge and compare the state and inaturalist lists


In [9]:
mergedstatuses = statelist.merge(inatstatuses,how="outer",left_on='state_scientificName',right_on='inat_scientificName')

# add extra fields 
# add some extra fields
mergedstatuses['place_id'] = str(place_id)
mergedstatuses['username'] = 'peggydnew'
mergedstatuses['description'] = "Listed - Refer to https://www.environment.act.gov.au/nature-conservation/conservation-and-ecological-communities/threatened-species-factsheets"
mergedstatuses['state_url'] = "https://bie.ala.org.au/species/" + mergedstatuses['state_lsid_conservation'].fillna(mergedstatuses['state_lsid_sensitive'])
mergedstatuses['state_iucn_equivalent'] = mergedstatuses['state_status'].map(iucnStatusMappings).fillna('30') # map to dictionary, Vulnerable default
mergedstatuses['state_status'] = mergedstatuses['state_status'].fillna('Sensitive')
mergedstatuses['state_geoprivacy'] = mergedstatuses['state_geoprivacy'].fillna('open')
mergedstatuses['inat_taxon_id'] = mergedstatuses['inat_taxon_id_y'].fillna(mergedstatuses['inat_taxon_id_x'])
mergedstatuses['inat_scientificName'] = mergedstatuses['inat_scientificName'].fillna(mergedstatuses['inat_taxon_name'])

# UPDATE: inat status and state status both exist
# REMOVE: inat status exists, state status does not
# ADD: state status exists, inat status does not (matching taxon)
# NO MATCH: state status exists, inat taxa not found
mergedstatuses['action'] = 'na'
mergedstatuses.loc[mergedstatuses['inat_id'].notnull() & mergedstatuses['state_scientificName'].notnull(), 'action'] = "UPDATE"
mergedstatuses.loc[mergedstatuses['inat_id'].notnull() & mergedstatuses['state_scientificName'].isnull(), 'action'] = "REMOVE"
mergedstatuses.loc[mergedstatuses['inat_id'].isnull() & mergedstatuses['inat_taxon_id'].notnull(), 'action'] = "ADD"
mergedstatuses.loc[mergedstatuses['inat_id'].isnull() & mergedstatuses['inat_taxon_id'].isnull(), 'action'] = "NO MATCH"

# only update those with different values 
mergedstatuses['action'] = mergedstatuses.apply(lambda x: "NO CHANGE" if (x['action'] == "UPDATE") & ((x['state_status'] == x['inat_status']) & (x['state_geoprivacy'] == x['inat_geoprivacy'] ) & (x['state_geoprivacy'] == x['inat_geoprivacy']) & (x['state_iucn_equivalent'] == x['inat_iucn'])) else x['action'], axis=1)


# display
mergedstatusesprintfriendly = mergedstatuses[['action','inat_id','inat_taxon_id','state_scientificName','inat_scientificName', 'state_status','inat_status','state_geoprivacy','inat_geoprivacy','state_iucn_equivalent','inat_iucn','state_authority','inat_authority','state_url','inat_url','inat_description','inat_place_display_name','inat_current_synonymous_taxon_ids']]


In [10]:
mergedstatuses.groupby('action').size()

action
ADD          150
NO CHANGE     24
NO MATCH      35
REMOVE         2
UPDATE        26
dtype: int64

## Updates
Updates match cleanly to an iNaturalist taxon and existing status. We'll only do an update if the status, geoprivacy or iucn values are different.

In [11]:
# Headers:
# action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
updates = pd.DataFrame(mergedstatuses[mergedstatuses['action'].isin(['UPDATE','REMOVE'])])
updates = updates[['action','state_scientificName','inat_id','inat_taxon_id','state_status','state_iucn_equivalent','state_authority','state_url','state_geoprivacy','place_id','username','description']]
updates.columns = updates.columns.str.replace("state_", "", regex=True)
updates.columns = updates.columns.str.replace("inat_", "", regex=True)
updates = updates.rename(columns={'scientificName':'taxon_name'})
updates

Unnamed: 0,action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
0,UPDATE,Anthochaera phrygia,270185,144707,Critically Endangered,50,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
2,UPDATE,Bettongia gaimardi,270186,42996,Conservation Dependent,30,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
5,UPDATE,Botaurus poiciloptilus,270188,5032,Endangered,40,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
9,UPDATE,Calyptorhynchus lathami lathami,270190,720267,Vulnerable,30,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
11,UPDATE,Climacteris picumnus victoriae,270191,713108,Vulnerable,30,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
23,UPDATE,Hieraaetus morphnoides,270196,5150,Vulnerable,30,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
26,UPDATE,Isoodon obesulus obesulus,270198,355856,Endangered,40,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
27,UPDATE,Keyacris scurra,152225,761022,Endangered,40,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,open,12986,peggydnew,Listed - Refer to https://www.environment.act....
29,UPDATE,Lathamus discolor,270200,19284,Critically Endangered,50,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
33,UPDATE,Litoria castanea,270201,23609,Critically Endangered,50,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....


## Additions
Match to an inat taxon and have a new sensitive or conservation status 


In [12]:
# Headers:
# Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id

additions = pd.DataFrame(mergedstatuses[mergedstatuses['action'] == "ADD"])
additions = additions[['action','state_scientificName','inat_id','inat_taxon_id','state_status','state_iucn_equivalent','state_authority','state_url','state_geoprivacy','place_id','username','description']]
additions = additions.rename(columns={'state_scientificName':'Taxon_Name',
                                      'state_status':'Status',
                                      'state_authority':'Authority',
                                      'state_iucn_equivalent':'IUCN_equivalent',
                                      'description':'Description',
                                      'place_id':'iNaturalst_Place_ID',
                                      'state_url':'url',
                                      'state_geoprivacy':'taxon_Geoprivacy',
                                      'inat_taxon_id':'taxon_id'})
additions

Unnamed: 0,action,Taxon_Name,inat_id,taxon_id,Status,IUCN_equivalent,Authority,url,taxon_Geoprivacy,iNaturalst_Place_ID,username,Description
7,ADD,Callocephalon fimbriatum,,116842,Endangered,40,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
8,ADD,Calyptorhynchus lathami,,116846,Vulnerable,30,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
10,ADD,Climacteris picumnus,,7802,Vulnerable,30,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
14,ADD,Dasyurus maculatus,,40166,Vulnerable,30,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://biodive...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
21,ADD,Gentiana baeuerlenii,,1444037,Endangered,40,Nature Conservation Act 2014 (ACT),https://bie.ala.org.au/species/https://id.biod...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
...,...,...,...,...,...,...,...,...,...,...,...,...
225,ADD,Thelymitra pauciflora,,406514,Sensitive,30,,https://bie.ala.org.au/species/https://id.biod...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
226,ADD,Thelymitra peniculata,,790420,Sensitive,30,,https://bie.ala.org.au/species/https://id.biod...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
227,ADD,Thelymitra rubra,,516247,Sensitive,30,,https://bie.ala.org.au/species/https://id.biod...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....
228,ADD,Thelymitra simulata,,908180,Sensitive,30,,https://bie.ala.org.au/species/https://id.biod...,obscured,12986,peggydnew,Listed - Refer to https://www.environment.act....


In [13]:
# write these to output files
mergedstatusesprintfriendly.to_csv(projectdir + "data/out/summaries/act.csv",index=False)
updates.to_csv(projectdir + "data/out/updates-act.csv", index=False)
additions.to_csv(projectdir + "data/out/additions-act.csv", index=False)
