# iNaturalist status updates by state - SA

Using the file produced in the collate-status-taxa.ipynb: `inat-aust-status-taxa.csv` (statuses joined to taxa names), generate lists to update iNaturalist statuses

## Prep - common to all states
1. Read in the inaturalist statuses & filter out this state
2. Read in the state conservation and sensitive lists
3. Prep fields incl IUCN equivalent mappings and matching to iNat taxonomy  
4. Merge and compare the state and inaturalist lists
5. Create update/removals list
6. Create additions list
7. Save files

## 1. Read in the inaturalist statuses & filter out SA

In [1]:
import pandas as pd
import sys
import os
projectdir = os.path.dirname(os.getcwd()) + "/" # parent dir of cwd
sourcedir = projectdir + "data/in/"
sys.path.append(os.path.abspath(projectdir + "notebooks/includes/"))
import list_functions  as lf

# read in the statuses file
taxastatus = pd.read_csv(sourcedir + "inat-aust-status-taxa.csv", encoding='UTF-8',na_filter=False,dtype=str)

# filter out ACT entries
def filter_state_statuses(stateregex: str, urlregex: str):
    authoritydf = taxastatus['authority'].drop_duplicates().sort_values()
    authoritydf = authoritydf[pd.Series(authoritydf).str.contains(stateregex)]
    urldf = taxastatus['url'].drop_duplicates().sort_values()
    urldf = urldf[pd.Series(urldf).str.contains(urlregex)]
    placedisplaydf = taxastatus['place_display_name'].drop_duplicates().sort_values()
    placedisplaydf = placedisplaydf[pd.Series(placedisplaydf).str.contains(stateregex)]
    placedf = taxastatus['place_name'].drop_duplicates().sort_values()
    placedf = placedf[pd.Series(placedf).str.contains(stateregex)]
    # concat all and remove duplicates
    statedf = pd.concat([taxastatus.apply(lambda row: row[taxastatus['place_display_name'].isin(placedisplaydf)]),
                         taxastatus.apply(lambda row: row[taxastatus['place_name'].isin(placedf)]),
                         taxastatus.apply(lambda row: row[taxastatus['url'].isin(urldf)]),
                         taxastatus.apply(
                             lambda row: row[taxastatus['authority'].isin(authoritydf)])]).drop_duplicates()
    return statedf.sort_values(['taxon_id', 'user_id'])

inatstatuses = filter_state_statuses(" SA |South Australia|SOUTH AUSTRALIA","sa.gov.au")
inatstatuses = inatstatuses.add_prefix("inat_")
inatstatuses.groupby(['inat_status']).size()


inat_status
EN              2
Endangered    265
NT              2
Protected       4
Rare          532
Vulnerable    213
endangered     20
dtype: int64

## 2. Read in the State lists
Read in the state lists 

In [5]:
# %%script echo skipping # comment this line to download dataset from lists.ala.org.au the web and save locally

sensitivelist = lf.download_ala_specieslist("https://lists.ala.org.au/ws/speciesListItems/dr884?max=10000&includeKVP=true")
sensitivelist = lf.kvp_to_columns(sensitivelist)
sensitivelist.to_csv(sourcedir + "state-lists/sa-ala-sensitive.csv", index=False)

conservationlist = lf.download_ala_specieslist("https://lists.ala.org.au/ws/speciesListItems/dr653?max=10000&includeKVP=true")
conservationlist = lf.kvp_to_columns(conservationlist)
conservationlist.to_csv(sourcedir + "sa-ala-conservation.csv", index=False)


In [8]:
# Read sensitive list data
sensitivelist = pd.read_csv(sourcedir + "state-lists/sa-ala-sensitive.csv", dtype=str)
sensitivelist['geoprivacy'] = 'obscured'
sensitivelist['status'] = 'Sensitive'
sensitivelist['authority'] = "South Australia Department for Environment and Water" 
conservationlist = pd.read_csv(sourcedir + "sa-ala-conservation.csv", dtype=str)
conservationlist['geoprivacy'] = 'open'
conservationlist['authority'] = "South Australia Department for Environment and Water"
statelist = conservationlist[['id','name','lsid','status','geoprivacy','authority']].merge(sensitivelist[['id','name','lsid','geoprivacy','status','authority']], how="outer",on='name',suffixes=('_conservation', '_sensitive'))
statelist['status'] = statelist['status_conservation'].fillna(statelist['status_sensitive'])
statelist['authority'] = statelist['authority_conservation'].fillna(statelist['authority_sensitive'])
statelist['geoprivacy'] = statelist['geoprivacy_sensitive'].fillna(statelist['geoprivacy_conservation'])
statelist = statelist.rename(columns = {'name':'scientificName'})
statelist = statelist.add_prefix("state_")
print("Conservation list entries:" + str(len(conservationlist)))
print("Sensitive list entries:" + str(len(sensitivelist)))
statelist

Conservation list entries:1120
Sensitive list entries:238


Unnamed: 0,state_id_conservation,state_scientificName,state_lsid_conservation,state_status_conservation,state_geoprivacy_conservation,state_authority_conservation,state_id_sensitive,state_lsid_sensitive,state_geoprivacy_sensitive,state_status_sensitive,state_authority_sensitive,state_status,state_authority,state_geoprivacy
0,4378829,Ornithorhynchus anatinus,https://biodiversity.org.au/afd/taxa/ac61fd14-...,Endangered,open,South Australia Department for Environment and...,,,,,,Endangered,South Australia Department for Environment and...,open
1,4379705,Tachyglossus aculeatus multiaculeatus,https://biodiversity.org.au/afd/taxa/e9289876-...,Endangered,open,South Australia Department for Environment and...,,,,,,Endangered,South Australia Department for Environment and...,open
2,4379443,Myrmecobius fasciatus,https://biodiversity.org.au/afd/taxa/6c72d199-...,Endangered,open,South Australia Department for Environment and...,,,,,,Endangered,South Australia Department for Environment and...,open
3,4378670,Dasycercus blythi,https://biodiversity.org.au/afd/taxa/926b1c33-...,Endangered,open,South Australia Department for Environment and...,,,,,,Endangered,South Australia Department for Environment and...,open
4,4378800,Dasyuroides byrnei,https://biodiversity.org.au/afd/taxa/c342ff42-...,Endangered,open,South Australia Department for Environment and...,,,,,,Endangered,South Australia Department for Environment and...,open
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1176,,Diuris behrii subsp. multilineata,,,,,4925012,https://id.biodiversity.org.au/name/apni/9732363,obscured,Sensitive,South Australia Department for Environment and...,Sensitive,South Australia Department for Environment and...,obscured
1177,,Ixodia achillaeoides subsp. arenicola,,,,,4925013,https://id.biodiversity.org.au/node/apni/2916083,obscured,Sensitive,South Australia Department for Environment and...,Sensitive,South Australia Department for Environment and...,obscured
1178,,Montia fontana subsp. chondrosperma,,,,,4925046,https://id.biodiversity.org.au/node/apni/2903260,obscured,Sensitive,South Australia Department for Environment and...,Sensitive,South Australia Department for Environment and...,obscured
1179,,Montia fontana,,,,,4925047,https://id.biodiversity.org.au/node/apni/2891942,obscured,Sensitive,South Australia Department for Environment and...,Sensitive,South Australia Department for Environment and...,obscured


In [9]:
statelist.groupby('state_status',dropna=False).size()

state_status
Endangered    276
Rare          594
Sensitive      61
Vulnerable    250
dtype: int64

### 4. Equivalent IUCN statuses

In [14]:
iucnStatusMappings = {
    'Least concern':'10',
    'Special least concern':'10',
    'Critically Endangered':'50',
    'Endangered':'40',
    'Vulnerable':'30',
    'Extinct':'70',
    'Extinct in the wild':'70',
    'Near Threatened':'20',
    'Sensitive':'30',
    'Rare':'30'
}

### 5. Determine best place ID to use

In [10]:
inatstatuses.groupby(['inat_place_id','inat_place_name','inat_place_display_name'])['inat_place_id'].count()
# looks like 6899 - note for extract


inat_place_id  inat_place_name                 inat_place_display_name       
124968         South Australia, marine waters  South Australia, marine waters       2
6899           South Australia                 South Australia, AU               1036
Name: inat_place_id, dtype: int64

## Merge iNaturalist statuses with State lists on scientificName


In [11]:
# set placeid
place_id = 6899
# get the inaturalist taxonomy matches for additions 
inattaxa = pd.read_csv(sourcedir + "inaturalist-australia-9/inaturalist-australia-9-taxa.csv",dtype=str,usecols=['id','name','rank','observations_count','is_active'])
inattaxa = inattaxa[inattaxa['is_active'] == 't']
inattaxa = inattaxa.rename(columns = {'id':'taxon_id','name':'taxon_name'})
inattaxa = inattaxa.add_prefix("inat_")
statelist = statelist[['state_scientificName','state_status','state_geoprivacy', 'state_lsid_conservation','state_lsid_sensitive','state_authority']].merge(inattaxa,how="left",left_on='state_scientificName',right_on='inat_taxon_name',suffixes=(None,'_inat'))
statelist

Unnamed: 0,state_scientificName,state_status,state_geoprivacy,state_lsid_conservation,state_lsid_sensitive,state_authority,inat_taxon_id,inat_taxon_name,inat_rank,inat_observations_count,inat_is_active
0,Ornithorhynchus anatinus,Endangered,open,https://biodiversity.org.au/afd/taxa/ac61fd14-...,,South Australia Department for Environment and...,43236,Ornithorhynchus anatinus,species,1837,t
1,Tachyglossus aculeatus multiaculeatus,Endangered,open,https://biodiversity.org.au/afd/taxa/e9289876-...,,South Australia Department for Environment and...,520695,Tachyglossus aculeatus multiaculeatus,subspecies,112,t
2,Myrmecobius fasciatus,Endangered,open,https://biodiversity.org.au/afd/taxa/6c72d199-...,,South Australia Department for Environment and...,40262,Myrmecobius fasciatus,species,151,t
3,Dasycercus blythi,Endangered,open,https://biodiversity.org.au/afd/taxa/926b1c33-...,,South Australia Department for Environment and...,74270,Dasycercus blythi,species,2,t
4,Dasyuroides byrnei,Endangered,open,https://biodiversity.org.au/afd/taxa/c342ff42-...,,South Australia Department for Environment and...,40249,Dasyuroides byrnei,species,9,t
...,...,...,...,...,...,...,...,...,...,...,...
1177,Diuris behrii subsp. multilineata,Sensitive,obscured,,https://id.biodiversity.org.au/name/apni/9732363,South Australia Department for Environment and...,,,,,
1178,Ixodia achillaeoides subsp. arenicola,Sensitive,obscured,,https://id.biodiversity.org.au/node/apni/2916083,South Australia Department for Environment and...,,,,,
1179,Montia fontana subsp. chondrosperma,Sensitive,obscured,,https://id.biodiversity.org.au/node/apni/2903260,South Australia Department for Environment and...,,,,,
1180,Montia fontana,Sensitive,obscured,,https://id.biodiversity.org.au/node/apni/2891942,South Australia Department for Environment and...,58990,Montia fontana,species,1433,t


In [29]:
# prepare the export fields, common to New template and Update template
mergedstatuses = statelist.merge(inatstatuses,how="outer",left_on='state_scientificName',right_on='inat_scientificName')

# add extra fields 
# add some extra fields
mergedstatuses['place_id'] = str(place_id)
mergedstatuses['username'] = 'peggydnew'
mergedstatuses['description'] = "Listed - refer to https://www.environment.sa.gov.au/topics/plants-and-animals/threatened-species-and-ecological-communities/threatened-species"
mergedstatuses['state_lsid_conservation'].fillna(mergedstatuses['state_lsid_sensitive'])
mergedstatuses['state_url'] = "https://bie.ala.org.au/species/" + mergedstatuses['state_lsid_conservation']
mergedstatuses['state_iucn_equivalent'] = mergedstatuses['state_status'].map(iucnStatusMappings)
#mergedstatuses['state_status'] = mergedstatuses['state_status'].fillna('Sensitive')
#mergedstatuses['state_geoprivacy'] = mergedstatuses['state_geoprivacy'].fillna('open')
mergedstatuses['inat_taxon_id'] = mergedstatuses['inat_taxon_id_y'].fillna(mergedstatuses['inat_taxon_id_x'])
mergedstatuses['inat_scientificName'] = mergedstatuses['inat_scientificName'].fillna(mergedstatuses['inat_taxon_name'])

# UPDATE: inat status and state status both exist
# REMOVE: inat status exists, state status does not
# ADD: state status exists, inat status does not (matching taxon)
# NO MATCH: state status exists, inat taxa not found
mergedstatuses['action'] = 'na'
mergedstatuses.loc[mergedstatuses['inat_id'].notnull() & mergedstatuses['state_scientificName'].notnull(), 'action'] = "UPDATE"
mergedstatuses.loc[mergedstatuses['inat_id'].notnull() & mergedstatuses['state_scientificName'].isnull(), 'action'] = "REMOVE"
mergedstatuses.loc[mergedstatuses['inat_id'].isnull() & mergedstatuses['inat_taxon_id'].notnull(), 'action'] = "ADD"
mergedstatuses.loc[mergedstatuses['inat_id'].isnull() & mergedstatuses['inat_taxon_id'].isnull(), 'action'] = "NO MATCH"

# only update those with different values 
mergedstatuses['action'] = mergedstatuses.apply(lambda x: "NO CHANGE" if (x['action'] == "UPDATE") & ((x['state_status'] == x['inat_status']) & (x['state_geoprivacy'] == x['inat_geoprivacy'] ) & (x['state_geoprivacy'] == x['inat_geoprivacy']) & (x['state_iucn_equivalent'] == x['inat_iucn'])) else x['action'], axis=1)

# display
mergedstatusesprintfriendly = mergedstatuses[['action','inat_id','inat_taxon_id','state_scientificName','inat_scientificName', 'state_status','inat_status','state_geoprivacy','inat_geoprivacy','state_iucn_equivalent','inat_iucn','state_authority','inat_authority','state_url','inat_url','inat_description','inat_place_display_name','inat_current_synonymous_taxon_ids']]
mergedstatuses.groupby('action').size()




action
ADD           37
NO CHANGE    863
NO MATCH     250
REMOVE       143
UPDATE        33
dtype: int64

In [17]:
mergedstatuses

Unnamed: 0,state_scientificName,state_status,state_geoprivacy,state_lsid_conservation,state_lsid_sensitive,state_authority,inat_taxon_id_x,inat_taxon_name,inat_rank,inat_observations_count,...,inat_preferred_common_name,inat_is_active_y,inat_current_synonymous_taxon_ids,place_id,username,description,state_url,state_iucn_equivalent,inat_taxon_id,action
0,Ornithorhynchus anatinus,Endangered,open,https://biodiversity.org.au/afd/taxa/ac61fd14-...,,South Australia Department for Environment and...,43236,Ornithorhynchus anatinus,species,1837,...,,,,6899,peggydnew,Listed - refer to https://www.environment.sa.g...,https://bie.ala.org.au/species/https://biodive...,40,43236,UPDATE
1,Tachyglossus aculeatus multiaculeatus,Endangered,open,https://biodiversity.org.au/afd/taxa/e9289876-...,,South Australia Department for Environment and...,520695,Tachyglossus aculeatus multiaculeatus,subspecies,112,...,,,,6899,peggydnew,Listed - refer to https://www.environment.sa.g...,https://bie.ala.org.au/species/https://biodive...,40,520695,NO CHANGE
2,Myrmecobius fasciatus,Endangered,open,https://biodiversity.org.au/afd/taxa/6c72d199-...,,South Australia Department for Environment and...,40262,Myrmecobius fasciatus,species,151,...,,,,6899,peggydnew,Listed - refer to https://www.environment.sa.g...,https://bie.ala.org.au/species/https://biodive...,40,40262,NO CHANGE
3,Dasycercus blythi,Endangered,open,https://biodiversity.org.au/afd/taxa/926b1c33-...,,South Australia Department for Environment and...,74270,Dasycercus blythi,species,2,...,,,,6899,peggydnew,Listed - refer to https://www.environment.sa.g...,https://bie.ala.org.au/species/https://biodive...,40,74270,NO CHANGE
4,Dasyuroides byrnei,Endangered,open,https://biodiversity.org.au/afd/taxa/c342ff42-...,,South Australia Department for Environment and...,40249,Dasyuroides byrnei,species,9,...,,,,6899,peggydnew,Listed - refer to https://www.environment.sa.g...,https://bie.ala.org.au/species/https://biodive...,40,40249,NO CHANGE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1321,,Sensitive,open,,,,,,,,...,,,,6899,peggydnew,Listed - refer to https://www.environment.sa.g...,,30,953460,REMOVE
1322,,Sensitive,open,,,,,,,,...,,,,6899,peggydnew,Listed - refer to https://www.environment.sa.g...,,30,966855,REMOVE
1323,,Sensitive,open,,,,,,,,...,,,,6899,peggydnew,Listed - refer to https://www.environment.sa.g...,,30,973888,REMOVE
1324,,Sensitive,open,,,,,,,,...,,,,6899,peggydnew,Listed - refer to https://www.environment.sa.g...,,30,985265,REMOVE


## Create output files

In [30]:
# UPDATES
# Headers: action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
updates = pd.DataFrame(mergedstatuses[mergedstatuses['action'].isin(['UPDATE','REMOVE'])])
updates = updates[['action','state_scientificName','inat_id','inat_taxon_id','state_status','state_iucn_equivalent','state_authority','state_url','state_geoprivacy','place_id','username','description']]
updates.columns = updates.columns.str.replace("state_", "", regex=True)
updates.columns = updates.columns.str.replace("inat_", "", regex=True)
updates = updates.rename(columns={'scientificName':'taxon_name'})

# ADDITIONS
# Headers: Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id
additions = pd.DataFrame(mergedstatuses[mergedstatuses['action'] == "ADD"])
additions = additions[['action','state_scientificName','inat_id','inat_taxon_id','state_status','state_iucn_equivalent','state_authority','state_url','state_geoprivacy','place_id','username','description']]
additions = additions.rename(columns={'state_scientificName':'Taxon_Name',
                                      'state_status':'Status',
                                      'state_authority':'Authority',
                                      'state_iucn_equivalent':'IUCN_equivalent',
                                      'description':'Description',
                                      'place_id':'iNaturalst_Place_ID',
                                      'state_url':'url',
                                      'state_geoprivacy':'taxon_Geoprivacy',
                                      'inat_taxon_id':'taxon_id'})

# WRITE TO FILE
mergedstatusesprintfriendly.to_csv(projectdir + "data/out/summaries/sa.csv",index=False)
updates.to_csv(projectdir + "data/out/updates-sa.csv", index=False)
additions.to_csv(projectdir + "data/out/additions-sa.csv", index=False)