# iNaturalist status updates by state - WA

Using the file produced in the collate-status-taxa.ipynb: `inat-aust-status-taxa.csv` (statuses joined to taxa names), generate lists to update iNaturalist statuses

* Sensitive list: __[dr467](https://lists.ala.org.au/speciesListItem/list/dr467)__ (and [dr18406 in test](https://lists-test.ala.org.au/speciesListItem/list/dr18406))
* Conservation list: __[dr2201](https://lists.ala.org.au/speciesListItem/list/dr2201)__ (and [dr2201 in test](https://lists-test.ala.org.au/speciesListItem/list/dr2201))

## Prep - common to all states
1. Read in the inaturalist statuses & filter out this state
2. Read in the state conservation and sensitive lists
3. Prep fields incl IUCN equivalent mappings and matching to iNat taxonomy  
4. Merge and compare the state and inaturalist lists
5. Create update/removals list
6. Create additions list
7. Save files

## 1. Read in the inaturalist statuses & filter out WA


In [2]:
import pandas as pd
import sys
import os
projectdir = os.path.dirname(os.getcwd()) + "/" # parent dir of cwd
sourcedir = projectdir + "data/in/"
sys.path.append(os.path.abspath(projectdir + "notebooks/includes/"))
import list_functions  as lf

# read in the statuses file
taxastatus = pd.read_csv(sourcedir + "inat-aust-status-taxa.csv", encoding='UTF-8',na_filter=False,dtype=str)

# filter out ACT entries
def filter_state_statuses(stateregex: str, urlregex: str):
    authoritydf = taxastatus['authority'].drop_duplicates().sort_values()
    authoritydf = authoritydf[pd.Series(authoritydf).str.contains(stateregex)]
    urldf = taxastatus['url'].drop_duplicates().sort_values()
    urldf = urldf[pd.Series(urldf).str.contains(urlregex)]
    placedisplaydf = taxastatus['place_display_name'].drop_duplicates().sort_values()
    placedisplaydf = placedisplaydf[pd.Series(placedisplaydf).str.contains(stateregex)]
    placedf = taxastatus['place_name'].drop_duplicates().sort_values()
    placedf = placedf[pd.Series(placedf).str.contains(stateregex)]
    # concat all and remove duplicates
    statedf = pd.concat([taxastatus.apply(lambda row: row[taxastatus['place_display_name'].isin(placedisplaydf)]),
                         taxastatus.apply(lambda row: row[taxastatus['place_name'].isin(placedf)]),
                         taxastatus.apply(lambda row: row[taxastatus['url'].isin(urldf)]),
                         taxastatus.apply(
                             lambda row: row[taxastatus['authority'].isin(authoritydf)])]).drop_duplicates()
    return statedf.sort_values(['taxon_id', 'user_id'])

inatstatuses = filter_state_statuses(" WA |WEST AUST|West Aust|WESTERN AUSTRALIA|Western Australia", ".wa.gov.au")
inatstatuses = inatstatuses.add_prefix("inat_")
inatstatuses.groupby(['inat_status']).size()

inat_status
Conservation Dependent                 5
Critically Endangered                161
EN                                     1
EX                                     1
Endangered                           161
Extinct                               25
Migratory                             92
NT                                     3
Not listed                             1
Other Specially Protected              4
P1                                    10
P2                                    28
P3                                    31
P4                                    10
Priority 1                             2
Priority 1: Poorly-known species     535
Priority 2: Poorly-known species     569
Priority 3: Poorly-known species     707
Priority 4: Rare, Near Threatened    332
Priority Three                         1
Priority Two                           1
T                                      8
VU                                     1
Vulnerable                           193
crit

### 3. State lists


In [14]:
# %%script echo skipping # comment this line to download dataset from lists.ala.org.au the web and save locally
# dr18406 test, dr467 prod
sensitivelist = lf.download_ala_specieslist("https://lists-test.ala.org.au/ws/speciesListItems/dr18406?max=10000&includeKVP=true")
sensitivelist = lf.kvp_to_columns(sensitivelist)
sensitivelist.to_csv(sourcedir + "state-lists/wa-ala-sensitive.csv", index=False)

conservationlist = lf.download_ala_specieslist("https://lists-test.ala.org.au/ws/speciesListItems/dr2201?max=10000&includeKVP=true")
conservationlist = lf.kvp_to_columns(conservationlist)
conservationlist.to_csv(sourcedir + "state-lists/wa-ala-conservation.csv", index=False)


In [31]:
# Read sensitive list data
sensitivelist = pd.read_csv(sourcedir + "state-lists/wa-ala-sensitive.csv", dtype=str)
sensitivelist['geoprivacy'] = 'obscured'
sensitivelist['status'] = 'Sensitive'
sensitivelist['authority'] = "WA Deparment of Biodiversity, Conservation and Attractions"
conservationlist = pd.read_csv(sourcedir + "state-lists/wa-ala-conservation.csv", dtype=str)
conservationlist['geoprivacy'] = 'open'
conservationlist['authority'] = "WA Deparment of Biodiversity, Conservation and Attractions"
statelist = conservationlist[['id','name','lsid','status','geoprivacy','authority']].merge(sensitivelist[['id','name','lsid','geoprivacy','status','authority']], how="outer",on='name',suffixes=('_conservation', '_sensitive'))
statelist['status'] = statelist['status_conservation'].fillna(statelist['status_sensitive'])
statelist['authority'] = statelist['authority_conservation'].fillna(statelist['authority_sensitive'])
statelist['geoprivacy'] = statelist['geoprivacy_sensitive'].fillna(statelist['geoprivacy_conservation'])
statelist = statelist.rename(columns = {'name':'scientificName'})
statelist = statelist.add_prefix("state_")
print("Conservation list entries:" + str(len(conservationlist)))
print("Sensitive list entries:" + str(len(sensitivelist)))
statelist


Conservation list entries:4517
Sensitive list entries:4517


Unnamed: 0,state_id_conservation,state_scientificName,state_lsid_conservation,state_status_conservation,state_geoprivacy_conservation,state_authority_conservation,state_id_sensitive,state_lsid_sensitive,state_geoprivacy_sensitive,state_status_sensitive,state_authority_sensitive,state_status,state_authority,state_geoprivacy
0,3677781,Abildgaardia pachyptera,https://id.biodiversity.org.au/name/apni/51389644,Priority 1: Poorly-known species,open,"WA Deparment of Biodiversity, Conservation and...",3680564,https://id.biodiversity.org.au/name/apni/51389644,obscured,Sensitive,"WA Deparment of Biodiversity, Conservation and...",Priority 1: Poorly-known species,"WA Deparment of Biodiversity, Conservation and...",obscured
1,3677168,Abutilon sp. Hamelin (A.M. Ashby 2196),https://id.biodiversity.org.au/node/apni/2898729,Priority 2: Poorly-known species,open,"WA Deparment of Biodiversity, Conservation and...",3681471,https://id.biodiversity.org.au/node/apni/2898729,obscured,Sensitive,"WA Deparment of Biodiversity, Conservation and...",Priority 2: Poorly-known species,"WA Deparment of Biodiversity, Conservation and...",obscured
2,3674446,Abutilon sp. Onslow (F. Smith s.n. 10/9/61),ALA_DR2201_3101,Priority 3: Poorly-known species,open,"WA Deparment of Biodiversity, Conservation and...",3682995,ALA_DR2201_3101,obscured,Sensitive,"WA Deparment of Biodiversity, Conservation and...",Priority 3: Poorly-known species,"WA Deparment of Biodiversity, Conservation and...",obscured
3,3677614,Abutilon sp. Pritzelianum (S. van Leeuwen 5095),https://id.biodiversity.org.au/node/apni/2905152,Priority 3: Poorly-known species,open,"WA Deparment of Biodiversity, Conservation and...",3681444,https://id.biodiversity.org.au/node/apni/2905152,obscured,Sensitive,"WA Deparment of Biodiversity, Conservation and...",Priority 3: Poorly-known species,"WA Deparment of Biodiversity, Conservation and...",obscured
4,3678483,Abutilon sp. Quobba (H. Demarz 3858),https://id.biodiversity.org.au/node/apni/2920532,Priority 2: Poorly-known species,open,"WA Deparment of Biodiversity, Conservation and...",3680835,https://id.biodiversity.org.au/node/apni/2920532,obscured,Sensitive,"WA Deparment of Biodiversity, Conservation and...",Priority 2: Poorly-known species,"WA Deparment of Biodiversity, Conservation and...",obscured
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4512,3674910,Zephyrarchaea mainae,https://biodiversity.org.au/afd/taxa/61b8777b-...,Vulnerable,open,"WA Deparment of Biodiversity, Conservation and...",3681166,https://biodiversity.org.au/afd/taxa/61b8777b-...,obscured,Sensitive,"WA Deparment of Biodiversity, Conservation and...",Vulnerable,"WA Deparment of Biodiversity, Conservation and...",obscured
4513,3676065,Zephyrarchaea marki,https://biodiversity.org.au/afd/taxa/c135a409-...,Vulnerable,open,"WA Deparment of Biodiversity, Conservation and...",3680450,https://biodiversity.org.au/afd/taxa/c135a409-...,obscured,Sensitive,"WA Deparment of Biodiversity, Conservation and...",Vulnerable,"WA Deparment of Biodiversity, Conservation and...",obscured
4514,3675041,Zephyrarchaea melindae,https://biodiversity.org.au/afd/taxa/df8d4917-...,Vulnerable,open,"WA Deparment of Biodiversity, Conservation and...",3678900,https://biodiversity.org.au/afd/taxa/df8d4917-...,obscured,Sensitive,"WA Deparment of Biodiversity, Conservation and...",Vulnerable,"WA Deparment of Biodiversity, Conservation and...",obscured
4515,3676991,Zephyrarchaea robinsi,https://biodiversity.org.au/afd/taxa/038e56d8-...,Vulnerable,open,"WA Deparment of Biodiversity, Conservation and...",3679768,https://biodiversity.org.au/afd/taxa/038e56d8-...,obscured,Sensitive,"WA Deparment of Biodiversity, Conservation and...",Vulnerable,"WA Deparment of Biodiversity, Conservation and...",obscured


In [32]:
statelist.groupby('state_status',dropna=False).size()

state_status
Conservation Dependent                  7
Critically Endangered                 228
Endangered                            209
Extinct                                38
Migratory                              94
Other Specially Protected               4
Priority 1: Poorly-known species     1193
Priority 2: Poorly-known species      985
Priority 3: Poorly-known species     1085
Priority 4: Rare, Near Threatened     418
Sensitive                               1
Vulnerable                            255
dtype: int64

### 4. Equivalent IUCN statuses
The above list should have a mapping below

In [33]:
# 'migratory':'Vulnerable',priority 3: poorly-known species':'Data Deficient',
iucnStatusMappings = {
    'Data Deficient':'5',
    'Priority 1: Poorly-known species':'5',
    'Priority 2: Poorly-known species':'5',
    'Priority 3: Poorly-known species':'5',
    'Least concern':'10',
    'Special least concern':'10',
    'Near Threatened':'20',
    'Priority 4: Rare, Near Threatened':'20',
    'Sensitive':'30',
    'Rare':'30',
    'Vulnerable':'30',
    'Migratory':'30',
    'Conservation Dependent':'30',
    'Other Specially Protected':'30',
    'Endangered':'40',
    'Critically Endangered':'50',
    'Extinct':'70',
    'Extinct in the wild':'70'
}
statelist.groupby(['state_status'])['state_status'].count()

state_status
Conservation Dependent                  7
Critically Endangered                 228
Endangered                            209
Extinct                                38
Migratory                              94
Other Specially Protected               4
Priority 1: Poorly-known species     1193
Priority 2: Poorly-known species      985
Priority 3: Poorly-known species     1085
Priority 4: Rare, Near Threatened     418
Sensitive                               1
Vulnerable                            255
Name: state_status, dtype: int64

In [22]:
#statelist.groupby(['state_iucn'],dropna=False)['state_iucn'].count()

KeyError: 'state_iucn'

### 5. Determine best place ID to use

In [34]:
inatstatuses.groupby(['inat_place_id','inat_place_name','inat_place_display_name'])['inat_place_id'].count()
# looks like 6827 - note for extract


inat_place_id  inat_place_name    inat_place_display_name
                                                                4
6827           Western Australia  Western Australia, AU      2986
Name: inat_place_id, dtype: int64

## Merge iNaturalist statuses with State lists on scientificName


In [35]:
# set placeid
place_id = 6827
# get the inaturalist taxonomy matches for additions 
inattaxa = pd.read_csv(sourcedir + "inaturalist-australia-9/inaturalist-australia-9-taxa.csv",dtype=str,usecols=['id','name','rank','observations_count','is_active'])
inattaxa = inattaxa[(inattaxa['is_active'] == 't') & (inattaxa['rank'] == 'species')]
inattaxa = inattaxa.rename(columns = {'id':'taxon_id','name':'taxon_name'})
inattaxa = inattaxa.add_prefix("inat_")
statelist = statelist[['state_scientificName','state_status','state_geoprivacy', 'state_lsid_conservation','state_lsid_sensitive','state_authority']].merge(inattaxa,how="left",left_on='state_scientificName',right_on='inat_taxon_name',suffixes=(None,'_inat'))
statelist


Unnamed: 0,state_scientificName,state_status,state_geoprivacy,state_lsid_conservation,state_lsid_sensitive,state_authority,inat_taxon_id,inat_taxon_name,inat_rank,inat_observations_count,inat_is_active
0,Abildgaardia pachyptera,Priority 1: Poorly-known species,obscured,https://id.biodiversity.org.au/name/apni/51389644,https://id.biodiversity.org.au/name/apni/51389644,"WA Deparment of Biodiversity, Conservation and...",,,,,
1,Abutilon sp. Hamelin (A.M. Ashby 2196),Priority 2: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2898729,https://id.biodiversity.org.au/node/apni/2898729,"WA Deparment of Biodiversity, Conservation and...",,,,,
2,Abutilon sp. Onslow (F. Smith s.n. 10/9/61),Priority 3: Poorly-known species,obscured,ALA_DR2201_3101,ALA_DR2201_3101,"WA Deparment of Biodiversity, Conservation and...",,,,,
3,Abutilon sp. Pritzelianum (S. van Leeuwen 5095),Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2905152,https://id.biodiversity.org.au/node/apni/2905152,"WA Deparment of Biodiversity, Conservation and...",,,,,
4,Abutilon sp. Quobba (H. Demarz 3858),Priority 2: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2920532,https://id.biodiversity.org.au/node/apni/2920532,"WA Deparment of Biodiversity, Conservation and...",,,,,
...,...,...,...,...,...,...,...,...,...,...,...
4512,Zephyrarchaea mainae,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/61b8777b-...,https://biodiversity.org.au/afd/taxa/61b8777b-...,"WA Deparment of Biodiversity, Conservation and...",828663,Zephyrarchaea mainae,species,0,t
4513,Zephyrarchaea marki,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/c135a409-...,https://biodiversity.org.au/afd/taxa/c135a409-...,"WA Deparment of Biodiversity, Conservation and...",828664,Zephyrarchaea marki,species,0,t
4514,Zephyrarchaea melindae,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/df8d4917-...,https://biodiversity.org.au/afd/taxa/df8d4917-...,"WA Deparment of Biodiversity, Conservation and...",828667,Zephyrarchaea melindae,species,0,t
4515,Zephyrarchaea robinsi,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/038e56d8-...,https://biodiversity.org.au/afd/taxa/038e56d8-...,"WA Deparment of Biodiversity, Conservation and...",828668,Zephyrarchaea robinsi,species,0,t


In [36]:
# test for duplicates
# statelist.groupby('state_scientificName').filter(lambda x: len(x) > 1)

Unnamed: 0,state_scientificName,state_status,state_geoprivacy,state_lsid_conservation,state_lsid_sensitive,state_authority,inat_taxon_id,inat_taxon_name,inat_rank,inat_observations_count,inat_is_active


In [37]:
# prepare the export fields, common to New template and Update template
mergedstatuses = statelist.merge(inatstatuses,how="outer",left_on='state_scientificName',right_on='inat_scientificName')

# add extra fields 
# add some extra fields
mergedstatuses['place_id'] = str(place_id)
mergedstatuses['username'] = 'peggydnew'
mergedstatuses['description'] = "Listed - refer to  https://www.dpaw.wa.gov.au/plants-and-animals/threatened-species-and-communities"
mergedstatuses['state_lsid_conservation'].fillna(mergedstatuses['state_lsid_sensitive'])
mergedstatuses['state_url'] = "https://bie.ala.org.au/species/" + mergedstatuses['state_lsid_conservation']
mergedstatuses['state_iucn_equivalent'] = mergedstatuses['state_status'].map(iucnStatusMappings)
#mergedstatuses['state_status'] = mergedstatuses['state_status'].fillna('Sensitive')
#mergedstatuses['state_geoprivacy'] = mergedstatuses['state_geoprivacy'].fillna('open')
mergedstatuses['inat_taxon_id'] = mergedstatuses['inat_taxon_id_y'].fillna(mergedstatuses['inat_taxon_id_x'])
mergedstatuses['inat_scientificName'] = mergedstatuses['inat_scientificName'].fillna(mergedstatuses['inat_taxon_name'])


# UPDATE: inat status and state status both exist
# REMOVE: inat status exists, state status does not
# ADD: state status exists, inat status does not (matching taxon)
# NO MATCH: state status exists, inat taxa not found
mergedstatuses['action'] = 'na'
mergedstatuses.loc[mergedstatuses['inat_id'].notnull() & mergedstatuses['state_scientificName'].notnull(), 'action'] = "UPDATE"
mergedstatuses.loc[mergedstatuses['inat_id'].notnull() & mergedstatuses['state_scientificName'].isnull(), 'action'] = "REMOVE"
mergedstatuses.loc[mergedstatuses['inat_id'].isnull() & mergedstatuses['inat_taxon_id'].notnull(), 'action'] = "ADD"
mergedstatuses.loc[mergedstatuses['inat_id'].isnull() & mergedstatuses['inat_taxon_id'].isnull(), 'action'] = "NO MATCH"

# only update those with different values 
mergedstatuses['action'] = mergedstatuses.apply(lambda x: "NO CHANGE" if (x['action'] == "UPDATE") & ((x['state_status'] == x['inat_status']) & (x['state_geoprivacy'] == x['inat_geoprivacy'] ) & (x['state_geoprivacy'] == x['inat_geoprivacy']) & (x['state_iucn_equivalent'] == x['inat_iucn'])) else x['action'], axis=1)

# display
mergedstatusesprintfriendly = mergedstatuses[['action','inat_id','inat_taxon_id','state_scientificName','inat_scientificName', 'state_status','inat_status','state_geoprivacy','inat_geoprivacy','state_iucn_equivalent','inat_iucn','state_authority','inat_authority','state_url','inat_url','inat_description','inat_place_display_name','inat_current_synonymous_taxon_ids']]
mergedstatuses.groupby('action').size()



action
ADD           175
NO CHANGE     335
NO MATCH     1860
REMOVE        429
UPDATE       2226
dtype: int64

In [39]:
mergedstatuses[mergedstatuses['state_scientificName'] == 'Acacia incongesta']

Unnamed: 0,state_scientificName,state_status,state_geoprivacy,state_lsid_conservation,state_lsid_sensitive,state_authority,inat_taxon_id_x,inat_taxon_name,inat_rank,inat_observations_count,...,inat_preferred_common_name,inat_is_active_y,inat_current_synonymous_taxon_ids,place_id,username,description,state_url,state_iucn_equivalent,inat_taxon_id,action
121,Acacia incongesta,Priority 2: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2906554,https://id.biodiversity.org.au/node/apni/2906554,"WA Deparment of Biodiversity, Conservation and...",1253857,Acacia incongesta,species,0,...,,,,6827,peggydnew,Listed - refer to https://www.dpaw.wa.gov.au/...,https://bie.ala.org.au/species/https://id.biod...,5,1253857,ADD


## Create output files

In [38]:
# UPDATES
# Headers: action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
updates = pd.DataFrame(mergedstatuses[mergedstatuses['action'].isin(['UPDATE','REMOVE'])])
updates = updates[['action','state_scientificName','inat_id','inat_taxon_id','state_status','state_iucn_equivalent','state_authority','state_url','state_geoprivacy','place_id','username','description']]
updates.columns = updates.columns.str.replace("state_", "", regex=True)
updates.columns = updates.columns.str.replace("inat_", "", regex=True)
updates = updates.rename(columns={'scientificName':'taxon_name'})

# ADDITIONS
# Headers: Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id
additions = pd.DataFrame(mergedstatuses[mergedstatuses['action'] == "ADD"])
additions = additions[['action','state_scientificName','inat_id','inat_taxon_id','state_status','state_iucn_equivalent','state_authority','state_url','state_geoprivacy','place_id','username','description']]
additions = additions.rename(columns={'state_scientificName':'Taxon_Name',
                                      'state_status':'Status',
                                      'state_authority':'Authority',
                                      'state_iucn_equivalent':'IUCN_equivalent',
                                      'description':'Description',
                                      'place_id':'iNaturalst_Place_ID',
                                      'state_url':'url',
                                      'state_geoprivacy':'taxon_Geoprivacy',
                                      'inat_taxon_id':'taxon_id'})

# WRITE TO FILE
mergedstatusesprintfriendly.to_csv(projectdir + "data/out/summaries/wa.csv",index=False)
updates.to_csv(projectdir + "data/out/updates-wa.csv", index=False)
additions.to_csv(projectdir + "data/out/additions-wa.csv", index=False)