# iNaturalist status updates by state - WA


Using the file produced in the collate-status-taxa.ipynb: `inat-aust-status-taxa.csv`, generate lists to update iNaturalist statuses

## Prep - common to all states
1. Read in the inaturalist statuses & filter out this state
2. Read in the inaturalist [taxa list](#inaturalist-taxonomy)
3. Read in the state sensitive and conservation list, concatenate them into a single list
4. Wash the names in the state list through the gbif name parser
5. Attempt to match the state statuses to an IUCN equivalent
6. Determine the best placeID to use for this state

## Next steps:
7. Find Updates and Additions
7.1 Left join the state list with the iNaturalist statuses on scientificName
  * **Match** UPDATE the status (new details, new dept name or url)
  * **No Match** Left join the remainder (noinatstatus) to the inat taxonomy
     * Yes - ADD new status record
     * No - REPORT. Seek synonyms for the taxon, or create species in iNat for critical species

8. Find [Removals](##removals) - Left join the inaturalist statuses with the update list. Report on the remainder.


In [1]:
import pandas as pd

#projectdir = "/Users/oco115/PycharmProjects/authoritative-lists/" # basedir for this gh project
projectdir = "/Users/new330/IdeaProjects/authoritative-lists/" # basedir for this gh project
sourcedir = projectdir + "source-data/inaturalist-statuses/"
listdir = projectdir + "current-lists/"


# read in the statuses
taxastatus = pd.read_csv(sourcedir + "inat-aust-status-taxa.csv", encoding='UTF-8',na_filter=False,dtype=str) ## Read inaturalist conservation statuses file
taxastatus.head(3)

Unnamed: 0,id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
0,166449,38493,1138587,7830,,Flora and Fauna Guarantee Act 1988,CR,,,obscured,...,Eulamprus,kosciuskoi,,2021-03-01T10:35:01Z,Eulamprus kosciuskoi,species,http://reptile-database.reptarium.cz/search.ph...,,,
1,234788,918383,702203,9994,,Atlas of Living Australia,NT,https://bie.ala.org.au/species/https://id.biod...,,,...,Chiloschista,phyllorhiza,,2022-01-08T03:30:36Z,Chiloschista phyllorhiza,species,http://www.catalogueoflife.org/annual-checklis...,,,
2,234789,918383,702203,7308,,Atlas of Living Australia,LC,https://bie.ala.org.au/species/https://id.biod...,,,...,Chiloschista,phyllorhiza,,2022-01-08T03:30:36Z,Chiloschista phyllorhiza,species,http://www.catalogueoflife.org/annual-checklis...,,,


In [2]:
def filter_state_statuses(stateregex: str, urlregex: str):
    authoritydf = taxastatus['authority'].drop_duplicates().sort_values()
    authoritydf = authoritydf[pd.Series(authoritydf).str.contains(stateregex)]
    urldf = taxastatus['url'].drop_duplicates().sort_values()
    urldf = urldf[pd.Series(urldf).str.contains(urlregex)]
    placedisplaydf = taxastatus['place_display_name'].drop_duplicates().sort_values()
    placedisplaydf = placedisplaydf[pd.Series(placedisplaydf).str.contains(stateregex)]
    placedf = taxastatus['place_name'].drop_duplicates().sort_values()
    placedf = placedf[pd.Series(placedf).str.contains(stateregex)]
    # concat all and remove duplicates
    statedf = pd.concat([taxastatus.apply(lambda row: row[taxastatus['place_display_name'].isin(placedisplaydf)]),
                         taxastatus.apply(lambda row: row[taxastatus['place_name'].isin(placedf)]),
                         taxastatus.apply(lambda row: row[taxastatus['url'].isin(urldf)]),
                         taxastatus.apply(
                             lambda row: row[taxastatus['authority'].isin(authoritydf)])]).drop_duplicates()
    return statedf.sort_values(['taxon_id', 'user_id'])


inatstatuses = filter_state_statuses(" WA |WEST AUST|West Aust|WESTERN AUSTRALIA|Western Australia", ".wa.gov.au")
inatstatuses.rename(columns={'id':'status_id','id_y':'taxon_id_y'},inplace=True)
inatstatuses

Unnamed: 0,status_id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
1990,153386,100948,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Fibulacamptus,bisetosus,,2021-10-28T19:35:25Z,Fibulacamptus bisetosus,species,http://www.iucnredlist.org/apps/redlist/details,,,
2158,153585,101164,708886,6827,16654,WA Department of Environment and Convservation,vulnerable,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Galaxiella,munda,,2019-11-23T07:13:44Z,Galaxiella munda,species,http://www.fishbase.org,,,
1982,153373,101165,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Galaxiella,nigrostriata,,2019-11-23T07:13:25Z,Galaxiella nigrostriata,species,http://www.fishbase.org,,,
2346,153802,101474,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Glacidorbis,occidentalis,,2021-10-29T17:47:10Z,Glacidorbis occidentalis,species,http://www.catalogueoflife.org/annual-checklis...,,,
2019,153420,101509,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Glyphis,garricki,,2019-04-18T19:04:37Z,Glyphis garricki,species,http://www.fishbase.org,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2311,153762,99344,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Diplodactylus,capensis,,2018-11-17T23:47:54Z,Diplodactylus capensis,species,http://reptile-database.reptarium.cz/search.ph...,,,
2039,153443,99634,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Dupucharopa,millestriata,,2021-10-29T15:16:07Z,Dupucharopa millestriata,species,http://www.catalogueoflife.org/annual-checklis...,,,
2185,153613,99973,708886,6827,16654,WA Department of Environment and Convservation,critically endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,,,,,Engaewa pseudoreducta,,,,False,[]
2293,153742,99974,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Engaewa,reducta,,2020-05-28T04:59:18Z,Engaewa reducta,species,http://www.iucnredlist.org/apps/redlist/details,,,


### 2. iNaturalist taxonomy

In [3]:
# Output files contain these fields
# Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id
# so we need to match species from the state lists to the inat taxa to get the taxon_id

import zipfile
url = "https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip"
filename = url.split("/")[-1]

z=zipfile.ZipFile(sourcedir + filename)

with z.open('taxa.csv') as from_archive:
    inattaxa = pd.read_csv(from_archive,dtype=str)
z.close()
inattaxa.head(3)


Unnamed: 0,id,taxonID,identifier,parentNameUsageID,kingdom,phylum,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references
0,1,https://www.inaturalist.org/taxa/1,https://www.inaturalist.org/taxa/1,https://www.inaturalist.org/taxa/48460,Animalia,,,,,,,,2021-11-02T06:05:44Z,Animalia,kingdom,http://www.catalogueoflife.org/annual-checklis...
1,2,https://www.inaturalist.org/taxa/2,https://www.inaturalist.org/taxa/2,https://www.inaturalist.org/taxa/1,Animalia,Chordata,,,,,,,2021-11-23T00:40:18Z,Chordata,phylum,http://www.catalogueoflife.org/annual-checklis...
2,3,https://www.inaturalist.org/taxa/3,https://www.inaturalist.org/taxa/3,https://www.inaturalist.org/taxa/355675,Animalia,Chordata,Aves,,,,,,2022-12-27T07:33:16Z,Aves,class,http://www.catalogueoflife.org/annual-checklis...


### 3. State lists

Get the ALA Sensitive list: `geoprivacy` = `obscured`
Everything on the WA list is sensitive


In [4]:
%%script echo skipping # comment this line to download dataset from lists.ala.org.au the web and save locally

import sys
import os
sys.path.append(os.path.abspath(projectdir + "source-code/includes"))
import list_functions as lf
alasensitivelist = lf.download_ala_list("https://lists-test.ala.org.au/ws/speciesListItems/dr18406?max=10000&includeKVP=true")
alasensitivelist = lf.kvp_to_columns(alasensitivelist)
alasensitivelist.to_csv(sourcedir + "wa-ala.csv", index=False)

skipping # comment this line to download dataset from lists.ala.org.au the web and save locally


Use the GBIF names parser to clean up the names

In [5]:
%%script echo skipping # comment this line to run the gbif parser again the web and save a file locally

import requests

namesonly = alasensitivelist['name']
url = "https://api.gbif.org/v1/parser/name"
headers = {'content-type' : 'application/json'}
data = namesonly.to_json(orient="values")
params = {'name':data}
r = requests.post(url=url,data=data,headers=headers)
results = pd.read_json(r.text)
results.to_csv(sourcedir + "wa-gbif.csv",index=False)
results

skipping # comment this line to run the gbif parser again the web and save a file locally


In [6]:
%%script echo skipping single gbif parser test
import requests
t = requests.get(url="https://api.gbif.org/v1/parser/name?name=Calandrinia sp. Berry Springs (M.O. Parker 855) PN)")
t.text

skipping single gbif parser test


Merge the parsed names back into the dataset

In [7]:
alasensitivelist = pd.read_csv(sourcedir + "wa-ala.csv", dtype=str)
parsednames = pd.read_csv(sourcedir + "wa-gbif.csv", dtype=str)
alasensitivelist = alasensitivelist.merge(parsednames[['scientificName','canonicalName','canonicalNameComplete','type','rankMarker']],how="left",left_on="name",right_on="scientificName")

***Report - unsuccessful parsed names with important statuses*** - iNaturalist won't accept names containing punctuation

In [8]:
noncomply = alasensitivelist[alasensitivelist['type'].isin(['INFORMAL','CULTIVAR','HYBRID'])]
#[['name','scientificName_x','scientificName_y','lsid','canonicalName','canonicalNameComplete','type','rankMarker','status']]
#noncomply = alasensitivelist[(alasensitivelist['type'].isin(['INFORMAL','CULTIVAR','HYBRID']) & (alasensitivelist['sourceStatus'].isin(['CD','CR','EN','VU'])))][['name','scientificName_x','scientificName_y','lsid','canonicalName','canonicalNameComplete','type','rankMarker']]
noncomply
#cols for debugging
#alasensitivelist[['name','scientificName_x','scientificName_y','lsid','canonicalName','canonicalNameComplete','type','rankMarker']]


Unnamed: 0.1,Unnamed: 0,id,name,commonName,scientificName_x,lsid,dataResourceUid,kvpValues,taxonId,W A Status,...,status description,category,sensitivityZoneId,W A Rank,taxonRemarks,scientificName_y,canonicalName,canonicalNameComplete,type,rankMarker
1,0,2818496,Abutilon sp. Hamelin (A.M. Ashby 2196),,Abutilon sp. Hamelin (A.M.Ashby 2196),https://id.biodiversity.org.au/node/apni/2898729,dr18406,"[{'key': 'taxonId', 'value': '14112'}, {'key':...",14112,2,...,Priority 2: Poorly-known species - known from ...,P2,WA,,,Abutilon sp. Hamelin (A.M. Ashby 2196),Abutilon spec.,Abutilon spec. Hamelin,INFORMAL,sp.
2,0,2821218,Abutilon sp. Onslow (F. Smith s.n. 10/9/61),,Abutilon sp. Onslow (F. Smith s.n. 10/9/61),ALA_DR2201_3101,dr18406,"[{'key': 'taxonId', 'value': '14110'}, {'key':...",14110,3,...,Priority 3: Poorly-known species - known from ...,P3,WA,,,Abutilon sp. Onslow (F. Smith s.n. 10/9/61),Abutilon spec.,Abutilon spec. Onslow,INFORMAL,sp.
3,0,2819452,Abutilon sp. Pritzelianum (S. van Leeuwen 5095),,Abutilon sp. Pritzelianum (S.van Leeuwen 5095),https://id.biodiversity.org.au/node/apni/2905152,dr18406,"[{'key': 'taxonId', 'value': '43021'}, {'key':...",43021,3,...,Priority 3: Poorly-known species - known from ...,P3,WA,,,Abutilon sp. Pritzelianum (S. van Leeuwen 5095),Abutilon spec.,Abutilon spec. Pritzelianum,INFORMAL,sp.
4,0,2820137,Abutilon sp. Quobba (H. Demarz 3858),,Abutilon sp. Quobba (H.Demarz 3858),https://id.biodiversity.org.au/node/apni/2920532,dr18406,"[{'key': 'taxonId', 'value': '14114'}, {'key':...",14114,2,...,Priority 2: Poorly-known species - known from ...,P2,WA,,,Abutilon sp. Quobba (H. Demarz 3858),Abutilon spec.,Abutilon spec. Quobba,INFORMAL,sp.
5,0,2819481,Abutilon sp. Warburton (A.S. George 8164),,Abutilon sp. Warburton (A.S.George 8164),https://id.biodiversity.org.au/node/apni/2890423,dr18406,"[{'key': 'taxonId', 'value': '14155'}, {'key':...",14155,1,...,Priority 1: Poorly-known species - known from ...,P1,WA,,,Abutilon sp. Warburton (A.S. George 8164),Abutilon spec.,Abutilon spec. Warburton,INFORMAL,sp.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4119,0,2819432,Ideoblothrus sp. 'Mesa A' (WAM T81374),An Ideoblothrus Pseudoscorpion (mesa A),Ideoblothrus sp. 'Mesa A' (WAM T81374),ALA_DR2201_3612,dr18406,"[{'key': 'status', 'value': 'Priority 1: Poorl...",,,...,Priority 1: Poorly-known species - known from ...,P1,WA,,,Ideoblothrus sp. 'Mesa A' (WAM T81374),Ideoblothrus spec.,Ideoblothrus spec. Mesa A,INFORMAL,sp.
4162,0,2820631,Lagorchestes hirsutus subsp. (Central Australia),Rufous Hare-wallaby (central Australia),Lagorchestes hirsutus subsp. (Central Australia),ALA_DR2201_3947,dr18406,"[{'key': 'status', 'value': 'Endangered'}, {'k...",,,...,Endangered,EN,WA,,,Lagorchestes hirsutus subsp. (Central Australia),Lagorchestes hirsutus subsp.,Lagorchestes hirsutus subsp.,INFORMAL,subsp.
4355,0,2821842,Rhytidid sp. (WAM 2295-69),Stirling Range Rhytidid Snail,Rhytidid sp. (WAM 2295-69),ALA_DR2201_4164,dr18406,"[{'key': 'status', 'value': 'Critically Endang...",,,...,Critically Endangered,CR,WA,,,Rhytidid sp. (WAM 2295-69),Rhytidid spec.,Rhytidid spec.,INFORMAL,sp.
4397,0,2820000,Teyl sp. (MYG693),,Teyl,https://biodiversity.org.au/afd/taxa/7cdbd74b-...,dr18406,"[{'key': 'taxonRemarks', 'value': '30/09/2022 ...",,,...,Critically Endangered,CR,WA,,30/09/2022 Museum specimen reference number ch...,Teyl sp. (MYG693),Teyl spec.,Teyl spec.,INFORMAL,sp.


Prepare final list for matching to inaturalist names

In [9]:
alasensitivelist = alasensitivelist[~alasensitivelist['type'].isin(['INFORMAL','CULTIVAR','HYBRID']) ] # remove 543 INFORMAL, 3 CULTIVAR, 14 HYBRID
# Identify records that won't comply with iNaturalist species names
alasensitivelist['wa_geoprivacy'] = 'obscured'
alasensitivelist['wa_taxonID'] = alasensitivelist['taxonId']#.apply(lambda x: int(float(x)))
alasensitivelist['wa_scientificName'] = alasensitivelist['canonicalName']
alasensitivelist['wa_status'] = alasensitivelist['status']
statelist = pd.DataFrame(alasensitivelist[['wa_taxonID','wa_scientificName','wa_status','wa_geoprivacy','lsid']])
numfullstatelist = len(statelist.index)
statelist

Unnamed: 0,wa_taxonID,wa_scientificName,wa_status,wa_geoprivacy,lsid
0,50593,Abildgaardia pachyptera,Priority 1: Poorly-known species,obscured,https://id.biodiversity.org.au/name/apni/51389644
6,14044,Acacia adinophylla,Priority 1: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2898130
7,44442,Acacia adjutrices,Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/taxon/apni/5128...
8,16110,Acacia alata platyptera,"Priority 4: Rare, Near Threatened",obscured,https://id.biodiversity.org.au/node/apni/2904348
9,13074,Acacia alexandri,Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2904701
...,...,...,...,...,...
4450,,Zephyrarchaea mainae,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/61b8777b-...
4451,,Zephyrarchaea marki,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/c135a409-...
4452,,Zephyrarchaea melindae,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/df8d4917-...
4453,,Zephyrarchaea robinsi,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/038e56d8-...


In [10]:
# check for duplicates with conflicting information
dupinformation = statelist.groupby('wa_taxonID').filter(lambda x: len(x) > 1)#.sort('size',ascending=False)
dupinformation

Unnamed: 0,wa_taxonID,wa_scientificName,wa_status,wa_geoprivacy,lsid


### 4. Equivalent IUCN statuses

In [11]:
iucn_statuses = {'Not Evaluated', 'Data Deficient', 'Least Concern', 'Near Threatened', 'Vulnerable', 'Endangered', 'Critically Endangered', 'Extinct in the Wild','Extinct'}
statelist.groupby(['wa_status'])['wa_status'].count()

wa_status
Conservation Dependent                 6
Critically Endangered                210
Endangered                           194
Extinct                               37
Migratory                             94
Other Specially Protected              4
Priority 1: Poorly-known species     885
Priority 2: Poorly-known species     827
Priority 3: Poorly-known species     981
Priority 4: Rare, Near Threatened    397
Vulnerable                           260
Name: wa_status, dtype: int64

In [12]:
# these will be used to populate the iucn_equivalent field
iucnStatusMappings = {
    'conservation dependent': 'Vulnerable',
    'critically endangered': 'Critically Endangered',
    'endangered':'Endangered',
    'extinct':'Extinct',
    'migratory':'Vulnerable',
    'other specially protencted':'Vulnerable',
    'priority 1: poorly-known species':'Data Deficient',
    'priority 2: poorly-known species':'Data Deficient',
    'priority 3: poorly-known species':'Data Deficient',
    'priority 4: rare, near threatened':'Vulnerable',
    'vulnerable':'Vulnerable',
    'not evaluated':'Not Evaluated',
    'data deficient':'Data Deficient',
    'least concern':'Least Concern',
    'special least concern':'Least Concern',
    'near threatened':'Near Threatened',
    'extinct in the wild':'Extinct in the Wild',
}

### 5. Determine best place ID to use

In [13]:
inatstatuses.groupby(['place_id','place_name','place_display_name'])['place_id'].count()
# looks like 6827 - note for extract


place_id  place_name         place_display_name   
                                                        1
6744      Australia          Australia                  1
6827      Western Australia  Western Australia, AU    982
Name: place_id, dtype: int64

## Merge iNaturalist statuses with State lists on scientificName

1. Match - updates, even if the statuses are the same we'll update the links and values anyway
2. No match - statuses to be added (additions)
   1.1 No match and no taxnomy - search for synonyms
   1.2 No match
3. Merge the other direction to see if there are deletes?


In [14]:
# join to see which lists already have a status in inaturalist based on scientificName
mergedstatuses = statelist[['wa_taxonID','wa_scientificName','wa_status','wa_geoprivacy','lsid']].merge(inatstatuses[['status_id','scientificName','taxon_id','user_id','description','iucn','authority','status','geoprivacy','place_id','place_display_name']],how="left",left_on='wa_scientificName',right_on='scientificName',suffixes=(None,'_inat')).sort_values(['scientificName'])
mergedstatuses


Unnamed: 0,wa_taxonID,wa_scientificName,wa_status,wa_geoprivacy,lsid,status_id,scientificName,taxon_id,user_id,description,iucn,authority,status,geoprivacy,place_id,place_display_name
1,14044,Acacia adinophylla,Priority 1: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2898130,152923,Acacia adinophylla,898581,708886,,40,WA Department of Environment and Convservation,endangered,obscured,6827,"Western Australia, AU"
2,44442,Acacia adjutrices,Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/taxon/apni/5128...,153375,Acacia adjutrices,898583,708886,,40,WA Department of Environment and Convservation,endangered,obscured,6827,"Western Australia, AU"
4,13074,Acacia alexandri,Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2904701,153652,Acacia alexandri,898592,708886,,40,WA Department of Environment and Convservation,endangered,obscured,6827,"Western Australia, AU"
5,14046,Acacia ampliata,Priority 1: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2919087,153206,Acacia ampliata,827789,708886,,40,WA Department of Environment and Convservation,endangered,obscured,6827,"Western Australia, AU"
6,14047,Acacia amyctica,Priority 2: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2902736,153042,Acacia amyctica,898602,708886,,40,WA Department of Environment and Convservation,endangered,obscured,6827,"Western Australia, AU"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3871,,Turnix varius scintillans,Endangered,obscured,https://biodiversity.org.au/afd/taxa/a5f510c9-...,,,,,,,,,,,
3872,,Tursiops aduncus,Migratory,obscured,https://biodiversity.org.au/afd/taxa/0cfe42e3-...,,,,,,,,,,,
3873,,Tyto novaehollandiae kimberli,Priority 1: Poorly-known species,obscured,https://biodiversity.org.au/afd/taxa/d1a27333-...,,,,,,,,,,,
3874,,Tyto novaehollandiae novaehollandiae,Priority 3: Poorly-known species,obscured,https://biodiversity.org.au/afd/taxa/44488be2-...,,,,,,,,,,,


In [15]:
# prepare the export fields, common to New template and Update template
# new statuses
# Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id
# updates
# action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username
#mergedstatuses['new_description'] = "Listed as sensitive - refer to https://www.dpaw.wa.gov.au/plants-and-animals/threatened-species-and-communities"
# url is either a florabase url or bie page
florabaseurl = "https://florabase.dpaw.wa.gov.au/browse/profile/"
biesearchurl = "https://bie.ala.org.au/species/" # eg + "https://id.biodiversity.org.au/node/apni/2894366"
mergedstatuses['new_url'] = mergedstatuses.apply(lambda x: biesearchurl + x['lsid'] if pd.isna(x['wa_taxonID']) else florabaseurl + x['wa_taxonID'],axis=1)
floradescrurl = "Listed as Confidential - refer to https://www.dpaw.wa.gov.au/plants-and-animals/threatened-species-and-communities/threatened-plants"
faunadescrurl = "Listed as Confidential - refer to https://www.dpaw.wa.gov.au/plants-and-animals/threatened-species-and-communities/threatened-animals"
mergedstatuses['new_description'] = mergedstatuses.apply(lambda x: faunadescrurl if pd.isna(x['wa_taxonID']) else floradescrurl,axis=1)
mergedstatuses['new_authority'] = "WA Deparment of Biodiversity, Conservation and Attractions"
mergedstatuses.rename(columns={'wa_geoprivacy':'new_geoprivacy'},inplace=True)
mergedstatuses['new_place_id'] = '6827'  # Queensland, AU
mergedstatuses['new_username'] = 'peggydnew'
mergedstatuses['new_iucn_equivalent'] = mergedstatuses['status'].str.lower().str.strip().map(iucnStatusMappings).fillna('Vulnerable') # map to dictionary
mergedstatuses['new_status'] = mergedstatuses['wa_status'].fillna('Sensitive')
mergedstatuses

Unnamed: 0,wa_taxonID,wa_scientificName,wa_status,new_geoprivacy,lsid,status_id,scientificName,taxon_id,user_id,description,...,geoprivacy,place_id,place_display_name,new_url,new_description,new_authority,new_place_id,new_username,new_iucn_equivalent,new_status
1,14044,Acacia adinophylla,Priority 1: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2898130,152923,Acacia adinophylla,898581,708886,,...,obscured,6827,"Western Australia, AU",https://florabase.dpaw.wa.gov.au/browse/profil...,Listed as Confidential - refer to https://www....,"WA Deparment of Biodiversity, Conservation and...",6827,peggydnew,Endangered,Priority 1: Poorly-known species
2,44442,Acacia adjutrices,Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/taxon/apni/5128...,153375,Acacia adjutrices,898583,708886,,...,obscured,6827,"Western Australia, AU",https://florabase.dpaw.wa.gov.au/browse/profil...,Listed as Confidential - refer to https://www....,"WA Deparment of Biodiversity, Conservation and...",6827,peggydnew,Endangered,Priority 3: Poorly-known species
4,13074,Acacia alexandri,Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2904701,153652,Acacia alexandri,898592,708886,,...,obscured,6827,"Western Australia, AU",https://florabase.dpaw.wa.gov.au/browse/profil...,Listed as Confidential - refer to https://www....,"WA Deparment of Biodiversity, Conservation and...",6827,peggydnew,Endangered,Priority 3: Poorly-known species
5,14046,Acacia ampliata,Priority 1: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2919087,153206,Acacia ampliata,827789,708886,,...,obscured,6827,"Western Australia, AU",https://florabase.dpaw.wa.gov.au/browse/profil...,Listed as Confidential - refer to https://www....,"WA Deparment of Biodiversity, Conservation and...",6827,peggydnew,Endangered,Priority 1: Poorly-known species
6,14047,Acacia amyctica,Priority 2: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2902736,153042,Acacia amyctica,898602,708886,,...,obscured,6827,"Western Australia, AU",https://florabase.dpaw.wa.gov.au/browse/profil...,Listed as Confidential - refer to https://www....,"WA Deparment of Biodiversity, Conservation and...",6827,peggydnew,Endangered,Priority 2: Poorly-known species
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3871,,Turnix varius scintillans,Endangered,obscured,https://biodiversity.org.au/afd/taxa/a5f510c9-...,,,,,,...,,,,https://bie.ala.org.au/species/https://biodive...,Listed as Confidential - refer to https://www....,"WA Deparment of Biodiversity, Conservation and...",6827,peggydnew,Vulnerable,Endangered
3872,,Tursiops aduncus,Migratory,obscured,https://biodiversity.org.au/afd/taxa/0cfe42e3-...,,,,,,...,,,,https://bie.ala.org.au/species/https://biodive...,Listed as Confidential - refer to https://www....,"WA Deparment of Biodiversity, Conservation and...",6827,peggydnew,Vulnerable,Migratory
3873,,Tyto novaehollandiae kimberli,Priority 1: Poorly-known species,obscured,https://biodiversity.org.au/afd/taxa/d1a27333-...,,,,,,...,,,,https://bie.ala.org.au/species/https://biodive...,Listed as Confidential - refer to https://www....,"WA Deparment of Biodiversity, Conservation and...",6827,peggydnew,Vulnerable,Priority 1: Poorly-known species
3874,,Tyto novaehollandiae novaehollandiae,Priority 3: Poorly-known species,obscured,https://biodiversity.org.au/afd/taxa/44488be2-...,,,,,,...,,,,https://bie.ala.org.au/species/https://biodive...,Listed as Confidential - refer to https://www....,"WA Deparment of Biodiversity, Conservation and...",6827,peggydnew,Vulnerable,Priority 3: Poorly-known species


## Updates

In [16]:
# those that need to be updated - we found a status
mergedstatuses[mergedstatuses['status_id'].notnull()][['wa_scientificName','wa_status','status_id','taxon_id','status','new_geoprivacy','geoprivacy','authority','user_id']]

Unnamed: 0,wa_scientificName,wa_status,status_id,taxon_id,status,new_geoprivacy,geoprivacy,authority,user_id
1,Acacia adinophylla,Priority 1: Poorly-known species,152923,898581,endangered,obscured,obscured,WA Department of Environment and Convservation,708886
2,Acacia adjutrices,Priority 3: Poorly-known species,153375,898583,endangered,obscured,obscured,WA Department of Environment and Convservation,708886
4,Acacia alexandri,Priority 3: Poorly-known species,153652,898592,endangered,obscured,obscured,WA Department of Environment and Convservation,708886
5,Acacia ampliata,Priority 1: Poorly-known species,153206,827789,endangered,obscured,obscured,WA Department of Environment and Convservation,708886
6,Acacia amyctica,Priority 2: Poorly-known species,153042,898602,endangered,obscured,obscured,WA Department of Environment and Convservation,708886
...,...,...,...,...,...,...,...,...,...
3894,Zephyrarchaea melindae,Vulnerable,153596,828667,vulnerable,obscured,obscured,WA Department of Environment and Convservation,708886
3895,Zephyrarchaea robinsi,Vulnerable,153176,828668,vulnerable,obscured,obscured,WA Department of Environment and Convservation,708886
3314,Zeuxine oblonga,Priority 2: Poorly-known species,169907,369267,NT,obscured,,Atlas of Living Australia,702203
3315,Zeuxine oblonga,Priority 2: Poorly-known species,153758,369267,NT,obscured,obscured,WA Department of Environment and Convservation,708886


In [17]:
# updates - create the data frame
# action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
updates = pd.DataFrame(mergedstatuses[mergedstatuses['status_id'].notnull()])
updates.sort_values('scientificName')
updates['action'] = 'UPDATE'
#updates.loc[:,'action'] = 'UPDATE'
updates = updates[['action','scientificName','status_id','taxon_id','new_status','new_iucn_equivalent','new_authority','new_url','new_geoprivacy','new_place_id','new_username','new_description']]
updates.columns = updates.columns.str.replace("new_", "", regex=True)
updates = updates.rename(columns={'scientificName':'taxon_name',
                                  'status_id':'id'})
updates

Unnamed: 0,action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
1,UPDATE,Acacia adinophylla,152923,898581,Priority 1: Poorly-known species,Endangered,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
2,UPDATE,Acacia adjutrices,153375,898583,Priority 3: Poorly-known species,Endangered,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
4,UPDATE,Acacia alexandri,153652,898592,Priority 3: Poorly-known species,Endangered,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
5,UPDATE,Acacia ampliata,153206,827789,Priority 1: Poorly-known species,Endangered,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
6,UPDATE,Acacia amyctica,153042,898602,Priority 2: Poorly-known species,Endangered,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
...,...,...,...,...,...,...,...,...,...,...,...,...
3894,UPDATE,Zephyrarchaea melindae,153596,828667,Vulnerable,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://bie.ala.org.au/species/https://biodive...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
3895,UPDATE,Zephyrarchaea robinsi,153176,828668,Vulnerable,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://bie.ala.org.au/species/https://biodive...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
3314,UPDATE,Zeuxine oblonga,169907,369267,Priority 2: Poorly-known species,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
3315,UPDATE,Zeuxine oblonga,153758,369267,Priority 2: Poorly-known species,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....


## No status in iNaturalist via straight scientificName match
The WA records that didn't match up to a status in iNaturalist

In [18]:
# to add: those that have no inaturalist status - 532!!
noinatstatus = mergedstatuses[mergedstatuses['status_id'].isnull()]
# try to match the taxon name to something in inaturalist
noinatstatus = noinatstatus.merge(inattaxa, how="left", left_on="wa_scientificName",right_on="scientificName")
noinatstatus

Unnamed: 0,wa_taxonID,wa_scientificName,wa_status,new_geoprivacy,lsid,status_id,scientificName_x,taxon_id,user_id,description,...,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName_y,taxonRank,references
0,50593,Abildgaardia pachyptera,Priority 1: Poorly-known species,obscured,https://id.biodiversity.org.au/name/apni/51389644,,,,,,...,,,,,,,,,,
1,16110,Acacia alata platyptera,"Priority 4: Rare, Near Threatened",obscured,https://id.biodiversity.org.au/node/apni/2904348,,,,,,...,Magnoliopsida,Fabales,Fabaceae,Acacia,alata,platyptera,2019-02-16T06:09:01Z,Acacia alata platyptera,variety,http://www.ubio.org/browser/details.php?nameba...
2,14585,Acacia ancistrophylla lissophylla,Priority 2: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2916096,,,,,,...,Magnoliopsida,Fabales,Fabaceae,Acacia,ancistrophylla,lissophylla,2022-03-07T22:20:25Z,Acacia ancistrophylla lissophylla,variety,https://powo.science.kew.org/taxon/urn:lsid:ip...
3,14048,Acacia ancistrophylla perarcuata,Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2910813,,,,,,...,Magnoliopsida,Fabales,Fabaceae,Acacia,ancistrophylla,perarcuata,2021-07-28T02:21:59Z,Acacia ancistrophylla perarcuata,variety,https://eol.org/pages/50482478
4,14725,Acacia ataxiphylla ataxiphylla,Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2903075,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3019,,Turnix varius scintillans,Endangered,obscured,https://biodiversity.org.au/afd/taxa/a5f510c9-...,,,,,,...,Aves,Charadriiformes,Turnicidae,Turnix,varius,scintillans,2018-12-19T06:54:39Z,Turnix varius scintillans,subspecies,http://www.birds.cornell.edu/clementschecklist...
3020,,Tursiops aduncus,Migratory,obscured,https://biodiversity.org.au/afd/taxa/0cfe42e3-...,,,,,,...,Mammalia,Artiodactyla,Delphinidae,Tursiops,aduncus,,2019-11-23T00:16:07Z,Tursiops aduncus,species,http://www.catalogueoflife.org/annual-checklis...
3021,,Tyto novaehollandiae kimberli,Priority 1: Poorly-known species,obscured,https://biodiversity.org.au/afd/taxa/d1a27333-...,,,,,,...,Aves,Strigiformes,Tytonidae,Tyto,novaehollandiae,kimberli,2018-12-19T08:22:30Z,Tyto novaehollandiae kimberli,subspecies,
3022,,Tyto novaehollandiae novaehollandiae,Priority 3: Poorly-known species,obscured,https://biodiversity.org.au/afd/taxa/44488be2-...,,,,,,...,Aves,Strigiformes,Tytonidae,Tyto,novaehollandiae,novaehollandiae,2018-12-19T08:22:32Z,Tyto novaehollandiae novaehollandiae,subspecies,


In [19]:
noinatstatus[noinatstatus['id'].notna()] # there's no status but there is a matching inat taxon (id is the taxon id)
# note: "Dendrobium" matches to both genus and section

Unnamed: 0,wa_taxonID,wa_scientificName,wa_status,new_geoprivacy,lsid,status_id,scientificName_x,taxon_id,user_id,description,...,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName_y,taxonRank,references
1,16110,Acacia alata platyptera,"Priority 4: Rare, Near Threatened",obscured,https://id.biodiversity.org.au/node/apni/2904348,,,,,,...,Magnoliopsida,Fabales,Fabaceae,Acacia,alata,platyptera,2019-02-16T06:09:01Z,Acacia alata platyptera,variety,http://www.ubio.org/browser/details.php?nameba...
2,14585,Acacia ancistrophylla lissophylla,Priority 2: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2916096,,,,,,...,Magnoliopsida,Fabales,Fabaceae,Acacia,ancistrophylla,lissophylla,2022-03-07T22:20:25Z,Acacia ancistrophylla lissophylla,variety,https://powo.science.kew.org/taxon/urn:lsid:ip...
3,14048,Acacia ancistrophylla perarcuata,Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2910813,,,,,,...,Magnoliopsida,Fabales,Fabaceae,Acacia,ancistrophylla,perarcuata,2021-07-28T02:21:59Z,Acacia ancistrophylla perarcuata,variety,https://eol.org/pages/50482478
6,31784,Acacia barrettiorum,Priority 2: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2917300,,,,,,...,Magnoliopsida,Fabales,Fabaceae,Acacia,barrettiorum,,2022-04-07T01:35:57Z,Acacia barrettiorum,species,https://eol.org/pages/49426101
7,41461,Acacia bartlei,Priority 3: Poorly-known species,obscured,https://id.biodiversity.org.au/node/apni/2895538,,,,,,...,Magnoliopsida,Fabales,Fabaceae,Acacia,bartlei,,2022-04-07T01:36:00Z,Acacia bartlei,species,https://eol.org/pages/49426080
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3018,,Turgenitubulus christenseni,Endangered,obscured,https://biodiversity.org.au/afd/taxa/06192667-...,,,,,,...,Gastropoda,Stylommatophora,Camaenidae,Turgenitubulus,christenseni,,2021-10-29T18:32:13Z,Turgenitubulus christenseni,species,http://www.catalogueoflife.org/annual-checklis...
3019,,Turnix varius scintillans,Endangered,obscured,https://biodiversity.org.au/afd/taxa/a5f510c9-...,,,,,,...,Aves,Charadriiformes,Turnicidae,Turnix,varius,scintillans,2018-12-19T06:54:39Z,Turnix varius scintillans,subspecies,http://www.birds.cornell.edu/clementschecklist...
3020,,Tursiops aduncus,Migratory,obscured,https://biodiversity.org.au/afd/taxa/0cfe42e3-...,,,,,,...,Mammalia,Artiodactyla,Delphinidae,Tursiops,aduncus,,2019-11-23T00:16:07Z,Tursiops aduncus,species,http://www.catalogueoflife.org/annual-checklis...
3021,,Tyto novaehollandiae kimberli,Priority 1: Poorly-known species,obscured,https://biodiversity.org.au/afd/taxa/d1a27333-...,,,,,,...,Aves,Strigiformes,Tytonidae,Tyto,novaehollandiae,kimberli,2018-12-19T08:22:30Z,Tyto novaehollandiae kimberli,subspecies,


In [20]:
# there's no status but there is a matching inat taxon (id is the taxon id)
additions = pd.DataFrame(noinatstatus[noinatstatus['id'].notna()])
additions['scientificName'] = additions['wa_scientificName']
#additions['new_status'] = additions['wa_status']
additions.sort_values(['scientificName'])
additions['action'] = 'ADD'
additions = additions[['action','scientificName','status_id','id','new_status','new_iucn_equivalent','new_authority','new_url','new_geoprivacy','new_place_id','new_username','new_description']]
additions.columns = additions.columns.str.replace("new_", "", regex=True)
additions = additions.rename(columns={'scientificName':'taxon_name',
                                      'id':'taxon_id',
                                  'status_id':'id'})
additions

Unnamed: 0,action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
1,ADD,Acacia alata platyptera,,145423,"Priority 4: Rare, Near Threatened",Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
2,ADD,Acacia ancistrophylla lissophylla,,1361077,Priority 2: Poorly-known species,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
3,ADD,Acacia ancistrophylla perarcuata,,1252488,Priority 3: Poorly-known species,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
6,ADD,Acacia barrettiorum,,1252534,Priority 2: Poorly-known species,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
7,ADD,Acacia bartlei,,1252535,Priority 3: Poorly-known species,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://florabase.dpaw.wa.gov.au/browse/profil...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
...,...,...,...,...,...,...,...,...,...,...,...,...
3018,ADD,Turgenitubulus christenseni,,1242615,Endangered,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://bie.ala.org.au/species/https://biodive...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
3019,ADD,Turnix varius scintillans,,708564,Endangered,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://bie.ala.org.au/species/https://biodive...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
3020,ADD,Tursiops aduncus,,41481,Migratory,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://bie.ala.org.au/species/https://biodive...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....
3021,ADD,Tyto novaehollandiae kimberli,,732085,Priority 1: Poorly-known species,Vulnerable,"WA Deparment of Biodiversity, Conservation and...",https://bie.ala.org.au/species/https://biodive...,obscured,6827,peggydnew,Listed as Confidential - refer to https://www....


In [21]:
all = pd.concat([updates,additions])
all.to_csv(sourcedir + "wa.csv", index=False )

# Reports
## Statuses with no matching taxon in iNaturalist
Things that didn't match to a taxon:
1.Those that didn't play nicely with the GBIF parser
2.Those that there was no taxa match for.

In [22]:
noncomply


Unnamed: 0.1,Unnamed: 0,id,name,commonName,scientificName_x,lsid,dataResourceUid,kvpValues,taxonId,W A Status,...,status description,category,sensitivityZoneId,W A Rank,taxonRemarks,scientificName_y,canonicalName,canonicalNameComplete,type,rankMarker
1,0,2818496,Abutilon sp. Hamelin (A.M. Ashby 2196),,Abutilon sp. Hamelin (A.M.Ashby 2196),https://id.biodiversity.org.au/node/apni/2898729,dr18406,"[{'key': 'taxonId', 'value': '14112'}, {'key':...",14112,2,...,Priority 2: Poorly-known species - known from ...,P2,WA,,,Abutilon sp. Hamelin (A.M. Ashby 2196),Abutilon spec.,Abutilon spec. Hamelin,INFORMAL,sp.
2,0,2821218,Abutilon sp. Onslow (F. Smith s.n. 10/9/61),,Abutilon sp. Onslow (F. Smith s.n. 10/9/61),ALA_DR2201_3101,dr18406,"[{'key': 'taxonId', 'value': '14110'}, {'key':...",14110,3,...,Priority 3: Poorly-known species - known from ...,P3,WA,,,Abutilon sp. Onslow (F. Smith s.n. 10/9/61),Abutilon spec.,Abutilon spec. Onslow,INFORMAL,sp.
3,0,2819452,Abutilon sp. Pritzelianum (S. van Leeuwen 5095),,Abutilon sp. Pritzelianum (S.van Leeuwen 5095),https://id.biodiversity.org.au/node/apni/2905152,dr18406,"[{'key': 'taxonId', 'value': '43021'}, {'key':...",43021,3,...,Priority 3: Poorly-known species - known from ...,P3,WA,,,Abutilon sp. Pritzelianum (S. van Leeuwen 5095),Abutilon spec.,Abutilon spec. Pritzelianum,INFORMAL,sp.
4,0,2820137,Abutilon sp. Quobba (H. Demarz 3858),,Abutilon sp. Quobba (H.Demarz 3858),https://id.biodiversity.org.au/node/apni/2920532,dr18406,"[{'key': 'taxonId', 'value': '14114'}, {'key':...",14114,2,...,Priority 2: Poorly-known species - known from ...,P2,WA,,,Abutilon sp. Quobba (H. Demarz 3858),Abutilon spec.,Abutilon spec. Quobba,INFORMAL,sp.
5,0,2819481,Abutilon sp. Warburton (A.S. George 8164),,Abutilon sp. Warburton (A.S.George 8164),https://id.biodiversity.org.au/node/apni/2890423,dr18406,"[{'key': 'taxonId', 'value': '14155'}, {'key':...",14155,1,...,Priority 1: Poorly-known species - known from ...,P1,WA,,,Abutilon sp. Warburton (A.S. George 8164),Abutilon spec.,Abutilon spec. Warburton,INFORMAL,sp.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4119,0,2819432,Ideoblothrus sp. 'Mesa A' (WAM T81374),An Ideoblothrus Pseudoscorpion (mesa A),Ideoblothrus sp. 'Mesa A' (WAM T81374),ALA_DR2201_3612,dr18406,"[{'key': 'status', 'value': 'Priority 1: Poorl...",,,...,Priority 1: Poorly-known species - known from ...,P1,WA,,,Ideoblothrus sp. 'Mesa A' (WAM T81374),Ideoblothrus spec.,Ideoblothrus spec. Mesa A,INFORMAL,sp.
4162,0,2820631,Lagorchestes hirsutus subsp. (Central Australia),Rufous Hare-wallaby (central Australia),Lagorchestes hirsutus subsp. (Central Australia),ALA_DR2201_3947,dr18406,"[{'key': 'status', 'value': 'Endangered'}, {'k...",,,...,Endangered,EN,WA,,,Lagorchestes hirsutus subsp. (Central Australia),Lagorchestes hirsutus subsp.,Lagorchestes hirsutus subsp.,INFORMAL,subsp.
4355,0,2821842,Rhytidid sp. (WAM 2295-69),Stirling Range Rhytidid Snail,Rhytidid sp. (WAM 2295-69),ALA_DR2201_4164,dr18406,"[{'key': 'status', 'value': 'Critically Endang...",,,...,Critically Endangered,CR,WA,,,Rhytidid sp. (WAM 2295-69),Rhytidid spec.,Rhytidid spec.,INFORMAL,sp.
4397,0,2820000,Teyl sp. (MYG693),,Teyl,https://biodiversity.org.au/afd/taxa/7cdbd74b-...,dr18406,"[{'key': 'taxonRemarks', 'value': '30/09/2022 ...",,,...,Critically Endangered,CR,WA,,30/09/2022 Museum specimen reference number ch...,Teyl sp. (MYG693),Teyl spec.,Teyl spec.,INFORMAL,sp.


In [23]:
# what didnt match to a taxon?
unknownToInat = noinatstatus[noinatstatus['id'].isna()]
unknownToInat.groupby('wa_status').size()

wa_status
Conservation Dependent                 1
Critically Endangered                 51
Endangered                            37
Extinct                               11
Migratory                              4
Priority 1: Poorly-known species     354
Priority 2: Poorly-known species     260
Priority 3: Poorly-known species     274
Priority 4: Rare, Near Threatened     67
Vulnerable                            68
dtype: int64

In [24]:
# what didnt match to a taxon?
pd.concat([noncomply,unknownToInat]).to_csv(sourcedir + "wa-no-inat-taxa-match.csv",index=False)

## iNaturalist statuses unaffected by these changes
Candidates for investigation and removal.

In [25]:
# inat statuses that aren't in added or updated
notaddedupdated = inatstatuses[~inatstatuses['taxon_id'].isin(updates['taxon_id'])]
notaddedupdated = notaddedupdated[notaddedupdated['user_id'] == "708886"]
notaddedupdated.to_csv(sourcedir + "wa-outstanding-inat-statuses.csv")
notaddedupdated

Unnamed: 0,status_id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
664,158009,1042669,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Corinomala,tumida,,2021-10-29T05:06:21Z,Corinomala tumida,species,,,,
1801,153169,107044,708886,6827,16654,WA Department of Environment and Convservation,critically endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Ningbingia,australis,,2021-10-29T09:25:06Z,Ningbingia australis,species,http://www.catalogueoflife.org/annual-checklis...,,,
1947,153332,108173,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Pachysaga,munggai,,2011-08-04T08:56:40Z,Pachysaga munggai,species,http://www.catalogueoflife.org/annual-checklis...,,,
143,170015,1264437,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,,,,,Arctophoca forsteri,,,New Zealand Fur Seal,False,[41752]
165,170012,1264442,708886,6827,16654,WA Department of Environment and Convservation,vulnerable,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,,,,,Arctophoca tropicalis,,,Subantarctic Fur Seal,False,[41753]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1872,153220,866315,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Acacia,deltoidea,,2022-04-07T01:42:14Z,Acacia deltoidea,species,,,,
1711,153064,878695,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Youwanjela,wilsoni,,2021-10-29T14:01:16Z,Youwanjela wilsoni,species,https://eol.org/pages/49878898,,,
1753,153112,898346,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Euphorbia,occidentaustralica,,2020-02-19T16:40:34Z,Euphorbia occidentaustralica,species,,,,
1690,153038,898351,708886,6827,16654,WA Department of Environment and Convservation,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Euphorbia,inappendiculata,,2020-02-19T16:40:42Z,Euphorbia inappendiculata,species,,,,


In [26]:
# Stats
numsensitive = len(alasensitivelist.index)
# numconservation = len(conservationlist.index)
numupdates  = len(updates.index)
numadditions  = len(additions.index)
numnoinatstatus = len(noinatstatus.index)
numunknownToInat = len(unknownToInat.index)
numnotaddedupdated = len(notaddedupdated.index)
numnoncomply = len(noncomply.index)
numcomply = len(statelist.index)
numdupinfo = len(dupinformation.index)
d = {'Sensitive': [numsensitive],
    # 'Conservation': [numconservation],
    'Statelist merge': [numfullstatelist],
    'Species iNat Comply' : [numcomply],
    'Species iNat non-Comply': [numnoncomply],
    'Duplicate Information': [numdupinfo],
    'Updates': [numupdates],
    'Additions': [numadditions],
    'Not added updated': [numnotaddedupdated],
    'No Inat Status': [numnoinatstatus],
    'Unknown to Inat': [numunknownToInat]}

statsdf = pd.DataFrame(data=d)
statsdf

Unnamed: 0,Sensitive,Statelist merge,Species iNat Comply,Species iNat non-Comply,Duplicate Information,Updates,Additions,Not added updated,No Inat Status,Unknown to Inat
0,3895,3895,3895,560,0,873,1897,105,3024,1127
