# iNaturalist status updates by state - VIC

Using the file produced in the collate-status-taxa.ipynb: `inat-aust-status-taxa.csv`, generate lists to update iNaturalist statuses

## Prep - common to all states
1. Read in the inaturalist statuses & filter out this state
2. Read in the inaturalist [taxa list](#inaturalist-taxonomy)
3. Read in the state sensitive and conservation list, concatenate them into a single list
4. Wash the names in the state list through the gbif name parser
5. Attempt to match the state statuses to an IUCN equivalent
6. Determine the best placeID to use for this state

## Next steps:
7. Find Updates and Additions
7.1 Left join the state list with the iNaturalist statuses on scientificName
  * **Match** UPDATE the status (new details, new dept name or url)
  * **No Match** Left join the remainder (noinatstatus) to the inat taxonomy
     * Yes - ADD new status record
     * No - REPORT. Seek synonyms for the taxon, or create species in iNat for critical species

8. Find [Removals](##removals) - Left join the inaturalist statuses with the update list. Report on the remainder.

### 1. iNaturalist statuses

In [94]:
import pandas as pd

#projectdir = "/Users/oco115/PycharmProjects/authoritative-lists/" # basedir for this gh project
projectdir = "/Users/new330/IdeaProjects/authoritative-lists/" # basedir for this gh project
sourcedir = projectdir + "source-data/inaturalist-statuses/"
listdir = projectdir + "current-lists/"

# read in the statuses
taxastatus = pd.read_csv(sourcedir + "inat-aust-status-taxa.csv", encoding='UTF-8',na_filter=False,dtype=str) ## Read inaturalist conservation statuses file
taxastatus.head(3)

Unnamed: 0,id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
0,166449,38493,1138587,7830,,Flora and Fauna Guarantee Act 1988,CR,,,obscured,...,Eulamprus,kosciuskoi,,2021-03-01T10:35:01Z,Eulamprus kosciuskoi,species,http://reptile-database.reptarium.cz/search.ph...,,,
1,234788,918383,702203,9994,,Atlas of Living Australia,NT,https://bie.ala.org.au/species/https://id.biod...,,,...,Chiloschista,phyllorhiza,,2022-01-08T03:30:36Z,Chiloschista phyllorhiza,species,http://www.catalogueoflife.org/annual-checklis...,,,
2,234789,918383,702203,7308,,Atlas of Living Australia,LC,https://bie.ala.org.au/species/https://id.biod...,,,...,Chiloschista,phyllorhiza,,2022-01-08T03:30:36Z,Chiloschista phyllorhiza,species,http://www.catalogueoflife.org/annual-checklis...,,,


In [95]:
def filter_state_statuses(stateregex: str, urlregex: str):
    authoritydf = taxastatus['authority'].drop_duplicates().sort_values()
    authoritydf = authoritydf[pd.Series(authoritydf).str.contains(stateregex)]
    urldf = taxastatus['url'].drop_duplicates().sort_values()
    urldf = urldf[pd.Series(urldf).str.contains(urlregex)]
    placedisplaydf = taxastatus['place_display_name'].drop_duplicates().sort_values()
    placedisplaydf = placedisplaydf[pd.Series(placedisplaydf).str.contains(stateregex)]
    placedf = taxastatus['place_name'].drop_duplicates().sort_values()
    placedf = placedf[pd.Series(placedf).str.contains(stateregex)]
    # concat all and remove duplicates
    statedf = pd.concat([taxastatus.apply(lambda row: row[taxastatus['place_display_name'].isin(placedisplaydf)]),
                         taxastatus.apply(lambda row: row[taxastatus['place_name'].isin(placedf)]),
                         taxastatus.apply(lambda row: row[taxastatus['url'].isin(urldf)]),
                         taxastatus.apply(
                             lambda row: row[taxastatus['authority'].isin(authoritydf)])]).drop_duplicates()
    return statedf.sort_values(['taxon_id', 'user_id'])

inatstatuses = filter_state_statuses(" VIC |Victoria|VICTORIA|Vic","vic.gov.au")
inatstatuses.rename(columns={'id':'status_id','id_y':'taxon_id_y'},inplace=True)
inatstatuses

Unnamed: 0,status_id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
159,264604,100611,3249428,7830,,Victoria Flora and Fauna Guarantee Act 1988,Threatened,https://www.environment.vic.gov.au/conserving-...,,open,...,Euastacus,armatus,,2022-06-06T16:36:21Z,Euastacus armatus,species,http://www.iucnredlist.org/apps/redlist/details,,,
158,264603,100616,3249428,7830,,Victoria Flora and Fauna Guarantee Act 1988,Endangered,https://www.environment.vic.gov.au/conserving-...,,obscured,...,Euastacus,bispinosus,,2022-06-06T16:26:39Z,Euastacus bispinosus,species,http://www.iucnredlist.org/apps/redlist/details,,,
2371,153834,100619,708886,7830,16656,VIC Government,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Euastacus,claytoni,,2020-05-28T05:05:59Z,Euastacus claytoni,species,http://www.iucnredlist.org/apps/redlist/details,,,
2388,153867,100620,708886,7830,16656,VIC Government,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Euastacus,crassus,,2020-05-28T05:04:27Z,Euastacus crassus,species,http://www.iucnredlist.org/apps/redlist/details,,,
3316,265501,100657,3249428,7830,,Flora and Fauna Guarantee Act 1988,Endangered,https://www.environment.vic.gov.au/conserving-...,,open,...,Euastacus,yanga,,2022-06-14T09:17:17Z,Euastacus yanga,species,http://www.iucnredlist.org/apps/redlist/details,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2865,153813,99966,708886,7830,16656,Victoria Flora and Fauna Guarantee Act 1988,Critically Endangered,https://www.environment.vic.gov.au/conserving-...,,obscured,...,Engaeus,sternalis,,2022-06-10T13:58:03Z,Engaeus sternalis,species,http://www.iucnredlist.org/apps/redlist/details,,,
2386,153863,99967,708886,7830,16656,VIC Government,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Engaeus,strictifrons,,2020-05-28T05:03:37Z,Engaeus strictifrons,species,http://www.iucnredlist.org/apps/redlist/details,,,
163,264608,99969,3249428,7830,,Victoria Flora and Fauna Guarantee Act 1988,Endangered,https://www.environment.vic.gov.au/conserving-...,,open,...,Engaeus,tuberculatus,,2022-07-02T08:00:10Z,Engaeus tuberculatus,species,http://www.iucnredlist.org/apps/redlist/details,,,
2484,153828,99970,708886,7830,16656,Victoria Flora and Fauna Guarantee Act 1988,Critically Endangered,https://www.environment.vic.gov.au/conserving-...,,obscured,...,Engaeus,urostrictus,,2022-07-18T14:12:03Z,Engaeus urostrictus,species,http://www.iucnredlist.org/apps/redlist/details,,,


### 2. iNaturalist taxonomy

In [96]:
# Output files contain these fields
# Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id
# so we need to match species from the state lists to the inat taxa to get the taxon_id

import zipfile
url = "https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip"
filename = url.split("/")[-1]

z=zipfile.ZipFile(sourcedir + filename)

with z.open('taxa.csv') as from_archive:
    inattaxa = pd.read_csv(from_archive,dtype=str)
z.close()
inattaxa.head(3)


Unnamed: 0,id,taxonID,identifier,parentNameUsageID,kingdom,phylum,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references
0,1,https://www.inaturalist.org/taxa/1,https://www.inaturalist.org/taxa/1,https://www.inaturalist.org/taxa/48460,Animalia,,,,,,,,2021-11-02T06:05:44Z,Animalia,kingdom,http://www.catalogueoflife.org/annual-checklis...
1,2,https://www.inaturalist.org/taxa/2,https://www.inaturalist.org/taxa/2,https://www.inaturalist.org/taxa/1,Animalia,Chordata,,,,,,,2021-11-23T00:40:18Z,Chordata,phylum,http://www.catalogueoflife.org/annual-checklis...
2,3,https://www.inaturalist.org/taxa/3,https://www.inaturalist.org/taxa/3,https://www.inaturalist.org/taxa/355675,Animalia,Chordata,Aves,,,,,,2022-12-27T07:33:16Z,Aves,class,http://www.catalogueoflife.org/annual-checklis...


### 3. State lists

Get the ALA Conservation and Sensitive lists


In [5]:
 %%script echo skipping # comment this line to download dataset from lists.ala.org.au the web and save locally
import sys
import os
sys.path.append(os.path.abspath(projectdir + "source-code/includes"))
import list_functions as lf

sensitivelist = lf.download_ala_list("https://lists-test.ala.org.au/ws/speciesListItems/dr18669?max=10000&includeKVP=true")
sensitivelist = lf.kvp_to_columns(sensitivelist)
sensitivelist.to_csv(sourcedir + "vic-ala-sensitive.csv", index=False)

conservationlist = lf.download_ala_list("https://lists-test.ala.org.au/ws/speciesListItems/dr655?max=10000&includeKVP=true")
conservationlist = lf.kvp_to_columns(conservationlist)
conservationlist.to_csv(sourcedir + "vic-ala-conservation.csv", index=False)

In [97]:
# Read sensitive list data
sensitivelist = pd.read_csv(sourcedir + "vic-ala-sensitive.csv", dtype=str)
sensitivelist['vba_geoprivacy'] = 'obscured'
sensitivelist

Unnamed: 0,id,name,commonName,scientificName,lsid,dataResourceUid,kvpValues,taxonID,scientificNameAuthority,primaryDiscipline,speciesGroup,ffgactstatus,vicadvisorystatus,restrictedFlag,modified,extractDate,status,sourceStatus,epbcactStatus,vba_geoprivacy
0,2803999,Engaeus australis,Freshwater Crayfish Or Yabby,Engaeus australis,https://biodiversity.org.au/afd/taxa/feada41f-...,dr18669,"[{'key': 'taxonID', 'value': '1686'}, {'key': ...",1686,"Riek, 1969",Aquatic fauna,"Mussels, decapod crustacea",Critically Endangered,Vulnerable,rest,2013-12-18,2023-01-16,Critically Endangered,Critically Endangered,,obscured
1,2804013,Engaeus fultoni,Otway Burrowing Crayfish,Engaeus fultoni,https://biodiversity.org.au/afd/taxa/7994c955-...,dr18669,"[{'key': 'taxonID', 'value': '1674'}, {'key': ...",1674,"Smith & Schuster, 1913",Aquatic fauna,"Mussels, decapod crustacea",Vulnerable,Vulnerable,rest,2013-12-18,2023-01-16,Vulnerable,Vulnerable,,obscured
2,2804088,Engaeus mallacoota,Mallacoota Burrowing Crayfish,Engaeus mallacoota,https://biodiversity.org.au/afd/taxa/bf6f5d52-...,dr18669,"[{'key': 'taxonID', 'value': '1694'}, {'key': ...",1694,"Horwitz, 1990",Aquatic fauna,"Mussels, decapod crustacea",Critically Endangered,Vulnerable,rest,2013-12-17,2023-01-16,Critically Endangered,Critically Endangered,,obscured
3,2804052,Engaeus phyllocercus,Narracan Burrowing Crayfish,Engaeus phyllocercus,https://biodiversity.org.au/afd/taxa/bb2b1f80-...,dr18669,"[{'key': 'taxonID', 'value': '1695'}, {'key': ...",1695,"Smith & Schuster, 1913",Aquatic fauna,"Mussels, decapod crustacea",Endangered,Endangered,rest,2012-11-07,2023-01-16,Endangered,Endangered,,obscured
4,2803985,Engaeus rostrogaleatus,Strzelecki Burrowing Crayfish,Engaeus rostrogaleatus,https://biodiversity.org.au/afd/taxa/cd66d8b6-...,dr18669,"[{'key': 'taxonID', 'value': '1683'}, {'key': ...",1683,"Horwitz, 1990",Aquatic fauna,"Mussels, decapod crustacea",Endangered,Endangered,rest,2012-11-07,2023-01-16,Endangered,Endangered,,obscured
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
131,2804096,Synamphisopus ambiguus,Phreatoic Isopod,Synamphisopus ambiguus,https://biodiversity.org.au/afd/taxa/bc3b9067-...,dr18669,"[{'key': 'taxonID', 'value': '75168'}, {'key':...",75168,"(Sheard, 1936)",Terrestrial fauna,Invertebrates,Vulnerable,Vulnerable,rest,2010-09-16,2023-01-16,Vulnerable,Vulnerable,,obscured
132,2804078,Synamphisopus doegi,Phreatoic Isopod,Synamphisopus doegi,https://biodiversity.org.au/afd/taxa/fdb51ee6-...,dr18669,"[{'key': 'taxonID', 'value': '75169'}, {'key':...",75169,"Wilson & Keable, 2002",Terrestrial fauna,Invertebrates,Vulnerable,Vulnerable,rest,2012-11-20,2023-01-16,Vulnerable,Vulnerable,,obscured
133,2804004,Varanus rosenbergi,Heath Monitor,Varanus rosenbergi,https://biodiversity.org.au/afd/taxa/a01a6bb4-...,dr18669,"[{'key': 'taxonID', 'value': '12287'}, {'key':...",12287,,Terrestrial fauna,Reptiles,Critically Endangered,Endangered,rest,2020-04-14,2023-01-16,Critically Endangered,Critically Endangered,,obscured
134,2804077,Vermicella annulata,Bandy Bandy,Vermicella annulata,https://biodiversity.org.au/afd/taxa/4c2e7ce4-...,dr18669,"[{'key': 'taxonID', 'value': '12734'}, {'key':...",12734,,Terrestrial fauna,Reptiles,Endangered,Vulnerable,rest,2018-08-03,2023-01-16,Endangered,Endangered,,obscured


In [98]:
conservationlist = pd.read_csv(sourcedir + "vic-ala-conservation.csv", dtype=str)
conservationlist['vba_geoprivacy'] = conservationlist['restrictedFlag'].apply(lambda x: 'open' if -pd.isnull(x) else 'obscured')
conservationlist

Unnamed: 0,id,name,commonName,scientificName,lsid,dataResourceUid,kvpValues,taxonID,scientificNameAuthority,primaryDiscipline,...,ffgactstatus,vicadvisorystatus,modified,extractDate,status,sourceStatus,epbcactStatus,restrictedFlag,establishmentMeans,vba_geoprivacy
0,2803160,Ambassis agassizii,Agassiz's Glassfish,Ambassis agassizii,https://biodiversity.org.au/afd/taxa/b0ff773c-...,dr655,"[{'key': 'taxonID', 'value': '4864'}, {'key': ...",4864,"Steindachner, 1867",Aquatic fauna,...,Extinct,Regionally extinct,2013-04-04,2023-01-16,Extinct,Extinct,,,,open
1,2803307,Bidyanus bidyanus,Silver Perch,Bidyanus bidyanus,https://biodiversity.org.au/afd/taxa/05866f31-...,dr655,"[{'key': 'taxonID', 'value': '528544'}, {'key'...",528544,"(Mitchell, 1838)",Aquatic fauna,...,Endangered,Vulnerable,2016-05-24,2023-01-16,Endangered,Endangered,Critically Endangered,,,open
2,2802393,Chelodina expansa,Broad-shelled Turtle,Chelodina (Macrochelodina) expansa,https://biodiversity.org.au/afd/taxa/fc7d0724-...,dr655,"[{'key': 'taxonID', 'value': '5133'}, {'key': ...",5133,"Gray, 1857",Aquatic fauna,...,Endangered,Endangered,2014-11-20,2023-01-16,Endangered,Endangered,,,,open
3,2803025,Craterocephalus fluviatilis,Murray Hardyhead,Craterocephalus fluviatilis,https://biodiversity.org.au/afd/taxa/50568ccf-...,dr655,"[{'key': 'taxonID', 'value': '4784'}, {'key': ...",4784,"McCulloch, 1912",Aquatic fauna,...,Critically Endangered,Critically endangered,2013-04-04,2023-01-16,Critically Endangered,Critically Endangered,Endangered,,,open
4,2802906,Emydura macquarii,Southern River Turtles,Emydura macquarii,https://biodiversity.org.au/afd/taxa/39c22a1e-...,dr655,"[{'key': 'taxonID', 'value': '5135'}, {'key': ...",5135,,Aquatic fauna,...,Critically Endangered,Vulnerable,2013-07-02,2023-01-16,Critically Endangered,Critically Endangered,,,,open
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1994,2802846,Varanus varius,Lace Monitor,Varanus varius,https://biodiversity.org.au/afd/taxa/6338346a-...,dr655,"[{'key': 'taxonID', 'value': '12283'}, {'key':...",12283,,Terrestrial fauna,...,Endangered,Endangered,2013-04-29,2023-01-16,Endangered,Endangered,,,,open
1995,2803060,Vermicella annulata,Bandy Bandy,Vermicella annulata,https://biodiversity.org.au/afd/taxa/4c2e7ce4-...,dr655,"[{'key': 'taxonID', 'value': '12734'}, {'key':...",12734,,Terrestrial fauna,...,Endangered,Vulnerable,2018-08-03,2023-01-16,Endangered,Endangered,,rest,,obscured
1996,2803837,Victaphanta compacta,Otway Black Snail,Victaphanta compacta,https://biodiversity.org.au/afd/taxa/e9582432-...,dr655,"[{'key': 'taxonID', 'value': '15050'}, {'key':...",15050,"(Cox & Hedley, 1912)",Terrestrial fauna,...,Endangered,Endangered,2010-12-02,2023-01-16,Endangered,Endangered,,rest,,obscured
1997,2802531,Xenus cinereus,Terek Sandpiper,Xenus cinereus,https://biodiversity.org.au/afd/taxa/4090ad27-...,dr655,"[{'key': 'taxonID', 'value': '10160'}, {'key':...",10160,,Terrestrial fauna,...,Endangered,Endangered,2010-12-02,2023-01-16,Endangered,Endangered,,,,open


## GBIF name parser

In [99]:
# join them in a way that works for inat (eg sensitive list, geoprivacy = 'obscured'
statelist = pd.concat([sensitivelist[['taxonID', 'name', 'status', 'vba_geoprivacy', 'lsid']],
                       conservationlist[['taxonID', 'name', 'status', 'vba_geoprivacy', 'lsid']]]).drop_duplicates()
statelist

Unnamed: 0,taxonID,name,status,vba_geoprivacy,lsid
0,1686,Engaeus australis,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/feada41f-...
1,1674,Engaeus fultoni,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/7994c955-...
2,1694,Engaeus mallacoota,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/bf6f5d52-...
3,1695,Engaeus phyllocercus,Endangered,obscured,https://biodiversity.org.au/afd/taxa/bb2b1f80-...
4,1683,Engaeus rostrogaleatus,Endangered,obscured,https://biodiversity.org.au/afd/taxa/cd66d8b6-...
...,...,...,...,...,...
1991,13930,Uperoleia martini,Critically Endangered,open,https://biodiversity.org.au/afd/taxa/b814b4c7-...
1992,13151,Uperoleia rugosa,Endangered,open,https://biodiversity.org.au/afd/taxa/b5e6d104-...
1994,12283,Varanus varius,Endangered,open,https://biodiversity.org.au/afd/taxa/6338346a-...
1997,10160,Xenus cinereus,Endangered,open,https://biodiversity.org.au/afd/taxa/4090ad27-...


In [32]:
%%script echo skipping # comment line to run cell
parsednames = lf.gbifparse(statelist)
parsednames.to_csv(sourcedir + "vic-gbif.csv", index=False)

In [100]:
parsednames = pd.read_csv(sourcedir + "vic-gbif.csv")
parsednames

Unnamed: 0,scientificName,type,genusOrAbove,specificEpithet,parsed,parsedPartially,canonicalName,canonicalNameComplete,canonicalNameWithMarker,rankMarker,infraSpecificEpithet,infraGeneric,sensu,bracketAuthorship,remarks,notho,authorship
0,Engaeus australis,SCIENTIFIC,Engaeus,australis,True,False,Engaeus australis,Engaeus australis,Engaeus australis,sp.,,,,,,,
1,Engaeus fultoni,SCIENTIFIC,Engaeus,fultoni,True,False,Engaeus fultoni,Engaeus fultoni,Engaeus fultoni,sp.,,,,,,,
2,Engaeus mallacoota,SCIENTIFIC,Engaeus,mallacoota,True,False,Engaeus mallacoota,Engaeus mallacoota,Engaeus mallacoota,sp.,,,,,,,
3,Engaeus phyllocercus,SCIENTIFIC,Engaeus,phyllocercus,True,False,Engaeus phyllocercus,Engaeus phyllocercus,Engaeus phyllocercus,sp.,,,,,,,
4,Engaeus rostrogaleatus,SCIENTIFIC,Engaeus,rostrogaleatus,True,False,Engaeus rostrogaleatus,Engaeus rostrogaleatus,Engaeus rostrogaleatus,sp.,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2130,Varanus varius,SCIENTIFIC,Varanus,varius,True,False,Varanus varius,Varanus varius,Varanus varius,sp.,,,,,,,
2131,Vermicella annulata,SCIENTIFIC,Vermicella,annulata,True,False,Vermicella annulata,Vermicella annulata,Vermicella annulata,sp.,,,,,,,
2132,Victaphanta compacta,SCIENTIFIC,Victaphanta,compacta,True,False,Victaphanta compacta,Victaphanta compacta,Victaphanta compacta,sp.,,,,,,,
2133,Xenus cinereus,SCIENTIFIC,Xenus,cinereus,True,False,Xenus cinereus,Xenus cinereus,Xenus cinereus,sp.,,,,,,,


In [101]:
#statelist = statelist.merge(parsednames[['scientificName','canonicalName','canonicalNameComplete','type','rankMarker']],how="inner",left_on="name",right_on="scientificName")
statelist = statelist.merge(parsednames,how="left",left_on="name",right_on="scientificName").drop_duplicates()
numfullstatelist = len(statelist.index)
statelist = statelist.rename(columns={'taxonID':'vba_taxonID', 'name':'vba_name','status':'vba_status'})
statelist['vba_scientificName'] = statelist['canonicalName']
statelist

Unnamed: 0,vba_taxonID,vba_name,vba_status,vba_geoprivacy,lsid,scientificName,type,genusOrAbove,specificEpithet,parsed,...,canonicalNameWithMarker,rankMarker,infraSpecificEpithet,infraGeneric,sensu,bracketAuthorship,remarks,notho,authorship,vba_scientificName
0,1686,Engaeus australis,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/feada41f-...,Engaeus australis,SCIENTIFIC,Engaeus,australis,True,...,Engaeus australis,sp.,,,,,,,,Engaeus australis
2,1674,Engaeus fultoni,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/7994c955-...,Engaeus fultoni,SCIENTIFIC,Engaeus,fultoni,True,...,Engaeus fultoni,sp.,,,,,,,,Engaeus fultoni
4,1694,Engaeus mallacoota,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/bf6f5d52-...,Engaeus mallacoota,SCIENTIFIC,Engaeus,mallacoota,True,...,Engaeus mallacoota,sp.,,,,,,,,Engaeus mallacoota
6,1695,Engaeus phyllocercus,Endangered,obscured,https://biodiversity.org.au/afd/taxa/bb2b1f80-...,Engaeus phyllocercus,SCIENTIFIC,Engaeus,phyllocercus,True,...,Engaeus phyllocercus,sp.,,,,,,,,Engaeus phyllocercus
8,1683,Engaeus rostrogaleatus,Endangered,obscured,https://biodiversity.org.au/afd/taxa/cd66d8b6-...,Engaeus rostrogaleatus,SCIENTIFIC,Engaeus,rostrogaleatus,True,...,Engaeus rostrogaleatus,sp.,,,,,,,,Engaeus rostrogaleatus
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2130,13930,Uperoleia martini,Critically Endangered,open,https://biodiversity.org.au/afd/taxa/b814b4c7-...,Uperoleia martini,SCIENTIFIC,Uperoleia,martini,True,...,Uperoleia martini,sp.,,,,,,,,Uperoleia martini
2131,13151,Uperoleia rugosa,Endangered,open,https://biodiversity.org.au/afd/taxa/b5e6d104-...,Uperoleia rugosa,SCIENTIFIC,Uperoleia,rugosa,True,...,Uperoleia rugosa,sp.,,,,,,,,Uperoleia rugosa
2132,12283,Varanus varius,Endangered,open,https://biodiversity.org.au/afd/taxa/6338346a-...,Varanus varius,SCIENTIFIC,Varanus,varius,True,...,Varanus varius,sp.,,,,,,,,Varanus varius
2133,10160,Xenus cinereus,Endangered,open,https://biodiversity.org.au/afd/taxa/4090ad27-...,Xenus cinereus,SCIENTIFIC,Xenus,cinereus,True,...,Xenus cinereus,sp.,,,,,,,,Xenus cinereus


In [102]:
# Identify records that won't comply with iNaturalist species names
noncomply = statelist[statelist['type'].isin(['INFORMAL','CULTIVAR','HYBRID', 'BLACKLISTED']) ]
noncomply

Unnamed: 0,vba_taxonID,vba_name,vba_status,vba_geoprivacy,lsid,scientificName,type,genusOrAbove,specificEpithet,parsed,...,canonicalNameWithMarker,rankMarker,infraSpecificEpithet,infraGeneric,sensu,bracketAuthorship,remarks,notho,authorship,vba_scientificName
132,505589,Caladenia sp. aff. fragrantissima (Central Vic...,Critically Endangered,obscured,ALA_DR490_93,Caladenia sp. aff. fragrantissima (Central Vic...,INFORMAL,Caladenia,,True,...,Caladenia spec.,sp.,,,,,,,,Caladenia spec.
134,505431,Caladenia sp. aff. venusta (Kilsyth South),Critically Endangered,obscured,https://id.biodiversity.org.au/taxon/apni/5139...,Caladenia sp. aff. venusta (Kilsyth South),INFORMAL,Caladenia,,True,...,Caladenia spec.,sp.,,,,,,,,Caladenia spec.
278,903498,Galaxias sp. 14,Critically Endangered,open,https://biodiversity.org.au/afd/taxa/c2bcc474-...,Galaxias sp. 14,INFORMAL,Galaxias,sp.14,True,...,Galaxias sp.14,sp.,,,,,,,,Galaxias sp.14
291,903041,Nannoperca sp. 1,Vulnerable,open,ALA_DR655_1698,Nannoperca sp. 1,INFORMAL,Nannoperca,sp.1,True,...,Nannoperca sp.1,sp.,,,,,,,,Nannoperca sp.1
434,503699,Arthropodium sp. 1 (robust glaucous),Endangered,open,ALA_DR655_657,Arthropodium sp. 1 (robust glaucous),INFORMAL,Arthropodium,sp.1(robust glaucous),True,...,Arthropodium sp.1(robust-glaucous),sp.,,,,,,,,Arthropodium sp.1(robust-glaucous)
450,504122,Astrotricha asperifolia subsp. 2,Endangered,open,https://id.biodiversity.org.au/node/apni/2911958,Astrotricha asperifolia subsp. 2,INFORMAL,Astrotricha,asperifolia,True,...,Astrotricha asperifolia subsp.,subsp.,,,,,,,,Astrotricha asperifolia subsp.
452,505604,Astrotricha linearis subsp. 1,Endangered,open,https://id.biodiversity.org.au/node/apni/2895901,Astrotricha linearis subsp. 1,INFORMAL,Astrotricha,linearis,True,...,Astrotricha linearis subsp.,subsp.,,,,,,,,Astrotricha linearis subsp.
453,505605,Astrotricha linearis subsp. 2,Endangered,open,https://id.biodiversity.org.au/node/apni/2916765,Astrotricha linearis subsp. 2,INFORMAL,Astrotricha,linearis,True,...,Astrotricha linearis subsp.,subsp.,,,,,,,,Astrotricha linearis subsp.
454,505606,Astrotricha parvifolia subsp. 1,Critically Endangered,open,https://id.biodiversity.org.au/node/apni/2903101,Astrotricha parvifolia subsp. 1,INFORMAL,Astrotricha,parvifolia,True,...,Astrotricha parvifolia subsp.,subsp.,,,,,,,,Astrotricha parvifolia subsp.
455,505607,Astrotricha parvifolia subsp. 2,Endangered,open,https://id.biodiversity.org.au/node/apni/2895186,Astrotricha parvifolia subsp. 2,INFORMAL,Astrotricha,parvifolia,True,...,Astrotricha parvifolia subsp.,subsp.,,,,,,,,Astrotricha parvifolia subsp.


In [103]:
# remove records that do not comply
statelist = statelist[~statelist['type'].isin(['INFORMAL','CULTIVAR','HYBRID', 'BLACKLISTED']) ]
statelist = pd.DataFrame(statelist[['vba_taxonID','vba_scientificName','vba_status','vba_geoprivacy','lsid']]).drop_duplicates()
statelist

Unnamed: 0,vba_taxonID,vba_scientificName,vba_status,vba_geoprivacy,lsid
0,1686,Engaeus australis,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/feada41f-...
2,1674,Engaeus fultoni,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/7994c955-...
4,1694,Engaeus mallacoota,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/bf6f5d52-...
6,1695,Engaeus phyllocercus,Endangered,obscured,https://biodiversity.org.au/afd/taxa/bb2b1f80-...
8,1683,Engaeus rostrogaleatus,Endangered,obscured,https://biodiversity.org.au/afd/taxa/cd66d8b6-...
...,...,...,...,...,...
2130,13930,Uperoleia martini,Critically Endangered,open,https://biodiversity.org.au/afd/taxa/b814b4c7-...
2131,13151,Uperoleia rugosa,Endangered,open,https://biodiversity.org.au/afd/taxa/b5e6d104-...
2132,12283,Varanus varius,Endangered,open,https://biodiversity.org.au/afd/taxa/6338346a-...
2133,10160,Xenus cinereus,Endangered,open,https://biodiversity.org.au/afd/taxa/4090ad27-...


In [104]:
# check for duplicates with conflicting information
dupinformation = statelist.groupby('vba_taxonID').filter(lambda x: len(x) > 1)#.sort('size',ascending=False)
dupinformation

Unnamed: 0,vba_taxonID,vba_scientificName,vba_status,vba_geoprivacy,lsid


### 4. Equivalent IUCN statuses

In [105]:
iucn_statuses = {'Not Evaluated', 'Data Deficient', 'Least Concern', 'Near Threatened', 'Vulnerable', 'Endangered', 'Critically Endangered', 'Extinct in the Wild','Extinct'}
statelist.groupby(['vba_status'])['vba_status'].count()

vba_status
Conservation Dependent                 3
Critically Endangered                540
Endangered                          1057
Endangered (Extinct in Victoria)       1
Extinct                               54
Threatened                             4
Vulnerable                           299
Name: vba_status, dtype: int64

In [106]:
# these will be used to populate the iucn_equivalent field
iucnStatusMappings = {
    'conservation dependent': 'Vulnerable',
    'endangered (extinct in victoria)': 'Extinct',
    'threatened':'Vulnerable',
    'least concern':'Least Concern',
    'special least concern':'Least Concern',
    'critically endangered': 'Critically Endangered',
    'endangered': 'Endangered',
    'extinct': 'Extinct',
    'vulnerable': 'Vulnerable'
}

### 5. Determine best place ID to use

In [107]:
inatstatuses.groupby(['place_id','place_name','place_display_name'])['place_id'].count()
# looks like 7830 - note for extract


place_id  place_name    place_display_name
117993    Vic Offshore  Vic Offshore             1
6744      Australia     Australia                2
7830      Victoria      Victoria, AU          1073
Name: place_id, dtype: int64

## Merge iNaturalist statuses with State lists on scientificName

1. Match - updates, even if the statuses are the same we'll update the links and values anyway
2. No match - statuses to be added (additions)
   1.1 No match and no taxnomy - search for synonyms
   1.2 No match
3. Merge the other direction to see if there are deletes?


In [108]:
# join to see which lists already have a status in inaturalist based on scientificName
mergedstatuses = statelist[['vba_taxonID','vba_scientificName','vba_status','vba_geoprivacy','lsid']].merge(inatstatuses[['status_id','scientificName','taxon_id','user_id','description','iucn','authority','status','geoprivacy','place_id','place_display_name']],how="left",left_on='vba_scientificName',right_on='scientificName',suffixes=(None,'_inat')).sort_values(['scientificName'])
mergedstatuses


Unnamed: 0,vba_taxonID,vba_scientificName,vba_status,vba_geoprivacy,lsid,status_id,scientificName,taxon_id,user_id,description,iucn,authority,status,geoprivacy,place_id,place_display_name
189,502094,Abrodictyum caudatum,Endangered,open,https://id.biodiversity.org.au/node/apni/7402200,264614,Abrodictyum caudatum,451374,3249428,,40,Victoria Flora and Fauna Guarantee Act 1988,Endangered,open,7830,"Victoria, AU"
190,500001,Abrotanella nivigena,Critically Endangered,open,https://id.biodiversity.org.au/node/apni/2900512,170090,Abrotanella nivigena,323722,527710,,50,Flora and Fauna Guarantee Act 1988,Critically Endangered,,7830,"Victoria, AU"
192,504199,Abutilon malvifolium,Critically Endangered,open,https://id.biodiversity.org.au/node/apni/2887438,264615,Abutilon malvifolium,323737,3249428,,50,Victoria Flora and Fauna Guarantee Act 1988,Critically Endangered,open,7830,"Victoria, AU"
193,500003,Abutilon otocarpum,Endangered,open,https://id.biodiversity.org.au/node/apni/2892894,264617,Abutilon otocarpum,323731,3249428,,40,Victoria Flora and Fauna Guarantee Act 1988,Endangered,open,7830,"Victoria, AU"
195,500009,Acacia alpina,Endangered,open,https://id.biodiversity.org.au/node/apni/2907301,264618,Acacia alpina,139887,3249428,,40,Victoria Flora and Fauna Guarantee Act 1988,Endangered,open,7830,"Victoria, AU"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1954,10138,Thinornis cucullatus,Vulnerable,open,https://biodiversity.org.au/afd/taxa/1ebf8ec6-...,,,,,,,,,,,
1956,75139,Trapezites luteus luteus,Endangered,open,https://biodiversity.org.au/afd/taxa/fcc2ac7b-...,,,,,,,,,,,
1963,12922,Tympanocryptis pinguicolla,Critically Endangered,open,https://biodiversity.org.au/afd/taxa/5bceebc1-...,,,,,,,,,,,
1965,10253,Tyto tenebricosa,Endangered,open,https://biodiversity.org.au/afd/taxa/645b287c-...,,,,,,,,,,,


In [109]:
# prepare the export fields, common to New template and Update template
# new statuses
# Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id
# updates
# action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username
# url is  a  bie page
biesearchurl = "https://bie.ala.org.au/species/" # eg + "https://id.biodiversity.org.au/node/apni/2894366"
mergedstatuses['new_url'] =  biesearchurl + mergedstatuses['lsid']
# biesearchurl = "https://bie.ala.org.au/species/" # eg + "https://id.biodiversity.org.au/node/apni/2894366"
mergedstatuses['new_description'] = "See https://discover.data.vic.gov.au/dataset/victorian-biodiversity-atlas-vba-taxa-list1"
mergedstatuses['new_authority'] = "Victorian Department of Energy, Environment and Climate Action"
mergedstatuses.rename(columns={'vba_geoprivacy':'new_geoprivacy'},inplace=True)
mergedstatuses['new_place_id'] = '7830'  # Victoria, AU
mergedstatuses['new_username'] = 'peggydnew'
mergedstatuses['new_iucn_equivalent'] = mergedstatuses['status'].str.lower().str.strip().map(iucnStatusMappings).fillna('Vulnerable') # map to dictionary
mergedstatuses['new_status'] = mergedstatuses['vba_status'].fillna('Sensitive')
mergedstatuses

Unnamed: 0,vba_taxonID,vba_scientificName,vba_status,new_geoprivacy,lsid,status_id,scientificName,taxon_id,user_id,description,...,geoprivacy,place_id,place_display_name,new_url,new_description,new_authority,new_place_id,new_username,new_iucn_equivalent,new_status
189,502094,Abrodictyum caudatum,Endangered,open,https://id.biodiversity.org.au/node/apni/7402200,264614,Abrodictyum caudatum,451374,3249428,,...,open,7830,"Victoria, AU",https://bie.ala.org.au/species/https://id.biod...,See https://discover.data.vic.gov.au/dataset/v...,"Victorian Department of Energy, Environment an...",7830,peggydnew,Endangered,Endangered
190,500001,Abrotanella nivigena,Critically Endangered,open,https://id.biodiversity.org.au/node/apni/2900512,170090,Abrotanella nivigena,323722,527710,,...,,7830,"Victoria, AU",https://bie.ala.org.au/species/https://id.biod...,See https://discover.data.vic.gov.au/dataset/v...,"Victorian Department of Energy, Environment an...",7830,peggydnew,Critically Endangered,Critically Endangered
192,504199,Abutilon malvifolium,Critically Endangered,open,https://id.biodiversity.org.au/node/apni/2887438,264615,Abutilon malvifolium,323737,3249428,,...,open,7830,"Victoria, AU",https://bie.ala.org.au/species/https://id.biod...,See https://discover.data.vic.gov.au/dataset/v...,"Victorian Department of Energy, Environment an...",7830,peggydnew,Critically Endangered,Critically Endangered
193,500003,Abutilon otocarpum,Endangered,open,https://id.biodiversity.org.au/node/apni/2892894,264617,Abutilon otocarpum,323731,3249428,,...,open,7830,"Victoria, AU",https://bie.ala.org.au/species/https://id.biod...,See https://discover.data.vic.gov.au/dataset/v...,"Victorian Department of Energy, Environment an...",7830,peggydnew,Endangered,Endangered
195,500009,Acacia alpina,Endangered,open,https://id.biodiversity.org.au/node/apni/2907301,264618,Acacia alpina,139887,3249428,,...,open,7830,"Victoria, AU",https://bie.ala.org.au/species/https://id.biod...,See https://discover.data.vic.gov.au/dataset/v...,"Victorian Department of Energy, Environment an...",7830,peggydnew,Endangered,Endangered
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1954,10138,Thinornis cucullatus,Vulnerable,open,https://biodiversity.org.au/afd/taxa/1ebf8ec6-...,,,,,,...,,,,https://bie.ala.org.au/species/https://biodive...,See https://discover.data.vic.gov.au/dataset/v...,"Victorian Department of Energy, Environment an...",7830,peggydnew,Vulnerable,Vulnerable
1956,75139,Trapezites luteus luteus,Endangered,open,https://biodiversity.org.au/afd/taxa/fcc2ac7b-...,,,,,,...,,,,https://bie.ala.org.au/species/https://biodive...,See https://discover.data.vic.gov.au/dataset/v...,"Victorian Department of Energy, Environment an...",7830,peggydnew,Vulnerable,Endangered
1963,12922,Tympanocryptis pinguicolla,Critically Endangered,open,https://biodiversity.org.au/afd/taxa/5bceebc1-...,,,,,,...,,,,https://bie.ala.org.au/species/https://biodive...,See https://discover.data.vic.gov.au/dataset/v...,"Victorian Department of Energy, Environment an...",7830,peggydnew,Vulnerable,Critically Endangered
1965,10253,Tyto tenebricosa,Endangered,open,https://biodiversity.org.au/afd/taxa/645b287c-...,,,,,,...,,,,https://bie.ala.org.au/species/https://biodive...,See https://discover.data.vic.gov.au/dataset/v...,"Victorian Department of Energy, Environment an...",7830,peggydnew,Vulnerable,Endangered


## Updates

In [110]:
# those that need to be updated - we found a status
mergedstatuses[mergedstatuses['status_id'].notnull()][['vba_scientificName','vba_status','status_id','taxon_id','status','new_geoprivacy','geoprivacy','authority','user_id']]

Unnamed: 0,vba_scientificName,vba_status,status_id,taxon_id,status,new_geoprivacy,geoprivacy,authority,user_id
189,Abrodictyum caudatum,Endangered,264614,451374,Endangered,open,open,Victoria Flora and Fauna Guarantee Act 1988,3249428
190,Abrotanella nivigena,Critically Endangered,170090,323722,Critically Endangered,open,,Flora and Fauna Guarantee Act 1988,527710
192,Abutilon malvifolium,Critically Endangered,264615,323737,Critically Endangered,open,open,Victoria Flora and Fauna Guarantee Act 1988,3249428
193,Abutilon otocarpum,Endangered,264617,323731,Endangered,open,open,Victoria Flora and Fauna Guarantee Act 1988,3249428
195,Acacia alpina,Endangered,264618,139887,Endangered,open,open,Victoria Flora and Fauna Guarantee Act 1988,3249428
...,...,...,...,...,...,...,...,...,...
1729,Zieria cytisoides,Endangered,265458,700296,Endangered,open,open,Flora and Fauna Guarantee Act 1988,3249428
1730,Zieria littoralis,Critically Endangered,264760,896657,Critically Endangered,open,open,Victoria Flora and Fauna Guarantee Act 1988,3249428
1731,Zieria oreocena,Endangered,265459,1092447,Endangered,open,open,Flora and Fauna Guarantee Act 1988,3249428
1732,Zieria robusta,Endangered,265460,973465,Endangered,open,open,Flora and Fauna Guarantee Act 1988,3249428


In [111]:
# updates - create the data frame
# action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
updates = pd.DataFrame(mergedstatuses[mergedstatuses['status_id'].notnull()])
updates.sort_values('scientificName')
updates['action'] = 'UPDATE'
#updates.loc[:,'action'] = 'UPDATE'
updates = updates[['action','scientificName','status_id','taxon_id','new_status','new_iucn_equivalent','new_authority','new_url','new_geoprivacy','new_place_id','new_username','new_description']]
updates.columns = updates.columns.str.replace("new_", "", regex=True)
updates = updates.rename(columns={'scientificName':'taxon_name',
                                  'status_id':'id'})
updates

Unnamed: 0,action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
189,UPDATE,Abrodictyum caudatum,264614,451374,Endangered,Endangered,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
190,UPDATE,Abrotanella nivigena,170090,323722,Critically Endangered,Critically Endangered,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
192,UPDATE,Abutilon malvifolium,264615,323737,Critically Endangered,Critically Endangered,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
193,UPDATE,Abutilon otocarpum,264617,323731,Endangered,Endangered,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
195,UPDATE,Acacia alpina,264618,139887,Endangered,Endangered,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
...,...,...,...,...,...,...,...,...,...,...,...,...
1729,UPDATE,Zieria cytisoides,265458,700296,Endangered,Endangered,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
1730,UPDATE,Zieria littoralis,264760,896657,Critically Endangered,Critically Endangered,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
1731,UPDATE,Zieria oreocena,265459,1092447,Endangered,Endangered,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
1732,UPDATE,Zieria robusta,265460,973465,Endangered,Endangered,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...


## No status in iNaturalist via straight scientificName match
The records that didn't match up to a status in iNaturalist

In [113]:
# to add: those that have no inaturalist status
noinatstatus = mergedstatuses[mergedstatuses['status_id'].isnull()]
# try to match the taxon name to something in inaturalist
noinatstatus = noinatstatus.merge(inattaxa, how="left", left_on="vba_scientificName",right_on="scientificName")
noinatstatus

Unnamed: 0,vba_taxonID,vba_scientificName,vba_status,new_geoprivacy,lsid,status_id,scientificName_x,taxon_id,user_id,description,...,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName_y,taxonRank,references
0,1633,Euastacus bidawalus,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/e5f3cb27-...,,,,,,...,Malacostraca,Decapoda,Parastacidae,Euastacus,bidawalus,,2021-10-07T04:33:00Z,Euastacus bidawalus,species,https://eol.org/pages/55588400
1,1467,Austrogammarus haasei,Endangered,obscured,https://biodiversity.org.au/afd/taxa/bf06e830-...,,,,,,...,,,,,,,,,,
2,75160,Colubotelson joyneri,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/1fb2623d-...,,,,,,...,,,,,,,,,,
3,75161,Colubotelson searli,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/1258c296-...,,,,,,...,,,,,,,,,,
4,621,Hyridella glenelgensis,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/316e9e64-...,,,,,,...,Bivalvia,Unionida,Hyriidae,Hyridella,glenelgensis,,2019-03-05T22:39:03Z,Hyridella glenelgensis,species,http://www.iucnredlist.org/details/58609631
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
915,10138,Thinornis cucullatus,Vulnerable,open,https://biodiversity.org.au/afd/taxa/1ebf8ec6-...,,,,,,...,Aves,Charadriiformes,Charadriidae,Thinornis,cucullatus,,2020-01-11T02:12:54Z,Thinornis cucullatus,species,http://www.birds.cornell.edu/clementschecklist...
916,75139,Trapezites luteus luteus,Endangered,open,https://biodiversity.org.au/afd/taxa/fcc2ac7b-...,,,,,,...,,,,,,,,,,
917,12922,Tympanocryptis pinguicolla,Critically Endangered,open,https://biodiversity.org.au/afd/taxa/5bceebc1-...,,,,,,...,Reptilia,Squamata,Agamidae,Tympanocryptis,pinguicolla,,2018-11-18T00:22:24Z,Tympanocryptis pinguicolla,species,http://www.iucnredlist.org/apps/redlist/detail...
918,10253,Tyto tenebricosa,Endangered,open,https://biodiversity.org.au/afd/taxa/645b287c-...,,,,,,...,Aves,Strigiformes,Tytonidae,Tyto,tenebricosa,,2019-11-23T01:14:52Z,Tyto tenebricosa,species,http://www.birdlife.org/datazone/speciesfactsh...


In [114]:
noinatstatus[noinatstatus['id'].notna()] # there's no status but there is a matching inat taxon (id is the taxon id)
# note: "Dendrobium" matches to both genus and section

Unnamed: 0,vba_taxonID,vba_scientificName,vba_status,new_geoprivacy,lsid,status_id,scientificName_x,taxon_id,user_id,description,...,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName_y,taxonRank,references
0,1633,Euastacus bidawalus,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/e5f3cb27-...,,,,,,...,Malacostraca,Decapoda,Parastacidae,Euastacus,bidawalus,,2021-10-07T04:33:00Z,Euastacus bidawalus,species,https://eol.org/pages/55588400
4,621,Hyridella glenelgensis,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/316e9e64-...,,,,,,...,Bivalvia,Unionida,Hyriidae,Hyridella,glenelgensis,,2019-03-05T22:39:03Z,Hyridella glenelgensis,species,http://www.iucnredlist.org/details/58609631
17,1845,Wundacaenis flabellum,,obscured,https://biodiversity.org.au/afd/taxa/5aa55309-...,,,,,,...,Insecta,Ephemeroptera,Caenidae,Wundacaenis,flabellum,,2022-05-04T04:48:01Z,Wundacaenis flabellum,species,https://biodiversity.org.au/afd/taxa/Wundacaen...
18,505970,Bossiaea vombata,Critically Endangered,obscured,https://id.biodiversity.org.au/node/apni/2905805,,,,,,...,Magnoliopsida,Fabales,Fabaceae,Bossiaea,vombata,,2021-07-28T02:27:43Z,Bossiaea vombata,species,https://eol.org/pages/51503805
19,503664,Caladenia audasii,Critically Endangered,obscured,https://id.biodiversity.org.au/taxon/apni/5139...,,,,,,...,Liliopsida,Asparagales,Orchidaceae,Caladenia,audasii,,2021-07-28T03:46:52Z,Caladenia audasii,species,http://www.catalogueoflife.org/annual-checklis...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
912,10091,Thalassarche cauta,Endangered,open,https://biodiversity.org.au/afd/taxa/c22e8df8-...,,,,,,...,Aves,Procellariiformes,Diomedeidae,Thalassarche,cauta,,2021-05-13T18:21:59Z,Thalassarche cauta,species,http://www.birdlife.org/datazone/speciesfactsh...
913,10090,Thalassarche chrysostoma,Endangered,open,https://biodiversity.org.au/afd/taxa/9428a314-...,,,,,,...,Aves,Procellariiformes,Diomedeidae,Thalassarche,chrysostoma,,2022-03-26T20:29:52Z,Thalassarche chrysostoma,species,http://www.birdlife.org/datazone/speciesfactsh...
915,10138,Thinornis cucullatus,Vulnerable,open,https://biodiversity.org.au/afd/taxa/1ebf8ec6-...,,,,,,...,Aves,Charadriiformes,Charadriidae,Thinornis,cucullatus,,2020-01-11T02:12:54Z,Thinornis cucullatus,species,http://www.birds.cornell.edu/clementschecklist...
917,12922,Tympanocryptis pinguicolla,Critically Endangered,open,https://biodiversity.org.au/afd/taxa/5bceebc1-...,,,,,,...,Reptilia,Squamata,Agamidae,Tympanocryptis,pinguicolla,,2018-11-18T00:22:24Z,Tympanocryptis pinguicolla,species,http://www.iucnredlist.org/apps/redlist/detail...


In [115]:
# there's no status but there is a matching inat taxon (id is the taxon id)
additions = pd.DataFrame(noinatstatus[noinatstatus['id'].notna()])
additions['scientificName'] = additions['vba_scientificName']
#additions['new_status'] = additions['wa_status']
additions.sort_values(['scientificName'])
additions['action'] = 'ADD'
additions = additions[['action','scientificName','status_id','id','new_status','new_iucn_equivalent','new_authority','new_url','new_geoprivacy','new_place_id','new_username','new_description']]
additions.columns = additions.columns.str.replace("new_", "", regex=True)
additions = additions.rename(columns={'scientificName':'taxon_name',
                                      'id':'taxon_id',
                                  'status_id':'id'})
additions

Unnamed: 0,action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
0,ADD,Euastacus bidawalus,,1257313,Vulnerable,Vulnerable,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://biodive...,obscured,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
4,ADD,Hyridella glenelgensis,,432556,Critically Endangered,Vulnerable,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://biodive...,obscured,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
17,ADD,Wundacaenis flabellum,,1389807,Sensitive,Vulnerable,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://biodive...,obscured,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
18,ADD,Bossiaea vombata,,1243845,Critically Endangered,Vulnerable,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,obscured,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
19,ADD,Caladenia audasii,,1247922,Critically Endangered,Vulnerable,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://id.biod...,obscured,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
...,...,...,...,...,...,...,...,...,...,...,...,...
912,ADD,Thalassarche cauta,,4088,Endangered,Vulnerable,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://biodive...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
913,ADD,Thalassarche chrysostoma,,4090,Endangered,Vulnerable,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://biodive...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
915,ADD,Thinornis cucullatus,,144487,Vulnerable,Vulnerable,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://biodive...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...
917,ADD,Tympanocryptis pinguicolla,,73985,Critically Endangered,Vulnerable,"Victorian Department of Energy, Environment an...",https://bie.ala.org.au/species/https://biodive...,open,7830,peggydnew,See https://discover.data.vic.gov.au/dataset/v...


In [116]:
all = pd.concat([updates,additions])
all.to_csv(sourcedir + "vic.csv", index=False )

# Reports
## Statuses with no matching taxon in iNaturalist
Things that didn't match to a taxon:
1.Those that didn't play nicely with the GBIF parser
2.Those that there was no taxa match for.

In [117]:
noncomply

Unnamed: 0,vba_taxonID,vba_name,vba_status,vba_geoprivacy,lsid,scientificName,type,genusOrAbove,specificEpithet,parsed,...,canonicalNameWithMarker,rankMarker,infraSpecificEpithet,infraGeneric,sensu,bracketAuthorship,remarks,notho,authorship,vba_scientificName
132,505589,Caladenia sp. aff. fragrantissima (Central Vic...,Critically Endangered,obscured,ALA_DR490_93,Caladenia sp. aff. fragrantissima (Central Vic...,INFORMAL,Caladenia,,True,...,Caladenia spec.,sp.,,,,,,,,Caladenia spec.
134,505431,Caladenia sp. aff. venusta (Kilsyth South),Critically Endangered,obscured,https://id.biodiversity.org.au/taxon/apni/5139...,Caladenia sp. aff. venusta (Kilsyth South),INFORMAL,Caladenia,,True,...,Caladenia spec.,sp.,,,,,,,,Caladenia spec.
278,903498,Galaxias sp. 14,Critically Endangered,open,https://biodiversity.org.au/afd/taxa/c2bcc474-...,Galaxias sp. 14,INFORMAL,Galaxias,sp.14,True,...,Galaxias sp.14,sp.,,,,,,,,Galaxias sp.14
291,903041,Nannoperca sp. 1,Vulnerable,open,ALA_DR655_1698,Nannoperca sp. 1,INFORMAL,Nannoperca,sp.1,True,...,Nannoperca sp.1,sp.,,,,,,,,Nannoperca sp.1
434,503699,Arthropodium sp. 1 (robust glaucous),Endangered,open,ALA_DR655_657,Arthropodium sp. 1 (robust glaucous),INFORMAL,Arthropodium,sp.1(robust glaucous),True,...,Arthropodium sp.1(robust-glaucous),sp.,,,,,,,,Arthropodium sp.1(robust-glaucous)
450,504122,Astrotricha asperifolia subsp. 2,Endangered,open,https://id.biodiversity.org.au/node/apni/2911958,Astrotricha asperifolia subsp. 2,INFORMAL,Astrotricha,asperifolia,True,...,Astrotricha asperifolia subsp.,subsp.,,,,,,,,Astrotricha asperifolia subsp.
452,505604,Astrotricha linearis subsp. 1,Endangered,open,https://id.biodiversity.org.au/node/apni/2895901,Astrotricha linearis subsp. 1,INFORMAL,Astrotricha,linearis,True,...,Astrotricha linearis subsp.,subsp.,,,,,,,,Astrotricha linearis subsp.
453,505605,Astrotricha linearis subsp. 2,Endangered,open,https://id.biodiversity.org.au/node/apni/2916765,Astrotricha linearis subsp. 2,INFORMAL,Astrotricha,linearis,True,...,Astrotricha linearis subsp.,subsp.,,,,,,,,Astrotricha linearis subsp.
454,505606,Astrotricha parvifolia subsp. 1,Critically Endangered,open,https://id.biodiversity.org.au/node/apni/2903101,Astrotricha parvifolia subsp. 1,INFORMAL,Astrotricha,parvifolia,True,...,Astrotricha parvifolia subsp.,subsp.,,,,,,,,Astrotricha parvifolia subsp.
455,505607,Astrotricha parvifolia subsp. 2,Endangered,open,https://id.biodiversity.org.au/node/apni/2895186,Astrotricha parvifolia subsp. 2,INFORMAL,Astrotricha,parvifolia,True,...,Astrotricha parvifolia subsp.,subsp.,,,,,,,,Astrotricha parvifolia subsp.


In [118]:
# what didnt match to a taxon?
unknownToInat = noinatstatus[noinatstatus['id'].isna()]
unknownToInat

Unnamed: 0,vba_taxonID,vba_scientificName,vba_status,new_geoprivacy,lsid,status_id,scientificName_x,taxon_id,user_id,description,...,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName_y,taxonRank,references
1,1467,Austrogammarus haasei,Endangered,obscured,https://biodiversity.org.au/afd/taxa/bf06e830-...,,,,,,...,,,,,,,,,,
2,75160,Colubotelson joyneri,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/1fb2623d-...,,,,,,...,,,,,,,,,,
3,75161,Colubotelson searli,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/1258c296-...,,,,,,...,,,,,,,,,,
5,2455,Leptoperla kallistae,Critically Endangered,obscured,https://biodiversity.org.au/afd/taxa/7129a62a-...,,,,,,...,,,,,,,,,,
6,3120,Notoperata sparsa,Vulnerable,obscured,https://biodiversity.org.au/afd/taxa/c09f4412-...,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
909,75237,Temognatha tricolorata,Vulnerable,open,https://biodiversity.org.au/afd/taxa/bb3bc0eb-...,,,,,,...,,,,,,,,,,
911,10089,Thalassarche carteri,Endangered,open,https://biodiversity.org.au/afd/taxa/8368ea93-...,,,,,,...,,,,,,,,,,
914,15028,Theclinesthes albocinctus,Endangered,open,https://biodiversity.org.au/afd/taxa/e2b6ac59-...,,,,,,...,,,,,,,,,,
916,75139,Trapezites luteus luteus,Endangered,open,https://biodiversity.org.au/afd/taxa/fcc2ac7b-...,,,,,,...,,,,,,,,,,


In [119]:
noinatstatus[noinatstatus['id'].isna()].groupby('vba_status').size()

vba_status
Critically Endangered     77
Endangered               148
Extinct                   10
Vulnerable                45
dtype: int64

In [120]:
pd.concat([noncomply,unknownToInat]).to_csv(sourcedir + "vic-no-inat-taxa-match.csv",index=False)

In [123]:
# inat statuses that aren't in added or updated
notaddedupdated = inatstatuses[~inatstatuses['taxon_id'].isin(updates['taxon_id'])]
#notaddedupdated = notaddedupdated[notaddedupdated['user_id'] == "708886"]
notaddedupdated.to_csv(sourcedir + "vic-outstanding-inat-statuses.csv")
notaddedupdated

Unnamed: 0,status_id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
2837,264792,1038965,3249428,7830,,Victoria Flora and Fauna Guarantee Act 1988,Critically Endangered,https://www.environment.vic.gov.au/conserving-...,,open,...,,,,,Boronia anemonifolia variabilis,,,,False,[1426173]
705,264648,1064159,3249428,7830,,Victoria Flora and Fauna Guarantee Act 1988,Critically Endangered,https://www.environment.vic.gov.au/conserving-...,,open,...,Alsophila,leichhardtiana,,2022-06-07T16:03:11Z,Alsophila leichhardtiana,species,http://plantsoftheworldonline.org/,,,
60,168028,1084244,702203,7830,,Rare Plants of Victoria,CR,http://www.viridans.com/RAREPL/oncecommon.htm,,,...,Pimelea,spinescens,,2021-05-11T01:21:06Z,Pimelea spinescens,species,,,,
3157,265697,1115629,702203,7830,,Atlas of Living Australia,EN,https://bie.ala.org.au/species/https://id.biod...,,obscured,...,Pterostylis,×,,2022-07-03T07:38:09Z,Pterostylis × toveyana,hybrid,https://bie.ala.org.au/species/https://id.biod...,,,
868,162244,1127952,708886,7830,16656.0,VIC Government,endangered,https://lists.ala.org.au/speciesListItem/list/...,,obscured,...,Suta,spectabilis,,2020-09-11T20:34:41Z,Suta spectabilis,species,,,,
2686,264754,1170290,3249428,7830,,Victoria Flora and Fauna Guarantee Act 1988,Critically Endangered,https://www.environment.vic.gov.au/conserving-...,,open,...,Thelymitra,x,,2022-12-13T05:34:46Z,Thelymitra x merraniae,hybrid,https://vicflora.rbg.vic.gov.au/flora/taxon/a0...,,,
3306,266736,1348449,702203,7830,,FFG Threatened List,CR,https://www.environment.vic.gov.au/__data/asse...,,,...,Cranfillia,deltoides,,2022-01-07T04:56:03Z,Cranfillia deltoides,species,https://www.nzpcn.org.nz/flora/species/cranfil...,,,
2698,153818,19251,708886,7830,16656.0,Victoria Flora and Fauna Guarantee Act 1988,Vulnerable,https://www.environment.vic.gov.au/conserving-...,,obscured,...,Polytelis,anthopeplus,,2022-06-11T01:21:17Z,Polytelis anthopeplus,species,http://www.birdlife.org/datazone/speciesfactsh...,,,
3374,265543,33842,3249428,7830,,Flora and Fauna Guarantee Act 1988,Endangered,https://www.environment.vic.gov.au/conserving-...,,open,...,Rhynchoedura,ornata,,2022-06-14T11:02:11Z,Rhynchoedura ornata,species,http://reptile-database.reptarium.cz/search.ph...,,,
2660,164661,353855,702203,7830,,Victoria,NT,https://bie.ala.org.au/species/https://id.biod...,,,...,Calamagrostis,quadriseta,,2020-12-08T19:17:33Z,Calamagrostis quadriseta,species,http://www.catalogueoflife.org/annual-checklis...,,,


In [122]:
# Stats
numsensitive = len(sensitivelist.index)
numconservation = len(conservationlist.index)
numupdates  = len(updates.index)
numadditions  = len(additions.index)
numnoinatstatus = len(noinatstatus.index)
numunknownToInat = len(unknownToInat.index)
numnotaddedupdated = len(notaddedupdated.index)
numnoncomply = len(noncomply.index)
numcomply = len(statelist.index)
numdupinfo = len(dupinformation.index)
d = {'Sensitive': [numsensitive],
    'Conservation': [numconservation],
    'Statelist merge': [numfullstatelist],
    'Species iNat Comply' : [numcomply],
    'Species iNat non-Comply': [numnoncomply],
    'Duplicate Information': [numdupinfo],
    'Updates': [numupdates],
    'Additions': [numadditions],
    'Not added updated': [numnotaddedupdated],
    'No Inat Status': [numnoinatstatus],
    'Unknown to Inat': [numunknownToInat]}

statsdf = pd.DataFrame(data=d)
statsdf

Unnamed: 0,Sensitive,Conservation,Statelist merge,Species iNat Comply,Species iNat non-Comply,Duplicate Information,Updates,Additions,Not added updated,No Inat Status,Unknown to Inat
0,136,1999,2011,1970,41,0,1051,636,26,920,284
