# iNaturalist status updates by state

Using the file produced in the collate-status-taxa.ipynb: `inat-aust-status-taxa.csv`, generate lists to update iNaturalist statuses

**Next steps:**
State by state establish the changes that need to be made:
    a. new - any new species that appear in the state lists but do not have a status in inaturalist (new template)
    b. updates - any changes to information which was added by us previously (user_id = 708886) (update template, action='UPDATE')
    c. removals - any statuses which were added by us previously (user_id = 708886) list which are incorrect (update template, action='REMOVE')
    d. flags - are there any statuses by other users that need to be flagged?

## Prep - common to all states
1. Read in the inaturalist statuses & filter out this state
2. Read in the inaturalist taxa list
3. Read in the state sensitive list
4. Attempt to match the state statuses to an IUCN equivalent

### download and save locally
    with urllib.request.urlopen("https://data.bionet.nsw.gov.au/biosvcapp/odata/SpeciesNames",context=ssl.create_default_context(cafile=certifi.where())) as url:
        data = json.loads(url.read().decode())
    speciesnames = pd.json_normalize(data, record_path =['value'])
    speciesnames.to_csv(projectDir + 'source-data/nsw/nsw-species-names.csv', index=None)
    speciesnames

### 1. iNaturalist statuses

In [1]:
import pandas as pd
import sys
import os
projectdir = "/Users/new330/IdeaProjects/authoritative-lists/" # basedir for this gh project
#projectdir = "/Users/oco115/PycharmProjects/authoritative-lists/" # basedir for this gh project
sys.path.append(os.path.abspath(projectdir + "source-code/includes"))
import list_functions as lf

sourcedir = projectdir + "source-data/inaturalist-statuses/"
listdir = projectdir + "current-lists/"


# read in the statuses
taxastatus = pd.read_csv(sourcedir + "inat-aust-status-taxa.csv", encoding='UTF-8',na_filter=False,dtype=str) ## Read inaturalist conservation statuses file
# taxastatus.head(3)
taxastatus

Unnamed: 0,id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
0,166449,38493,1138587,7830,,Flora and Fauna Guarantee Act 1988,CR,,,obscured,...,Eulamprus,kosciuskoi,,2021-03-01T10:35:01Z,Eulamprus kosciuskoi,species,http://reptile-database.reptarium.cz/search.ph...,,,
1,234788,918383,702203,9994,,Atlas of Living Australia,NT,https://bie.ala.org.au/species/https://id.biod...,,,...,Chiloschista,phyllorhiza,,2022-01-08T03:30:36Z,Chiloschista phyllorhiza,species,http://www.catalogueoflife.org/annual-checklis...,,,
2,234789,918383,702203,7308,,Atlas of Living Australia,LC,https://bie.ala.org.au/species/https://id.biod...,,,...,Chiloschista,phyllorhiza,,2022-01-08T03:30:36Z,Chiloschista phyllorhiza,species,http://www.catalogueoflife.org/annual-checklis...,,,
3,166416,1033183,3669610,6825,,NSW Office of Environment & Heritage,EN,https://www.environment.nsw.gov.au/threateneds...,,obscured,...,Eidothea,hardeniana,,2021-02-22T07:21:17Z,Eidothea hardeniana,species,,,,
4,180721,1247288,222137,6825,,NSW Threatened Species Scientific Committee,vu,https://www.environment.nsw.gov.au/topics/anim...,,obscured,...,Pomaderris,bodalla,,2021-08-27T06:18:35Z,Pomaderris bodalla,species,https://eol.org/pages/49432063,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3448,165697,1182117,3669610,73684,,Australian Government,CR,http://www.environment.gov.au/biodiversity/thr...,,,...,Achyranthes,margaretarum,,2021-02-12T19:39:39Z,Achyranthes margaretarum,species,http://plantsoftheworldonline.org/taxon/urn:ls...,,,
3449,130966,508985,,,,,critically endangered,http://environment.gov.au/cgi-bin/sprat/public...,,obscured,...,Lichenostomus,melanops,cassidix,2022-06-11T03:51:49Z,Lichenostomus melanops cassidix,subspecies,http://www.birds.cornell.edu/clementschecklist...,,,
3450,161226,50744,516268,,,DAWE Species Profile and Threats Database,CR,https://www.environment.gov.au/cgi-bin/sprat/p...,,obscured,...,,,,,Stiphodon allen,,,Opal Cling-Goby,False,[]
3451,162783,924263,764897,,,"Department of Biodiversity, Conservation and A...",EN,https://www.dpaw.wa.gov.au/images/documents/pl...,,obscured,...,Reedia,spathacea,,2020-09-27T12:27:42Z,Reedia spathacea,species,,,,


In [2]:
def filter_state_statuses(stateregex: str, urlregex: str):
    authoritydf = taxastatus['authority'].drop_duplicates().sort_values()
    authoritydf = authoritydf[pd.Series(authoritydf).str.contains(stateregex)]
    urldf = taxastatus['url'].drop_duplicates().sort_values()
    urldf = urldf[pd.Series(urldf).str.contains(urlregex)]
    placedisplaydf = taxastatus['place_display_name'].drop_duplicates().sort_values()
    placedisplaydf = placedisplaydf[pd.Series(placedisplaydf).str.contains(stateregex)]
    placedf = taxastatus['place_name'].drop_duplicates().sort_values()
    placedf = placedf[pd.Series(placedf).str.contains(stateregex)]
    # concat all and remove duplicates
    statedf = pd.concat([taxastatus.apply(lambda row: row[taxastatus['place_display_name'].isin(placedisplaydf)]),
                         taxastatus.apply(lambda row: row[taxastatus['place_name'].isin(placedf)]),
                         taxastatus.apply(lambda row: row[taxastatus['url'].isin(urldf)]),
                         taxastatus.apply(
                             lambda row: row[taxastatus['authority'].isin(authoritydf)])]).drop_duplicates()
    return statedf.sort_values(['taxon_id', 'user_id'])
inatstatuses = filter_state_statuses("NSW|New South Wales", ".nsw.")
inatstatuses.rename(columns={'id':'status_id','id_y':'taxon_id_y'},inplace=True)
inatstatuses

Unnamed: 0,status_id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
3,166416,1033183,3669610,6825,,NSW Office of Environment & Heritage,EN,https://www.environment.nsw.gov.au/threateneds...,,obscured,...,Eidothea,hardeniana,,2021-02-22T07:21:17Z,Eidothea hardeniana,species,,,,
2656,165059,1054498,58320,6825,,NSW Office of Environment & Heritage,Vulnerable,https://www.environment.nsw.gov.au/threatenedS...,,obscured,...,Prostanthera,cryptandroides,cryptandroides,2020-12-17T03:53:45Z,Prostanthera cryptandroides cryptandroides,subspecies,,,,
477,264941,1061113,3669610,6825,,New South Wales Biodiversity Conservation Act ...,EX,https://bie.ala.org.au/species/https://id.biod...,Presumed Extinct,open,...,Leuzea,australis,,2022-06-11T11:38:00Z,Leuzea australis,species,,,,
371,180988,1070573,3669610,6825,,NSW Office of Environment & Heritage,EN,https://www.environment.nsw.gov.au/threateneds...,,,...,Rotala,tripartita,,2021-09-21T05:48:55Z,Rotala tripartita,species,https://eol.org/pages/49427430,,,
2514,160581,1076814,990532,6825,,NSW Office of Environment & Heritage,Vulnerable,https://www.environment.nsw.gov.au/threateneds...,,obscured,...,Pterostylis,nigricans,,2022-07-12T14:19:22Z,Pterostylis nigricans,species,http://www.catalogueoflife.org/annual-checklis...,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1111,152247,906670,708886,6825,16650,NSW Office of Environment & Heritage,Endangered,https://www.environment.nsw.gov.au/resources/a...,,obscured,...,Caladenia,concolor,,2022-06-10T07:36:31Z,Caladenia concolor,species,http://www.catalogueoflife.org/annual-checklis...,,,
2637,164342,913024,58320,6825,,NSW Office of Environment & Heritage,Vulnerable,https://www.environment.nsw.gov.au/threatenedS...,,,...,Symplocos,baeuerlenii,,2022-03-14T15:27:03Z,Symplocos baeuerlenii,species,,,,
7,167825,953250,702203,6825,,New South Wales Office of Environment and Heri...,VU,https://www.environment.nsw.gov.au/threatenedS...,Vulnerable,,...,Pultenaea,glabra,,2021-05-02T14:08:01Z,Pultenaea glabra,species,http://www.catalogueoflife.org/annual-checklis...,,,
836,155144,966856,708886,6825,16650,NSW Office of Environment & Heritage,endangered,https://www.environment.nsw.gov.au/resources/a...,,obscured,...,Pteris,platyzomopsis,,2019-11-20T05:25:13Z,Pteris platyzomopsis,species,https://eol.org/pages/47172990,,,


### 2. iNaturalist taxonomy

In [3]:
# Output files contain these fields
# Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id
# so we need to match species from the state lists to the inat taxa to get the taxon_id

import zipfile
url = "https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip"
filename = url.split("/")[-1]

z=zipfile.ZipFile(sourcedir + filename)

with z.open('taxa.csv') as from_archive:
    inattaxa = pd.read_csv(from_archive,dtype=str)
z.close()
inattaxa.head(3)


Unnamed: 0,id,taxonID,identifier,parentNameUsageID,kingdom,phylum,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references
0,1,https://www.inaturalist.org/taxa/1,https://www.inaturalist.org/taxa/1,https://www.inaturalist.org/taxa/48460,Animalia,,,,,,,,2021-11-02T06:05:44Z,Animalia,kingdom,http://www.catalogueoflife.org/annual-checklis...
1,2,https://www.inaturalist.org/taxa/2,https://www.inaturalist.org/taxa/2,https://www.inaturalist.org/taxa/1,Animalia,Chordata,,,,,,,2021-11-23T00:40:18Z,Chordata,phylum,http://www.catalogueoflife.org/annual-checklis...
2,3,https://www.inaturalist.org/taxa/3,https://www.inaturalist.org/taxa/3,https://www.inaturalist.org/taxa/355675,Animalia,Chordata,Aves,,,,,,2022-12-27T07:33:16Z,Aves,class,http://www.catalogueoflife.org/annual-checklis...


### 3. State lists

Sensitive list: `geoprivacy` = `obscured`
overrides


In [4]:
# %%script echo skipping # comment this line to download dataset from lists.ala.org.au the web and save locally
#Download lists data. Retrieve binomial and trinomial names from GBIF. Save locally to CSV

sensitivelist = lf.download_ala_list("https://lists.ala.org.au/ws/speciesListItems/dr487?max=10000&includeKVP=true")
sensitivelist = lf.kvp_to_columns(sensitivelist)
sensitivelist.to_csv(sourcedir + "nsw-ala-sensitive.csv", index=False)

conservationlist = lf.download_ala_list("https://lists.ala.org.au/ws/speciesListItems/dr650?max=10000&includeKVP=true")
conservationlist = lf.kvp_to_columns(conservationlist)
conservationlist.to_csv(sourcedir + "nsw-ala-conservation.csv", index=False)

In [15]:
# Read sensitive list data
sensitivelist = pd.read_csv(sourcedir + "nsw-ala-sensitive.csv", dtype=str)
sensitivelist = sensitivelist.rename(columns={'T S Profile I D':'nsw_taxonID',
                                              'status':'nsw_status',
                                              'scientificName':'nsw_scientificName'})
sensitivelist

Unnamed: 0,id,name,commonName,nsw_scientificName,lsid,dataResourceUid,kvpValues,taxonRank,kingdom,class,...,sourceStatus,protectedInNSW,sensitivityClass,nsw_taxonID,countryConservation,dcterms_modified,speciesID,taxonID,generalisation,nsw_status
0,4040135,Callocephalon fimbriatum,Gang-gang Cockatoo,Callocephalon fimbriatum,https://biodiversity.org.au/afd/taxa/6c646af8-...,dr487,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Animalia,Aves,...,Vulnerable,True,Category 3,10975.0,Endangered,2011-11-11T11:23:09+11:00,6,6,1km,Vulnerable
1,4040193,Ninox strenua,Powerful Owl,Ninox (Rhabdoglaux) strenua,https://biodiversity.org.au/afd/taxa/d1c5dee0-...,dr487,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Animalia,Aves,...,Vulnerable,True,Category 3,10562.0,Not Listed,2011-11-11T11:23:06+11:00,162,162,1km,Vulnerable
2,4040029,Tyto tenebricosa,Sooty Owl,Tyto tenebricosa,https://biodiversity.org.au/afd/taxa/645b287c-...,dr487,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Animalia,Aves,...,Vulnerable,True,Category 3,10821.0,Not Listed,2012-07-23T18:04:52.683+10:00,333,333,1km,Vulnerable
3,4040138,Ninox connivens,Barking Owl,Ninox (Hieracoglaux) connivens,https://biodiversity.org.au/afd/taxa/bd332ca4-...,dr487,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Animalia,Aves,...,Vulnerable,True,Category 3,10561.0,Not Listed,2011-11-11T11:23:06+11:00,363,363,1km,Vulnerable
4,4040203,Hoplocephalus bungaroides,Broad-headed Snake,Hoplocephalus bungaroides,https://biodiversity.org.au/afd/taxa/f579a6fc-...,dr487,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Animalia,Reptilia,...,Endangered,True,Category 2,10413.0,Vulnerable,2011-11-11T11:23:04+11:00,390,390,10km,Endangered
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
192,4040083,Anthosachne kingiana subsp. kingiana,Phillip Island Wheat Grass,Anthosachne kingiana subsp. kingiana,https://id.biodiversity.org.au/taxon/apni/5129...,dr487,"[{'key': 'taxonRank', 'value': 'Subspecies'}, ...",Subspecies,Plantae,Flora,...,Critically Endangered,False,Category 3,20148.0,Critically Endangered,2020-01-23T17:14:13.01+11:00,23014,23014,1km,Critically Endangered
193,4040198,Caladenia tensa,Erect Greencomb Spider Orchid,Caladenia tensa,https://id.biodiversity.org.au/taxon/apni/5139...,dr487,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Plantae,Flora,...,Not Listed,True,Category 2,,Not Listed,2020-06-23T19:56:43.85+10:00,23737,23737,10km,Not Listed
194,4040162,Caladenia atroclavia,Black-clubbed Spider-orchid,Caladenia atroclavia,https://id.biodiversity.org.au/taxon/apni/5139...,dr487,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Plantae,Flora,...,Not Listed,True,Category 2,20365.0,Endangered,2020-06-23T19:51:08.733+10:00,23738,23738,10km,Not Listed
195,4040056,Prasophyllum sandrae,Majors Creek Leek Orchid,Prasophyllum sandrae,ALA_DR487_196,dr487,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Plantae,Flora,...,Critically Endangered,False,Category 2,10668.0,Not Listed,2021-12-31T11:53:58.467+11:00,24268,24268,10km,Critically Endangered


In [16]:
categorymapping = {
    "Category 3":"obscured",
    "Category 2":"obscured",
    "Category 1":"private"
}
sensitivelist['bionet_geoprivacy'] = sensitivelist['sensitivityClass'].str.strip().map(categorymapping).fillna('open') # map to dictionary
sensitivelist[['sensitivityClass','bionet_geoprivacy']].drop_duplicates()

Unnamed: 0,sensitivityClass,bionet_geoprivacy
0,Category 3,obscured
4,Category 2,obscured
143,Category 1,private


In [17]:
# Read conservation list data
conservationlist = pd.read_csv(sourcedir + "nsw-ala-conservation.csv", dtype=str)
conservationlist = conservationlist.rename(columns={'T S Profile I D':'nsw_taxonID',
                                              'status':'nsw_status',
                                              'scientificName':'nsw_scientificName'})
conservationlist['bionet_geoprivacy'] = conservationlist['sensitivityClass'].str.strip().map(categorymapping).fillna('open') # map to dictionary
conservationlist

Unnamed: 0,id,name,commonName,nsw_scientificName,lsid,dataResourceUid,kvpValues,taxonRank,kingdom,class,...,sourceStatus,protectedInNSW,sensitivityClass,nsw_taxonID,countryConservation,dcterms_modified,speciesID,taxonID,nsw_status,bionet_geoprivacy
0,4028582,Delma impar,Many-lined Delma,Delma impar,https://biodiversity.org.au/afd/taxa/51c700dd-...,dr650,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Animalia,Reptilia,...,Vulnerable,True,Not Sensitive,10211.0,Vulnerable,2013-12-20T15:06:07.13+11:00,1,1,Vulnerable,open
1,4028454,Callocephalon fimbriatum,Gang-gang Cockatoo,Callocephalon fimbriatum,https://biodiversity.org.au/afd/taxa/6c646af8-...,dr650,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Animalia,Aves,...,Vulnerable,True,Category 3,10975.0,Endangered,2011-11-11T11:23:09+11:00,6,6,Vulnerable,obscured
2,4028239,Cacophis harriettae,White-crowned Snake,Cacophis harriettae,https://biodiversity.org.au/afd/taxa/ace6534f-...,dr650,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Animalia,Reptilia,...,Vulnerable,True,Not Sensitive,10117.0,Not Listed,2015-05-28T10:14:29.083+10:00,31,31,Vulnerable,open
3,4028340,Litoria booroolongensis,Booroolong Frog,Litoria booroolongensis,https://biodiversity.org.au/afd/taxa/6377e038-...,dr650,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Animalia,Amphibia,...,Endangered,True,Not Sensitive,10484.0,Endangered,2011-11-11T11:23:05+11:00,33,33,Endangered,open
4,4028512,Anthochaera phrygia,Regent Honeyeater,Anthochaera (Xanthomyza) phrygia,https://biodiversity.org.au/afd/taxa/31869a0e-...,dr650,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Animalia,Aves,...,Critically Endangered,True,Not Sensitive,10841.0,Critically Endangered,2011-11-11T11:23:09+11:00,53,53,Critically Endangered,open
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1013,4028492,Muehlenbeckia sp. Mt Norman,Scrambling Lignum,Muehlenbeckia sp. Mt Norman,ALA_DR650_1014,dr650,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Plantae,Flora,...,Vulnerable,False,Not Sensitive,10547.0,Not Listed,2021-03-18T10:54:03.1+11:00,24272,24272,Vulnerable,open
1014,4028995,Lobelia claviflora,,Lobelia claviflora,ALA_DR650_1015,dr650,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Plantae,Flora,...,Critically Endangered,False,Not Sensitive,20375.0,Not Listed,2021-09-24T16:27:42.707+10:00,24282,24282,Critically Endangered,open
1015,4029167,Fontainea sp. Coffs Harbour,,Fontainea sp. Coffs Harbour,ALA_DR650_1016,dr650,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Plantae,Flora,...,Critically Endangered,False,Not Sensitive,20376.0,Not Listed,2021-10-15T10:15:24.41+11:00,24283,24283,Critically Endangered,open
1016,4029112,Melichrus gibberagee,Narrow-leaf Melichrus,Melichrus gibberagee,ALA_DR650_1017,dr650,"[{'key': 'taxonRank', 'value': 'Species'}, {'k...",Species,Plantae,Flora,...,Endangered,False,Not Sensitive,10521.0,Endangered,2021-10-22T14:59:20.563+11:00,24284,24284,Endangered,open


In [19]:
# join them in a way that works for inat (eg sensitive list, geoprivacy = 'obscured'
statelist = pd.concat([sensitivelist[['nsw_taxonID','nsw_scientificName','nsw_status','bionet_geoprivacy','kingdom','class','family']],
                    conservationlist[['nsw_taxonID','nsw_scientificName','nsw_status','bionet_geoprivacy','kingdom','class','family']]]).drop_duplicates()
statelist['nsw_taxonID'] = statelist['nsw_taxonID'].str[:-2]
statelist

Unnamed: 0,nsw_taxonID,nsw_scientificName,nsw_status,bionet_geoprivacy,kingdom,class,family
0,10975,Callocephalon fimbriatum,Vulnerable,obscured,Animalia,Aves,Cacatuidae
1,10562,Ninox (Rhabdoglaux) strenua,Vulnerable,obscured,Animalia,Aves,Strigidae
2,10821,Tyto tenebricosa,Vulnerable,obscured,Animalia,Aves,Tytonidae
3,10561,Ninox (Hieracoglaux) connivens,Vulnerable,obscured,Animalia,Aves,Strigidae
4,10413,Hoplocephalus bungaroides,Endangered,obscured,Animalia,Reptilia,Elapidae
...,...,...,...,...,...,...,...
1013,10547,Muehlenbeckia sp. Mt Norman,Vulnerable,open,Plantae,Flora,Polygonaceae
1014,20375,Lobelia claviflora,Critically Endangered,open,Plantae,Flora,Campanulaceae
1015,20376,Fontainea sp. Coffs Harbour,Critically Endangered,open,Plantae,Flora,Euphorbiaceae
1016,10521,Melichrus gibberagee,Endangered,open,Plantae,Flora,Ericaceae


In [20]:
# retrieve binomial and trinomial names from GBIF
statelist = statelist.rename(columns={'nsw_scientificName':'name'})
parsednames = lf.gbifparse(statelist)
parsednames.to_csv(sourcedir + "nsw-gbif.csv", index=False)

In [21]:
parsednames = pd.read_csv(sourcedir + "nsw-gbif.csv")
statelist = statelist.merge(parsednames[['scientificName','canonicalName','canonicalNameComplete','type','rankMarker']],how="left",left_on="name",right_on="scientificName")
numfullstatelist = len(statelist.index)
conservationlist = conservationlist.rename(columns={'name':'nsw_scientificName'})
statelist['nsw_scientificName'] = statelist['canonicalName']
statelist

Unnamed: 0,nsw_taxonID,name,nsw_status,bionet_geoprivacy,kingdom,class,family,scientificName,canonicalName,canonicalNameComplete,type,rankMarker,nsw_scientificName
0,10975,Callocephalon fimbriatum,Vulnerable,obscured,Animalia,Aves,Cacatuidae,Callocephalon fimbriatum,Callocephalon fimbriatum,Callocephalon fimbriatum,SCIENTIFIC,sp.,Callocephalon fimbriatum
1,10562,Ninox (Rhabdoglaux) strenua,Vulnerable,obscured,Animalia,Aves,Strigidae,Ninox (Rhabdoglaux) strenua,Ninox strenua,Ninox strenua,SCIENTIFIC,sp.,Ninox strenua
2,10821,Tyto tenebricosa,Vulnerable,obscured,Animalia,Aves,Tytonidae,Tyto tenebricosa,Tyto tenebricosa,Tyto tenebricosa,SCIENTIFIC,sp.,Tyto tenebricosa
3,10561,Ninox (Hieracoglaux) connivens,Vulnerable,obscured,Animalia,Aves,Strigidae,Ninox (Hieracoglaux) connivens,Ninox connivens,Ninox connivens,SCIENTIFIC,sp.,Ninox connivens
4,10413,Hoplocephalus bungaroides,Endangered,obscured,Animalia,Reptilia,Elapidae,Hoplocephalus bungaroides,Hoplocephalus bungaroides,Hoplocephalus bungaroides,SCIENTIFIC,sp.,Hoplocephalus bungaroides
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1018,10547,Muehlenbeckia sp. Mt Norman,Vulnerable,open,Plantae,Flora,Polygonaceae,Muehlenbeckia sp. Mt Norman,Muehlenbeckia sp.Mt,Muehlenbeckia sp.Mt,INFORMAL,sp.,Muehlenbeckia sp.Mt
1019,20375,Lobelia claviflora,Critically Endangered,open,Plantae,Flora,Campanulaceae,Lobelia claviflora,Lobelia claviflora,Lobelia claviflora,SCIENTIFIC,sp.,Lobelia claviflora
1020,20376,Fontainea sp. Coffs Harbour,Critically Endangered,open,Plantae,Flora,Euphorbiaceae,Fontainea sp. Coffs Harbour,Fontainea sp.Coffs,Fontainea sp.Coffs,INFORMAL,sp.,Fontainea sp.Coffs
1021,10521,Melichrus gibberagee,Endangered,open,Plantae,Flora,Ericaceae,Melichrus gibberagee,Melichrus gibberagee,Melichrus gibberagee,SCIENTIFIC,sp.,Melichrus gibberagee


In [23]:
# Identify records that won't comply with iNaturalist species names
noncomply = statelist[statelist['type'].isin(['INFORMAL','CULTIVAR','HYBRID', 'BLACKLISTED']) ]
noncomply

Unnamed: 0,nsw_taxonID,name,nsw_status,bionet_geoprivacy,kingdom,class,family,scientificName,canonicalName,canonicalNameComplete,type,rankMarker,nsw_scientificName
110,10706,Pterostylis sp. Botany Bay,Endangered,obscured,Plantae,Flora,Orchidaceae,Pterostylis sp. Botany Bay,Pterostylis sp.Botany,Pterostylis sp.Botany,INFORMAL,sp.,Pterostylis sp.Botany
130,10718,Pultenaea sp. Olinda,Endangered,obscured,Plantae,Flora,Fabaceae (Faboideae),Pultenaea sp. Olinda,Pultenaea sp.Olinda,Pultenaea sp.Olinda,INFORMAL,sp.,Pultenaea sp.Olinda
132,10818,Typhonium sp. aff. brownii,Endangered,obscured,Plantae,Flora,Araceae,Typhonium sp. aff. brownii,Typhonium spec.,Typhonium spec.,INFORMAL,sp.,Typhonium spec.
144,10242,"Diuris sp. (Oaklands, D.L. Jones 5380)",Endangered,obscured,Plantae,Flora,Orchidaceae,"Diuris sp. (Oaklands, D.L. Jones 5380)",Diuris spec.,Diuris spec.,INFORMAL,sp.,Diuris spec.
162,10717,Pultenaea sp. Genowlan Point (NSW 417813),Critically Endangered,obscured,Plantae,Flora,Fabaceae (Faboideae),Pultenaea sp. Genowlan Point (NSW 417813),Pultenaea spec.,Pultenaea spec. Genowlan Point,INFORMAL,sp.,Pultenaea spec.
171,20103,Prasophyllum sp. Moama,Critically Endangered,obscured,Plantae,Flora,Orchidaceae,Prasophyllum sp. Moama,Prasophyllum sp.Moama,Prasophyllum sp.Moama,INFORMAL,sp.,Prasophyllum sp.Moama
181,20266,Genoplesium sp. Charmhaven (NSW 896673),Critically Endangered,obscured,Plantae,Flora,Orchidaceae,Genoplesium sp. Charmhaven (NSW 896673),Genoplesium spec.,Genoplesium spec. Charmhaven,INFORMAL,sp.,Genoplesium spec.
829,10723,Samadera sp. Moonee Creek (J.King s.n. Nov. 1949),Endangered,open,Plantae,Flora,Simaroubaceae,Samadera sp. Moonee Creek (J.King s.n. Nov. 1949),Samadera sp.Moonee,Samadera sp.Moonee,INFORMAL,sp.,Samadera sp.Moonee
920,10625,Phyllodes imperialis southern subspecies,Endangered,open,Animalia,Insecta,Noctuidae,Phyllodes imperialis southern subspecies,Phyllodes imperialis subspecies,Phyllodes imperialis subspecies,BLACKLISTED,infrasubsp.,Phyllodes imperialis subspecies
936,10317,Eucalyptus sp. Cattai,Critically Endangered,open,Plantae,Flora,Myrtaceae,Eucalyptus sp. Cattai,Eucalyptus sp.Cattai,Eucalyptus sp.Cattai,INFORMAL,sp.,Eucalyptus sp.Cattai


In [24]:
# remove records that do not comply
statelist = statelist[~statelist['type'].isin(['INFORMAL','CULTIVAR','HYBRID', 'BLACKLISTED']) ]
statelist = pd.DataFrame(statelist[['nsw_taxonID','nsw_scientificName','nsw_status','bionet_geoprivacy']]).drop_duplicates()
statelist

Unnamed: 0,nsw_taxonID,nsw_scientificName,nsw_status,bionet_geoprivacy
0,10975,Callocephalon fimbriatum,Vulnerable,obscured
1,10562,Ninox strenua,Vulnerable,obscured
2,10821,Tyto tenebricosa,Vulnerable,obscured
3,10561,Ninox connivens,Vulnerable,obscured
4,10413,Hoplocephalus bungaroides,Endangered,obscured
...,...,...,...,...
1014,20369,Leionema westonii,Critically Endangered,open
1015,10803,Thinornis cucullatus cucullatus,Critically Endangered,open
1017,10111,Epacris gnidioides,Vulnerable,open
1019,20375,Lobelia claviflora,Critically Endangered,open


In [25]:
parsednames['type'].unique()

#cols for debugging

array(['SCIENTIFIC', 'INFORMAL', 'BLACKLISTED'], dtype=object)

In [26]:
# check for duplicates with conflicting information
dupinformation = statelist.groupby('nsw_taxonID').filter(lambda x: len(x) > 1)#.sort('size',ascending=False)
dupinformation
#df.groupby('hash').filter(lambda group: len(group) > 1).sort('size', ascending=False)

Unnamed: 0,nsw_taxonID,nsw_scientificName,nsw_status,bionet_geoprivacy


In [245]:
# fix duplicate

### 4. Equivalent IUCN statuses

In [27]:
# iucn_statuses = {'Not Evaluated', 'Data Deficient', 'Least Concern', 'Near Threatened', 'Vulnerable', 'Endangered', 'Critically Endangered', 'Extinct in the Wild', 'Extinct'}
statelist.groupby(['nsw_status'])['nsw_status'].count()

nsw_status
Critically Endangered    106
Endangered               414
Extinct                   72
Not Listed                 5
Vulnerable               405
Name: nsw_status, dtype: int64

In [29]:
iucnStatusMappings = {
    'critically endangered':'Critically Endangered',
    'endangered':'Endangered',
    'extinct':'Extinct',
    'not listed': 'Not Evaluated',
    'vulnerable': 'Vulnerable'
}


### 5. Determine best place ID to use

In [30]:
inatstatuses.groupby(['place_id','place_name','place_display_name'])['place_id'].count()
# looks like 6825

place_id  place_name       place_display_name 
6744      Australia        Australia                5
6825      New South Wales  New South Wales, AU    161
Name: place_id, dtype: int64

## Merge iNaturalist statuses with State sensitive list on scientificName

1. Match - updates, even if the statuses are the same we'll update the links and values anyway
2. No match - statuses to be added (additions)
   1.1 No match and no taxnomy - search for synonyms
   1.2 No match
3. Merge the other direction to see if there are deletes?


In [31]:
# join to see which lists already have a status in inaturalist based on scientificName
mergedstatuses = statelist[['nsw_scientificName', 'nsw_taxonID','nsw_status','bionet_geoprivacy']].merge(inatstatuses[['status_id','scientificName','taxon_id','user_id','description','iucn','authority','status','geoprivacy','place_id','place_display_name']],how="left",left_on='nsw_scientificName',right_on='scientificName',suffixes=(None,'_inat')).sort_values(['scientificName'])
mergedstatuses

Unnamed: 0,nsw_scientificName,nsw_taxonID,nsw_status,bionet_geoprivacy,status_id,scientificName,taxon_id,user_id,description,iucn,authority,status,geoprivacy,place_id,place_display_name
138,Acacia atrox,10003,Endangered,obscured,152295,Acacia atrox,898643,708886,,40,NSW Office of Environment & Heritage,endangered,obscured,6825,"New South Wales, AU"
617,Acacia baueri aspera,10005,Vulnerable,open,159920,Acacia baueri aspera,1111698,702203,,30,New South Wales,VU,,6825,"New South Wales, AU"
97,Acacia dangarensis,20028,Critically Endangered,obscured,152286,Acacia dangarensis,775137,708886,,40,NSW Office of Environment & Heritage,endangered,obscured,6825,"New South Wales, AU"
602,Acacia pubifolia,10024,Endangered,open,162893,Acacia pubifolia,897779,702203,,40,New South Wales,EN,,6825,"New South Wales, AU"
441,Acacia ruppii,10027,Endangered,open,162887,Acacia ruppii,578518,702203,,40,New South Wales,EN,,6825,"New South Wales, AU"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1000,Leionema westonii,20369,Critically Endangered,open,,,,,,,,,,,
1001,Thinornis cucullatus cucullatus,10803,Critically Endangered,open,,,,,,,,,,,
1002,Epacris gnidioides,10111,Vulnerable,open,,,,,,,,,,,
1003,Lobelia claviflora,20375,Critically Endangered,open,,,,,,,,,,,


In [32]:
# prepare the export fields, common to New template and Update template
# new statuses
# Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id
# updates
# action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username
mergedstatuses['new_authority'] = "New South Wales Office of Environment and Heritage"
mergedstatuses['new_description'] = "Listed as Threatened - refer to https://www.environment.nsw.gov.au/threatenedspeciesapp"
baseurl = "https://www.environment.nsw.gov.au/threatenedspeciesapp/profile.aspx?id="
mergedstatuses['new_url'] = baseurl + mergedstatuses['nsw_taxonID']
mergedstatuses.rename(columns={'bionet_geoprivacy':'new_geoprivacy'},inplace=True)
mergedstatuses['new_place_id'] = '6825'  # NEW SOUTH WALES
mergedstatuses['new_username'] = 'peggydnew'
mergedstatuses['new_iucn_equivalent'] = mergedstatuses['status'].str.lower().str.strip().map(iucnStatusMappings).fillna('Vulnerable') # map to dictionary
mergedstatuses['new_status'] = mergedstatuses['nsw_status'].fillna('Threatened')
mergedstatuses

Unnamed: 0,nsw_scientificName,nsw_taxonID,nsw_status,new_geoprivacy,status_id,scientificName,taxon_id,user_id,description,iucn,...,geoprivacy,place_id,place_display_name,new_authority,new_description,new_url,new_place_id,new_username,new_iucn_equivalent,new_status
138,Acacia atrox,10003,Endangered,obscured,152295,Acacia atrox,898643,708886,,40,...,obscured,6825,"New South Wales, AU",New South Wales Office of Environment and Heri...,Listed as Threatened - refer to https://www.en...,https://www.environment.nsw.gov.au/threateneds...,6825,peggydnew,Endangered,Endangered
617,Acacia baueri aspera,10005,Vulnerable,open,159920,Acacia baueri aspera,1111698,702203,,30,...,,6825,"New South Wales, AU",New South Wales Office of Environment and Heri...,Listed as Threatened - refer to https://www.en...,https://www.environment.nsw.gov.au/threateneds...,6825,peggydnew,Vulnerable,Vulnerable
97,Acacia dangarensis,20028,Critically Endangered,obscured,152286,Acacia dangarensis,775137,708886,,40,...,obscured,6825,"New South Wales, AU",New South Wales Office of Environment and Heri...,Listed as Threatened - refer to https://www.en...,https://www.environment.nsw.gov.au/threateneds...,6825,peggydnew,Endangered,Critically Endangered
602,Acacia pubifolia,10024,Endangered,open,162893,Acacia pubifolia,897779,702203,,40,...,,6825,"New South Wales, AU",New South Wales Office of Environment and Heri...,Listed as Threatened - refer to https://www.en...,https://www.environment.nsw.gov.au/threateneds...,6825,peggydnew,Vulnerable,Endangered
441,Acacia ruppii,10027,Endangered,open,162887,Acacia ruppii,578518,702203,,40,...,,6825,"New South Wales, AU",New South Wales Office of Environment and Heri...,Listed as Threatened - refer to https://www.en...,https://www.environment.nsw.gov.au/threateneds...,6825,peggydnew,Vulnerable,Endangered
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1000,Leionema westonii,20369,Critically Endangered,open,,,,,,,...,,,,New South Wales Office of Environment and Heri...,Listed as Threatened - refer to https://www.en...,https://www.environment.nsw.gov.au/threateneds...,6825,peggydnew,Vulnerable,Critically Endangered
1001,Thinornis cucullatus cucullatus,10803,Critically Endangered,open,,,,,,,...,,,,New South Wales Office of Environment and Heri...,Listed as Threatened - refer to https://www.en...,https://www.environment.nsw.gov.au/threateneds...,6825,peggydnew,Vulnerable,Critically Endangered
1002,Epacris gnidioides,10111,Vulnerable,open,,,,,,,...,,,,New South Wales Office of Environment and Heri...,Listed as Threatened - refer to https://www.en...,https://www.environment.nsw.gov.au/threateneds...,6825,peggydnew,Vulnerable,Vulnerable
1003,Lobelia claviflora,20375,Critically Endangered,open,,,,,,,...,,,,New South Wales Office of Environment and Heri...,Listed as Threatened - refer to https://www.en...,https://www.environment.nsw.gov.au/threateneds...,6825,peggydnew,Vulnerable,Critically Endangered


## Updates

In [33]:
# updates
# action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
updates = pd.DataFrame(mergedstatuses[mergedstatuses['status_id'].notnull()])
updates.sort_values('scientificName')
updates['action'] = 'UPDATE'
#updates.loc[:,'action'] = 'UPDATE'
updates = updates[['action','scientificName','status_id','taxon_id','new_status','new_iucn_equivalent','new_authority','new_url','new_geoprivacy','new_place_id','new_username','new_description']]
updates.columns = updates.columns.str.replace("new_", "", regex=True)
updates = updates.rename(columns={'scientificName':'taxon_name',
                                  'status_id':'id'})
updates

Unnamed: 0,action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
138,UPDATE,Acacia atrox,152295,898643,Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Threatened - refer to https://www.en...
617,UPDATE,Acacia baueri aspera,159920,1111698,Vulnerable,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,open,6825,peggydnew,Listed as Threatened - refer to https://www.en...
97,UPDATE,Acacia dangarensis,152286,775137,Critically Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Threatened - refer to https://www.en...
602,UPDATE,Acacia pubifolia,162893,897779,Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,open,6825,peggydnew,Listed as Threatened - refer to https://www.en...
441,UPDATE,Acacia ruppii,162887,578518,Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,open,6825,peggydnew,Listed as Threatened - refer to https://www.en...
...,...,...,...,...,...,...,...,...,...,...,...,...
2,UPDATE,Tyto tenebricosa,152244,20422,Vulnerable,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Threatened - refer to https://www.en...
111,UPDATE,Viola cleistogamoides,152321,566603,Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Threatened - refer to https://www.en...
141,UPDATE,Wollemia nobilis,152312,49381,Critically Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,private,6825,peggydnew,Listed as Threatened - refer to https://www.en...
698,UPDATE,Zieria involucrata,266172,604220,Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,open,6825,peggydnew,Listed as Threatened - refer to https://www.en...


## No status in iNaturalist via straight scientificName match
The NSW records that didn't match up to a status in iNaturalist

In [34]:
# to add: those that have no inaturalist status - 532!!
noinatstatus = mergedstatuses[mergedstatuses['status_id'].isnull()]
# try to match the taxon name to something in inaturalist
noinatstatus = noinatstatus.merge(inattaxa, how="left", left_on="nsw_scientificName",right_on="scientificName")
noinatstatus

Unnamed: 0,nsw_scientificName,nsw_taxonID,nsw_status,new_geoprivacy,status_id,scientificName_x,taxon_id,user_id,description,iucn,...,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName_y,taxonRank,references
0,Cyclopsitta diophthalma coxeni,10195,Critically Endangered,obscured,,,,,,,...,Aves,Psittaciformes,Psittaculidae,Cyclopsitta,diophthalma,coxeni,2021-11-24T00:57:49Z,Cyclopsitta diophthalma coxeni,subspecies,http://eol.org/pages/1279512
1,Polytelis anthopeplus monarchoides,10644,Endangered,obscured,,,,,,,...,Aves,Psittaciformes,Psittaculidae,Polytelis,anthopeplus,monarchoides,2019-09-08T01:35:11Z,Polytelis anthopeplus monarchoides,subspecies,
2,Genoplesium baueri,10875,Endangered,obscured,,,,,,,...,Liliopsida,Asparagales,Orchidaceae,Genoplesium,baueri,,2020-02-18T23:46:33Z,Genoplesium baueri,species,http://www.catalogueoflife.org/annual-checklis...
3,Lindsaea fraseri,10481,Endangered,obscured,,,,,,,...,Polypodiopsida,Polypodiales,Lindsaeaceae,Lindsaea,fraseri,,2020-11-16T20:28:38Z,Lindsaea fraseri,species,https://eol.org/pages/52187882
4,Callistemon linearifolius,10129,Vulnerable,obscured,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
851,Leionema westonii,20369,Critically Endangered,open,,,,,,,...,,,,,,,,,,
852,Thinornis cucullatus cucullatus,10803,Critically Endangered,open,,,,,,,...,,,,,,,,,,
853,Epacris gnidioides,10111,Vulnerable,open,,,,,,,...,Magnoliopsida,Ericales,Ericaceae,Epacris,gnidioides,,2020-02-09T06:28:44Z,Epacris gnidioides,species,
854,Lobelia claviflora,20375,Critically Endangered,open,,,,,,,...,,,,,,,,,,


In [35]:
additions = pd.DataFrame(noinatstatus[noinatstatus['id'].notna()])
additions

Unnamed: 0,nsw_scientificName,nsw_taxonID,nsw_status,new_geoprivacy,status_id,scientificName_x,taxon_id,user_id,description,iucn,...,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName_y,taxonRank,references
0,Cyclopsitta diophthalma coxeni,10195,Critically Endangered,obscured,,,,,,,...,Aves,Psittaciformes,Psittaculidae,Cyclopsitta,diophthalma,coxeni,2021-11-24T00:57:49Z,Cyclopsitta diophthalma coxeni,subspecies,http://eol.org/pages/1279512
1,Polytelis anthopeplus monarchoides,10644,Endangered,obscured,,,,,,,...,Aves,Psittaciformes,Psittaculidae,Polytelis,anthopeplus,monarchoides,2019-09-08T01:35:11Z,Polytelis anthopeplus monarchoides,subspecies,
2,Genoplesium baueri,10875,Endangered,obscured,,,,,,,...,Liliopsida,Asparagales,Orchidaceae,Genoplesium,baueri,,2020-02-18T23:46:33Z,Genoplesium baueri,species,http://www.catalogueoflife.org/annual-checklis...
3,Lindsaea fraseri,10481,Endangered,obscured,,,,,,,...,Polypodiopsida,Polypodiales,Lindsaeaceae,Lindsaea,fraseri,,2020-11-16T20:28:38Z,Lindsaea fraseri,species,https://eol.org/pages/52187882
5,Prostanthera marifolia,20101,Critically Endangered,obscured,,,,,,,...,Magnoliopsida,Lamiales,Lamiaceae,Prostanthera,marifolia,,2020-02-19T17:43:25Z,Prostanthera marifolia,species,https://eol.org/pages/5383100
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
846,Homoranthus elusus,20362,Critically Endangered,open,,,,,,,...,Magnoliopsida,Myrtales,Myrtaceae,Homoranthus,elusus,,2021-07-28T03:28:41Z,Homoranthus elusus,species,https://eol.org/pages/52550126
848,Egernia roomi,20370,Critically Endangered,open,,,,,,,...,Reptilia,Squamata,Scincidae,Egernia,roomi,,2020-02-11T02:17:51Z,Egernia roomi,species,
849,Tympanocryptis osbornei,20380,Endangered,open,,,,,,,...,Reptilia,Squamata,Agamidae,Tympanocryptis,osbornei,,2019-11-17T06:04:05Z,Tympanocryptis osbornei,species,
850,Tympanocryptis mccartneyi,20379,Critically Endangered,open,,,,,,,...,Reptilia,Squamata,Agamidae,Tympanocryptis,mccartneyi,,2019-11-17T06:04:04Z,Tympanocryptis mccartneyi,species,


In [36]:
# there's no status but there is a matching inat taxon (id is the taxon id)
additions = pd.DataFrame(noinatstatus[noinatstatus['id'].notna()])
additions.sort_values(['nsw_scientificName'])
additions['action'] = 'ADD'
additions = additions[['action','nsw_scientificName','status_id','id','new_status','new_iucn_equivalent','new_authority','new_url','new_geoprivacy','new_place_id','new_username','new_description']]
additions.columns = additions.columns.str.replace("new_", "", regex=True)
additions = additions.rename(columns={'nsw_scientificName':'taxon_name',
                                      'id':'taxon_id',
                                  'status_id':'id'})
additions

Unnamed: 0,action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
0,ADD,Cyclopsitta diophthalma coxeni,,495917,Critically Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Threatened - refer to https://www.en...
1,ADD,Polytelis anthopeplus monarchoides,,728998,Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Threatened - refer to https://www.en...
2,ADD,Genoplesium baueri,,921754,Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Threatened - refer to https://www.en...
3,ADD,Lindsaea fraseri,,1081779,Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Threatened - refer to https://www.en...
5,ADD,Prostanthera marifolia,,953191,Critically Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Threatened - refer to https://www.en...
...,...,...,...,...,...,...,...,...,...,...,...,...
846,ADD,Homoranthus elusus,,1244359,Critically Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,open,6825,peggydnew,Listed as Threatened - refer to https://www.en...
848,ADD,Egernia roomi,,1031614,Critically Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,open,6825,peggydnew,Listed as Threatened - refer to https://www.en...
849,ADD,Tympanocryptis osbornei,,965994,Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,open,6825,peggydnew,Listed as Threatened - refer to https://www.en...
850,ADD,Tympanocryptis mccartneyi,,965993,Critically Endangered,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,open,6825,peggydnew,Listed as Threatened - refer to https://www.en...


### Write updates and additions to file

In [37]:
pd.concat([updates,additions]).to_csv(sourcedir + "nsw.csv", index=False)

In [38]:
# what didnt match to a taxon?
unknownToInat = noinatstatus[noinatstatus['id'].isna()]
unknownToInat

Unnamed: 0,nsw_scientificName,nsw_taxonID,nsw_status,new_geoprivacy,status_id,scientificName_x,taxon_id,user_id,description,iucn,...,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName_y,taxonRank,references
4,Callistemon linearifolius,10129,Vulnerable,obscured,,,,,,,...,,,,,,,,,,
13,Gentiana bredboensis,10346,Critically Endangered,obscured,,,,,,,...,,,,,,,,,,
15,Gentiana wingecarribiensis,10347,Critically Endangered,obscured,,,,,,,...,,,,,,,,,,
26,Myriophyllum implicatum,20146,Critically Endangered,obscured,,,,,,,...,,,,,,,,,,
27,Lysimachia vulgaris davurica,10498,Endangered,obscured,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
847,Meridolum maryae,20363,Endangered,open,,,,,,,...,,,,,,,,,,
851,Leionema westonii,20369,Critically Endangered,open,,,,,,,...,,,,,,,,,,
852,Thinornis cucullatus cucullatus,10803,Critically Endangered,open,,,,,,,...,,,,,,,,,,
854,Lobelia claviflora,20375,Critically Endangered,open,,,,,,,...,,,,,,,,,,


### are there any that need to be removed?
nsw sensitive list count: 197
nsw inat statuses count: xx

updates to inat status: 152
additional inat status: 738
nsw statuses we can't find a taxon match for in iNaturalist: 139
total: 1395 (explainable via the various genus/section entries that we matched to in the taxonomy)

inat statuses left over: xx-152=16 that may need checking

In [257]:
# Stats
numsensitive = len(sensitivelist.index)
numconservation = len(conservationlist.index)
numupdates  = len(updates.index)
numadditions  = len(additions.index)
numnoinatstatus = len(noinatstatus.index)
numunknownToInat = len(unknownToInat.index)
numnoncomply = len(noncomply.index)
numcomply = len(statelist.index)
numdupinfo = len(dupinformation.index)
d = {'Sensitive': [numsensitive],
    'Conservation': [numconservation],
    'Statelist merge': [numfullstatelist],
    'Species iNat Comply' : [numcomply],
    'Species iNat non-Comply': [numnoncomply],
    'Duplicate Information': [numdupinfo],
    'Updates': [numupdates],
    'Additions': [numadditions],
    'No Inat Status': [numnoinatstatus],
    'Unknown to Inat': [numunknownToInat]}
statsdf = pd.DataFrame(data=d)
statsdf


Unnamed: 0,Sensitive,Conservation,Statelist merge,Species iNat Comply,Species iNat non-Comply,Duplicate Information,Updates,Additions,No Inat Status,Unknown to Inat
0,199,1025,1031,1010,21,0,149,743,864,121


In [258]:
# inat statuses that aren't in added or updated
inatstatuses[~inatstatuses['taxon_id'].isin(updates['taxon_id'])]

Unnamed: 0,status_id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
477,264941,1061113,3669610,6825,,New South Wales Biodiversity Conservation Act ...,EX,https://bie.ala.org.au/species/https://id.biod...,Presumed Extinct,open,...,Leuzea,australis,,2022-06-11T11:38:00Z,Leuzea australis,species,,,,
681,164757,1098159,58320,6825,,NSW Office of Environment & Heritage,Vulnerable,https://www.inaturalist.org/taxa/1098159/edit,,,...,Melaleuca,linearifolia,,2022-09-05T10:35:34Z,Melaleuca linearifolia,species,http://www.plantsoftheworldonline.org/,,,
1147,152307,116844,708886,6825,16650.0,NSW Office of Environment & Heritage,critically endangered,https://www.environment.nsw.gov.au/resources/a...,,obscured,...,Calyptorhynchus,banksii,,2019-11-23T01:11:55Z,Calyptorhynchus banksii,species,http://www.birdlife.org/datazone/speciesfactsh...,,,
736,166035,1194478,3669610,6825,,NSW Office of Environment & Heritage,EN,https://www.environment.nsw.gov.au/threateneds...,,,...,Vincetoxicum,woollsii,,2021-02-11T14:40:16Z,Vincetoxicum woollsii,species,http://www.plantsoftheworldonline.org/taxon/ur...,,,
737,166036,1194478,3669610,6744,,Australian Government,EN,https://www.environment.nsw.gov.au/threateneds...,,,...,Vincetoxicum,woollsii,,2021-02-11T14:40:16Z,Vincetoxicum woollsii,species,http://www.plantsoftheworldonline.org/taxon/ur...,,,
51,167249,1227491,3669610,6825,,NSW Office of Environment & Heritage,EN,https://www.environment.nsw.gov.au/threateneds...,,,...,Phyllodes,imperialis,smithersi,2021-03-23T03:42:06Z,Phyllodes imperialis smithersi,subspecies,https://biodiversity.org.au/afd/taxa/Phyllodes...,,,
397,264070,1255425,3669610,6825,,NSW Office of Environment & Heritage,CR,https://www.environment.nsw.gov.au/threateneds...,,open,...,Melaleuca,megalongensis,,2022-09-07T09:45:08Z,Melaleuca megalongensis,species,http://www.catalogueoflife.org/annual-checklis...,,,
809,262163,1289564,708886,6825,16650.0,NSW Office of Environment & Heritage,vulnerable,https://www.environment.nsw.gov.au/resources/a...,,obscured,...,Parvipsitta,porphyrocephala,,2022-03-15T04:25:29Z,Parvipsitta porphyrocephala,species,https://www.birds.cornell.edu/clementschecklis...,,,
3367,152252,19111,708886,6825,16650.0,NSW Office of Environment & Heritage,vulnerable,https://www.environment.nsw.gov.au/resources/a...,,obscured,...,Pezoporus,wallicus,,2022-06-14T10:57:21Z,Pezoporus wallicus,species,http://www.birdlife.org/datazone/speciesfactsh...,,,
2653,164637,19275,84719,6825,,IUCN Red List,EN,https://www.iucnredlist.org/species/22727593/1...,range of Cyclopsitta diophthalma coxeni (split...,obscured,...,Cyclopsitta,diophthalma,,2021-09-22T17:20:20Z,Cyclopsitta diophthalma,species,http://www.birdlife.org/datazone/speciesfactsh...,,,
