# iNaturalist status updates by state

Using the file produced in the collate-status-taxa.ipynb: `inat-aust-status-taxa.csv`, generate lists to update iNaturalist statuses

**Next steps:**
State by state establish the changes that need to be made:
    a. new - any new species that appear in the state lists but do not have a status in inaturalist (new template)
    b. updates - any changes to information which was added by us previously (user_id = 708886) (update template, action='UPDATE')
    c. removals - any statuses which were added by us previously (user_id = 708886) list which are incorrect (update template, action='REMOVE')
    d. flags - are there any statuses by other users that need to be flagged?

## Prep - common to all states
1. Read in the inaturalist statuses & filter out this state
2. Read in the inaturalist taxa list
3. Read in the state sensitive list
4. Attempt to match the state statuses to an IUCN equivalent


### 1. iNaturalist statuses

In [1]:
import pandas as pd

projectdir = "/Users/oco115/PycharmProjects/authoritative-lists/" # basedir for this gh project
# projectdir = "/Users/new330/IdeaProjects/authoritative-lists/" # basedir for this gh project
sourcedir = projectdir + "source-data/inaturalist-statuses/"
listdir = projectdir + "current-lists/"


# read in the statuses
taxastatus = pd.read_csv(sourcedir + "inat-aust-status-taxa.csv", encoding='UTF-8',na_filter=False,dtype=str) ## Read inaturalist conservation statuses file
taxastatus.head(3)

Unnamed: 0,id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
0,166449,38493,1138587,7830,,Flora and Fauna Guarantee Act 1988,CR,,,obscured,...,Eulamprus,kosciuskoi,,2021-03-01T10:35:01Z,Eulamprus kosciuskoi,species,http://reptile-database.reptarium.cz/search.ph...,,,
1,234788,918383,702203,9994,,Atlas of Living Australia,NT,https://bie.ala.org.au/species/https://id.biod...,,,...,Chiloschista,phyllorhiza,,2022-01-08T03:30:36Z,Chiloschista phyllorhiza,species,http://www.catalogueoflife.org/annual-checklis...,,,
2,234789,918383,702203,7308,,Atlas of Living Australia,LC,https://bie.ala.org.au/species/https://id.biod...,,,...,Chiloschista,phyllorhiza,,2022-01-08T03:30:36Z,Chiloschista phyllorhiza,species,http://www.catalogueoflife.org/annual-checklis...,,,


In [8]:
def filter_state_statuses(stateregex: str, urlregex: str):
    authoritydf = taxastatus['authority'].drop_duplicates().sort_values()
    authoritydf = authoritydf[pd.Series(authoritydf).str.contains(stateregex)]
    urldf = taxastatus['url'].drop_duplicates().sort_values()
    urldf = urldf[pd.Series(urldf).str.contains(urlregex)]
    placedisplaydf = taxastatus['place_display_name'].drop_duplicates().sort_values()
    placedisplaydf = placedisplaydf[pd.Series(placedisplaydf).str.contains(stateregex)]
    placedf = taxastatus['place_name'].drop_duplicates().sort_values()
    placedf = placedf[pd.Series(placedf).str.contains(stateregex)]
    # concat all and remove duplicates
    statedf = pd.concat([taxastatus.apply(lambda row: row[taxastatus['place_display_name'].isin(placedisplaydf)]),
                         taxastatus.apply(lambda row: row[taxastatus['place_name'].isin(placedf)]),
                         taxastatus.apply(lambda row: row[taxastatus['url'].isin(urldf)]),
                         taxastatus.apply(
                             lambda row: row[taxastatus['authority'].isin(authoritydf)])]).drop_duplicates()
    return statedf.sort_values(['taxon_id', 'user_id'])
inatstatuses = filter_state_statuses("NSW|New South Wales", ".nsw.")
inatstatuses.rename(columns={'id':'status_id','id_y':'taxon_id_y'},inplace=True)
inatstatuses

Unnamed: 0,status_id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
3,166416,1033183,3669610,6825,,NSW Office of Environment & Heritage,EN,https://www.environment.nsw.gov.au/threateneds...,,obscured,...,Eidothea,hardeniana,,2021-02-22T07:21:17Z,Eidothea hardeniana,species,,,,
2656,165059,1054498,58320,6825,,NSW Office of Environment & Heritage,Vulnerable,https://www.environment.nsw.gov.au/threatenedS...,,obscured,...,Prostanthera,cryptandroides,cryptandroides,2020-12-17T03:53:45Z,Prostanthera cryptandroides cryptandroides,subspecies,,,,
477,264941,1061113,3669610,6825,,New South Wales Biodiversity Conservation Act ...,EX,https://bie.ala.org.au/species/https://id.biod...,Presumed Extinct,open,...,Leuzea,australis,,2022-06-11T11:38:00Z,Leuzea australis,species,,,,
371,180988,1070573,3669610,6825,,NSW Office of Environment & Heritage,EN,https://www.environment.nsw.gov.au/threateneds...,,,...,Rotala,tripartita,,2021-09-21T05:48:55Z,Rotala tripartita,species,https://eol.org/pages/49427430,,,
2514,160581,1076814,990532,6825,,NSW Office of Environment & Heritage,Vulnerable,https://www.environment.nsw.gov.au/threateneds...,,obscured,...,Pterostylis,nigricans,,2022-07-12T14:19:22Z,Pterostylis nigricans,species,http://www.catalogueoflife.org/annual-checklis...,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1111,152247,906670,708886,6825,16650,NSW Office of Environment & Heritage,Endangered,https://www.environment.nsw.gov.au/resources/a...,,obscured,...,Caladenia,concolor,,2022-06-10T07:36:31Z,Caladenia concolor,species,http://www.catalogueoflife.org/annual-checklis...,,,
2637,164342,913024,58320,6825,,NSW Office of Environment & Heritage,Vulnerable,https://www.environment.nsw.gov.au/threatenedS...,,,...,Symplocos,baeuerlenii,,2022-03-14T15:27:03Z,Symplocos baeuerlenii,species,,,,
7,167825,953250,702203,6825,,New South Wales Office of Environment and Heri...,VU,https://www.environment.nsw.gov.au/threatenedS...,Vulnerable,,...,Pultenaea,glabra,,2021-05-02T14:08:01Z,Pultenaea glabra,species,http://www.catalogueoflife.org/annual-checklis...,,,
836,155144,966856,708886,6825,16650,NSW Office of Environment & Heritage,endangered,https://www.environment.nsw.gov.au/resources/a...,,obscured,...,Pteris,platyzomopsis,,2019-11-20T05:25:13Z,Pteris platyzomopsis,species,https://eol.org/pages/47172990,,,


### 2. iNaturalist taxonomy

In [10]:
# Output files contain these fields
# Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id
# so we need to match species from the state lists to the inat taxa to get the taxon_id

import zipfile
url = "https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip"
filename = url.split("/")[-1]

z=zipfile.ZipFile(sourcedir + filename)

with z.open('taxa.csv') as from_archive:
    inattaxa = pd.read_csv(from_archive,dtype=str)
z.close()
inattaxa.head(3)


Unnamed: 0,id,taxonID,identifier,parentNameUsageID,kingdom,phylum,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references
0,1,https://www.inaturalist.org/taxa/1,https://www.inaturalist.org/taxa/1,https://www.inaturalist.org/taxa/48460,Animalia,,,,,,,,2021-11-02T06:05:44Z,Animalia,kingdom,http://www.catalogueoflife.org/annual-checklis...
1,2,https://www.inaturalist.org/taxa/2,https://www.inaturalist.org/taxa/2,https://www.inaturalist.org/taxa/1,Animalia,Chordata,,,,,,,2021-11-23T00:40:18Z,Chordata,phylum,http://www.catalogueoflife.org/annual-checklis...
2,3,https://www.inaturalist.org/taxa/3,https://www.inaturalist.org/taxa/3,https://www.inaturalist.org/taxa/355675,Animalia,Chordata,Aves,,,,,,2022-12-27T07:33:16Z,Aves,class,http://www.catalogueoflife.org/annual-checklis...


### 3. State lists

In [11]:
sensitivelist = pd.read_csv(listdir + "sensitive-lists/NSW-sensitive.csv")  # Qld sensitive list
sensitivelist['scientificName'] = sensitivelist['scientificName'].str.replace('subsp. ', '', regex=False)
sensitivelist = sensitivelist.rename(columns={'taxonID':'wildnetTaxonID'})
sensitivelist

Unnamed: 0,taxonRank,kingdom,class,order,family,genus,scientificName,specificEpithet,vernacularName,establishmentMeans,sourceStatus,protectedInNSW,sensitivityClass,tsprofileID,countryConservation,dcterms_modified,speciesID,wildnetTaxonID,generalisation,status
0,Species,Animalia,Aves,Psittaciformes,Cacatuidae,Callocephalon,Callocephalon fimbriatum,fimbriatum,Gang-gang Cockatoo,"Alive in NSW, Native",Vulnerable,True,Category 3,10975.0,Endangered,2022-05-12T15:50:01.01+10:00,6,6,1km,Vulnerable
1,Species,Animalia,Aves,Strigiformes,Strigidae,Ninox,Ninox strenua,strenua,Powerful Owl,"Alive in NSW, Native",Vulnerable,True,Category 3,10562.0,Not Listed,2011-11-11T11:23:06+11:00,162,162,1km,Vulnerable
2,Species,Animalia,Aves,Strigiformes,Tytonidae,Tyto,Tyto tenebricosa,tenebricosa,Sooty Owl,"Alive in NSW, Native",Vulnerable,True,Category 3,10821.0,Not Listed,2012-07-23T18:04:52.683+10:00,333,333,1km,Vulnerable
3,Species,Animalia,Aves,Strigiformes,Strigidae,Ninox,Ninox connivens,connivens,Barking Owl,"Alive in NSW, Native",Vulnerable,True,Category 3,10561.0,Not Listed,2011-11-11T11:23:06+11:00,363,363,1km,Vulnerable
4,Species,Animalia,Reptilia,Squamata,Elapidae,Hoplocephalus,Hoplocephalus bungaroides,bungaroides,Broad-headed Snake,"Alive in NSW, Native",Endangered,True,Category 2,10413.0,Vulnerable,2011-11-11T11:23:04+11:00,390,390,10km,Endangered
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193,Subspecies,Plantae,Flora,Flora,Poaceae,Anthosachne,Anthosachne kingiana kingiana,kingiana,Philip Island Wheat Grass,"Alive in NSW, Native",Critically Endangered,False,Category 3,20148.0,Critically Endangered,2020-01-23T17:14:13.01+11:00,23014,23014,1km,Critically Endangered
194,Species,Plantae,Flora,Flora,Orchidaceae,Caladenia,Caladenia tensa,tensa,Rigid Spider-orchid,"Alive in NSW, Native",Not Listed,True,Category 2,,Not Listed,2020-06-23T19:56:43.85+10:00,23737,23737,10km,Not Listed
195,Species,Plantae,Flora,Flora,Orchidaceae,Caladenia,Caladenia atroclavia,atroclavia,Black-clubbed Spider-orchid,"Alive in NSW, Native",Not Listed,True,Category 2,20365.0,Endangered,2020-06-23T19:51:08.733+10:00,23738,23738,10km,Not Listed
196,Species,Plantae,Flora,Flora,Orchidaceae,Prasophyllum,Prasophyllum sandrae,sandrae,Majors Creek Leek Orchid,"Alive in NSW, Native",Critically Endangered,False,Category 2,10668.0,Not Listed,2021-12-31T11:53:58.467+11:00,24268,24268,10km,Critically Endangered


### 4. Equivalent IUCN statuses

In [12]:
iucn_statuses = {'Not Evaluated', 'Data Deficient', 'Least Concern', 'Near Threatened', 'Vulnerable', 'Endangered', 'Critically Endangered', 'Extinct in the Wild' and 'Extinct'}
sensitivelist.groupby(['status'])['status'].count()

status
Critically Endangered    52
Endangered               93
Extinct                   3
Not Listed                5
Vulnerable               45
Name: status, dtype: int64

In [13]:
iucnStatusMappings = {
    'critically endangered': 'Critically Endangered',
    'vulnerable':'Vulnerable',
    # 'not evaluated':'Not Evaluated',
    'not evaluated':'Not Listed',
    'data deficient':'Data Deficient',
    'least concern':'Least Concern',
    'special least concern':'Least Concern',
    'near threatened':'Near Threatened',
    'endangered':'Endangered',
    'extinct in the wild':'Extinct in the Wild',
    'extinct':'Extinct',
    'confidential':'Vulnerable'
}

### 5. Determine best place ID to use

In [14]:
inatstatuses.groupby(['place_id','place_name','place_display_name'])['place_id'].count()
# looks like 6825

place_id  place_name       place_display_name 
6744      Australia        Australia                5
6825      New South Wales  New South Wales, AU    161
Name: place_id, dtype: int64

## Merge iNaturalist statuses with State sensitive list on scientificName

1. Match - updates, even if the statuses are the same we'll update the links and values anyway
2. No match - statuses to be added (additions)
   1.1 No match and no taxnomy - search for synonyms
   1.2 No match
3. Merge the other direction to see if there are deletes?


In [15]:
# join to see which lists already have a status in inaturalist based on scientificName
mergedstatuses = sensitivelist[['wildnetTaxonID','scientificName','status']].merge(inatstatuses[['status_id','scientificName','taxon_id','user_id','description','iucn','authority','status','geoprivacy','place_id','place_display_name']],how="left",left_on='scientificName',right_on='scientificName',suffixes=(None,'_inat')).sort_values(['scientificName'])
mergedstatuses


Unnamed: 0,wildnetTaxonID,scientificName,status,status_id,taxon_id,user_id,description,iucn,authority,status_inat,geoprivacy,place_id,place_display_name
142,13734,Acacia atrox,Endangered,152295,898643,708886,,40,NSW Office of Environment & Heritage,endangered,obscured,6825,"New South Wales, AU"
97,11394,Acacia dangarensis,Critically Endangered,152286,775137,708886,,40,NSW Office of Environment & Heritage,endangered,obscured,6825,"New South Wales, AU"
42,4206,Allocasuarina portuensis,Endangered,,,,,,,,,,
192,21535,Amytornis modestus inexpectatus,Extinct,,,,,,,,,,
193,21541,Amytornis modestus obscurior,Critically Endangered,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
145,13940,Wollemia nobilis,Critically Endangered,152312,49381,708886,,40,NSW Office of Environment & Heritage,Endangered,obscured,6825,"New South Wales, AU"
74,8685,Zieria adenophora,Critically Endangered,,,,,,,,,,
113,12102,Zieria buxijugum,Critically Endangered,,,,,,,,,,
114,12106,Zieria formosa,Critically Endangered,,,,,,,,,,


In [21]:
# prepare the export fields, common to New template and Update template
# new statuses
# Taxon_Name,Status,Authority,IUCN_equivalent,Description,iNaturalist_Place_ID,url,Taxon_Geoprivacy,Username,taxon_id
# updates
# action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username
mergedstatuses['new_authority'] = "New South Wales Office of Environment and Heritage"
mergedstatuses['new_description'] = "Listed as Confidential - refer to https://www.environment.nsw.gov.au/threatenedspeciesapp"
mergedstatuses['new_url'] = "https://www.environment.nsw.gov.au/threatenedspeciesapp/profile.aspx?id=" + mergedstatuses['wildnetTaxonID'].astype(str)
mergedstatuses['new_geoprivacy'] = "obscured"
mergedstatuses['new_place_id'] = '6825'  # NEW SOUTH WALES
mergedstatuses['new_username'] = 'peggydnew'
mergedstatuses['new_iucn_equivalent'] = mergedstatuses['status'].str.lower().str.strip().map(iucnStatusMappings).fillna('Vulnerable') # map to dictionary
mergedstatuses['new_status'] = mergedstatuses['status'].fillna('Confidential')
mergedstatuses

Unnamed: 0,wildnetTaxonID,scientificName,status,status_id,taxon_id,user_id,description,iucn,authority,status_inat,...,place_id,place_display_name,new_authority,new_description,new_url,new_geoprivacy,new_place_id,new_username,new_iucn_equivalent,new_status
142,13734,Acacia atrox,Endangered,152295,898643,708886,,40,NSW Office of Environment & Heritage,endangered,...,6825,"New South Wales, AU",New South Wales Office of Environment and Heri...,Listed as Confidential - refer to https://www....,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Endangered,Endangered
97,11394,Acacia dangarensis,Critically Endangered,152286,775137,708886,,40,NSW Office of Environment & Heritage,endangered,...,6825,"New South Wales, AU",New South Wales Office of Environment and Heri...,Listed as Confidential - refer to https://www....,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Critically Endangered,Critically Endangered
42,4206,Allocasuarina portuensis,Endangered,,,,,,,,...,,,New South Wales Office of Environment and Heri...,Listed as Confidential - refer to https://www....,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Endangered,Endangered
192,21535,Amytornis modestus inexpectatus,Extinct,,,,,,,,...,,,New South Wales Office of Environment and Heri...,Listed as Confidential - refer to https://www....,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Extinct,Extinct
193,21541,Amytornis modestus obscurior,Critically Endangered,,,,,,,,...,,,New South Wales Office of Environment and Heri...,Listed as Confidential - refer to https://www....,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Critically Endangered,Critically Endangered
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
145,13940,Wollemia nobilis,Critically Endangered,152312,49381,708886,,40,NSW Office of Environment & Heritage,Endangered,...,6825,"New South Wales, AU",New South Wales Office of Environment and Heri...,Listed as Confidential - refer to https://www....,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Critically Endangered,Critically Endangered
74,8685,Zieria adenophora,Critically Endangered,,,,,,,,...,,,New South Wales Office of Environment and Heri...,Listed as Confidential - refer to https://www....,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Critically Endangered,Critically Endangered
113,12102,Zieria buxijugum,Critically Endangered,,,,,,,,...,,,New South Wales Office of Environment and Heri...,Listed as Confidential - refer to https://www....,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Critically Endangered,Critically Endangered
114,12106,Zieria formosa,Critically Endangered,,,,,,,,...,,,New South Wales Office of Environment and Heri...,Listed as Confidential - refer to https://www....,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Critically Endangered,Critically Endangered


## Updates

In [22]:
# updates
# action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
updates = pd.DataFrame(mergedstatuses[mergedstatuses['status_id'].notnull()])
updates.sort_values('scientificName')
updates['action'] = 'UPDATE'
#updates.loc[:,'action'] = 'UPDATE'
updates = updates[['action','scientificName','status_id','taxon_id','new_status','new_iucn_equivalent','new_authority','new_url','new_geoprivacy','new_place_id','new_username','new_description']]
updates.columns = updates.columns.str.replace("new_", "", regex=True)
updates = updates.rename(columns={'scientificName':'taxon_name',
                                  'status_id':'id'})
updates

Unnamed: 0,action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
142,UPDATE,Acacia atrox,152295,898643,Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
97,UPDATE,Acacia dangarensis,152286,775137,Critically Endangered,Critically Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
65,UPDATE,Angiopteris evecta,152280,122319,Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
33,UPDATE,Arthropteris palisotii,152292,736268,Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
28,UPDATE,Banksia conferta,152272,545952,Critically Endangered,Critically Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
...,...,...,...,...,...,...,...,...,...,...,...,...
151,UPDATE,Tyto longimembris,152297,73545,Vulnerable,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
12,UPDATE,Tyto novaehollandiae,152263,20425,Vulnerable,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
2,UPDATE,Tyto tenebricosa,152244,20422,Vulnerable,Vulnerable,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
112,UPDATE,Viola cleistogamoides,152321,566603,Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....


In [110]:
# investigation - which updates are mine (346), which are those not from me (80 not from me)
#updates[updates['user_id']=='708886'][['scientificName','new_status','status_inat','authority','new_authority','description','new_description','geoprivacy','new_geoprivacy']]
#updates[updates['user_id']!='708886'][['user_id','scientificName','new_status','status_inat','authority','new_authority','description','new_description','geoprivacy','new_geoprivacy']]
# those with different statuses
#updates[updates['new_status'].str.lower().str.strip() != updates['status_inat'].str.lower().str.strip()][['scientificName','new_status','status_inat','authority','new_authority','description','new_description','geoprivacy','new_geoprivacy']]
# users who've updated qld statuses who aren't me
#'https://www.inaturalist.org/users/220795','Steven Kurniawidjaja','neontetraploid','US'
#'https://www.inaturalist.org/users/3669610','Craig Robbins','craig-r','AU'
#'https://www.inaturalist.org/users/527710','James Kameron Mitchell','jameskm','US'
#'https://www.inaturalist.org/users/58320','lwnrngr','lwnrngr','NZ'
#'https://www.inaturalist.org/users/702203','Kitty Maurey','kitty12','CA'
#'https://www.inaturalist.org/users/717122','Miguel de Salas','mftasp','TAS'


## No status in iNaturalist via straight scientificName match
The NSW records that didn't match up to a status in iNaturalist

In [23]:
# to add: those that have no inaturalist status - 532!!
noinatstatus = mergedstatuses[mergedstatuses['status_id'].isnull()]
# try to match the taxon name to something in inaturalist
noinatstatus = noinatstatus.merge(inattaxa, how="left", left_on="scientificName",right_on="scientificName")
noinatstatus

Unnamed: 0,wildnetTaxonID,scientificName,status,status_id,taxon_id,user_id,description,iucn,authority,status_inat,...,phylum,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,taxonRank,references
0,4206,Allocasuarina portuensis,Endangered,,,,,,,,...,Tracheophyta,Magnoliopsida,Fagales,Casuarinaceae,Allocasuarina,portuensis,,2020-12-08T01:22:50Z,species,http://www.catalogueoflife.org/annual-checklis...
1,21535,Amytornis modestus inexpectatus,Extinct,,,,,,,,...,Chordata,Aves,Passeriformes,Maluridae,Amytornis,modestus,inexpectatus,2018-12-18T22:21:40Z,subspecies,http://www.birds.cornell.edu/clementschecklist...
2,21541,Amytornis modestus obscurior,Critically Endangered,,,,,,,,...,Chordata,Aves,Passeriformes,Maluridae,Amytornis,modestus,obscurior,2021-11-24T00:49:54Z,subspecies,http://www.birds.cornell.edu/clementschecklist...
3,12355,Angophora exul,Endangered,,,,,,,,...,,,,,,,,,,
4,23014,Anthosachne kingiana kingiana,Critically Endangered,,,,,,,,...,Tracheophyta,Liliopsida,Poales,Poaceae,Anthosachne,kingiana,kingiana,2022-12-19T05:37:47Z,subspecies,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,13469,Typhonium sp. aff. brownii,Endangered,,,,,,,,...,,,,,,,,,,
99,8685,Zieria adenophora,Critically Endangered,,,,,,,,...,Tracheophyta,Magnoliopsida,Sapindales,Rutaceae,Zieria,adenophora,,2020-09-27T03:19:35Z,species,
100,12102,Zieria buxijugum,Critically Endangered,,,,,,,,...,Tracheophyta,Magnoliopsida,Sapindales,Rutaceae,Zieria,buxijugum,,2020-09-27T03:19:38Z,species,
101,12106,Zieria formosa,Critically Endangered,,,,,,,,...,Tracheophyta,Magnoliopsida,Sapindales,Rutaceae,Zieria,formosa,,2021-07-28T04:47:55Z,species,https://eol.org/pages/49431660


In [24]:
additions = pd.DataFrame(noinatstatus[noinatstatus['id'].notna()])
additions

Unnamed: 0,wildnetTaxonID,scientificName,status,status_id,taxon_id,user_id,description,iucn,authority,status_inat,...,phylum,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,taxonRank,references
0,4206,Allocasuarina portuensis,Endangered,,,,,,,,...,Tracheophyta,Magnoliopsida,Fagales,Casuarinaceae,Allocasuarina,portuensis,,2020-12-08T01:22:50Z,species,http://www.catalogueoflife.org/annual-checklis...
1,21535,Amytornis modestus inexpectatus,Extinct,,,,,,,,...,Chordata,Aves,Passeriformes,Maluridae,Amytornis,modestus,inexpectatus,2018-12-18T22:21:40Z,subspecies,http://www.birds.cornell.edu/clementschecklist...
2,21541,Amytornis modestus obscurior,Critically Endangered,,,,,,,,...,Chordata,Aves,Passeriformes,Maluridae,Amytornis,modestus,obscurior,2021-11-24T00:49:54Z,subspecies,http://www.birds.cornell.edu/clementschecklist...
4,23014,Anthosachne kingiana kingiana,Critically Endangered,,,,,,,,...,Tracheophyta,Liliopsida,Poales,Poaceae,Anthosachne,kingiana,kingiana,2022-12-19T05:37:47Z,subspecies,
5,24270,Backhousia subargentea,Endangered,,,,,,,,...,Tracheophyta,Magnoliopsida,Myrtales,Myrtaceae,Backhousia,subargentea,,2020-03-16T22:13:44Z,species,https://eol.org/pages/49899176
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97,9368,Trachymene scapigera,Endangered,,,,,,,,...,Tracheophyta,Magnoliopsida,Apiales,Araliaceae,Trachymene,scapigera,,2020-02-19T07:53:19Z,species,
99,8685,Zieria adenophora,Critically Endangered,,,,,,,,...,Tracheophyta,Magnoliopsida,Sapindales,Rutaceae,Zieria,adenophora,,2020-09-27T03:19:35Z,species,
100,12102,Zieria buxijugum,Critically Endangered,,,,,,,,...,Tracheophyta,Magnoliopsida,Sapindales,Rutaceae,Zieria,buxijugum,,2020-09-27T03:19:38Z,species,
101,12106,Zieria formosa,Critically Endangered,,,,,,,,...,Tracheophyta,Magnoliopsida,Sapindales,Rutaceae,Zieria,formosa,,2021-07-28T04:47:55Z,species,https://eol.org/pages/49431660


In [25]:
# there's no status but there is a matching inat taxon (id is the taxon id)
additions = pd.DataFrame(noinatstatus[noinatstatus['id'].notna()])
additions.sort_values(['scientificName'])
additions['action'] = 'ADD'
additions = additions[['action','scientificName','status_id','id','new_status','new_iucn_equivalent','new_authority','new_url','new_geoprivacy','new_place_id','new_username','new_description']]
additions.columns = additions.columns.str.replace("new_", "", regex=True)
additions = additions.rename(columns={'scientificName':'taxon_name',
                                      'id':'taxon_id',
                                  'status_id':'id'})
additions

Unnamed: 0,action,taxon_name,id,taxon_id,status,iucn_equivalent,authority,url,geoprivacy,place_id,username,description
0,ADD,Allocasuarina portuensis,,1152712,Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
1,ADD,Amytornis modestus inexpectatus,,713121,Extinct,Extinct,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
2,ADD,Amytornis modestus obscurior,,713120,Critically Endangered,Critically Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
4,ADD,Anthosachne kingiana kingiana,,485776,Critically Endangered,Critically Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
5,ADD,Backhousia subargentea,,1046296,Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
...,...,...,...,...,...,...,...,...,...,...,...,...
97,ADD,Trachymene scapigera,,1003604,Endangered,Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
99,ADD,Zieria adenophora,,1125306,Critically Endangered,Critically Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
100,ADD,Zieria buxijugum,,1125318,Critically Endangered,Critically Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....
101,ADD,Zieria formosa,,1247626,Critically Endangered,Critically Endangered,New South Wales Office of Environment and Heri...,https://www.environment.nsw.gov.au/threateneds...,obscured,6825,peggydnew,Listed as Confidential - refer to https://www....


In [26]:
# write these to the file
pd.concat([updates,additions]).to_csv(sourcedir + "nsw.csv", index=False)

In [27]:
# what didnt match to a taxon?
unknownToInat = noinatstatus[noinatstatus['id'].isna()]
unknownToInat

Unnamed: 0,wildnetTaxonID,scientificName,status,status_id,taxon_id,user_id,description,iucn,authority,status_inat,...,phylum,class,order,family,genus,specificEpithet,infraspecificEpithet,modified,taxonRank,references
3,12355,Angophora exul,Endangered,,,,,,,,...,,,,,,,,,,
6,12474,Baeckea kandos,Endangered,,,,,,,,...,,,,,,,,,,
15,3619,Callistemon linearifolius,Vulnerable,,,,,,,,...,,,,,,,,,,
21,20623,Corunastylis sp. Charmhaven (NSW896673),Critically Endangered,,,,,,,,...,,,,,,,,,,
25,11238,Dendrobium melaleucaphilum,Endangered,,,,,,,,...,,,,,,,,,,
28,13945,"Diuris sp. (Oaklands, D.L. Jones 5380)",Endangered,,,,,,,,...,,,,,,,,,,
42,4740,Gentiana bredboensis,Critically Endangered,,,,,,,,...,,,,,,,,,,
43,5147,Gentiana wingecarribiensis,Critically Endangered,,,,,,,,...,,,,,,,,,,
49,21483,Hibbertia spanantha,Critically Endangered,,,,,,,,...,,,,,,,,,,
52,10552,Leucopogon confertus,Endangered,,,,,,,,...,,,,,,,,,,


### are there any that need to be removed?
qld sensitive list count: 198
qld inat statuses count: 166

updates to inat status: 96
additional inat status: 81
qld statuses we can't find a taxon match for in iNaturalist: 22
total: 541 (explainable via the various genus/section entries that we matched to in the taxonomy)

inat statuses left over: 166-96=70 that may need checking

In [118]:
# inat statuses that aren't in added or updated
inatstatuses[~inatstatuses['taxon_id'].isin(updates['taxon_id'])]


Unnamed: 0,status_id,taxon_id,user_id,place_id,source_id,authority,status,url,description,geoprivacy,...,genus,specificEpithet,infraspecificEpithet,modified,scientificName,taxonRank,references,preferred_common_name,is_active,current_synonymous_taxon_ids
408,223608,1015555,3669610,7308,,Queensland Nature Conservation Act 1992,NT,https://apps.des.qld.gov.au/species-search/det...,,,...,Symplocos,harroldii,,2022-11-30T22:01:59Z,Symplocos harroldii,species,https://eol.org/pages/47146631,,,
102,159922,1019990,702203,7308,,Queensland,VU,https://www.legislation.qld.gov.au/view/html/i...,,,...,Acacia,baueri,,2022-04-06T22:03:46Z,Acacia baueri,species,http://www.catalogueoflife.org/annual-checklis...,,,
667,165360,1023152,58320,7308,,Queensland Government,Near threatened,https://apps.des.qld.gov.au/species-search/det...,,,...,Bertya,pedicellata,,2021-01-05T06:59:04Z,Bertya pedicellata,species,http://www.catalogueoflife.org/annual-checklis...,,,
3327,161810,1032816,425992,7308,,Queensland Government,VU,https://apps.des.qld.gov.au/species-search/det...,,open,...,Macadamia,ternifolia,,2020-08-30T17:12:56Z,Macadamia ternifolia,species,,,,
263,180953,1033327,702203,7308,,Nature Conservation Act 1992,EN,https://bie.ala.org.au/species/https://id.biod...,,,...,Grevillea,linsmithii,,2021-09-20T02:01:57Z,Grevillea linsmithii,species,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
835,262047,977629,3669610,7308,,Queensland Nature Conservation Act 1992,NT,https://www.data.qld.gov.au/dataset/conservati...,,open,...,Melaleuca,formosa,,2022-09-05T10:47:52Z,Melaleuca formosa,species,,,,
2615,161807,993333,702203,7308,,Nature Conservation Act 1992,VU,https://apps.des.qld.gov.au/species-search/det...,,,...,Cupaniopsis,tomentella,,2020-08-30T17:12:00Z,Cupaniopsis tomentella,species,https://eol.org/pages/5629346,,,
2616,161808,993333,702203,6744,,Environment Protection and Biodiversity Conser...,VU,https://apps.des.qld.gov.au/species-search/det...,,,...,Cupaniopsis,tomentella,,2020-08-30T17:12:00Z,Cupaniopsis tomentella,species,https://eol.org/pages/5629346,,,
23,167723,993605,3669610,7308,,QLD DEHP,NT,https://www.data.qld.gov.au/dataset/conservati...,,obscured,...,Acianthus,amplexicaulis,,2021-10-05T08:48:02Z,Acianthus amplexicaulis,species,http://www.catalogueoflife.org/annual-checklis...,,,
