# QLD Conservation Status and Sensitive Species Lists

This notebook downloads the Qld lists from the [Qld Government Open Data Portal](https://data.qld.gov.au) and formats them in Darwin Core for ingestion into the ALA Lists tool.
It will save original lists to the `source-data/QLD` directory, process the lists and save them to `current-lists`.


## Lists in the ALA Species List tool
* Conservation list: __[dr652](https://lists.ala.org.au/speciesListItem/list/dr652)__ (in [test](https://lists-test.ala.org.au/speciesListItem/list/dr652))
* Sensitive list: __[dr493](https://lists.ala.org.au/speciesListItem/list/dr493)__ (in [test](https://lists-test.ala.org.au/speciesListItem/list/dr18404))


## Sources
Queensland Nature Conservation Act 1992

### Conservation
* __[Metadata - Qld Species (Open Data Portal)](https://www.data.qld.gov.au/dataset/conservation-status-of-queensland-wildlife)__
* __[Data](https://apps.des.qld.gov.au/data-sets/wildlife/wildnet/species.csv)__

### Sensitive
* __[Metadata - Queensland Confidential Species (Open Data Portal)](https://www.data.qld.gov.au/dataset/queensland-confidential-species)__
* __[Data](https://apps.des.qld.gov.au/data-sets/wildlife/wildnet/qld-confidential-species.csv)__

### Codes
* __[Metadata - Qld Species codes](https://www.data.qld.gov.au/dataset/conservation-status-of-queensland-wildlife/resource/6344ea93-cadf-4e0c-9ff4-12dfb18d5f14)__
* __[Data](https://apps.des.qld.gov.au/data-sets/wildlife/wildnet/species-status-codes.csv)__



# Setup
* Import libraries
* Set Project directory
* Set URLs

In [1]:
import datetime

import pandas as pd
import requests
import io
from ftfy import fix_encoding
import urllib.request, json
import certifi
import ssl

projectDir = "/Users/new330/IdeaProjects/authoritative-lists/"
sourceDataDir = projectDir + "source-data/QLD/"
processedDataDir = projectDir + "current-lists/"

codesurl =  "https://apps.des.qld.gov.au/data-sets/wildlife/wildnet/species-status-codes.csv"
listurl = "https://apps.des.qld.gov.au/data-sets/wildlife/wildnet/species.csv"
confidentiallisturl = "https://apps.des.qld.gov.au/data-sets/wildlife/wildnet/qld-confidential-species.csv"

## Download the raw files from data.qld.gov.au
... save locally

In [2]:
# %%script echo skipping # comment this line to download dataset from API

response = requests.get(codesurl)
rtext = fix_encoding(response.text)
speciescodes = pd.read_csv(io.StringIO(rtext))
speciescodes.to_csv(sourceDataDir + "species-status-codes.csv")

response = requests.get(listurl)
rtext = fix_encoding(response.text)
conservationlist = pd.read_csv(io.StringIO(rtext))
conservationlist.to_csv(sourceDataDir + "species.csv")

response = requests.get(confidentiallisturl)
rtext = fix_encoding(response.text)
confidentiallist = pd.read_csv(io.StringIO(rtext))
confidentiallist.to_csv(sourceDataDir + "qld-confidential-species.csv")

## Standardise Status Codes
Some minimal changes to some Qld Nature Conservation Act codes so that they are consistent with other states

In [3]:
speciescodes = pd.read_csv(sourceDataDir + "species-status-codes.csv")
ncastatuscodes = speciescodes[speciescodes['Field'] == "NCA_status"][['Code', 'Code_description']]
ncastatuscodes['Code_description'] = ncastatuscodes['Code_description'].str.replace(" wildlife", "")
ncastatuscodes.loc[
    ncastatuscodes['Code_description'] == "Critically endangered", 'Code_description'] = "Critically Endangered"
ncastatuscodes.loc[ncastatuscodes['Code_description'] == "Near threatened", 'Code_description'] = "Near Threatened"
endemicitycodes = speciescodes[speciescodes['Field'] == "Endemicity"][['Code', 'Code_description']]
# ebpc codes
epbccodes = speciescodes[speciescodes['Field'] == "EPBC_status"][['Code', 'Code_description']]
ncastatuscodes

Unnamed: 0,Code,Code_description
17,C,Least concern
18,CR,Critically Endangered
19,E,Endangered
20,EX,Extinct
21,I,International
22,NT,Near Threatened
23,PE,Extinct in the wild
24,SL,Special least concern
25,V,Vulnerable


## Conservation List
* Read in the Conservation list
* Join to the codes to expand the code descriptions.
* Change the field names to `sourceStatus` and `status` as required by the ALA's conservation list processing.
* Remove **Least concern** and no status
* Expand the endemicity and epbc status codes

In [4]:
conservationlist = pd.read_csv(sourceDataDir + "species.csv")
conservationlist = pd.merge(conservationlist,ncastatuscodes,left_on=['NCA_status'],right_on=['Code'],how="left")
conservationlist.drop(['Code'],axis=1,inplace=True)
conservationlist = conservationlist.rename(columns={'NCA_status':'sourceStatus','Code_description':'status'})

# remove empty or Least Concern status records
conservationlist = conservationlist[((conservationlist['status'] != "Least concern") & (conservationlist['status'].notna()))]

# expand endemicity
endemicitycodes = speciescodes[speciescodes['Field'] == "Endemicity"][['Code', 'Code_description']]
conservationlist = pd.merge(conservationlist, endemicitycodes, left_on=['Endemicity'], right_on=['Code'], how="left")
conservationlist.drop(['Code','Endemicity'], axis=1, inplace=True)
conservationlist = conservationlist.rename(columns={'Code_description': 'Endemicity'})

# expand epbc
epbccodes = speciescodes[speciescodes['Field'] == "EPBC_status"][['Code','Code_description']]
conservationlist = pd.merge(conservationlist,epbccodes,left_on=['EPBC_status'],right_on=['Code'],how="left")
conservationlist.drop(['Code','EPBC_status'],axis=1,inplace=True)
conservationlist = conservationlist.rename(columns={'Code_description':'EPBC Status'})
#conservationlist.drop(['EPBC_status'],axis=1,inplace=True)
conservationlist.drop(['Unnamed: 0'],axis=1,inplace=True)
conservationlist

Unnamed: 0,Taxon_Id,Kingdom,Class,Family,Scientific_name,Common_name,Taxon_author,sourceStatus,Significant,Confidential,status,Endemicity,EPBC Status
0,706,animals,amphibians,Limnodynastidae,Adelotus brevis,tusked frog,"(Günther, 1863)",V,Y,N,Vulnerable,Intranational,
1,687,animals,amphibians,Limnodynastidae,Philoria kundagungan,red-and-yellow mountainfrog,"(Ingram & Corben, 1975)",E,Y,Y,Endangered,Intranational,Endangered
2,686,animals,amphibians,Myobatrachidae,Crinia tinnula,wallum froglet,"Straughan & Main, 1966",V,Y,N,Vulnerable,Intranational,
3,675,animals,amphibians,Myobatrachidae,Mixophyes fleayi,Fleay's barred frog,"Corben & Ingram, 1987",E,Y,Y,Endangered,Intranational,Endangered
4,676,animals,amphibians,Myobatrachidae,Mixophyes iteratus,giant barred frog,"Straughan, 1968",V,Y,Y,Vulnerable,Intranational,Vulnerable
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2239,6482,plants,land plants,Zamiaceae,Macrozamia viridis,,D.L.Jones & P.I.Forst.,E,Y,Y,Endangered,Intranational,
2240,8948,plants,land plants,Zingiberaceae,Alpinia hylandii,,R.M.Sm.,NT,Y,N,Near Threatened,Queensland Endemic,
2241,8949,plants,land plants,Zingiberaceae,Amomum queenslandicum,,R.M.Sm.,V,Y,N,Vulnerable,Queensland Endemic,
2242,12434,plants,land plants,Zingiberaceae,Globba marantina,,L.,V,Y,N,Vulnerable,Regional Endemic,


**Tidy up**
* rename fields to Darwin Core
* replace kingdom/class values with scientific names

In [5]:
conservationlist = conservationlist.rename(columns=
{
    'Taxon_Id':'taxonID',
    'Kingdom':'kingdom',
    'Class':'class',
    'Family':'family',
    'Scientific_name':'scientificName',
    'Common_name': 'vernacularName',
    'Taxon_author':'scientificNameAuthorship',
    'NCA_status':'sourceStatus'
})

# Replace kingdom and class values with scientific terms
conservationlist.loc[conservationlist["kingdom"] == "animals", "kingdom"] = "Animalia"
conservationlist.loc[conservationlist["kingdom"] == "plants", "kingdom"] = "Plantae"
conservationlist.loc[conservationlist["class"] == "land plants", "class"] = "Equisetopsida"
conservationlist.loc[conservationlist["class"] == "amphibians", "class"] = "Amphibia"
conservationlist.loc[conservationlist["class"] == "birds", "class"] = "Aves"
conservationlist.loc[conservationlist["class"] == "cartilaginous fishes", "class"] = "Chondrichthyes"
conservationlist.loc[conservationlist["class"] == "insects", "class"] = "Insecta"
conservationlist.loc[conservationlist["class"] == "malacostracans", "class"] = "Malacostraca"
conservationlist.loc[conservationlist["class"] == "mammals", "class"] = "Mammalia"
conservationlist.loc[conservationlist["class"] == "ray-finned fishes", "class"] = "Actinopterygii"
conservationlist.loc[conservationlist["class"] == "reptiles", "class"] = "Reptilia"
conservationlist.loc[conservationlist["class"] == "snails", "class"] = "Gastropoda"
conservationlist.loc[conservationlist["class"] == "arachnids", "class"] = "Arachnida"
conservationlist

Unnamed: 0,taxonID,kingdom,class,family,scientificName,vernacularName,scientificNameAuthorship,sourceStatus,Significant,Confidential,status,Endemicity,EPBC Status
0,706,Animalia,Amphibia,Limnodynastidae,Adelotus brevis,tusked frog,"(Günther, 1863)",V,Y,N,Vulnerable,Intranational,
1,687,Animalia,Amphibia,Limnodynastidae,Philoria kundagungan,red-and-yellow mountainfrog,"(Ingram & Corben, 1975)",E,Y,Y,Endangered,Intranational,Endangered
2,686,Animalia,Amphibia,Myobatrachidae,Crinia tinnula,wallum froglet,"Straughan & Main, 1966",V,Y,N,Vulnerable,Intranational,
3,675,Animalia,Amphibia,Myobatrachidae,Mixophyes fleayi,Fleay's barred frog,"Corben & Ingram, 1987",E,Y,Y,Endangered,Intranational,Endangered
4,676,Animalia,Amphibia,Myobatrachidae,Mixophyes iteratus,giant barred frog,"Straughan, 1968",V,Y,Y,Vulnerable,Intranational,Vulnerable
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2239,6482,Plantae,Equisetopsida,Zamiaceae,Macrozamia viridis,,D.L.Jones & P.I.Forst.,E,Y,Y,Endangered,Intranational,
2240,8948,Plantae,Equisetopsida,Zingiberaceae,Alpinia hylandii,,R.M.Sm.,NT,Y,N,Near Threatened,Queensland Endemic,
2241,8949,Plantae,Equisetopsida,Zingiberaceae,Amomum queenslandicum,,R.M.Sm.,V,Y,N,Vulnerable,Queensland Endemic,
2242,12434,Plantae,Equisetopsida,Zingiberaceae,Globba marantina,,L.,V,Y,N,Vulnerable,Regional Endemic,


In [6]:
conservationlist.groupby(["kingdom","class"]).size().sort_values(ascending=False)

kingdom   class         
Plantae   Equisetopsida     1868
Animalia  Aves               158
          Reptilia            73
          Mammalia            68
          Amphibia            41
          Actinopterygii      11
          Malacostraca        11
          Insecta              8
          Gastropoda           3
          Chondrichthyes       2
Plantae   Charophyceae         1
dtype: int64

In [7]:
len(conservationlist.index)

2244

Write dataframe to CSV - UTF-8 encoding

In [8]:
conservationlist.to_csv(processedDataDir + "conservation-lists/QLD-conservation.csv",encoding="UTF-8",index=False)

## Sensitive - Qld Confidential list
* Read in the Confidential list
* Expand the nca status, endemicity and epbc status codes
* Rename fields to DwC terms
* Replace kingdom and class values with scientific terms


In [9]:
confidentiallist = pd.read_csv(sourceDataDir + "qld-confidential-species.csv")
# nca status
confidentiallist = pd.merge(confidentiallist,ncastatuscodes,left_on=['NCA status'],right_on=['Code'],how="left")
confidentiallist.drop(['Code'],axis=1,inplace=True)
confidentiallist = confidentiallist.rename(columns={'NCA status':'sourceStatus','Code_description':'status'})
# endemicity
confidentiallist = pd.merge(confidentiallist,endemicitycodes,left_on=['Endemicity'],right_on=['Code'],how="left")
confidentiallist.drop(['Code','Endemicity'],axis=1,inplace=True)
confidentiallist = confidentiallist.rename(columns={'Code_description':'Endemicity'})
# epbc
confidentiallist = pd.merge(confidentiallist,epbccodes,left_on=['EPBC status'],right_on=['Code'],how="left")
confidentiallist.drop(['Code','EPBC status','Unnamed: 0'],axis=1,inplace=True)
confidentiallist = confidentiallist.rename(columns={'Code_description':'EPBC Status'})

# rename fields
confidentiallist = confidentiallist.rename(columns=
{
    'Taxon Id':'taxonID',
    'Kingdom':'kingdom',
    'Class':'class',
    'Family':'family',
    'Scientific name':'scientificName',
    'Common name': 'vernacularName',
    'Taxon author':'scientificNameAuthorship'
})

#confidentiallist.groupby(["kingdom","class"]).size()

confidentiallist.loc[confidentiallist["kingdom"] == "animals", "kingdom"] = "Animalia"
confidentiallist.loc[confidentiallist["kingdom"] == "plants", "kingdom"] = "Plantae"
confidentiallist.loc[confidentiallist["class"] == "land plants", "class"] = "Equisetopsida"
confidentiallist.loc[confidentiallist["class"] == "amphibians", "class"] = "Amphibia"
confidentiallist.loc[confidentiallist["class"] == "birds", "class"] = "Aves"
confidentiallist.loc[confidentiallist["class"] == "cartilaginous fishes", "class"] = "Chondrichthyes"
confidentiallist.loc[confidentiallist["class"] == "insects", "class"] = "Insecta"
confidentiallist.loc[confidentiallist["class"] == "malacostracans", "class"] = "Malacostraca"
confidentiallist.loc[confidentiallist["class"] == "mammals", "class"] = "Mammalia"
confidentiallist.loc[confidentiallist["class"] == "ray-finned fishes", "class"] = "Actinopterygii"
confidentiallist.loc[confidentiallist["class"] == "reptiles", "class"] = "Reptilia"
confidentiallist.loc[confidentiallist["class"] == "snails", "class"] = "Gastropoda"
confidentiallist.loc[confidentiallist["class"] == "arachnids", "class"] = "Arachnida"
confidentiallist

Unnamed: 0,taxonID,kingdom,class,family,scientificName,vernacularName,scientificNameAuthorship,sourceStatus,Significant,status,Endemicity,EPBC Status
0,969,Animalia,Mammalia,Rhinolophidae,Rhinolophus philippinensis,greater large-eared horseshoe bat,"Waterhouse, 1843",E,Y,Endangered,Regional Endemic,Vulnerable
1,1376,Animalia,Aves,Estrildidae,Chloebia gouldiae,Gouldian finch,"(Gould, 1844)",E,Y,Endangered,Intranational,Endangered
2,1378,Animalia,Aves,Estrildidae,Erythrura trichroa,blue-faced parrot-finch,"(Kittlitz, 1835)",NT,Y,Near Threatened,Not Endemic to Australia,
3,1370,Animalia,Aves,Estrildidae,Neochmia phaeton evangelinae,crimson finch (white-bellied subspecies),"(Hombron & Jacquinot, 1841)",E,Y,Endangered,Regional Endemic,Endangered
4,1365,Animalia,Aves,Estrildidae,Poephila cincta cincta,black-throated finch (white-rumped subspecies),"Gould, 1837",E,Y,Endangered,Intranational,Endangered
...,...,...,...,...,...,...,...,...,...,...,...,...
950,11699,Plantae,Equisetopsida,Thelypteridaceae,Pneumatopteris costata,,(Brack.) Holttum,NT,Y,Near Threatened,Regional Endemic,
951,11700,Plantae,Equisetopsida,Thelypteridaceae,Pneumatopteris pennigera,lime fern,(G.Forst.) Holttum,E,Y,Endangered,Not Endemic to Australia,
952,16042,Plantae,Equisetopsida,Thelypteridaceae,Thelypteris confluens,,(Thunb.) C.V.Morton,V,Y,Vulnerable,Not Endemic to Australia,
953,8185,Plantae,Equisetopsida,Proteaceae,Macadamia jansenii,,C.L.Gross & P.H.Weston,CR,Y,Critically Endangered,Queensland Endemic,Endangered


In [10]:
confidentiallist.groupby(["kingdom","class"]).size().sort_values(ascending=False)

kingdom   class         
Plantae   Equisetopsida     851
Animalia  Reptilia           30
          Aves               24
          Amphibia           22
          Malacostraca       10
          Actinopterygii      7
          Insecta             7
          Arachnida           3
          Mammalia            1
dtype: int64

In [11]:
len(confidentiallist.index)

955

## Write to CSV

In [12]:
confidentiallist.to_csv(processedDataDir + "sensitive-lists/QLD-sensitive.csv",encoding="UTF-8",index=False)

# Manual List check

**Instructions**
1. Load the lists above into the lists-test tool
2. Check the list name matching score and the text appearance on species pages
3. Unskip the below code and Run the reports below to compare to production. Send the changelog.csv to check. Correct any issues.
4. Save the production list into the `historical lists` directory by uncommenting the code section below.
5. Load the lists into production

### Define functions

In [22]:
def download_ala_list(url: str):
    with urllib.request.urlopen(url, context=ssl.create_default_context(cafile=certifi.where())) as url:
        data = json.loads(url.read().decode())
        data = pd.json_normalize(data)
        return data

def kvp_to_columns(df):
    d0 = pd.DataFrame()
    for i in df.index:
        kvpdf = pd.json_normalize(df.kvpValues[i])
        kvpdf = kvpdf.transpose()
        kvpdf.columns = kvpdf.loc['key'] #rename columns to the keys
        kvpdf.drop(['key'], inplace=True) #drop the keys row
        kvpdf['id'] = df.id[i]
        kvpdf = pd.merge(df,kvpdf,"inner",on="id")
        d0 = pd.concat([d0,kvpdf])
    return d0

def get_changelist(newListUrl: str, oldListUrl: str):
    oldList = download_ala_list(oldListUrl)
    oldList = kvp_to_columns(oldList)
    newList = download_ala_list(newListUrl)
    newList = kvp_to_columns(newList)
    # new names
    newVsOld = pd.merge(newList, oldList, how='left', on='name', suffixes=('_new','_old'))
    newVsOld = newVsOld[newVsOld['scientificName_old'].isna()][['name', 'commonName_new', 'scientificName_new','status_new']]
    newVsOld['listUpdate'] = 'added'
    # removed names
    oldVsNew = pd.merge(oldList, newList, how='left', on='name', suffixes=('_old','_new'))
    oldVsNew = oldVsNew[oldVsNew['scientificName_new'].isna()][['name', 'commonName_old', 'scientificName_old','status_old']]
    oldVsNew['listUpdate'] = 'removed'
    # status changes
    statusChanges = pd.merge(newList, oldList, how='left', on='name', suffixes=('_new','_old'))
    statusChanges = statusChanges[statusChanges['status_new'] != statusChanges['status_old']][['name','commonName_new','scientificName_new','status_new','status_old']]
    statusChanges['listUpdate'] = 'status change'
    # union and display in alphabetical order and save locally
    changeList = pd.concat([newVsOld, oldVsNew])
    #changeList = changeList[['listUpdate','name','scientificName_x','commonName_x','status_x','status_y']].sort_values('name')
    return changeList

### Conservation List - Download old and new and compare

In [23]:
%%script echo skipping # comment this line to run this code

import datetime
monthStr = datetime.datetime.now().strftime('%Y%m')

# conservation
filename = "QLD-conservation.csv"
prodListUrl = "https://lists.ala.org.au/ws/speciesListItems/" + "dr652" + "?max=10000&includeKVP=true"
testListUrl = "https://lists-test.ala.org.au/ws/speciesListItems/" + "dr652" + "?max=10000&includeKVP=true"
changelist = get_changelist(testListUrl, prodListUrl)
# save the lists locally
changelist.to_csv(projectDir + "analysis/change-log/" + monthStr + "-" + filename, encoding="UTF-8", index=False)
prodList = download_ala_list(prodListUrl) # save the prod list to the historical lists directory
prodList = kvp_to_columns(prodList)
prodList.to_csv(projectDir + "historical-lists/conservation/" + filename, encoding="UTF-8", index=False)
changelist

Unnamed: 0,name,commonName_new,scientificName_new,status_new,listUpdate,commonName_old,scientificName_old,status_old
147,Lophochroa leadbeateri leadbeateri,,Lophochroa leadbeateri leadbeateri,Endangered,added,,,
153,Neophema chrysostoma,Blue-winged Parrot,Neophema (Neonanodes) chrysostoma,Vulnerable,added,,,
164,Climacteris picumnus victoriae,Brown Treecreeper (eastern Subspecies),Climacteris (Climacteris) picumnus victoriae,Vulnerable,added,,,
173,Aphelocephala leucopsis,Western Whiteface,Aphelocephala leucopsis,Vulnerable,added,,,
186,Melanodryas cucullata cucullata,Hooded Robin (south-eastern Form),Melanodryas (Melanodryas) cucullata cucullata,Endangered,added,,,
195,Stagonopleura guttata,Diamond Firetail,Stagonopleura (Stagonopleura) guttata,Vulnerable,added,,,
212,Euastacus dalagarbe,Freshwater Crayfish/yabby,Euastacus dalagarbe,Critically Endangered,added,,,
240,Hemibelideus lemuroides,Lemuroid Ringtail Possum,Hemibelideus lemuroides,Critically Endangered,added,,,
288,Melanotaenia sp. nov. 'Malanda',,Melanotaenia,Critically Endangered,added,,,
289,Melanotaenia sp. nov. 'Running River',,Melanotaenia,Critically Endangered,added,,,


### Sensitive List - Download old and new and compare

In [26]:
%%script echo skipping # comment this line to run this code
filename = "QLD-sensitive.csv"
prodListUrl = "https://lists.ala.org.au/ws/speciesListItems/" + "dr493" + "?max=10000&includeKVP=true"
testListUrl = "https://lists-test.ala.org.au/ws/speciesListItems/" + "dr18404" + "?max=10000&includeKVP=true"
changelist = get_changelist(testListUrl, prodListUrl )
# save the lists locally
changelist.to_csv(projectDir + "analysis/change-log/" + monthStr + "-" + filename, encoding="UTF-8", index=False)
prodList = download_ala_list(prodListUrl) # save the prod list to the historical lists directory
prodList = kvp_to_columns(prodList)
prodList.to_csv(projectDir + "historical-lists/sensitive/" + filename, encoding="UTF-8", index=False)
changelist

Unnamed: 0,name,commonName_new,scientificName_new,status_new,listUpdate,commonName_old,scientificName_old,status_old
28,Calorodius thorntonensis,,SCINCIDAE,Least concern,added,,,
91,Cherax robustus,Freshwater Crayfish Or Yabby,Cherax robustus,Vulnerable,added,,,
93,Euastacus binzayedi,,Euastacus binzayedi,Critically Endangered,added,,,
94,Euastacus eungella,Freshwater Crayfish Or Yabby,Euastacus eungella,Endangered,added,,,
95,Euastacus hystricosus,Freshwater Crayfish Or Yabby,Euastacus hystricosus,Endangered,added,,,
99,Euastacus robertsi,Freshwater Crayfish Or Yabby,Euastacus robertsi,Endangered,added,,,
100,Tenuibranchiurus glypticus,Swamp Crayfish,Tenuibranchiurus glypticus,Endangered,added,,,
276,Cooktownia,,Cooktownia,Least concern,added,,,
299,Corybas,Helmet Orchids,Corybas,,added,,,
322,Cymbidium,Boat-lipped Orchids,Cymbidium,,added,,,
