# Fireveg DB imports -- Updated taxonomic list from BIONET

Author: [José R. Ferrer-Paris](https://github.com/jrfep)

Date: 19 August 2024

This Jupyter Notebook includes [Python](https://www.python.org) code to populate taxonomic list for New South Wales plant species in the Fireveg database. 

For **version 1.0** of the database, we got the BioNet data provided by Renee as an excel file, which we read in ***R*** using package `readxl` and then wrote a table into the Postgres database using package `RPostgreSQL`.

For **version 1.1** we are reading the data directly from the [BioNet API](https://www.environment.nsw.gov.au/topics/animals-and-plants/biodiversity/nsw-bionet/web-services) at <https://data.bionet.nsw.gov.au/biosvcapp/odata>, and we are using ***Python*** with modules `json`, `pandas` and `sqlalchemy` to import into the database.

**Please note:**
<div class="alert alert-warning">
    This repository contains code that is intended for internal project management and is documented for the sake of reproducibility.<br/>
    🛂 Only users contributing directly to the project have access to the credentials for data download/upload. 
</div>

## Set-up
### Import modules

In [1]:
import sys
import json
import urllib
import pandas as pd
from sqlalchemy import create_engine,text

# Pyprojroot for easier handling of working directory
import pyprojroot

### Define paths for input and output

Define project directory using the `pyprojroot` functions, and add this to the execution path.

In [2]:
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))
sys.path.append(str(repodir))

### Load own functions
Load functions from `lib` folder, we will use a function to read db credentials.

In [3]:
from lib.parseparams import read_dbparams

### Database credentials

🤫 We use a folder named "secrets" to keep the credentials for connection to different services (database credentials, API keys, etc). This checked this folder in our `.gitignore` so that its content are not tracked by git and not exposed. Future users need to copy the contents of this folder manually.

We read database credentials stored in a `database.ini` file using our own `read_dbparams` function.

In [4]:
dbparams = read_dbparams(repodir / 'secrets' / 'database.ini', 
                         section='fireveg-db-v1.1')

## Import Species name data from BioNet

### Load data from Open API 
This web service is provided with open access, the data is in json format.

In [5]:
odata_url = 'https://data.bionet.nsw.gov.au/biosvcapp/odata/SpeciesNames'
def getResponse(url):
    operUrl = urllib.request.urlopen(url)
    if(operUrl.getcode()==200):
       data = operUrl.read()
    else:
       print("Error receiving data", operUrl.getcode())
    return data
odata_query = getResponse(odata_url)
BIONET_data = json.loads(odata_query)

Where is the data? 

In [6]:
BIONET_data.keys()

dict_keys(['@odata.context', 'value'])

How many records are in the `value` component?

In [7]:
len(BIONET_data['value'])

24304

Let's inspect one record:

In [8]:
BIONET_data['value'][1]

{'dcterms_rightsHolder': 'NSW Dept of Planning, Industry and Environment',
 'dcterms_rights': 'CC-BY 4.0',
 'dcterms_language': 'en',
 'dcterms_type': 'service',
 'dcterms_modified': '2006-01-30T17:20:32+11:00',
 'dcterms_available': '1995-12-15T12:48:13+11:00',
 'speciesID': 2,
 'taxonRank': 'Species',
 'kingdomID': 138,
 'kingdom': 'Animalia',
 'classID': 35,
 'class': 'Reptilia',
 'orderID': 129,
 'order': 'Squamata',
 'familyID': 1,
 'family': 'Pygopodidae',
 'sortOrder': 697,
 'genusID': 533,
 'genus': 'Delma',
 'parentSpeciesID': 2,
 'specificEpithet': 'inornata',
 'infraspecificEpithet': None,
 'scientificNameAuthorship': 'Kluge, 1974',
 'scientificNameID': 2,
 'speciesCode_Synonym': '2160',
 'scientificName': 'Delma inornata',
 'scientificNameHTML': '<em>Delma inornata</em>',
 'vernacularName': 'Patternless Delma',
 'otherVernacularNames': 'Patternless Delma',
 'taxonID': 2,
 'currentScientificNameCode': '2160',
 'currentScientificName': 'Delma inornata',
 'currentVernacularNam

### Read plant species data into a Data Frame
Using `pandas` reading the data into a data frame is a piece of cake 🍰.

In [9]:
df=pd.DataFrame(BIONET_data['value'])

And we can now filter the data to include only plants 🌱

In [10]:
df.kingdom.unique()

array(['Animalia', 'Plantae', 'Fungi'], dtype=object)

In [11]:
BIONET_plants = df[df.kingdom == 'Plantae']

In [12]:
BIONET_plants

Unnamed: 0,dcterms_rightsHolder,dcterms_rights,dcterms_language,dcterms_type,dcterms_modified,dcterms_available,speciesID,taxonRank,kingdomID,kingdom,...,stateConservation,protectedInNSW,sensitivityClass,TSProfileID,countryConservation,highThreatWeed,widelyCultivatedNativeSpecies,CAMBA,JAMBA,ROKAMBA
2304,"NSW Dept of Planning, Industry and Environment",CC-BY 4.0,en,service,2017-10-04T17:28:08.617+11:00,1995-12-15T13:06:39+11:00,2358,Species,139,Plantae,...,Not Listed,false,Not Sensitive,,Not Listed,,,false,false,false
2305,"NSW Dept of Planning, Industry and Environment",CC-BY 4.0,en,service,2017-10-04T17:45:14.513+11:00,1995-12-15T13:06:40+11:00,2359,Species,139,Plantae,...,Not Listed,false,Not Sensitive,,Not Listed,,,false,false,false
2306,"NSW Dept of Planning, Industry and Environment",CC-BY 4.0,en,service,2017-10-04T17:35:13.517+11:00,1995-12-15T13:06:41+11:00,2360,Species,139,Plantae,...,Not Listed,false,Not Sensitive,,Not Listed,,,false,false,false
2307,"NSW Dept of Planning, Industry and Environment",CC-BY 4.0,en,service,2017-10-04T17:20:54.077+11:00,1995-12-15T13:06:41+11:00,2361,Species,139,Plantae,...,Not Listed,true,Not Sensitive,,Not Listed,,,false,false,false
2308,"NSW Dept of Planning, Industry and Environment",CC-BY 4.0,en,service,2017-10-04T17:21:14.357+11:00,1995-12-15T13:06:41+11:00,2362,Species,139,Plantae,...,Not Listed,false,Not Sensitive,,Not Listed,,,false,false,false
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24282,"NSW Dept of Planning, Industry and Environment",CC-BY 4.0,en,service,2024-05-09T11:17:23.96+10:00,2024-05-09T11:17:23.96+10:00,25600,Species,139,Plantae,...,Not Listed,false,Not Sensitive,,Not Listed,,,false,false,false
24284,"NSW Dept of Planning, Industry and Environment",CC-BY 4.0,en,service,2024-05-24T14:55:33.177+10:00,2024-05-24T14:55:33.177+10:00,25603,Species,139,Plantae,...,Not Listed,false,Not Sensitive,,Not Listed,,,false,false,false
24285,"NSW Dept of Planning, Industry and Environment",CC-BY 4.0,en,service,2024-06-17T11:17:59.3+10:00,2024-06-17T11:17:59.3+10:00,25604,Species,139,Plantae,...,Not Listed,false,Not Sensitive,,Not Listed,,,false,false,false
24302,"NSW Dept of Planning, Industry and Environment",CC-BY 4.0,en,service,2024-07-30T11:11:56.763+10:00,2024-07-30T11:11:56.763+10:00,25632,Species,139,Plantae,...,Not Listed,false,Not Sensitive,,Not Listed,,,false,false,false


### Import as a table into Database

Create a database connection using the sql alchemy approach

In [13]:
psql_engine='postgresql://{user}:{password}@{host}:{port}/{database}'.format(**dbparams)
engine = create_engine(psql_engine)


Write to the database

In [14]:
BIONET_plants.to_sql('bionet', engine, schema='species', 
                     index=False,
                     if_exists='replace')

132

In [15]:
with engine.connect() as con:
    con.execute(text('ALTER TABLE species.bionet ADD PRIMARY KEY ("speciesID");'))
    con.execute(text('CREATE INDEX scientific_idx ON species.bionet ("scientificName");'))
    con.commit()
    

And this is done!

## That is it for now!

✅ Job done! 😎👌🔥

You can:
- go [back home](../Instructions-and-workflow.ipynb),
- continue navigating the repo on [GitHub](https://github.com/ces-unsw-edu-au/fireveg-db-exports)
- continue exploring the repo on [OSF](https://osf.io/h96q2/).
- visit the database at <http://fireecologyplants.net>