# Fireveg DB imports -- import field work forms

Author: [JosÃ© R. Ferrer-Paris](https://github.com/jrfep)

Date: February 2022, updated 19 August 2024

This Jupyter Notebook includes [Python](https://www.python.org) code to update species records from the field samples.

**Please note:**
<div class="alert alert-warning">
    This repository contains code that is intended for internal project management and is documented for the sake of reproducibility.<br/>
    ðŸ›‚ Only users contributing directly to the project have access to the credentials for data download/upload. 
</div>

## Set-up
### Load libraries 

In [1]:
import openpyxl
from pathlib import Path
import os,sys
from datetime import datetime
from configparser import ConfigParser
import psycopg2
from psycopg2.extensions import AsIs
import pyprojroot
import re
import pandas as pd

import pyprojroot

### Define paths for input and output

In [2]:
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))
sys.path.append(str(repodir))

### Load own functions

Load functions from `lib` folder, we will use a function to read db credentials and one for batch insert and updates:

In [3]:
from lib.parseparams import read_dbparams
from lib.firevegdb import dbquery

### Database credentials

ðŸ¤« We use a folder named "secrets" to keep the credentials for connection to different services (database credentials, API keys, etc). This checked this folder in our `.gitignore` so that its content are not tracked by git and not exposed. Future users need to copy the contents of this folder manually.

We read database credentials stored in a `database.ini` file using our own `read_dbparams` function.

In [4]:
dbparams = read_dbparams(repodir / 'secrets' / 'database.ini', 
                         section='fireveg-db-v1.1')

## Query species recorded in field work
Check the species with more records (sites, visits and samples)

In [5]:
qry="""
select species,species_code,
count(distinct visit_id) as sites,
count(distinct (visit_id,visit_date)) as visits,
count(distinct (visit_id,visit_date,sample_nr)) as samples,
count(distinct record_id) as records
from form.quadrat_samples
group by species,species_code
ORDER BY records DESC;
"""

In [6]:
res=dbquery(qry,dbparams)

In [7]:
len(res)

1140

In [8]:
res[0:10]

[['Triodia scariosa', None, 58, 82, 452, 452],
 ['Austrostipa scabra', None, 60, 79, 386, 386],
 ['Empodisma minus', 5532, 21, 21, 334, 335],
 ['Beyeria opaca', None, 51, 69, 335, 335],
 ['Sclerolaena diacantha', None, 52, 69, 306, 306],
 ['Eucalyptus socialis', None, 49, 68, 300, 300],
 ['Sclerolaena parviflora', None, 44, 61, 271, 271],
 ['Halgania cyanea', None, 40, 53, 244, 245],
 ['Chenopodium desertorum subsp. desertorum', None, 40, 52, 220, 220],
 ['Dodonaea viscosa subsp. angustissima', None, 45, 63, 217, 217]]

## Species without species code
These records need an update of the species code based on the latest version of the taxonomic information.

In [9]:
qry = """
SELECT count(distinct species) 
FROM form.quadrat_samples 
WHERE species_code is NULL; 
"""
dbquery(qry,dbparams)

[[523]]

Let's try to fix this, first check the updated taxonomic list:

In [10]:
qry="""
SELECT \"speciesID\",\"taxonID\",\"currentScientificNameCode\",\"scientificName\",
"speciesCode_Synonym"
FROM species.bionet
WHERE \"scientificName\" IN 
(SELECT species FROM form.quadrat_samples where species_code is NULL); 
"""
res = dbquery(qry,dbparams)

colnames=['speciesID','taxonID','currentScientificNameCode','scientificName','speciesCode_Synonym']
splist = pd.DataFrame(res,columns=colnames,dtype=object)

In [11]:
splist["taxonID"] = pd.Series(splist["taxonID"], dtype=int)

In [12]:
splist=splist[pd.to_numeric(splist['speciesCode_Synonym'], errors='coerce').notnull()]
splist

Unnamed: 0,speciesID,taxonID,currentScientificNameCode,scientificName,speciesCode_Synonym
0,2363,2363,3484,Cryptocarya obovata,3484
1,2368,2368,3688,Sarcopetalum harveyanum,3688
2,2399,2399,7121,Goodenia lunata,7121
3,2425,2425,6314,Lomandra spicata,6314
4,2427,2427,2698,Claoxylon australe,2698
...,...,...,...,...,...
474,19587,8800,3968,Syzygium smithii,13800
475,19594,19594,13805,Ackama paniculosa,13805
477,20250,20250,14253,Eremophila glabra subsp. murrayana,14253
478,20382,20382,14362,Citrus australis,14362


In [13]:
if splist.shape[0]>0:
    item=splist.loc[0]
    item['taxonID'],item['scientificName']

In [14]:
splist = splist.reset_index()  # make sure indexes pair with number of rows


In [15]:
updated_rows=0
if splist.shape[0]>0:
    qrystr="""UPDATE form.quadrat_samples SET species_code=%s WHERE species=%s AND species_code is NULL; """
    # connect to the PostgreSQL server
    print('Connecting to the PostgreSQL database...')
    conn = psycopg2.connect(**dbparams)
    cur = conn.cursor()
    for index, row in splist.iterrows():
        qry = cur.mogrify(qrystr, (AsIs(row['speciesCode_Synonym']),row['scientificName']))
        cur.execute(qry)
        if cur.rowcount > 0:
            updated_rows = updated_rows + cur.rowcount
    conn.commit()
    cur.close()
    if conn is not None:
        conn.close()
        print('Database connection closed.')

Connecting to the PostgreSQL database...
Database connection closed.


In [16]:
print("%s rows updated" % (updated_rows))

10021 rows updated


## That is it for now!

âœ… Job done! ðŸ˜ŽðŸ‘ŒðŸ”¥

You can:
- go [back home](../Instructions-and-workflow.ipynb),
- continue navigating the repo on [GitHub](https://github.com/ces-unsw-edu-au/fireveg-db-exports)
- continue exploring the repo on [OSF](https://osf.io/h96q2/).
- visit the database at <http://fireecologyplants.net>