# Fireveg DB exports -- Instructions and workflow for this repository

Author: [José R. Ferrer-Paris](https://github.com/jrfep)

Date: 22 August 2024

This repository includes [Python](https://www.python.org) and [R](https://www.r-project.org/) code to export data from the the Fireveg database. This Jupyter Notebook contains instructions on how we set up our instance of Jupyter Lab and on how to navigate this repository.

**Please note:**
<div class="alert alert-warning">
    This repository contains code that is intended for internal project management and is documented for the sake of reproducibility.<br/>
    🛂 Only users contributing directly to the project have access to the credentials for data download/upload. 
</div>

## Repository structure

All the script are saved as Jupyter Notebooks and include documentation and comments on each step. Items with a green checkmark (✅) have been updated for version 1.1 (August 2024). Some items in this list are still work in progress (⌛), and others are just old or outdated, and will be deleted soon (🦕). 


### Shared scripts in `lib` folder

Folder [lib/](/lib/) include several functions written as [python](https://www.python.org/) modules. ✅

Having these functions in the `lib` folder allow us to share functions between notebooks with more consistency, and also streamline the documentation of steps in the notebooks.

### Fireveg version information

In August 2024 we updated from version 1.0 to version 1.1 of the database. I keep a `fireveg-version.env` in the root directory of the repository with the basic details of the current version. This file can be read as a programming environment variable in R or Python code.

### Credentials

🤫 We use a folder named `secrets` to keep the credentials for connection to different services (database credentials, API keys, etc). We checked this folder in our `.gitignore` so that its content are not tracked by git and not exposed. Future users need to copy the contents of this folder manually.

For R I use a `Renviron.local` file with a <key>=<value> format like this:

```sh
OSF_PAT=
DBHOST=
DBPORT=
DBNAME=
DBUSER=
```

For Python I use a `database.ini` file with following format:

```sh
[section]
host=
port=
database=
user=
```

### Data folder

🐘 The data folder is checked in our `.gitignore` so that its contents are not tracked by git. We use this folder locally, and then upload the files to a cloud drive for project-wide sharing. 


### RDS output

Scripts in folder `RDS-output`:
- [Read tables from the database](RDS-output/Read-tables-from-database.ipynb) ⌛
- [Upload files to OSF](RDS-output/Upload-files-to-OSF.ipynb) ⌛

### SQL dump output

Scripts in folder `SQL-output`:
- [Create SQL dump file](SQL-output/Create-SQL-dumpfile.ipynb) ⌛

### Report output

Scripts in folder `Report-output`:
- [Check reference list](Report-output/Check-reference-list.ipynb) ⌛
- [Create XLSX ...](Report-output/Create-xlsx-output-curation-litrev-records.ipynb) ⌛
- [Create XLSX ...](Report-output/Create-xlsx-output-field-data.ipynb) ⌛
- [Create XLSX ...](Report-output/Create-xlsx-output-litrev-records.ipynb) ⌛
- [Create XLSX ...](Report-output/Create-xlsx-output-summary-litrev.ipynb) ⌛
- [Upload files to S3 bucket](Report-output/upload-files-to-S3-bucket.ipynb) ⌛

## How to use Jupyter Lab

I use Jupyter Lab to organise and document all the code in notebooks.  

These are the steps I followed to configure a Jupyter Lab environment. This is a simple collection of note, not an exhaustive how to.


### Create and activate python environment

#### with venv

This is my preferred method after some frustrations with conda.


Check python version with `python --version`

Update and install modules with:

We recently update Jupyter lab:

#### Alternative with Conda

Of course, you can still use Conda **instead** of venv.

I followed these steps to a) create a new environment with conda, and b) install the appropriate Python modules and R packages.

### Installing python libraries
Using the `venv` environment, it is easier to install packages with `pip`:

### Connecting to postgresql

To connect to postgresql database we need to have a client in the local computer. 

For postgresql connection on mac, use: https://postgresapp.com, download and then 

Restart the terminal, then:

**Updating postgresql?**

According to [wise people](https://stackoverflow.com/questions/45025382/how-to-update-pg-dump-to-server-version-in-mac-os):

In [None]:
brew install postgresql@16 # install server version or higher
brew services stop postgresql@15 # stop postgres
brew uninstall postgresql@15 # uninstall old version
brew services start postgresql@16 # start newly installed version

### Adding the R kernel
Activate the right R kernel for Jupyter lab with:

### Loading own functions
If we want to have shared functions in different workbooks, one options is to start the jupyter lab interface with pythonpath. Something like this:

Alternatively, we can update the path in one of the cells of the notebook, something like this:

In [1]:
import pyprojroot
import sys
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))
sys.path.append(str(repodir))

### Version control with Jupyter

There are some problems associated with version control of jupyter notebooks when copies of the notebook are edited in different sessions concurrently.

Look at some recommendations here:
<https://nextjournal.com/schmudde/how-to-version-control-jupyter>


## This is the end...

... Of this short document.

You are welcome to continue exploring the [links above](#Repository-structure), or:
- continue navigating the repo on [GitHub](https://github.com/ces-unsw-edu-au/fireveg-db-exports)
- continue exploring the repo on [OSF](https://osf.io/h96q2/).
- visit the database at <http://fireecologyplants.net>