# Fireveg DB imports -- Instructions and workflow for this repository

Author: [José R. Ferrer-Paris](https://github.com/jrfep)

Date: July 2024

This repository includes [Python](https://www.python.org) and [R](https://www.r-project.org/) code to populate and manage the Fireveg database. This Jupyter Notebook contains instructions on how we set up our instance of Jupyter Lab and on how to navigate this repository.

**Please note:**
<div class="alert alert-warning">
    This repository contains code that is intended for internal project management and is documented for the sake of reproducibility.<br/>
    🛂 Only users contributing directly to the project have access to the credentials for data download/upload. 
</div>

## Repository structure

All the script are saved as Jupyter Notebooks and include documentation and comments on each step. 


### Shared scripts in `lib` folder

Folder [lib/](/lib/) include several functions written as [python](https://www.python.org/) modules.

I am in the process of moving shared functions to a module in folder `lib` to ensure consistency in the use of the functions, more generalised and customizable functions, and also streamline the documentation of steps in the notebooks.

### Credentials

🤫 We use a folder named `secrets` to keep the credentials for connection to different services (database credentials, API keys, etc). We checked this folder in our `.gitignore` so that its content are not tracked by git and not exposed. Future users need to copy the contents of this folder manually.

Once we copied the credentials to the secrets folder, we can use this to connect to the database using psql
```sh
eval $(grep -A4  aws-lght-sl secrets/database.ini | tail +2)
psql -h $host -d $database -U $user
```

### Data preparation

To run the code in this repository we need to first prepare a local copy of the data we need. We use another untracked directory (`data`) to hold local copies of the input data.

Scripts for data preparation are documented in folder `data-preparation`:
- [Download files from S3 bucket](data-preparation/upload-files-to-S3-bucket.ipynb)
- [Download and explore Austraits record from Zenodo](data-preparation/Explore-Austraits-records.ipynb)
- [Explore BIONET data](data-preparation/Read-BIONET-data.ipynb)

### Import from existing sources

The folder `import-existing-sources` includes the following notebooks:

- [Import records from Austraits data](import-existing-sources/import-records-from-Austraits-data.ipynb)

- `Import-austraits-build`:
- `Import-NSWFRD`:

These scripts were created to import existing data. 
`NSWFRD` refers to the New South Wales Flora Fire Response Database and is a static spreadsheet. We are using the most recent available version (). 
`austraits` refers to the austraits project and is an active repository. Scripts were tested with version ... from ... We expect to adapt the code as new versions become available. 
`austraits-build` contain source data and source code to build the complete austraits database. It allows to import observation directly from original data sources. 


### `Field-forms`

Scripts to import field data from spreadsheets. Spreadsheets or Workbooks in XLSX format provided by Prof. David Keith, [FAA](https://www.science.org.au/profile/david-keith), but created by different observers. Scripts were adapted to import spreadsheets as given, with minimum editing of original files. 

 [Field-forms/](/workflow/Field-forms): code for reading field-work data from excel documents

## Jupyter Lab

I use Jupyter Lab to organise and document all the code in notebooks.  

These are the steps I followed to configure a Jupyter Lab environment.


### Create and activate python environment

#### with venv

This is my preferred method after some frustrations with conda.


Check python version with `python --version`

Update and install modules with:

#### Alternative with Conda

Of course, you can still use Conda **instead** of venv.

I followed these steps to a) create a new environment with conda, and b) install the appropriate Python modules and R packages.

### Installing python libraries
Using the `venv` environment, it is easier to install packages with `pip`:

### Connecting to postgresql

To connect to postgresql database we need to have a client in the local computer. 

For postgresql connection on mac, use: https://postgresapp.com, download and then 

Restart the terminal, then:

### Adding the R kernel
Activate the right R kernel for Jupyter lab with:

### Loading own functions
If we want to have shared functions in different workbooks, one options is to start the jupyter lab interface with pythonpath. Something like this:

Alternatively, we can update the path in one of the cells of the notebook, something like this:

In [1]:
import pyprojroot
import sys
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))
sys.path.append(str(repodir))

### version control with Jupyter

There are some problems associated with version control of jupyter notebooks when copies of the notebook are edited in different sessions concurrently.

Look at some recommendations here:
<https://nextjournal.com/schmudde/how-to-version-control-jupyter>
