# Fireveg DB

This notebook was created by José R. Ferrer-Paris and updated on 2 Aug 2024.

## Cloud storage with a AWS S3 bucket

We are using aws s3 buckets for cloud storage of input and output files. Here we set up a connection with the s3 client, query the list of objects in the bucket and download data into our data folder.

See examples here:
- http://datasciencedirectory.com/how-to-connect-to-aws-s3-buckets-with-python/
- https://dev.to/aws-builders/how-to-list-contents-of-s3-bucket-using-boto3-python-47mm
- https://towardsdatascience.com/how-to-upload-and-download-files-from-aws-s3-using-python-2022-4c9b787b15f2
- https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-download-file.html

## Set up

First we import some external libraries

In [12]:
from pathlib import Path
import os
import boto3
import pyprojroot

Now we import function from our local `lib` folder:

In [13]:
from lib.parseparams import read_s3params

We find the root of the repository folder:

In [14]:
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))

We parse connection parameters from a text file:

In [15]:
filename = repodir / 'secrets' / 's3info'
s3params = read_s3params(filename)

Start a connection to S3 using the connection parameters: 

In [16]:
s3=boto3.client('s3', s3params['region'],
                        aws_access_key_id=s3params['key'],
                  aws_secret_access_key=s3params['secret'])

## Step 1: download from bucket

If we already have content in the bucket, this will show a list of all documents in the selected bucket.
Our bucket is called `fireveg-db`:

In [19]:
objects = s3.list_objects_v2(Bucket = 'fireveg-db')

for obj in objects['Contents']:
    print(obj['Key'])

input-field-form/Fire response quadrat survey Newnes Nov2020_DK_revised IDs+AllNovData.xlsm
input-field-form/PlantFireTraitData_2011-2018_Import.xlsx
input-field-form/PlantFireTraitData_2011-2018_Import_AdditionalSiteInfo.xlsx
input-field-form/RobertsonRF_data_bionet2.xlsx
input-field-form/SthnNSWRF_data_bionet2.xlsx
input-field-form/UNSWFireVegResponse_UplandBasalt_AlexThomsen+DK.xlsx
input-field-form/UNSW_VegFireResponse_AlpineBogs_reformat_Sep2021.xlsx
input-field-form/UNSW_VegFireResponse_DataEntry_Yatteyattah all +DK +Milton.xlsx
input-field-form/UNSW_VegFireResponse_DataEntry_Yatteyattah all +DK +Milton_revisedfields_Mar2022.xlsx
input-field-form/UNSW_VegFireResponse_KNP AlpAsh.xlsx
input-field-form/UNSW_VegFireResponse_KNP AlpAsh_firehistupdate.xlsx
input-field-form/UNSW_VegFireResponse_RMK_reformat_Sep2021a.xlsx
output-report/fireveg-field-report-model.xlsx
output-report/fireveg-trait-records-curation.xlsx
output-report/fireveg-trait-records-model.xlsx
output-report/fireveg-tra

But we only want to download the input field forms into the data folder

<div class="alert alert-warning">
In terra I need to use `str(filename)` but in Auyantepui we just used `filename` as last argument of s3.download_file
</div>

In [20]:
for obj in objects['Contents']:
    okey = obj['Key'].split("/")
    if okey[0] == 'input-field-form':
        filename = repodir / "data" / okey[0] / okey[1]
        if (os.path.isfile(filename)):
            print("file ", okey[1],"already present")
        else:
            print("download file ", okey[1])
            s3.download_file('fireveg-db', obj['Key'], str(filename ))

file  Fire response quadrat survey Newnes Nov2020_DK_revised IDs+AllNovData.xlsm already present
download file  PlantFireTraitData_2011-2018_Import.xlsx
download file  PlantFireTraitData_2011-2018_Import_AdditionalSiteInfo.xlsx
file  RobertsonRF_data_bionet2.xlsx already present
file  SthnNSWRF_data_bionet2.xlsx already present
file  UNSWFireVegResponse_UplandBasalt_AlexThomsen+DK.xlsx already present
file  UNSW_VegFireResponse_AlpineBogs_reformat_Sep2021.xlsx already present
file  UNSW_VegFireResponse_DataEntry_Yatteyattah all +DK +Milton.xlsx already present
file  UNSW_VegFireResponse_DataEntry_Yatteyattah all +DK +Milton_revisedfields_Mar2022.xlsx already present
file  UNSW_VegFireResponse_KNP AlpAsh.xlsx already present
file  UNSW_VegFireResponse_KNP AlpAsh_firehistupdate.xlsx already present
file  UNSW_VegFireResponse_RMK_reformat_Sep2021a.xlsx already present


## Step 2: upload to bucket

These are the steps to upload from our local folder to the s3 bucket.

In [17]:
inputdir = repodir / "data" / "field-form"

for fp in inputdir.glob('*.xls[mx]'):
    targetname=os.path.basename(fp)
    if not targetname.startswith('~'):
        print(targetname)
        with open(fp, "rb") as f:
            s3.upload_fileobj(f, 'fireveg-db', 'input-field-form/' + targetname)




PlantFireTraitData_2011-2018_Import_AdditionalSiteInfo.xlsx
PlantFireTraitData_2011-2018_Import.xlsx


In [9]:
inputdir = repodir / "data" 
os.listdir(inputdir)

['output-report',
 '.DS_Store',
 'input-field-form',
 'field-form',
 '.ipynb_checkpoints']

Example upload of reports to a folder in a bucket:

In [None]:
inputdir = repodir / "data" / "output-report"
#targetfile = 'fireveg-trait-report-model.xlsx'
targetfile = 'fireveg-trait-records-model.xlsx'
#targetfile = 'fireveg-field-report-model.xlsx'
targetfile = 'fireveg-trait-records-curation.xlsx'

In [None]:
with open(inputdir / targetfile, "rb") as f:
  s3.upload_fileobj(f, 'fireveg-db', 'output-report/' + targetfile)

Upload all original field forms to a input folder