# Fireveg DB - Cloud storage with a AWS S3 bucket

Author: [José R. Ferrer-Paris](https://github.com/jrfep) 

Date: 2 Aug 2024.


For this repository we are using AWS s3 buckets for cloud storage of input and output files. In this notebook we set up a connection with the s3 client, query the list of objects in the bucket and download data into our local data folder. To run this code you will need the AWS credentials for the connection to the target bucket.

***Contents***

- [Set up](#Set-up)
- [Step 1: list bucket content](#Step-1:-List-bucket-content)
- [Step 2: download files](#Step-2:-Download-from-bucket)
- [Step 3: upload files](##Step-3:-upload-to-bucket)
- [Notes](#Notes:)

## Set up

First we import the modules that we need:

In [1]:
from pathlib import Path
import os
import sys
import boto3
import pyprojroot

We find the root of the repository folder and add this to the the executable paths:

In [2]:
repodir = pyprojroot.find_root(pyprojroot.has_dir(".git"))
sys.path.append(str(repodir))

Now we import function from our local `lib` folder:

In [3]:
from lib.parseparams import read_s3params

<div class="alert alert-info">

***Important!*** We will parse connection parameters from a text file located in our `secrets` folder and with the following format:

```sh
[sectionname]
key=...
secret=...
region=...
```

Refer to the Jupyter lab instructions file in this repository's root.
</div>

We parse connection parameters from this text file:

In [4]:
filename = repodir / 'secrets' / 's3info'
s3params = read_s3params(filename)

Start a connection to S3 using the connection parameters: 

In [5]:
s3=boto3.client('s3', 
                s3params['region'],
                aws_access_key_id=s3params['key'],
                aws_secret_access_key=s3params['secret'])

## Step 1: List bucket content

If we already have content in the bucket, this will show a list of all documents in the selected bucket.
Our bucket is called `fireveg-db`:

In [6]:
objects = s3.list_objects_v2(Bucket = 'fireveg-db')

for obj in objects['Contents']:
    print(obj['Key'])

input-field-form/Fire response quadrat survey Newnes Nov2020_DK_revised IDs+AllNovData.xlsm
input-field-form/PlantFireTraitData_2011-2018_Import.xlsx
input-field-form/PlantFireTraitData_2011-2018_Import_AdditionalSiteInfo.xlsx
input-field-form/RobertsonRF_data_bionet2.xlsx
input-field-form/SthnNSWRF_data_bionet2.xlsx
input-field-form/UNSWFireVegResponse_UplandBasalt_AlexThomsen+DK.xlsx
input-field-form/UNSW_VegFireResponse_AlpineBogs_reformat_Sep2021.xlsx
input-field-form/UNSW_VegFireResponse_DataEntry_Yatteyattah all +DK +Milton.xlsx
input-field-form/UNSW_VegFireResponse_DataEntry_Yatteyattah all +DK +Milton_revisedfields_Mar2022.xlsx
input-field-form/UNSW_VegFireResponse_KNP AlpAsh.xlsx
input-field-form/UNSW_VegFireResponse_KNP AlpAsh_firehistupdate.xlsx
input-field-form/UNSW_VegFireResponse_RMK_reformat_Sep2021a.xlsx
output-report/fireveg-field-report-model.xlsx
output-report/fireveg-trait-records-curation.xlsx
output-report/fireveg-trait-records-model.xlsx
output-report/fireveg-tra

## Step 2: Download from bucket

We can use this to download the input field forms into the data folder

<div class="alert alert-warning">

***Watch out*** with some python versions I need to use `str(filename)` but originally I just used `filename` as last argument of s3.download_file.

</div>

In [7]:
for obj in objects['Contents']:
    okey = obj['Key'].split("/")
    if okey[0] == 'input-field-form':
        filename = repodir / "data" / okey[0] / okey[1]
        if (os.path.isfile(filename)):
            print("file ", okey[1],"already present")
        else:
            print("download file ", okey[1])
            s3.download_file('fireveg-db', obj['Key'], str(filename ))

file  Fire response quadrat survey Newnes Nov2020_DK_revised IDs+AllNovData.xlsm already present
file  PlantFireTraitData_2011-2018_Import.xlsx already present
file  PlantFireTraitData_2011-2018_Import_AdditionalSiteInfo.xlsx already present
file  RobertsonRF_data_bionet2.xlsx already present
file  SthnNSWRF_data_bionet2.xlsx already present
file  UNSWFireVegResponse_UplandBasalt_AlexThomsen+DK.xlsx already present
file  UNSW_VegFireResponse_AlpineBogs_reformat_Sep2021.xlsx already present
file  UNSW_VegFireResponse_DataEntry_Yatteyattah all +DK +Milton.xlsx already present
file  UNSW_VegFireResponse_DataEntry_Yatteyattah all +DK +Milton_revisedfields_Mar2022.xlsx already present
file  UNSW_VegFireResponse_KNP AlpAsh.xlsx already present
file  UNSW_VegFireResponse_KNP AlpAsh_firehistupdate.xlsx already present
file  UNSW_VegFireResponse_RMK_reformat_Sep2021a.xlsx already present


## Step 3: upload to bucket

What is the content of our local data folder?

In [9]:
inputdir = repodir / "data" 
os.listdir(inputdir)

['output-report',
 '.DS_Store',
 'input-field-form',
 'field-form',
 '.ipynb_checkpoints']

We can upload from our local folder to the s3 bucket.

In [10]:
inputdir = repodir / "data" / "field-form"

for fp in inputdir.glob('*.xls[mx]'):
    targetname=os.path.basename(fp)
    if not targetname.startswith('~'):
        print(targetname)
        with open(fp, "rb") as f:
            s3.upload_fileobj(f, 'fireveg-db', 'input-field-form/' + targetname)




PlantFireTraitData_2011-2018_Import_AdditionalSiteInfo.xlsx
PlantFireTraitData_2011-2018_Import.xlsx


----
### ***Notes:***

This code was inspired and adapted from several examples online, please check these links if you want to learn more about this approach:
- http://datasciencedirectory.com/how-to-connect-to-aws-s3-buckets-with-python/
- https://dev.to/aws-builders/how-to-list-contents-of-s3-bucket-using-boto3-python-47mm
- https://towardsdatascience.com/how-to-upload-and-download-files-from-aws-s3-using-python-2022-4c9b787b15f2
- https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-download-file.html


These are example uploads of reports to a folder in a bucket:

Upload all original field forms to a input folder