# Using EASI scratch and project buckets <img align="right" src="../resources/csiro_easi_logo.png">

EASI has a **Scratch** bucket available for all users.
- **Scratch** means temporary: all files will be deleted after 30 days.
- Use the scratch bucket to save files between processing runs or share files between projects, temporarily.

**Project** buckets are available to selected users as well. A project bucket can exist in another AWS account and be cross-linked to EASI. An EASI admin will assign users to a "project", which will enable their access to the bucket. Files in a project bucket are subject to the bucket owner's life cycle rules, administration and costs.

> Cross-account **project** buckets may benefit from additional ACL settings. See [User Guide/08-cross-account-storage-usage](https://docs.csiro.easi-eo.solutions/user-guide/users-guide/08-cross-account-storage-usage/) (in your deployment).

Glossary:
- S3 storage items are called **objects**. Typically these are files but they could be any blob of data.
- An **object**'s name is its **key**. The **key** can be [just about any string](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html). Typically we include a `/` in the key to make it look like a directory path, which we're familiar with from regular file systems.

There are two AWS APIs that can be used to read/write to a **scratch** or **project** bucket. Examples for both are given in this notebook.
- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-services-s3-commands.html) - linux program (use in terminal)
- [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) - python library (use in code)

We show *writing* first so that you add a test file for the *reading* section.

- [Writing](#Writing)
   - [User ID](#User-ID)
   - [Select a test file](#Select-a-test-file)
   - [Upload a file](#Upload-a-file)
- [Reading](#Reading)
   - [List objects](#List-objects)
   - [Read a file directly](#Read-a-file-directly)
   - [Copy a file to local](#Copy-a-file-to-local)

## Imports and setup

In [None]:
import sys, os
import boto3
from datetime import datetime as dt

# EASI tools
repo = f'{os.environ["HOME"]}/easi-notebooks'  # No easy way to get repo directory
if repo not in sys.path: sys.path.append(repo)
from easi_tools import EasiNotebooks

In [None]:
client = boto3.client('s3')

this = EasiNotebooks('csiro')
bucket = this.scratch

In [None]:
# Optional, for parallel uploads and downloads of large files
# Add a (..., Config=config) parameter to the relevant upload and download functions

# from boto3.s3.transfer import TransferConfig
# config = TransferConfig(
#     multipart_threshold = 1024 * 25,
#     max_concurrency = 10,
#     multipart_chunksize = 1024 * 25,
#     use_threads = True
# )

## Writing

### User ID

To write to the **scratch** bucket the root of the key must be your AWS **User ID**.

For a **project** bucket this restriction probably doesn't apply. Any root key conditions are managed by the bucket owner.

In [None]:
%%bash

userid=`aws sts get-caller-identity --query 'UserId' | sed 's/["]//g'`
echo $userid

In [None]:
userid = boto3.client('sts').get_caller_identity()['UserId']
print(userid)

### Select a test file

For use in this notebook.

In [None]:
testfile = '/home/jovyan/test-file.txt'

In [None]:
%%bash -s "$testfile"
 
testfile=$1
touch $testfile
ls -l $testfile

### Upload a file

In [None]:
%%bash -s "$bucket" "$userid" "$testfile"

bucket=$1
userid=$2
testfile=$3

aws s3 cp ${testfile} s3://${bucket}/${userid}/

In [None]:
target = testfile.split('/')[-1]
try:
    print(f'upload: {testfile} to s3://{bucket}/{userid}/{target}')
    r = client.upload_file(testfile, bucket, f'{userid}/{target}')
    print('Success.')
except Exception as e:
    print(e)
    print('Failed.')

## Reading

### List objects

The `boto3.list_objects_v2` function will return at most 1000 keys. Two options are shown here.
1. Basic use of `list_objects_v2`
2. Paginated list objects, for potentially >1000 keys

In [None]:
%%bash -s "$bucket" "$userid"

bucket=$1
userid=$2

aws s3 ls s3://${bucket}/${userid}/

In [None]:
# Basic use of list_objects_v2

response = client.list_objects_v2(Bucket=bucket, Prefix=f'{userid}/')

# from pprint import pprint
# pprint(response)

# List each key with its last modified time stamp
if 'Contents' in response:
    for c in response['Contents']:
        key = c['Key']
        lastmodified = c['LastModified'].strftime('%Y-%d-%m %H:%M:%S')
        size = c['Size']
        print(f'{lastmodified}\t{size} {key}')

In [None]:
# Paginated list objects, for potentially >1000 keys

paginator = client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=bucket, Prefix=f'{userid}/')

for response in page_iterator:
    if 'Contents' in response:
        for c in response['Contents']:
            key = c['Key']
            lastmodified = c['LastModified'].strftime('%Y-%d-%m %H:%M:%S')
            psize = c['Size']
            print(f'{lastmodified}\t{size} {key}')

### Read a file directly

Many data reading packages can read a file from an *s3://bucket/key* path into memory. Examples include:
- `rasterio` and `rioxarray`
- `gdal`

For packages that can not read from an S3 path, first copy the file to your home directory or a temporary directory (e.g., dask workers). Then read the file with a normal file path.

### Copy a file to local

In [None]:
%%bash -s "$bucket" "$userid" "$testfile"

bucket=$1
userid=$2
testfile=$3

source=`basename $testfile`
aws s3 cp s3://${bucket}/${userid}/${source} ${testfile}
ls -l $testfile

In [None]:
source = testfile.split('/')[-1]
try:
    print(f'download: s3://{bucket}/{userid}/{source} to {testfile}')
    r = client.download_file(bucket, f'{userid}/{source}', testfile)
    print('Success.')
except Exception as e:
    print(e)
    print('Failed.')