# EASI User Scratch bucket <img align="right" src="../resources/csiro_easi_logo.png">
 
EASI has a "scratch" bucket available for all users. "Scratch" means temporary: all files __will be deleted after 30 days__. Use the scratch bucket to save files between processing runs or share files between projects, temporarily.

If you need to share files and resources between projects beyond 30 days then consider provisioning a "Project" bucket. Contact the admins for details.

Glossary:
- S3 storage items are called "**objects**". Typically these are files but they could be any blob of data.
- An object's name is its "**key**". The key can be any just about any string. Typically we include a `/` in the key to make it look like a directory path, which we're familiar with from regular file systems.
- `boto3` is the underlying library to interact with AWS services.

Best practice:
1. Prepend a name to your object(s) to aid organisation. That is, you will be able to find and reference your files.

### Imports and setup

In [None]:
import sys, os
sys.path.append(os.path.expanduser('../scripts'))
os.environ['USE_PYGEOS'] = '0'
from easi_tools import EasiDefaults
easi = EasiDefaults()

In [None]:
import boto3
from datetime import datetime as dt

s3 = boto3.client('s3')
scratch_bucket = easi.scratch_bucket
print(f'Your temporary scratch bucket is s3://{scratch_bucket}')

# Optional for this notebook: create a test file
!touch '/home/jovyan/test-file.txt'

# Optional, for parallel uploads and downloads of large files
# from boto3.s3.transfer import TransferConfig
# config = TransferConfig(
#     multipart_threshold = 1024 * 25,
#     max_concurrency = 10,
#     multipart_chunksize = 1024 * 25,
#     use_threads = True
# )

### Generate a name unique to your work

Here we use your AWS UserID, which comes from your AWS credentials (auto-generated by EASI).

You could also use your ident (AWS does not know your ident) or a project name.

In [None]:
userid = boto3.client('sts').get_caller_identity()['UserId']
print(f'Your userid is {userid}')
print(f'Your temporary scratch folder is s3://{scratch_bucket}/{userid}')

## Upload a file

Two methods are shown here.
1. Upload file, using `upload_file` function
1. Upload a binary object, using `with` context

In [None]:
# 1. Upload file, using upload_file function. Includes a little more rigour in error catching etc.

import logging
from botocore.exceptions import ClientError

def upload_file(file_name, bucket, object_name=None):
    """Upload a file to an S3 bucket

    :param file_name: File to upload
    :param bucket: Bucket to upload to
    :param object_name: S3 object name. If not specified then file_name is used
    :return: True if file was uploaded, else False
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = file_name

    # boto3
    if 's3' not in locals():
        s3 = boto3.client('s3')
        
    # Upload the file
    try:
        s3.upload_file(file_name, bucket, object_name)  # Config=config
    except (ClientError, FileNotFoundError) as e:
        logging.error(e)
        return False
    return True

jhub_file = '/home/jovyan/test-file.txt'
res = upload_file(jhub_file, scratch_bucket, f'{userid}/test-file.txt')
if res:
    print(f'Successfully uploaded s3://{scratch_bucket}/{userid}/test-file.txt')
else:
    print('Failed.')

In [None]:
# 2. Upload a binary object, using `with` context

# boto3
if 's3' not in locals():
    s3 = boto3.client('s3')

jhub_file = '/home/jovyan/test-file.txt'
with open(jhub_file, 'rb') as f:
    try:
        r = s3.upload_fileobj(f, scratch_bucket, f'{userid}/test-file.txt')
        print(f'Successfully uploaded s3://{scratch_bucket}/{userid}/test-file.txt')
    except Exception as e:
        logging.error(e)
        print('Failed.')

## List objects in the scratch bucket

The `boto3.list_objects_v2` function will return at most 1000 keys.

Two options are shown here.
1. Basic use of `list_objects_v2`
2. Paginated list objects, for potentially >1000 keys

In [None]:
# Basic use of list_objects_v2

# boto3
if 's3' not in locals():
    s3 = boto3.client('s3')

response = s3.list_objects_v2(
    Bucket=scratch_bucket,
    Prefix=f'{userid}/',
)

# from pprint import pprint
# pprint(response)

if 'Contents' in response:
    for c in response['Contents']:
        key = c['Key']
        lastmodified = c['LastModified'].strftime('%Y-%d-%m %H:%M:%S')
        print(f'{lastmodified}: s3://{scratch_bucket}/{key}')

In [None]:
# Paginated list objects, for potentially >1000 keys

# boto3
if 's3' not in locals():
    s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=scratch_bucket, Prefix=f'{userid}/')

for response in page_iterator:
    if 'Contents' in response:
        for c in response['Contents']:
            key = c['Key']
            lastmodified = c['LastModified'].strftime('%Y-%d-%m %H:%M:%S')
            print(f'{lastmodified}: s3://{scratch_bucket}/{key}')

## Retrieve objects in scratch bucket

In [None]:
# Example, retrieve the first file from list_objects
from pathlib import Path

# boto3
if 's3' not in locals():
    s3 = boto3.client('s3')
    
response = s3.list_objects_v2(Bucket=scratch_bucket, Prefix=f'{userid}/')
key = response['Contents'][0]['Key']  # First file (example)

# jhub file name
jhub_file = '/home/jovyan/' + Path(key).stem + '-downloaded.txt'

try:
    s3.download_file(scratch_bucket, key, str(jhub_file))  # Config=config
    print(f'Successly downloaded {jhub_file}')
except Exception as e:
    logging.error(e)
    print('Failed.')