# Uploading original data to Domino s3 bucket

This notebook is provided for reproducibility and historical reference.
There should be no need to run this notebook.

**This notebook requires credentials available only to Domino employees.**

## Setup

This notebook assumes you have just run `Download-Original-Data.ipynb`, and uploads that data to a Domino s3 bucket.

Environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` are required to grant upload permissions to the Domino bucket.
These are available in 1Password from Melanie Veale, or new credentials can be created for the `deep-learning-trainer` user in `domino-workshop` and used.

In [1]:
import os
import boto3

client = boto3.client(
    's3',
    aws_access_key_id = os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key = os.environ['AWS_SECRET_ACCESS_KEY']
)

In [2]:
ORIGINAL_DATA_PATH = f"/domino/datasets/local/{os.environ['DOMINO_PROJECT_NAME']}/original_data"

In [3]:
s3_bucket = 'deep-learning-serengeti-4000'

## Upload metadata file

In [4]:
metadata_file = 'reduced_metadata.csv'
client.upload_file(
    os.path.join(ORIGINAL_DATA_PATH, metadata_file),
    s3_bucket,
    metadata_file
)
print("Metadata uploaded succesfully")

Metadata uploaded succesfully


## Upload images

Much faster to do this with aws cli

In [5]:
!aws --version

aws-cli/1.18.69 Python/3.8.10 Linux/5.4.196-108.356.amzn2.x86_64 botocore/1.16.19


In [6]:
images_folder = os.path.join(ORIGINAL_DATA_PATH, 'images')

A good idea to try the below with `--dryrun` first

In [9]:
!aws s3 sync $images_folder s3://$s3_bucket/images --quiet