# Stimuli and session template setup example notebook
This notebook provides a minimal example on how to set up the stimuli and upload the sequence of trials to the database.

In [1]:
from upload_to_s3 import upload_stim_to_s3, get_filepaths
from experiment_config import experiment_setup
import pandas as pd
import os
import glob

Set the names for the experiment and the iteration.

In [2]:
PROJECT = "sketch_rgb"
DATASET = "sketchy"
ITERATION = "test1"
EXPERIMENT = "test_split"
print(EXPERIMENT)

test_split


## Provide metadata and locations of the stimuli files
for a simple data directory with all to-be-uploaded files in one directory,  data_path is in the form /path/to/your/data
    
For a multi-level directory structure, you will need to use glob ** notation in data_path to index all the relevant files. something like:
- `/path/to/your/files/**/*` (this finds all the files in your directory structure)
- `/path/to/your/files/**/another_dir/*` (this finds all the files contained in all sub-directories named `another_dir`)
- `/path/to/your/files/**/another_dir/*png` (this finds all the pngs contained in all sub-directories named `another_dir`)

`bucket`: string, name of bucket to write to. Also specifies the name of the experiment in the input database.\
`pth_to_s3_credentials`: string, path to AWS credentials file\
`data_root`: string, root path for data to upload\
`data_path`: string, path in data_root to be included in upload\
`multilevel`: True for multilevel directory structures, False if all data is stored in one directory
`fam_trial_ids`: list of strings, stim_id for familiarization stimuli\
`batch_set_size`: int, # of stimuli to be included in each batch. should be a multiple of overall stimulus set size

The example data used in this example is taken from [Physion](https://github.com/cogtoolslab/physics-benchmarking-neurips2021). Download [Physion_Dominoes](https://physics-benchmarking-neurips2021-dataset.s3.amazonaws.com/Physion_Dominoes.zip) (25 MB), extract it and copy the folder into the `stimuli/` subfolder of the repository.

In [3]:
keypoint_meta = pd.read_csv("./sketchy_test_keypoint_meta.csv", index_col=0)

In [4]:
bucket = (PROJECT + "_" + DATASET).replace("_","-").lower() # bucket name on AWS S3 where stimuli where be stored. `_` is not allowed in bucket names
pth_to_s3_credentials = "../../.aws/credentials.json" # local path to your aws credentials in JSON format. Pass None to use shared credentials file
data_root = '/mnt/pentagon/xul076/' 

For reproducibility, fix the random seed

In [5]:
import numpy as np
np.random.seed(42)

## Upload stimuli to S3
We need to store the stimuli files in S3. This assumes that a bucket has already been created and the appropriate permissions have been set (the files need to be publicly available, as they are embedded by the web experiment.) 

Make sure that you have the appropriate credentials to upload to S3. 

Running this section will upload your stimuli files to the specified S3 bucket.

Consider logging into the AWS console to make sure that the right files have been uploaded.

In [6]:
photo_paths = list(set(list(keypoint_meta["photo_path"])))
sketch_paths = list(set(list(keypoint_meta["sketch_path"])))
photo_paths = [os.path.join(data_root, x) for x in photo_paths]
sketch_paths = [os.path.join(data_root, x) for x in sketch_paths]
filepaths = photo_paths + sketch_paths
print(len(photo_paths), len(sketch_paths), len(filepaths), filepaths[:5])

1250 6250 7500 ['/mnt/pentagon/xul076/sketchy/rendered_256x256/256x256/photo/tx_000100000000/starfish/n02317335_2966.jpg', '/mnt/pentagon/xul076/sketchy/rendered_256x256/256x256/photo/tx_000100000000/kangaroo/n01877134_9443.jpg', '/mnt/pentagon/xul076/sketchy/rendered_256x256/256x256/photo/tx_000100000000/flower/n11939491_43188.jpg', '/mnt/pentagon/xul076/sketchy/rendered_256x256/256x256/photo/tx_000100000000/bee/n02206856_8187.jpg', '/mnt/pentagon/xul076/sketchy/rendered_256x256/256x256/photo/tx_000100000000/seal/n02077923_2092.jpg']


In [7]:
# upload dataset to aws s3
upload_stim_to_s3(bucket, 
                  pth_to_s3_credentials, 
                  filepaths,
                  s3_keep_path_block=4,
                  overwrite=False)

Bucket exists. Skipping creation.


100%|███████████████████████| 7500/7500 [10:59<00:00, 11.36it/s]

Done





In [5]:
# upload dataset to aws s3
upload_stim_to_s3(bucket, 
                  pth_to_s3_credentials, 
                  glob.glob("/mnt/pentagon/xul076/sketchy/familiar/*"),
                  s3_keep_path_block=4,
                  overwrite=False)

Bucket exists. Skipping creation.


100%|████████████████████████████████| 16/16 [00:06<00:00,  2.64it/s]

Done



