# Stimuli and session template setup notebook for OCP on Physion
This notebook provides a minimal example on how to set up the stimuli and upload the sequence of trials to the database.

In [None]:
from upload_to_s3 import upload_stim_to_s3, get_filepaths
from experiment_config import experiment_setup

Set the names for the experiment and the iteration.

In [None]:
PROJECT = "Physion_V1_5" 
DATASET = "Dominoes"
TASK = "OCP"
ITERATION = "pilot_1"
EXPERIMENT = DATASET + "_" + TASK

In [None]:
PORT = 8882 # which port did we use when launching `app.js` (ie. the `--gameport` flag)
print("The experiment should be reachable under the following URL:")
print("https://cogtoolslab.org:{}/{}/index.html?projName={}&expName={}&iterName={}".format(PORT,TASK,PROJECT,EXPERIMENT,ITERATION))
print("The internal name is {}_{}_{}".format(PROJECT,EXPERIMENT,ITERATION))

## Provide metadata and locations of the stimuli files
for a simple data directory with all to-be-uploaded files in one directory,  data_path is in the form /path/to/your/data
    
For a multi-level directory structure, you will need to use glob ** notation in data_path to index all the relevant files. something like:
- `/path/to/your/files/**/*` (this finds all the files in your directory structure)
- `/path/to/your/files/**/another_dir/*` (this finds all the files contained in all sub-directories named `another_dir`)
- `/path/to/your/files/**/another_dir/*png` (this finds all the pngs contained in all sub-directories named `another_dir`)

`bucket`: string, name of bucket to write to. Also specifies the name of the experiment in the input database.\
`pth_to_s3_credentials`: string, path to AWS credentials file\
`data_root`: string, root path for data to upload\
`data_path`: string, path in data_root to be included in upload\
`multilevel`: True for multilevel directory structures, False if all data is stored in one directory
`fam_trial_ids`: list of strings, stim_id for familiarization stimuli\
`batch_set_size`: int, # of stimuli to be included in each batch. should be a multiple of overall stimulus set size

The example data used in this example is taken from [Physion](https://github.com/cogtoolslab/physics-benchmarking-neurips2021). Download [Physion_Dominoes](https://physics-benchmarking-neurips2021-dataset.s3.amazonaws.com/Physion_Dominoes.zip) (25 MB), extract it and copy the folder into the `stimuli/` subfolder of the repository.

In [None]:
bucket = (PROJECT + "_" + DATASET + "_" + ITERATION).replace("_","-").lower() # bucket name on AWS S3 where stimuli where be stored. `_` is not allowed in bucket names
pth_to_s3_credentials = None # local path to your aws credentials in JSON format. Pass None to use shared credentials file
data_root = "/Users/felixbinder/Desktop/dominoes_all_movies 2/"
data_path = ['**/*_img.mp4'] # this finds all subdirectories in data_root and loads all files in each subdirectory to s3
hdf5_path = ['**/*.hdf5'] # matching for the file containing metadata information
multilevel=True # Dominoes/ contains 2 subdirectories, so the structure is multi-level
stim_paths = ['*_map.png', '*_img.mp4', '*.hdf5'] # list of paths to stimuli to upload to s3—include a pattern to match only for relevant files
batch_set_size = 150
n_entries = 250 # how many different random orders do we want?

Which stimuli IDs do we want to use for familiarization? Usually **2** stimuli are used.

In [None]:
fam_stim_ids = ['pilot_dominoes_1mid_J025R45_boxroom_0000',
'pilot_dominoes_1mid_J025R45_boxroom_0006']

For reproducibility, fix the random seed

In [None]:
import numpy as np
np.random.seed(42)

## Upload stimuli to S3
We need to store the stimuli files in S3. This assumes that a bucket has already been created and the appropriate permissions have been set (the files need to be publicly available, as they are embedded by the web experiment.) 

Make sure that you have the appropriate credentials to upload to S3. 

Running this section will upload your stimuli files to the specified S3 bucket.

Consider logging into the AWS console to make sure that the right files have been uploaded.

In [None]:
# which files would we upload?
files = get_filepaths(data_root, data_path, multilevel)
print("Got {} paths to files".format(len(files)))
try: print(files[0:5],"\n","...","\n",files[-5:])
except: print("Not enough file paths to display")

In [None]:
hdf5_files = get_filepaths(data_root, hdf5_path, multilevel)
print("Got {} paths to files".format(len(hdf5_files)))
try: print(hdf5_files[0:2],"\n","...","\n",hdf5_files[-2:])
except: print("Not enough file paths to display")

In [None]:
assert len(files) == len(hdf5_files), "Number of files and hdf5 files do not match"

In [None]:
# do we have an mp4 and a png for each hdf5 file?
import glob
for f in hdf5_files:
    for s in stim_paths:
        if not glob.glob(f.replace(".hdf5",s)):
            print("No match for {} in {}".format(s,f))

In [None]:
# upload dataset to aws s3
upload_stim_to_s3(bucket, 
                  pth_to_s3_credentials, 
                  data_root, 
                  stim_paths, 
                  multilevel,
                  overwrite=False)
   

## Create and upload session templates to the `input` database
This section will create a number of session templates, and upload them to the `input` database. 
For purposes of documentation (or the use of app.js with `--local_store`) the file is also saved to disk.

A session template is an ordered list of stimuli that will be shown to the participant. 

Make sure that you have appropriate credentials for the `input` database (see the documentation on the CAB config file). If you are not running this one the same machine as the database, you might need to create an ssh tunnel to the database server. (eg. run `ssh -fNL 27017:127.0.0.1:27017 USERNAME@cogtoolslab.org` in your terminal.)

In [None]:
# batch dataset and upload to mongodb
experiment_setup(project = PROJECT,
                 experiment = EXPERIMENT,
                 iteration = ITERATION,
                 bucket = bucket,
                 s3_stim_paths = stim_paths,
                 hdf5_paths = hdf5_files,
                 fam_trial_ids = fam_stim_ids,
                 batch_set_size = batch_set_size,
                 n_entries = n_entries,
                 overwrite = True,
                 exclude_fam_stem = False)