# Run time external files

In this notebook we will explain how to use files from a workgroup s3 bucket in your code run.

For your code to be able to access files while running in your agent we need do 4 steps:
  1. Create a S3 bucket for the workgroup files.
  2. Upload the relevant files to the bucket
  3. Reference the files in your code.
  4. Tell the code run what files to download from the bucket.

### 1. Creating the bucket

The bucket is created as part of the onborading process into the FCP by client request, if you want to create a bucket please contact support@rhinohealth.com for more details.

### 2. Uploading files to the bucket

AWS offers many alternative ways to [upload files to an S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html), we also provided a [script](https://github.com/RhinoHealth/user-resources/blob/main/rhino-utils/upload-file-to-s3.sh) in our user_resource repository you can use.  
  You need to define your S3 credentials and then you can call the script like this:
  
  `./upload-file-to-s3.sh ./the_folder_to_upload name_of_the_bucket the_folder_in_s3_you_want_to_upload_to`

### 3. Refence the files in your code

In the run all the files will be uploaded to the container to the **/external_data** folder with the same folder structure as in the S3 bucket.
So if the file in the bucket is `/test_1_folder/model_params.txt`,
in the container run they would be avalble under `/external_data/test_1_folder/model_params.txt`.

### 4. Use the files in the specific run

In [None]:
from rhino_health.lib.endpoints.code_object.code_object_dataclass import (
    CodeObjectCreateInput,
    CodeObjectRunInput,
    CodeTypes,
)
from textwrap import dedent

# In this example we use CodeTypes.PYTHON_CODE to show how the access looks in the code, 
# but you can use all CodeTypes.
new_code_object = CodeObjectCreateInput(
    name="Example code object",
    description="A code that references a file",
    code_type=CodeTypes.PYTHON_CODE,
    version=0,
    project_uid=project.uid,
    config={
        "python_version": "3.9",
        "requirements": ["numpy == 1.22.*", "pandas ~= 1.4.2"],
        "python_code": dedent(
            """
            from pathlib import Path
            text = Path('/external_data/data_files/example_file1.txt').read_text()
            """
            ),
        "code_execution_mode": "snippet",
    },
    input_data_schema_uids=[None],
    output_data_schema_uids=[None],
)
code_object = session.code_object.create_code_object(new_code_object)
run_params = CodeObjectRunInput(
    code_object_uid=code_object.uid,
    input_dataset_uids=[[dataset.uid]],
    output_dataset_names_suffix="test",
    # this is the new variable where your reference the bucket files you want to access in the run.
    external_storage_file_paths=[
        "data_files/example_file1.txt",
        "data_files/example_file2.txt",
    ], 
    timeout_seconds=600,
    sync=True,
)
run_result = session.code_object.run_code_object(run_params)