## Transferring Files to and from an S3 Bucket

Retrieve a file from a remote host's filesystem before executing a workflow using Rsync via SSH.

### Prerequisites

1. Define the read (source) file path. 
2. Create a source file to transfer.

In [1]:
import covalent as ct 

from pathlib import Path

# define source & destination filepaths 
source_filepath = Path('./my_source_file').resolve()

# create an example file
source_filepath.touch()

### Procedure

Transfer a file from an S3 bucket to a local filesystem using the boto3 library. 

In the following example a zip file is downloaded from an S3 bucket before electron execution. The electron processes the files, then the processed files are uploaded back to the S3 bucket.

1. Define two Covalent `FileTransfer` objects and a Covalent `S3` strategy object:

In [2]:
import covalent as ct
import zipfile
import os

strategy = ct.fs_strategies.S3()

ft_2 = ct.fs.FileTransfer('/home/ubuntu/tmp-dir/images.zip','s3://covalent-tmp/images.zip',strategy = strategy,order=ct.fs.Order.AFTER)
ft_1 = ct.fs.FileTransfer('s3://covalent-tmp/test_vids.zip','/home/ubuntu/tmp-dir/test_vids.zip',strategy = strategy)


2. Define an electron to:
    1. Download a zip file from S3
    2. Unzip the file
    3. Perform some processing on the contents (omitted here as irrelevant to the demo)
    4. Zip the files
    5. Upload the zip file to S3:

In [3]:
@ct.electron(files = [ft_1,ft_2])
def unzip_zip(files=[]):
    path = "/home/ubuntu/tmp-dir"
    # Unzip downloaded data
    with zipfile.ZipFile(path + "/test_vids.zip", 'r') as zip_ref:
        zip_ref.extractall(path)
        
    # Perform operations on the files
    # ...
    
    # Zip files to upload    
    with zipfile.ZipFile(path + "/images.zip",  'w', zipfile.ZIP_DEFLATED) as ziph:
        for root, dirs, files in os.walk(path + '/test_vids'):
            for file in files:
                ziph.write(os.path.join(root, file), 
                           os.path.relpath(os.path.join(root, file), 
                                           os.path.join(path, '..')))

[2023-07-25 15:52:15,882] [DEBUG] s3_strategy.py: Line 57 in download: Is dir: False
[2023-07-25 15:52:15,883] [DEBUG] s3_strategy.py: Line 66 in download: S3 download bucket: covalent-tmp, from_filepath: test_vids.zip, to_filepath /home/ubuntu/tmp-dir/test_vids.zip.
[2023-07-25 15:52:15,884] [DEBUG] s3_strategy.py: Line 133 in upload: S3 upload bucket: covalent-tmp, from_filepath: /home/ubuntu/tmp-dir/images.zip, to_filepath images.zip.


3. Create and dispatch a lattice to run the electron:

In [4]:
@ct.lattice
def run_electrons():
    return unzip_zip()

dispatch_id = ct.dispatch(run_electrons)()

Notes:
- This example illustrates a typical pattern in which files are downloaded from remote storage, are processed, and the results are uploaded to the same remote storage. Other scenarios can of course be implemented with the Covalent components illustrated here (`FileTransfer`, `FileTransferStrategy`, `@electron`).
- The example puts everything in one electron. For a real-world scenario of any complexity, a better practice would be to break the task into small sub-tasks, each in its own electron.

### See Also

[Transferring Local Files During Workflows](./file_transfers_for_workflows_local.ipynb)

[Transferring Remote Files After a Workflow](./file_transfers_for_workflows_to_remote.ipynb)