# Upload to ORNL DAAC

This notebook demonstrates transferring data from MAAP's Algorithm Development Environment (ADE) to ORNL DAAC. ORNL DAAC's submission process is detailed here: https://daac.ornl.gov/submit/ . This process is specific to ORNL DAAC, and before publishing your product, you first need to identify the correct DAAC to publish your data.

Currently it pushes data, which incurs egress cost, for this particular dataset that was ~$30. In the future we plan to see about having the DAAC pull data between AWS buckets to avoid egress.


## Install Rclone

On the MAAP ADE you need to have [rclone](https://rclone.org/). We chose rclone because it verifies file integrity on upload, can resume uploads, and supports both S3 and FTPS.

```
# Install rclone
apt install unzip
curl https://rclone.org/install.sh | bash
```

## Setup s3 as source
```
rclone config

# Settings to pick (based on the rclone config file)
[s3]
type = s3
provider = AWS
env_auth = true
region = us-west-2
location_constraint = us-west-2
```


## Setup DAAC as destination sftp
```
rclone config

# Settings to pick (based on the rclone config file)
[ornl]
type = ftp
host = daacupload.ornl.gov
# username is all lowercase, even if you signed up differently
user = <username>
explicit_tls = true
no_check_certificate = true
ask_password = true
```

You can check your rclone config (and save for later)
```
cat /projects/.config/rclone/rclone.conf
```

In [None]:
#Uncomment the next line, or copy to terminal, to run a simple test to verify permission and upload destination
#!rclone copyto -P s3:nasa-maap-data-store/file-staging/icesat2-boreal/boreal_agb_202302151676439579_1326.tif ornl:/407161fd93/

# Setup Transfer List

For this collection, we needed a subset of the complete set of files: only those in the boreal region identified by the bounds -180, 51.6, 180, 78. Ideally, we would use a STAC query to select the files necessary for transfer so external groups, like DAACs, can reliably repeat the same query. 

In [None]:
## You need pystac_client
#%pip install pystac_client

In [14]:
from pystac_client import Client
import os

In [4]:
#make a list of granules meeting criteria
# https://stac.maap-project.org/collections/icesat2-boreal/items?bbox=-180,51.6,180,78
api = Client.open('https://stac.maap-project.org/')

granule_results = api.search (
    max_items=5000,
    collections=['icesat2-boreal'],
    bbox=[-180,51.6,180,78]
)
#save list to text file

In [5]:
# create an iterator to get the items
test = granule_results.get_all_items()

In [13]:
# build a list of asset urls
assets = [item.assets.get('cog_default').href.replace("s3://","") for item in granule_results.get_all_items()]

In [7]:
# check the number of assets selected
len(assets)

3556

In [15]:
# convert the asset list to just the basename as save as a text file for rclone to use
# Filter to only in the list
#https://rclone.org/filtering/#files-from-read-list-of-source-file-names
txt_file = 'icesat2_boreal_granules.txt'
with open(txt_file, 'w') as filehandle:
    filehandle.writelines([f"{os.path.basename(granule)}\n" for granule in assets])

# Do the Rclone transfer
Run this in a terminal (not sure password prompt will work inside a notebook)
```
rclone copy --dry-run --no-update-modtime -P --files-from icesat2_boreal_granules.txt s3:nasa-maap-data-store/file-staging/icesat2-boreal ornl:/407161fd93/
```

An updated list of tiles was eventually used because the BBOX drive STAC query missed 335 tiles that occur outside of the box but are still part of the data submission. 
```
rclone copy --dry-run --no-update-modtime -P --files-from /projects/shared-buckets/nathanmthomas/boreal_agb_tiles_DAAC.txt s3:nasa-maap-data-store/file-staging/icesat2-boreal ornl:/407161fd93/
```

Example output
```
2023-03-17 16:31:52 ERROR : ftp://daacupload.ornl.gov:21/407161fd93: SetModTime is not supported
Transferred:       27.839 GiB / 27.839 GiB, 100%, 39.908 MiB/s, ETA 0s
Checks:              3556 / 3556, 100%
Transferred:          335 / 335, 100%
Elapsed time:     11m40.2s
```
You can ignore the SetModTime error messages.