# **Data Movement**
- Big data science using ARM data requires access to large volumes of observations.
- Data search and download can be an daunting and time taking task.
- ARM provides custom developed services to allow seamless access to ARM data archives.
- These services are designed for easy use in a programmatic environment.
- Our custom tools alleviates some of these by:
    - integrating search and download
    - handling authentication, security etc.
    - command line and python API's to enable fully automated workflows

# **Tools for data movement**

|ARM Live Data Web Service | "stage_arm_data"|
|--------------------|-----------------|
| Web service        |Globus based transfer|
| Works from any where in the world | Custom tool available only on ARM HPC clusters |
| Access to all user accessible data | Access to **ALL** ARM data, including raw |
| Download only | Also allow data movement within ARM clusters and ADC |
| HTTP download driven by your internet speed|Fast transfers over dedicated Infiniband network|
| Allow search and download based on simple queries | Allow search and download based on simple queries |
| Command line, Python API | Command line, Python API |
| Enable fully portable application | Applications would be limited to ARM cluster |

# **ARM Live Data Web Service**

**URL: https://adc.arm.gov/armlive**

Developed and maintained by: Ranjeet Devarakonda and ADC Web Tools Team

Provides: REST API, Command line (Wget, Curl), Bash scripting, Python API

## **Using REST API**
`https://adc.arm.gov/armlive/data/query?user=USER_ID:ACCESS_TOKEN&ds=DATASTREAM&start=START_DATE&end=END_DATE`

## **Command line download using Curl**
`$ wget 'https://adc.arm.gov/armlive/data/saveData?user=example:abcd1234&file=sgpmetE11.b1.20180101.000000.cdf'`

## **Command line download using Wget**
`$ curl 'https://adc.arm.gov/armlive/data/saveData?user=example:abcd1234&file=sgpmetE11.b1.20180101.000000.cdf'`

## **Using Python API**

Installation: `$ pip install git+https://code.ornl.gov/ofg/armlive_getfiles.git`

Execution: `$ getARMFiles -u example:abcd1234 -ds sgpmetE11.b1 -s 2018-01-01 -e 2018-02-01`

OR 

Installation: `$ git clone https://code.ornl.gov/ofg/armlive_getfiles.git`

Execution: `$ python armlive_getfiles/src/getFiles.py -u example:abcd1234 -ds sgpmetE11.b1 -s 2018-01-01 -e 2018-02-01`

## **Using bash scripting**
Bash script: `https://adc.arm.gov/armlive/scripts/getFiles.sh`

Execution: `$ bash getFiles.sh example:abcd1234 sgpmetE11.b1 2018-01-01 2018-02-01`

## **This Python function parses the JSON blob and downloads the responsive files into the output directory.**
The ARMLive web service returns a JSON blob with download links for archive files based on the datastream, start, and end dates provided. 


```python
def download_arm_files(user, token, datastream, start, end, output_directory):
    params = {
       'user': f'{user}:{token}',
       'ds': datastream,
       'start': start,
       'end': end,
       'wt': 'json',
    }

#     print(params)
    response = requests.get('https://adc.arm.gov/armlive/livedata/query', params=params)
#     print(response.url)
    response = response.json()
#     print(response)
    downloaded_files = []
    for filename in response['files']:
        download_url = f'https://adc.arm.gov/armlive/livedata/saveData?user={user}:{token}&file={filename}'
        file_path = Path(output_directory, filename)
        file_path.parent.mkdir(parents=True, exist_ok=True)
        with requests.get(download_url, stream=True) as r:
            with open(file_path, 'wb') as f:
                shutil.copyfileobj(r.raw, f)
        downloaded_files.append(file_path)
    return downloaded_files
```

# stage_arm_data

Developed and maintained by: Zach Price, Jitu Kumar and ARM HPC Team

- stage_arm_data uses Globus protocols to
    - query ARM database
    - identify the files as per search criteria
    - download from ARM archive
    - handles all security and authentications
- data is always stages in project shared area on Lustre filesystem
- Currently stage_arm_data is only avaliable from the login and compute nodes of Stratus. 
- They are not available from Jupyter notebook. The recommended workaround in the meantime is to open a web terminal in Jupyter, ssh to stratus.ornl.gov, and stage data.


# stage_arm_data

Add to your environment: `module load stage_arm_data`

Data transfer command: `stage_arm_data --to Stratus --datastream corkasacrcfrhsrhiM1.a1 --start 2019-01-01 00:00:00 --end 2019-02-01 00:00:00`

Files will be staged at: `/lustre/or-hydra/cades-arm/proj-shared/data_transfer/cor/corkasacrcfrhsrhiM1.a1/`

From within a Python script:
```python
from stage_arm_data.core import transfer_files
from stage_arm_data.endpoints import Stratus

constraints = {
    'start_time': 1552608000,
    'end_time': 1552694400,
    'datastream': 'corkasacrcfrhsrhiM1.a1'
}

transfer_files(constraints, Stratus)
```

Use `--dry-run=True` option in constraints to list available files without transferring.