<img src='https://radiant-assets.s3-us-west-2.amazonaws.com/PrimaryRadiantMLHubLogo.png' alt='Radiant MLHub Logo' width='300'/>

How to use the Radiant MLHub API to browse and download the NASA Tropical Storm Wind Speed Competition Data
=====

This Jupyter notebook, which you may copy and adapt for any use, shows basic examples of how to use the API to download labels and source imagery for the NASA Tropical Storm Wind Speed Competition dataset. Full documentation for the API is available at [docs.mlhub.earth](http://docs.mlhub.earth).

We'll show you how to set up your authorization,retrieve the items (the data contained within them) from those collections, and load the data into a dataframe.

Each item in our collection is explained in json format compliant with STAC label extension definition.

Citation
====
M. Maskey, R. Ramachandran, I. Gurung, B. Freitag, M. Ramasubramanian, J. Miller (2020) "Tropical Cyclone Wind Estimation Competition Dataset", Version 1.0, Radiant MLHub. \[Date Accessed\] [https://doi.org/10.34911/rdnt.xs53up](https://doi.org/10.34911/rdnt.xs53up)

Authentication
-----

Access to the Radiant MLHub API requires an API key. To get your API key, go to [dashboard.mlhub.earth](https://dashboard.mlhub.earth). If you have not used Radiant MLHub before, you will need to sign up and create a new account. Otherwise, sign in. In the **API Keys** tab, you'll be able to create API key, which you will need. *Do not share* your API key with others: your usage may be limited and sharing your API key is a security risk.

Copy the API key, and paste it in the box bellow.

The Collection ID for the labels of the LandCoverNet dataset can be found on the [registry page](https://registry.mlhub.earth/10.34911/rdnt.d2ce8i). We will save that to a variable which will be used during API requests.

Click **Run** or press SHIFT + ENTER before moving on to run this first piece of code.

In [None]:
import requests

# copy your API key from dashboard.mlhub.earth and paste it in the following
API_KEY = 'PASTE_YOUR_API_KEY_HERE'
API_BASE = 'https://api.radiant.earth/mlhub/v1'

Downloading Items Setup
====
The code below sets up functions which we will use to download items.

In [None]:
import os
from urllib.parse import urlparse
import arrow
from multiprocessing.pool import ThreadPool
from tqdm import tqdm
    
def download_http(uri, path, retries=0):
    if retries >= 3:
        return
    try:
        parsed = urlparse(uri)
        r = requests.get(uri)
        file_path = os.path.join(path, parsed.path.split('/')[-1])
        if os.path.exists(file_path):
            return
        f = open(file_path, 'wb')
        for chunk in r.iter_content(chunk_size=512 * 1024): 
            if chunk:
                f.write(chunk)
        f.close()
    except:
        download_http(uri, path, retries+1)

def get_download_uri(uri):
    r = requests.get(uri, allow_redirects=False)
    return r.headers['Location']

def download(d):
    href = d[0]
    path = d[1]
    download_uri = get_download_uri(href)
    parsed = urlparse(download_uri)
    
    if parsed.scheme in ['s3']:
        download_s3(download_uri, path)
    elif parsed.scheme in ['http', 'https']:
        download_http(download_uri, path)
        
def get_source_item_assets(args):
    path = args[0]
    href = args[1]
    asset_downloads = []
    try:
        r = requests.get(href, params={'key': API_KEY})
    except:
        print('ERROR: Could Not Load', href)
        return []
    asset_path = path
    if not os.path.exists(asset_path):
        os.makedirs(asset_path)

    for key, asset in r.json()['assets'].items():
        asset_downloads.append((asset['href'], asset_path))
        
    return asset_downloads

def get_test_source_item_assets(item):
    asset_downloads = []
    asset_path = f'test/{item["id"].split("_")[-2]}/'
    if not os.path.exists(asset_path):
        os.makedirs(asset_path)

    for key, asset in item['assets'].items():
        asset_downloads.append([(asset['href'], asset_path)])
        
    return asset_downloads

def download_source_and_labels(item):
    labels = item.get('assets').get('labels')
    links = item.get('links')
    
    # Make the directory to download the files to
    path = f'train/{item["id"].split("_")[-2]}/'
    if not os.path.exists(path):
        os.makedirs(path)
    
    source_items = []
    
    # Download the source imagery
    for link in links:
        if link['rel'] != 'source':
            continue
        source_items.append((path, link['href']))
        
    results = list(map(get_source_item_assets, source_items))
    results.append([(labels['href'], path)])
            
    return results

def download_train_items(uri=None):
    if uri is None:
        uri = f'{API_BASE}/collections/nasa_tropical_storm_competition_train_labels/items'
    print('Loading', uri, '...')
    r = requests.get(uri, params={'key': API_KEY})
    collection = r.json()
    for feature in collection.get('features', []):
        for d in download_source_and_labels(feature):
            for args in d:
                download(args)
    
        
    # Get the next page if results, if available
    for link in collection.get('links', []):
        if link['rel'] == 'next' and link['href'] is not None:
            download_train_items(link['href'])

def download_test_items(uri=None):
    if uri is None:
        uri = f'{API_BASE}/collections/nasa_tropical_storm_competition_test_source/items'
    print('Loading', uri, '...')
    r = requests.get(uri, params={'key': API_KEY})
    collection = r.json()
    for feature in collection.get('features', []):
        for d in get_test_source_item_assets(feature):
            for args in d:
                download(args)
        
    for link in collection.get('links', []):
        if link['rel'] == 'next' and link['href'] is not None:
            download_test_items(link['href'])

Downloading Training Items
=====
Run the cell below to download the train items

In [None]:
download_train_items()

Downloading Test Items
====
Run the cell below to download the test items

In [None]:
download_test_items()

Loading Data into a Dataframe
====
The below code will load both the training and test items into dataframes and sort the rows by the Image ID

In [None]:
import glob
import json
import pandas as pd
import numpy as np

files = glob.glob('train/**/*.jpg')
pd_data = []

for fname in files:
    label_file = fname.replace('.jpg', '_label.json')
    features_file = fname.replace('.jpg', '_features.json')
    storm_id = fname.split('/')[1]
    image_id = fname.split('/')[-1].replace('.jpg', '')
    
    with open(features_file, 'r') as f:
        features_data = json.load(f)
        
    with open(label_file, 'r') as f:
        label_data = json.load(f)
        
    pd_data.append([image_id, storm_id, int(features_data['relative_time']), int(features_data['ocean']), int(label_data['wind_speed']), fname])

train_df = pd.DataFrame(np.array(pd_data),
                   columns=['Image ID', 'Storm ID', 'Relative Time', 'Ocean', 'Wind Speed', 'Image File Path']).sort_values(by=['Image ID'])

files = glob.glob('test/**/*.jpg')
pd_data = []

for fname in files:
    features_file = fname.replace('.jpg', '_features.json')
    storm_id = fname.split('/')[1]
    image_id = fname.split('/')[-1].replace('.jpg', '')
    
    with open(features_file, 'r') as f:
        features_data = json.load(f)
        
    with open(label_file, 'r') as f:
        label_data = json.load(f)
        
    pd_data.append([image_id, storm_id, int(features_data['relative_time']), int(features_data['ocean']), fname])

test_df = pd.DataFrame(np.array(pd_data),
                   columns=['Image ID', 'Storm ID', 'Relative Time', 'Ocean', 'Image File Path']).sort_values(by=['Image ID'])