<a target="_blank"  href="https://colab.research.google.com/github/dataJSA/radiant-mlhub/blob/master/examples/mlhub_client.ipynb">
    <img height="45px" src="https://colab.research.google.com/img/colab_favicon_256px.png"/> 
    Run in Google Colab
  </a>

  <h1>Getting Started with the Radiant MLHub API Using The MLHub Client</h1>

The Radiant MLHub is an open library for geospatial training data to advance machine learning applications on Earth Observations. The training datasets include pairs of imagery and labels for different types of ML problems including image classification, object detection, and semantic segmentation.


The Radiant MLHub API gives access to the different datasets. You can access the full API documentation at [docs.mlhub.earth](docs.mlhub.earth) and check the [Radiant MLHub site](https://mlhub.earth). 

**<p align="center">This notebook demonstrates how to download the full [LandCoverNet](https://medium.com/radiant-earth-insights/radiant-earth-foundation-releases-the-benchmark-training-data-landcovernet-for-africa-7e8906e846a3) using the Radiant MLhub API.</p>**
> LandCoverNet is an annual land cover classification training dataset with labels for the multi-spectral high-quality satellite imagery from Sentinel-2 satellites, covering Africa, Asia, Australia, Europe, North America, and South America.

## Setup

### Package Requirements

In [None]:
!pip install git+https://github.com/dataJSA/radiant-mlhub

### Setup Requirements

In [None]:
import pandas as pd

from mlhub import mlhub
from itertools import chain
from urllib.parse import urlparse
from google.colab import drive, files


from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
%matplotlib inline

In [None]:
drive.mount('/content/drive')

## MLHub Client

The MLHub Client aims at providing a reliable tool for downloading the full LandCoverNet dataset. It's still an experimental project ***if you encounter difficulties do not hesitate to open an issue***.

### Authentication

To get your access token, go to [dashboard.mlhub.earth](https://dashboard.mlhub.earth/). If you have not used Radiant MLHub before, you will need to sign up and create a new account. Otherwise, sign in. Under Usage, you'll see your access token, which you will need.

In [None]:
API_TOKEN = ''

### Usage



#### Initialize the MLHUb client 

The client is intialized with a default `collection_id` and optional default `feature_id`

In [None]:
client = mlhub.Client(api_token=API_TOKEN, 
                     collection_id='ref_landcovernet_v1_labels',
                     feature_id='ref_landcovernet_v1_labels_29NMG_12')

#### Describe the default collection

In [None]:
client.describe_collection()

#### Get an item from the collection 

In [None]:
label_item = client.get_item(collection_id='ref_landcovernet_v1_labels', 
                             item_id='ref_landcovernet_v1_labels_28QDE_02')
source_item = client.get_item(collection_id='ref_landcovernet_v1_source',
                              item_id='ref_landcovernet_v1_source_28QDE_02_20180103')

In [None]:
label_item

In [None]:
source_item

#### Get assets from the item

In [None]:
source_item_assets = client.get_item_assets(source_item,
                                            ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07',
                                             'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD', 'SCL'])
label_item_assets = client.get_item_assets(label_item,
                                          ['labels'])

In [None]:
source_item_assets

In [None]:
label_item_assets

#### Get multiple items and their respective assets

In [None]:
source_items_refs = [(urlparse(item.get('href')).path.split('/')[-1], item.get('href')) for item in label_item.get('links') if item.get('rel') =='source']
source_items_ids = [urlparse(item.get('href')).path.split('/')[-1] for item in label_item.get('links') if item.get('rel') =='source']

In [None]:
source_items_refs

In [None]:
source_items_ids

In [None]:
source_items = client.get_items(collection_id='ref_landcovernet_v1_source',
                                items_ids = source_items_ids)

In [None]:
source_items

In [None]:
source_items_assets = client.get_items_assets(items=source_items,
                                              assets_keys=['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07',
                                                           'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD', 'SCL'])

In [None]:
source_items_assets

In [None]:
source_items_assets_flat = list(chain(*source_items_assets))
source_items_assets_flat

In [None]:
source_items_assets_refs = [(f'landcovernet/{asset[0]}/', asset[2]) for asset in source_items_assets_flat]
source_items_assets_refs

#### Download the items assets

In [None]:
client.downloads(source_items_assets_refs[:10], leave=True)

## Retrieving The LandCoverNet Dataset

Radiant MLHub datasets are split into two collections: One contains items for the source imagery and the other items for the labels.

- **Label Items** are a JSON object with properties describing the type of label, possible label values, spatial and temporal extents, and links to the label assets to download.
  
- **Source imagery items** contain all information required to determine the location and time that the imagery was taken, as well as links to download either individual bands of the imagery or the multi-band files.

![](https://miro.medium.com/max/1260/1*Ei8QLbju7wfssi7w7NBOUA.png)

For more details see Kevin Booth article [Accessing and Downloading Training Data on the Radiant MLHub API ](https://medium.com/radiant-earth-insights/accessing-and-downloading-training-data-on-the-radiant-mlhub-api-f04dc635592f)

***

**The two collections needed for downloading the full LandCovernet datasets are:**

- **`ref_landcovernet_v1_source`: includes the multi-temporal bands of Sentinel-2**
- **`ref_landcovernet_v1_labels`: includes the labels**

[The LandCoverNet dataset documentation](https://radiant-mlhub.s3-us-west-2.amazonaws.com/landcovernet/Documentation.pdf)

The LandCoverNet is constited of a total of: 

- A representative set of **66 Sentinel-2 tiles**:
- For each of the 66 tiles **30 image chips of 256 x 256 pixels** at **10m spatial resolution** are selected
- For each of the image chips **~73 scenes (temporal observations)** covering 2018 are selected
- For each scene **14 bands i.e geoTIFF files** (including cloud cover, scene classification layer are available)

The dataset contains roughly 2.100.000 source item assets **(tile X chip x scene x band imagery)**

### Test downloading both source items assets from the MLHub API

In [None]:
test_item_assets = client.get_items_all_assets(uri=client.collection_items_uri,
                                   max_items=2,
                                   limits=1)

In [None]:
client.downloads(test_item_assets[:10], leave=False)

### Retrieve both label source items download references from the MLHub API

> **Downloading the full dataset on a single core vCPU (as it is the case for google colab) will take between 2-3 hours** 


In [None]:
items_assets_refs = client.get_items_all_assets(uri=client.collection_items_uri, limits=100)

### Save References

The asset references will expire after 6 hours

In [None]:
results =pd.DataFrame({'assets': items_assets_refs})
results.to_csv('landcovernet_assets_references.csv')

### Download the assets

> **Downloading the full dataset on a single core CPU will take several days (downloading only 4% of the dataset took me roughly 8 hours on google colab single core vCPU)**

In [None]:
client.downloads(items_assets_refs, leave=False)