<a href="https://colab.research.google.com/github/jainaman588/DataScienceProjects/blob/master/notebooks/radiant-mlhub-api-know-how.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src='https://radiant-assets.s3-us-west-2.amazonaws.com/PrimaryRadiantMLHubLogo.png' alt='Radiant MLHub Logo' width='300'/>

# How to use the Radiant MLHub API

The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the [Radiant MLHub site](https://mlhub.earth) and about the organization behind it at the [Radiant Earth Foundation site](https://radiant.earth).

This Jupyter notebook, which you may copy and adapt for any use, shows basic examples of how to use the API. Full documentation for the API is available at [docs.mlhub.earth](docs.mlhub.earth).

We'll show you how to set up your authorization, see the list of available collections and datasets, and retrieve the items (the data contained within them) from those collections. 

Each item in our collection is explained in json format compliant with [STAC](https://stacspec.org/) [label extension](https://github.com/radiantearth/stac-spec/tree/master/extensions/label) definition.

## Dependencies

This notebook utilizes the [`radiant-mlhub` Python client](https://pypi.org/project/radiant-mlhub/) for interacting with the API. If you are running this notebooks using Binder, then this dependency has already been installed. If you are running this notebook locally, you will need to install this yourself.

See the official [`radiant-mlhub` docs](https://radiant-mlhub.readthedocs.io/) for more documentation of the full functionality of that library.

## Authentication

### Create an API Key

Access to the Radiant MLHub API requires an API key. To get your API key, go to [dashboard.mlhub.earth](https://dashboard.mlhub.earth). If you have not used Radiant MLHub before, you will need to sign up and create a new account. Otherwise, sign in. In the **API Keys** tab, you'll be able to create API key(s), which you will need. *Do not share* your API key with others: your usage may be limited and sharing your API key is a security risk.

### Configure the Client

Once you have your API key, you need to configure the `radiant_mlhub` library to use that key. There are a number of ways to configure this (see the [Authentication docs](https://radiant-mlhub.readthedocs.io/en/latest/authentication.html) for details). 

For these examples, we will set the `MLHUB_API_KEY` environment variable. Run the cell below to save your API key as an environment variable that the client library will recognize.

*If you are running this notebook locally and have configured a profile as described in the [Authentication docs](https://radiant-mlhub.readthedocs.io/en/latest/authentication.html), then you do not need to execute this cell.*


In [1]:
import os

os.environ['MLHUB_API_KEY'] = '328bd4d36624758908483454ae963b9dbc527f35935574adb0dcaba4143f7ebd'

In [3]:
!pip install radiant_mlhub

Collecting radiant_mlhub
  Downloading https://files.pythonhosted.org/packages/d0/9d/7e226b0bd483260fa309d10bc6cb68ffbd965825d795055e3a4dd002635b/radiant_mlhub-0.2.1-py3-none-any.whl
Collecting pystac==0.5.4
[?25l  Downloading https://files.pythonhosted.org/packages/69/17/86b6e1531e1295c52d400e0a7d03b2d3a4ae75d92faec6fa00265a05ef86/pystac-0.5.4-py3-none-any.whl (133kB)
[K     |████████████████████████████████| 143kB 6.6MB/s 
Collecting requests~=2.25.1
[?25l  Downloading https://files.pythonhosted.org/packages/29/c1/24814557f1d22c56d50280771a17307e6bf87b70727d975fd6b2ce6b014a/requests-2.25.1-py2.py3-none-any.whl (61kB)
[K     |████████████████████████████████| 61kB 5.7MB/s 
[?25hCollecting tqdm~=4.56.0
[?25l  Downloading https://files.pythonhosted.org/packages/b3/db/dcda019790a8d989b8b0e7290f1c651a0aaef10bbe6af00032155e04858d/tqdm-4.56.2-py2.py3-none-any.whl (72kB)
[K     |████████████████████████████████| 81kB 7.7MB/s 
[31mERROR: google-colab 1.0.0 has requirement requests~=2.

In [4]:
from radiant_mlhub import client, get_session

## List data collections

A **collection** in the Radiant MLHub API is a [STAC Collection](https://github.com/radiantearth/stac-spec/tree/master/collection-spec) representing a group of resources (represented as [STAC Items](https://github.com/radiantearth/stac-spec/tree/master/item-spec) and their associated assets) covering a given spatial and temporal extent. A Radiant MLHub collection may contain resources representing training labels, source imagery, or (rarely) both.

Use the `client.list_collections` function to list all available collections and view their properties. The following cell uses the `client.list_collections` function to print the ID, license (if available), and citation (if available) for all available collections.

In [5]:
collections = client.list_collections()
for c in collections:
    collection_id = c['id']
    license = c.get('license', 'N/A')
    citation = c.get('sci:citation', 'N/A')

    print(f'ID:       {collection_id}\nLicense:  {license}\nCitation: {citation}\n')

ID:       ref_african_crops_kenya_01_labels
License:  CC-BY-SA-4.0
Citation: PlantVillage (2019) "PlantVillage Kenya Ground Reference Crop Type Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.U41J87

ID:       ref_african_crops_kenya_01_source
License:  CC-BY-SA-4.0
Citation: PlantVillage (2019) "PlantVillage Kenya Ground Reference Crop Type Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.U41J87

ID:       ref_african_crops_tanzania_01_labels
License:  CC-BY-SA-4.0
Citation: Great African Food Company (2019) "Great African Food Company Tanzania Ground Reference Crop Type Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.5VX40R

ID:       ref_african_crops_tanzania_01_source
License:  CC-BY-SA-4.0
Citation: Great African Food Company (2019) "Great African Food Company Tanzania Ground Reference Crop Type Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/R

Collection objects have many other properties besides the ones shown above. The cell below prints the `ref_african_crops_kenya_01_labels` collection object in its entirety.

In [17]:
kenya_crops_labels = next(c for c in collections if c['id'] == 'ref_south_africa_crops_competition_v1_train_labels')
kenya_crops_labels

{'description': 'Crop Type Classification Dataset for Western Cape, South Africa',
 'extent': {'spatial': {'bbox': [[17.818514019575364,
     -34.15382764253356,
     19.765086593643762,
     -30.753864468983238]]},
  'temporal': {'interval': [['2017-08-01T00:00:00Z',
     '2017-08-01T00:00:00Z']]}},
 'id': 'ref_south_africa_crops_competition_v1_train_labels',
 'keywords': [],
 'license': 'CC-BY-4.0',
 'links': [{'href': 'https://api.radiant.earth/mlhub/v1/collections/ref_south_africa_crops_competition_v1_train_labels',
   'rel': 'self',
   'title': None,
   'type': 'application/json'},
  {'href': 'https://api.radiant.earth/mlhub/v1',
   'rel': 'root',
   'title': None,
   'type': 'application/json'}],
 'properties': {},
 'providers': [{'description': None,
   'name': 'Radiant Earth Foundation',
   'roles': ['licensor', 'host', 'processor'],
   'url': 'https://radiant.earth'}],
 'sci:citation': 'Western Cape Department of Agriculture, Radiant Earth Foundation (2021) "Crop Type Classifi

## Select an Item

Collections have items associated with them that are used to catalog assets (labels or source imagery) for that collection. Collections vary greatly in the number of items associated with them; some may contain only a handful of items, while others may contain hundreds of thousands of items.

The following cell uses the `client.list_collection_items` to get the first item in the `ref_african_crops_kenya_01_labels` collection. The `client.list_collection_items` is a Python generator that yields a dictionary for each item in the collection (you can read more about how to use Python generators [here](https://realpython.com/introduction-to-python-generators/)).

In [18]:
# NOTE: Here we are using using the "id" property of the collection that we fetched above as the collection_id 
#  argument to the list_collection_items function
items_iterator = client.list_collection_items(kenya_crops_labels['id'])

# Get the first item
first_item = next(items_iterator)
first_item

{'assets': {'documentation': {'description': None,
   'href': 'https://api.radiant.earth/mlhub/v1/download/gAAAAABg40TGphuWgcrIcel_tC7aRo7qFAt_vCKsUZrZjvVo4-vqcr_qSDit5XHjGRVYkY-s0zCp3gf0_EuKqN_GqFuCJJNLodvhLx-DOU8W1dYq5jsvrJEoh4WBvCOiE_f4b-egtqYBPlHqUenqttfTJgTWLJ6CPKdgFs876AzX4V5zsOWePzUwUZ_b_HP4VLFziDmRvytImiCJx_UaBH06XjmoLZ1NekoNsKO-buer3l7ZJZ-WuXdtJvTYQ5cF0dKVcDUejYwOVkLsjafCOuIiGqyVaegXHTtiUFe0SQhAHNqe6me7_2x5dj_b9Zl6eToVTHRD1GNZXz_BtRPcgZsKX9s_9HwCyTNe8Yec_uePoHN3O183894=',
   'roles': [],
   'title': 'Dataset Documentation',
   'type': 'application/pdf'},
  'field_ids': {'description': None,
   'href': 'https://api.radiant.earth/mlhub/v1/download/gAAAAABg40TG9n_DfXhNb1XmRU_W4Eu7Zg7QTUZ1Wdgo_OXmjZcz6Q5D0uZHlBLA9uTnySlKyjeR25xU_eVuI8-EvbRyi7FpybZb6fHsJYMrVUGGjOyXLpFDrxCiIMr0m2exOMAbCRKro0VGTB5zWJB63sbG_JnKgKPtxEBNmu9xlS0KuzQ6GPtIdINEAgJ-15Y2sIm7So8hGuM_nme2R24hmunnvR2oKxcnCvSTCnuYvAy7iOlerywjDbjwH1NETG57njAXT7Mw7y_wzxYmGIO9whhVT_WAEolFalX3NYPHXS4LoH90VNT3Cuv2EQ8LqLI8FOF3aWCIVjlJh

> **IMPORTANT:** Some collections may have hundreds of thousands of items (e.g. `bigearthnet_v1_source`). Looping over all of the items for these massive collections may take a very long time (perhaps on the order of hours), and is not recommended. To prevent accidentally looping over all assets, the `client.list_collection_items` function limits the total number of returned items to ``100`` by default. You can change this limit using the `limit` argument:
> ```python
> client.list_collection_items(collection['id'], limit=150)
> ```
> If you would like to download all of the assets associated with a collection, it is far more efficient to use the `client.download_archive` method.

### List Available Assets

Each STAC Item has assets associated with it, representing the actual source imagery or labels described by that Item. 

The cell below summarizes the assets for the first item that we selected above by printing the key within the `assets` dictionary, the asset title and the media type.

In [19]:
for asset_key, asset in first_item['assets'].items():
    title = asset['title']
    media_type = asset['type']
    print(f'{asset_key}: {title} [{media_type}]')

documentation: Dataset Documentation [application/pdf]
field_ids: Field ID Labels [image/tiff; application=geotiff]
field_info_train: Field Label List [text/csv]
labels: Crop Labels [image/tiff; application=geotiff]
raster_values: Raster Crop Type Mapping [application/json]


## Download Assets

To download these assets, we will first set up a helper function to get the download link from the asset and then download the content to a local file.

> **NOTE:** If you are running these notebooks using Binder these resources will be downloaded to the remote file system that the notebooks are running on and **not to your local file system.** If you want to download the files to your machine, you will need to clone the repo and run the notebook locally.

In [20]:
import urllib.parse
from pathlib import Path
import requests


def download(item, asset_key, output_dir='.'):
    # Try to get the given asset and return None if it does not exist
    asset = item.get('assets', {}).get(asset_key)
    if asset is None:
        print(f'Asset "{asset_key}" does not exist in this item')
        return None
    
    # Try to get the download URL from the asset and return None if it does not exist
    download_url = asset.get('href')
    if download_url is None:
        print(f'Asset {asset_key} does not have an "href" property, cannot download.')
        return None
    
    session = get_session()
    r = session.get(download_url, allow_redirects=True, stream=True)
    
    filename = urllib.parse.urlsplit(r.url).path.split('/')[-1]
    output_path = Path(output_dir) / filename

    
    with output_path.open('wb') as dst:
        for chunk in r.iter_content(chunk_size=512 * 1024):
            if chunk:
                dst.write(chunk)
    
    print(f'Downloaded to {output_path.resolve()}')
    

### Download Labels


We can download the `labels` asset of the `selected_item` by calling the following function: 

In [22]:
first_item.get('assets', {}).get('labels')

{'description': None,
 'href': 'https://api.radiant.earth/mlhub/v1/download/gAAAAABg40TGxyvFg5lVk-JyZFVJPNswAQzbXrKHaof8vHu2BrWOWR06exIQZoz3sRgByZQEwKlj16lSyjbVQUWTyVVRUFJ6WsKATDNLjVnn2dIatfg7Cmorw2YmLvFQW4Ys6FbjoZYg_mPmLFq_RFBoqF8ykqiCrcmDP_pkDxH2KAi5JMmlEYT7S5rmFDnbMnDKjQU0jNbUtpJXAOeOWn7fzcTqg69traFrIM_tFZnZuxFCA8oxN9UP-rWzvZYuH1_KZIfZrDnYWTA-baRqKNMYOptP1kciaMPlaOalyqb516saPrsX2BGjifa105f1_JWcWDDtUSq5akOQhnUkB6wmmVPZZfD2N-L-qwZXXtG-LRtGMxAP_ME=',
 'roles': [],
 'title': 'Crop Labels',
 'type': 'image/tiff; application=geotiff'}

In [21]:
download(first_item, 'labels')

Downloaded to /content/2371.tif


### Download Metadata

Likewise, we can download the documentation pdf and property description csv.

In [None]:
download(first_item, 'documentation')
download(first_item, 'property_descriptions')

### Download Source Imagery

The Item that we fetched earlier represents a collection of labels. This item also contains references to all of the source imagery used to generate these labels in its `links` property. Any source imagery links will have a `rel` type of `"source"`.

In the cell below we get a list of all the sources associated with this item and fetch the first one

In [None]:
source_links

In [56]:
source_links = [link for link in first_item['links'] if link['rel'] == 'source']
print(f'Number of Source Items: {len(source_links)}')

session = get_session()
r = session.get(source_links[77]['href'])
source_item = r.json()
print('First Item\n--------')
source_item

Number of Source Items: 79
First Item
--------


{'assets': {'B01': {'description': None,
   'eo:bands': [{'common_name': 'Coastal aerosol', 'gsd': 10, 'name': 'B01'}],
   'href': 'https://api.radiant.earth/mlhub/v1/download/gAAAAABg400tlhUJxRR5hKvSBzEKSppYuq-jQVnXlmYck6UjNXex9wF5o6DH4-PuE2rsB6bTh746qYayc93qfHveadqv2FaLnPM7KOJHnF7g1Yl_ZgpjFyJZK4-s1c-xXdA0EojRlYsYPKA89P_YVaa1phrOKKemXTmLBT3myntanyFSOxwxYwzVgtiR6Khi5Ni9_U2DgkBL57xFQRm_jw_ADWiciPmVqG8wppB8xh12dHK3Hm8f2eQgwUonxqmkH9t-yhweqNWcw_4fSmIfp9-V7KS6N8Odcbco_AEg9Mu12ujdI3bL08ppx9gZ6DUKeR9jFkC4-ihY0LkQFotFD21JPbN_ynVuzE__Xx1vb5x5-vhEKLrqG8ccpYWIvN0yyUSowLupxOSxPO2mtK1GuQGbJFEOdaZCqFYUEBdikBj89hRdsNlCqio=',
   'roles': [],
   'title': None,
   'type': 'image/tiff'},
  'B02': {'description': None,
   'eo:bands': [{'common_name': 'Blue', 'gsd': 10, 'name': 'B02'}],
   'href': 'https://api.radiant.earth/mlhub/v1/download/gAAAAABg400tOnGlCMs7F7bezwwn-K2FCswZETDifeR8_33RZ8D58-riMByMPzdeF874ru5rtDcmCYnMl9IXg-PFD_8xYeeYnSWa2Ner2A7LB7HgnCa_1Ye3sd8aahTCNn3h-gJ1vBQf5mUUAnzgIUw67RqbaiJz5g0n0p

Once we have the source item, we can use our `download` function to download assets associated with that item. 

The cell below downloads just the first 3 bands of the source item that we just fetched (a Sentinel 2 scene).

In [54]:
source_item['a

'https://api.radiant.earth/mlhub/v1/download/gAAAAABg40v57WYtQCUWfchPNFrelC6UDBMAe7TNTET-JtBJgKUQW11FFxh2ADO-gbEbFQCOIVR7hDQ8bUGmPYHQzN1CpdWnKqWYuB3Z5ZeKGl7Lm_odWRnQSuJckyINWs62WCrfi8EmgMmKVtXCKFGxFL-A8CX4-lD9VOrqn3FLJzJAQ-nPnLFLORzsi3OsWRb8baNQhyzVvePbudcQdM9ycNaE9Vfyyw6QGPH4lURibHgYySwt_ZE-M_RXqSJMxo6u_POyxe0Xpel_s2WN6tsD84d0TYtvMEwdLR3XmwmoFK4cfRdyzX6P6Mkp4EER0EYIkHHOYvvzGs7uboo2cvB56Y-mcuEd4fW0sogl2on7hGplhClKcu5YUXYzyRrPvjREcwv__qasGIJS-xo24730WgA5flPAXzuwXLFy-L8RmVvtjRGhUwQ='

In [32]:
!pip install rasterio
import rasterio

Collecting rasterio
[?25l  Downloading https://files.pythonhosted.org/packages/a3/6e/b32a74bca3d4fca8286c6532cd5795ca8a2782125c23b383448ecd9a70b6/rasterio-1.2.6-cp37-cp37m-manylinux1_x86_64.whl (19.3MB)
[K     |████████████████████████████████| 19.3MB 1.2MB/s 
Collecting click-plugins
  Downloading https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl
Collecting snuggs>=1.4.1
  Downloading https://files.pythonhosted.org/packages/cc/0e/d27d6e806d6c0d1a2cfdc5d1f088e42339a0a54a09c3343f7f81ec8947ea/snuggs-1.4.7-py3-none-any.whl
Collecting cligj>=0.5
  Downloading https://files.pythonhosted.org/packages/73/86/43fa9f15c5b9fb6e82620428827cd3c284aa933431405d1bcf5231ae3d3e/cligj-0.7.2-py3-none-any.whl
Collecting affine
  Downloading https://files.pythonhosted.org/packages/ac/a6/1a39a1ede71210e3ddaf623982b06ecfc5c5c03741ae659073159184cd3e/affine-2.3.0-py2.py3-none-any.whl
Installing collected packages

In [33]:
with rasterio.open('/content/2371_2017_11_17_B03_10m.tif') as src:
  my_file = src.read(1)

In [34]:
my_file.shape

(256, 256)

In [27]:
download(source_item, 'B01')
download(source_item, 'B02')
download(source_item, 'B03')

Downloaded to /content/2371_2017_11_17_B01_10m.tif
Downloaded to /content/2371_2017_11_17_B02_10m.tif
Downloaded to /content/2371_2017_11_17_B03_10m.tif


### Download All Assets

Looping through all items and downloading the associated assets may be *very* time-consuming for larger datasets like BigEarthNet or LandCoverNet. Instead, MLHub provides TAR archives of all collections that can be downloaded using the `/archive/{collection_id}` endpoint. 

The following cell uses the `client.download_archive` function to download the `ref_african_crops_kenya_01_labels` archive to the current working directory.

In [28]:
client.download_archive('ref_south_africa_crops_competition_v1_train_labels', output_dir='.')

  0%|          | 0/31.4 [00:00<?, ?M/s]

PosixPath('/content/ref_south_africa_crops_competition_v1_train_labels.tar.gz')

In [55]:
output_dir='.'
download_url = 'https://api.radiant.earth/mlhub/v1/download/gAAAAABg40v57WYtQCUWfchPNFrelC6UDBMAe7TNTET-JtBJgKUQW11FFxh2ADO-gbEbFQCOIVR7hDQ8bUGmPYHQzN1CpdWnKqWYuB3Z5ZeKGl7Lm_odWRnQSuJckyINWs62WCrfi8EmgMmKVtXCKFGxFL-A8CX4-lD9VOrqn3FLJzJAQ-nPnLFLORzsi3OsWRb8baNQhyzVvePbudcQdM9ycNaE9Vfyyw6QGPH4lURibHgYySwt_ZE-M_RXqSJMxo6u_POyxe0Xpel_s2WN6tsD84d0TYtvMEwdLR3XmwmoFK4cfRdyzX6P6Mkp4EER0EYIkHHOYvvzGs7uboo2cvB56Y-mcuEd4fW0sogl2on7hGplhClKcu5YUXYzyRrPvjREcwv__qasGIJS-xo24730WgA5flPAXzuwXLFy-L8RmVvtjRGhUwQ='
session = get_session()
r = session.get(download_url, allow_redirects=True, stream=True)

filename = urllib.parse.urlsplit(r.url).path.split('/')[-1]
output_path = Path(output_dir) / filename


with output_path.open('wb') as dst:
    for chunk in r.iter_content(chunk_size=512 * 1024):
        if chunk:
            dst.write(chunk)

print(f'Downloaded to {output_path.resolve()}')

Downloaded to /content/2371_2017_11_27_B01_10m.tif
