<img src='https://radiant-assets.s3-us-west-2.amazonaws.com/PrimaryRadiantMLHubLogo.png' alt='Radiant MLHub Logo' width='300'/>

How to use the Radiant MLHub API
=====

The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the [Radiant MLHub site](https://mlhub.earth) and about the organization behind it at the [Radiant Earth Foundation site](https://radiant.earth).

This Jupyter notebook, which you may copy and adapt for any use, shows basic examples of how to use the API. Full documentation for the API is available at [docs.mlhub.earth](docs.mlhub.earth).

We'll show you how to set up your authorization, see the list of available collections and datasets, and retrieve the items (the data contained within them) from those collections. 

Each item in our collection is explained in json format compliant with [STAC](https://stacspec.org/) [label extension](https://github.com/radiantearth/stac-spec/tree/master/extensions/label) definition.

Authentication
-----

Access to the Radiant MLHub API requires an access token. To get your access token, go to [dashboard.mlhub.earth](https://dashboard.mlhub.earth). If you have not used Radiant MLHub before, you will need to sign up and create a new account. Otherwise, sign in. Under **Usage**, you'll see your access token, which you will need. *Do not share* your access token with others: your usage may be limited and sharing your access token is a security risk.

Copy the access token, and paste it in the box bellow. This header block will work for all API calls.

Click **Run** or press `SHIFT` + `ENTER` before moving on to run this first piece of code.

In [1]:
# only the requests module is required to access the API
import requests

# copy your access token from dashboard.mlhub.earth and paste it in the following
ACCESS_TOKEN = 'eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IlJqa3dNMEpFTURsRlFrSXdOemxDUlVZelJqQkdPRFpHUVRaRVFqWkRNRVJGUWpjeU5ERTFPQSJ9.eyJpc3MiOiJodHRwczovL3JhZGlhbnRlYXJ0aC5hdXRoMC5jb20vIiwic3ViIjoiYXV0aDB8NWU3YmZlMmJmN2MyYjMwY2JjZjViZWVjIiwiYXVkIjpbImh0dHBzOi8vYXBpLnJhZGlhbnQuZWFydGgvdjEiLCJodHRwczovL3JhZGlhbnRlYXJ0aC5hdXRoMC5jb20vdXNlcmluZm8iXSwiaWF0IjoxNjAzMjM3MzcxLCJleHAiOjE2MDM4NDIxNzEsImF6cCI6IlAzSXFMcWJYUm0xMEJVSk1IWEJVdGU2U0FEbjBTOERlIiwic2NvcGUiOiJvcGVuaWQgcHJvZmlsZSBlbWFpbCIsInBlcm1pc3Npb25zIjpbXX0.NM7gWi--zYy0gDvk9FynMZDyIPbvtRlamCCSmimXvHJ9hUczyO4LxNeoqIsbDd4xQ-iQgo7qMxOaZjQyAN81hqhLf85LICmPSuLJUUwesNd482CjmCvryvAbPw1-fTnILLp9rXPeqlBF4laBfNy_wDjEk66QAAbuiAxJ866V2uOT33GdbMVOX-XcLbA3HtsJ-0Jb4oOHOeVxW4YtCsSfue8wAFT_pBbusb4rsfBwBMWiBQBtwABij34kpVEyKTQqvNRwtwdR2eXoiuewsqxqAU7o_p_L3CC74enMg4ISPzVfZxBWAR7O40_-MnJbfBbL2tGyxBvexrTi0YA11EN9NQ'
headers = {
    'Authorization': f'Bearer {ACCESS_TOKEN}',
    'Accept':'application/json'
}

Search for data collections
-----

To see what training data is available, you will want to see the collections available through the API.

A collection represents the top-most data level. Typically this means the data comes from the same source for the same geography. It might include different years or sub-geographies.

To find data with specific parameters, see the [API documentation](http://docs.mlhub.earth/?python#the-feature-collections-in-the-dataset).

To see the list, simply run the following cell. The returned list shows the collection id values, collection license, and data source citation (if available).

In [2]:
# get list of all collections
r = requests.get('https://api.radiant.earth/mlhub/v1/collections', headers=headers)
h = r.json()
collections = h['collections']

# print the list of collections 
for c in collections:
    print(f'ID:       {c["id"]}\nLicense:  {c.get("license", "N/A")}\nCitation: {c.get("sci:citation", "N/A")}\n')

ID:       microsoft_chesapeake_nlcd
License:  CDLA-permissive-1.0
Citation: Robinson C, Hou L, Malkin K, Soobitsky R, Czawlytko J, Dilkina B, Jojic N. Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data. Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019).

ID:       ref_african_crops_uganda_01
License:  CC-BY-SA-4.0
Citation: Bocquet, C., Dalberg Data Insights. (2019) Dalberg Data Insights Uganda Crop Classification, Version 1. [Indicate subset used]. Radiant ML Hub. [Date Accessed]

ID:       ref_african_crops_kenya_02_source
License:  CC-BY-SA-4.0
Citation: Radiant Earth Foundation (2020) CV4A Competition Kenya Crop Type Dataset, Version 1. [Indicate subset used]. Radiant ML Hub. [Date Accessed]

ID:       microsoft_chesapeake_landsat_leaf_on
License:  See https://landsat.usgs.gov/sites/default/files/documents/Landsat_Data_Policy.pdf
Citation: Robinson C, Hou L, Malkin K, Soobitsky R, Czawlytko J, Dilkina B, Jojic N. Large 

Retrieve properties of a collection
----

Once you have found the collection that you want to access, you can get its properties from the API.

You can  limit what data you get in the response using the optional parameters:
* **Limit** limits how many items will be returned, with a minimum of 1 and maximum of 10000.
* **Bounding box** limits the returned items to a specific geographic area. 
* **Date time** limits the returned items to those that fall within a specific time-frame.

See the [get features](http://docs.mlhub.earth/#getfeatures) API documentation for more information.

Paste the collection id below for `collectionId`, and enter any desired parameters, then run the cell.

In [23]:
# paste the id of the collection you are interested in here:
collectionId = 'ref_african_crops_kenya_01'
# use these optional parameters to control what items are returned. maximum limit is 10000
limit = 10000
bounding_box = []
date_time = []

# retrieves the items and their metadata in the collection
r = requests.get(f'https://api.radiant.earth/mlhub/v1/collections/{collectionId}/items', params={'limit':limit, 'bbox':bounding_box,'datetime':date_time},headers=headers)
collection = r.json()

As you can see, there are 3 assets which match this criteria: `labels`, `documentation`, and `property descriptions`.

Downloading Assets
---
We'll need to set up some functions to download assets first.

In [24]:
from urllib.parse import urlparse

def get_download_url(item, asset_key, headers):
    asset = item.get('assets', {}).get(asset_key, None)
    if asset is None:
        print(f'Asset "{asset_key}" does not exist in this item')
        return None
    r = requests.get(asset.get('href'), headers=headers, allow_redirects=False)
    return r.headers.get('Location')

def download_file(url):
    filename = urlparse(url).path.split('/')[-1]
    r = requests.get(url)
    f = open(filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk:
            f.write(chunk)
    f.close()
    print(f'Downloaded {filename}')
    return 

In [25]:
selected_item = None
assets = None
for feature in collection.get('features', []):
    selected_item = feature
    assets = list(feature.get('assets').keys())
    download_file(get_download_url(selected_item, 'labels', headers))

download_file(get_download_url(selected_item, 'documentation', headers))
download_file(get_download_url(selected_item, 'property_descriptions', headers))


Downloaded ref_african_crops_kenya_01_tile_001.geojson
Downloaded Kenya_Documentation.pdf
Downloaded Kenya_properties.csv


# Data Wrangle

Merge all the geojsons for each country, then join into one regional file from all countries

In [17]:
import os
import geopandas as gpd
import pandas as pd

In [34]:
folder = "uganda/"
file = os.listdir(folder)
path = [os.path.join(folder, i) for i in file if ".geojson" in i]

In [35]:
gdf = gpd.GeoDataFrame(pd.concat([gpd.read_file(i) for i in path], 
                        ignore_index=True), crs=gpd.read_file(path[0]).crs)

In [36]:
gdf.to_file(f'{folder}ref_african_crops_uganda.geojson', driver="GeoJSON")

## merge into one regional file

In [46]:
# folder = 'data/training_validation/radiant_earth/'

tan = gpd.read_file('ref_african_crops_tanzania.geojson')[['Estimated Harvest Date', 'id', 'geometry']]
ken = gpd.read_file('ref_african_crops_kenya.geojson')
ken = ken[ken['Crop1']!='Fallowland']
ken = ken[['Estimated Harvest Date', 'id', 'geometry']]
# uga = gpd.read_file('ref_african_crops_uganda.geojson')#[['Estimated Harvest Date', 'id', 'geometry']]

In [47]:
crop_ref = pd.concat([tan,ken])

In [48]:
crop_ref.to_file('ref_eastafrican_crops.geojson', driver="GeoJSON")

In [33]:
tan = gpd.read_file('ref_african_crops_tanzania.geojson')
ken=gpd.read_file('ref_african_crops_kenya_01_tile_001.geojson')
ken = ken[ken['Crop1']!='Fallowland'] # remove fallows

In [None]:
crop_ref = pd.concat([tan,ken])

In [32]:
ken = ken[ken['Crop1']!='Fallowland']

In [None]:
tan['Crop'].unique()