# NASA Earthdata API Client 🌍

CMR API documentation: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

EDL API documentation: https://urs.earthdata.nasa.gov/

NASA OpenScapes: https://nasa-openscapes.github.io/earthdata-cloud-cookbook/

## Overview

> TL;DR: **earthdata** makes use of NASA APIs to search, preview and access NASA datasets on-prem and in the cloud with 4 lines of Python.

There are many ways to access NASA datasets, we can use the Earthdata search portal. We can use DAAC specific portals or tools.
We could even use data.gov! These web portals are great but... they are not designed for programmatic access and reproducible workflows. 
This is extremly important in the age of the cloud and reproducible open science.

The good news is that NASA also exposes APIs that allows us to search, transform and access data in a programmatic way. 
There are already some very useful client libraries for these APIs:

* python-cmr
* eo-metadata-tools
* harmony-py
* Hyrax
* others

Each of these libraries has amazing features and some similarities but they miss the glue to take a researcher all the way from searching to getting the data. *Harmony-py* is probably the most complete client of the list and the future for cloud-based workflows but as of today it only serves a small number of datasets.


### CMR: How good is anything if we can't find it?


### Collections

### Granules

### UMM

### EDL

### Data Formats and Cloud Access

## A Python client for NASA's APIs


In [1]:
from earthdata import Auth, DataGranules, DataCollections, Accessor
auth = Auth()

Enter your Earthdata Login username:  betolink
Enter your Earthdata password:  ········


You're now authenticated with NASA Earthdata Login


## Querying for collections
The DataCollection client can query CMR for any collection using all of CMR's Query parameters and has built-in accessors for the common ones.
This makes it ideal for one liners and easier notation.

```python
auth = Auth()
collections = DataCollections(auth).short_name('MODIS').get(10)
collections
```

We can filter fields, if we want the full UMM fileds we use a * symbol.

In [None]:
# We can now search for collections using a pythonic API client for CMR.
# Query = DataCollections(auth).keyword('fire').temporal("2016-01-01", "2020-12-12")
# Query = DataCollections(auth).keyword('GEDI').bounding_box(-134.7,58.9,-133.9,59.2)

Query = DataCollections(auth).keyword('elevation').bounding_box(-134.7,58.9,-133.9,59.2)

print(f'Collections found: {Query.hits()}')

collections = Query.fields(['ShortName','Abstract']).get(10)
# Inspect 5 results printing just the ShortName and Abstract
collections[0:5]

In [None]:
collections[0]["umm.ShortName"]

The DataCollections class returns python dictionaries with some handy methods.

```python 
collection.concept_id() # returns the concept-id, used to search for data granules
collection.abstract() # returns the abstract
collection.landing_page() # returns the landing page if present in the UMM fields
collection.get_data() # returns the portal where data can be accessed.
```

The same results can be obtained using the `dict` syntax:

```python
collection["meta"]["concept-id"] # concept-id
collection["umm"]["RelatedUrls"] # URLs, with GET DATA, LANDING PAGE etc
```


In [None]:
# We can now search for collections using a pythonic API client for CMR.
# Query = DataCollections(auth).provider('POCLOUD')
Query = DataCollections(auth).short_name("ASTGTM")

print(f'Collections found: {Query.hits()}')
collections = Query.fields(['ShortName']).get(20)
# Printing 3 collections
collections[0]

In [None]:
# Printing the concept-id for the first 10 collections
[collection.concept_id() for collection in collections[0:10]]

## Querying for data granules

The DataGranules class provides similar functionality as the collection class. To query for granules in a more reliable way concept-id would be the main key.
You can search data granules using a short name but that could (more likely will) return different versions of the same data granules. 

In this example we're querying for 20 data grnaules from ICESat-2 [ATL03](https://nsidc.org/data/ATL03/versions/) version `"003"` dataset. 

In [None]:
# Query = DataGranules().short_name('ATL03').version("003")
# Query = DataGranules().short_name('ASTGTM')

granules = Query.get(20)

[display(g) for g in granules[0:6]]

### Spatiotemporal queries

Our granules and collection classes accept the same spatial and temporal argumenst as CMR so we can search for granules that match spatiotemporal criteria.



In [None]:
Query = DataGranules().short_name("ATL03").temporal("2020-03-01", "2020-03-30").bounding_box(-134.7,58.9,-133.9,59.2).version("003")
print(f"Granules found: {Query.hits()}")

In [None]:
# Now we can print some info about these granules using the built-in methods
granules = Query.get(4)
data_links = [{'links': g.data_links(), 'size (MB):': g.size()} for g in granules]
data_links

In [None]:
[display(g) for g in granules]

In [None]:
# C1908348134-LPDAAC_ECS: GEDI L2A Elevation and Height Metrics Data Global Footprint Level V002
# C1968980609-POCLOUD: Sentinel-6A MF Jason-CS L2 P4 Altimeter Low Resolution (LR) STC Ocean Surface Topography
# C1575731655-LPDAAC_ECS: ASTER Global Digital Elevation Model NetCDF V003
# Query = DataGranules(auth).short_name('ATL03').version("003")
Query = DataGranules(auth).short_name('ATL03').version("003").bounding_box(-134.7,58.9,-133.9,59.2)
# Query = DataGranules(auth).concept_id("C1968980609-POCLOUD").bounding_box(-134.7,58.9,-133.9,59.2)
print(f"Granules found: {Query.hits()}")

In [None]:
# Not all granules have data previews, if they have the granule class will show up to 2 preview images while using Jupyter's display() function
granules = Query.get(20)
[display(g) for g in granules[0:5]]

In [None]:
# Granules are python dictionaries, with fancy nested key/value notation and some extra built-in methods.
granules[0]["umm.TemporalExtent.RangeDateTime"]

In [None]:
# Size in MB
data_links = [{'links': g.data_links(), 'size (MB):': g.size()} for g in granules]
data_links

## **Accessing the data**: *How I Learned to Stop Worrying and Love the Cloud* **

 
* ** Terms and conditions may and will apply

In [None]:
# Accessing not necessarily means downloading, specially in the cloud.
access = Accessor(auth)

In [None]:
granules[0].cloud_hosted

In [None]:
files = access.get(granules)

In [None]:
import xarray as xr


ds = xr.open_mfdataset('./data/*.nc', concat_dim=[..., None, ...] )
print(ds)

In [None]:
ds.ASTER_GDEM_DEM.plot()

## Recap

```python
from earthdata import Auth, DataGranules, DataCollections, Accessor
auth = Auth()
access = Accessor(auth)

Query = DataGranules(auth).concept_id("C1575731655-LPDAAC_ECS").bounding_box(-134.7,58.9,-133.9,59.2)
granules = Query.get(10)
# preview the data granules
granules 
# get the files
files = access.get(granules)


```

**Wait, we said 4 lines of python, we meant 3!**

```python

from earthdata import Auth, DataGranules, DataCollections, Accessor
auth = Auth()
files = Accessor(auth).get(DataGranules().concept_id("C1575731655-LPDAAC_ECS").bounding_box(-134.7,58.9,-133.9,59.2).get(10))

# Profit!
```

In [None]:
from earthdata import Auth, DataGranules, DataCollections, Accessor
auth = Auth()
files = Accessor(auth).get(DataGranules().concept_id("C1575731655-LPDAAC_ECS").bounding_box(-134.7,58.9,-133.9,59.2).get(10))