1. Install the library from this branch, execute the cell and restart the kernel

In [None]:
!rm -rf /srv/conda/envs/notebook/lib/python3.10/site-packages/earthaccess*
%pip install git+https://github.com/nsidc/earthaccess.git@auth-improvements --quiet

In [1]:
import earthaccess

auth = earthaccess.login()
print(f"Using earthaccess v{earthaccess.__version__}")

EARTHDATA_USERNAME and EARTHDATA_PASSWORD are not set in the current environment, try setting them or use a different strategy (netrc, interactive)
No .netrc found in /home/jovyan


Enter your Earthdata Login username:  betolink
Enter your Earthdata password:  ········


You're now authenticated with NASA Earthdata Login
Using token with expiration date: 05/13/2023
Using user provided credentials for EDL
Using earthaccess v0.5.1


In [20]:
from pprint import pprint
datasets = earthaccess.search_datasets(short_name="MOD07_L2",
                                       cloud_hosted=True)

for dataset in datasets:
    pprint(dataset.summary())

Datasets found: 1
{'cloud-info': {'Region': 'us-west-2', 'S3BucketAndObjectPrefixNames': ['s3://prod-lads/MOD07_L2'], 'S3CredentialsAPIEndpoint': 'https://data.laadsdaac.earthdatacloud.nasa.gov/s3credentials', 'S3CredentialsAPIDocumentationURL': 'https://data.laadsdaac.earthdatacloud.nasa.gov/s3credentialsREADME'},
 'concept-id': 'C1443541366-LAADS',
 'file-type': "[{'Format': 'HDF-EOS', 'FormatType': 'Native', 'Media': "
              "['Online (HTTPS)'], 'AverageFileSize': 6.5, "
              "'AverageFileSizeUnit': 'MB', "
              "'TotalCollectionFileSizeBeginDate': '2000-02-24T00:00:00.000Z', "
              "'Fees': 'No Fee'}]",
 'get-data': ['https://ladsweb.modaps.eosdis.nasa.gov/search/order/2/MOD07_L2--61',
              'https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD07_L2/'],
 'short-name': 'MOD07_L2',
 'version': '6.1'}



## Searching for granules (files) from a given collection (dataset)

> earthaccess has 2 different ways of querying for data. We can build a query object or we can use the top level API.
The difference is that the query object is a bit more flexible and we don't retrieve the metadata from CMR until we execute the `.get()` or `.get_all()` methods.

In [3]:
# bbox over Iceland ~= -22.1649, 63.3052, -11.9366, 65.5970
granules_query = earthaccess.granule_query().cloud_hosted(True).short_name("MOD07_L2").bounding_box(-22.1649, 63.3052, -11.9366, 65.5970).temporal("2010-01","2020-01")
granules_query.params

{'short_name': 'MOD07_L2',
 'bounding_box': '-22.1649,63.3052,-11.9366,65.597',
 'temporal': ['2010-01-01T00:00:00Z,2020-01-01T00:00:00Z']}

In [21]:
granules_query.hits()

24482

earthaccess has many methods we can use for our search for a complete list of the parameters we can use go to https://nsidc.github.io/earthaccess/user-reference/granules/granules-query/



**Example 1** 
* We have enough disk space, we can copy them from S3 to our local accesible drive. 
* We are going to search with the top level API method.
* We are going to batch our downloads by year

IMPORTANT: Some datasets will require users to accept an EULA, it is advisable trying to download a single granule using our browser first and see if we get redirected to a NASA form.


<a href="https://user-images.githubusercontent.com/717735/226122072-0a8262ee-1403-4622-a8f4-4a54a5412365.png"><img src="https://user-images.githubusercontent.com/717735/226122072-0a8262ee-1403-4622-a8f4-4a54a5412365.png" width="50%" /></a>

In [22]:
granule = granules_query.get(1)[0]
granule.data_links()

['https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD07_L2/2010/001/MOD07_L2.A2010001.1045.061.2017308013648.hdf']

## Downloading granules 

We can now download the granules in batches, in this case per year or we can pick some for each season depending on our needs.

In [24]:
# we are using a bounding box but we can also use a polygon or a point
iceland_bbox = (-22.1649, 63.3052, -11.9366, 65.5970)

for year in range(2015, 2020):
    print(f"Querying {year}")
    granules = earthaccess.search_data(
        short_name = "MOD07_L2",
        bounding_box = iceland_bbox,
        temporal = (f"{year}-01", f"{year+1}-01")
    )
    # If we really want to download the HDF files we can uncomment the next line
    # earthaccess.download(granules, f"MOD07_L2/{year}")

Querying 2015
Granules found: 2420
Querying 2016
Granules found: 2393
Querying 2017
Granules found: 2458
Querying 2018
Granules found: 2457
Querying 2019
Granules found: 2455


## Streaming data with earthaccess

**Example 2** 
* We have enough RAM, we can load our granules from S3 into memory.
* Our libraries can work with fsspec (xarray, h5netcdf) so this is better suited for L3, L4 NetCDF datasets

We are going to select a few granules for the same day in January for 10 years


In [6]:
iceland_bbox = (-22.1649, 63.3052, -11.9366, 65.5970)
# we are going to save our granules for each year on this list
granule_list = []

for year in range(2010, 2023):
    print(f"Querying {year}")
    granules = earthaccess.search_data(
        short_name = "MOD07_L2",
        bounding_box = iceland_bbox,
        temporal = (f"{year}-01-01", f"{year}-01-02")
    )
    granule_list.extend(granules)

Querying 2010
Granules found: 9
Querying 2011
Granules found: 7
Querying 2012
Granules found: 7
Querying 2013
Granules found: 5
Querying 2014
Granules found: 8
Querying 2015
Granules found: 8
Querying 2016
Granules found: 7
Querying 2017
Granules found: 8
Querying 2018
Granules found: 6
Querying 2019
Granules found: 7
Querying 2020
Granules found: 9
Querying 2021
Granules found: 6
Querying 2022
Granules found: 7


In [7]:
print("Direct access link: ", granule_list[0].data_links(access="direct"))
print("External link: ", granule_list[0].data_links(access="external"))

Direct access link:  ['s3://prod-lads/MOD07_L2/MOD07_L2.A2010001.1045.061.2017308013648.hdf']
External link:  ['https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD07_L2/2010/001/MOD07_L2.A2010001.1045.061.2017308013648.hdf']


## Working with legacy data formats

**MOD07_L2** is an HDF EOS dataset, a very old HDF data format and although streaming is possible, our client libraries for HDF EOS expect file paths not remote file systems like the one used by xarray (fsspec). In this case we will be better off accesing the netCDF endpoint via Opendap to download the granules instead of streming them.

In [9]:
# this method should be public in upcoming versions of earthaccess
granule_list[1]._filter_related_links("USE SERVICE API")

['https://ladsweb.modaps.eosdis.nasa.gov/opendap/RemoteResources/laads/allData/61/MOD07_L2/2010/001/MOD07_L2.A2010001.1225.061.2017308013709.hdf.html']

In [10]:
netcdf_list = [g._filter_related_links("USE SERVICE API")[0].replace(".html", ".nc4") for g in granule_list]

In [14]:
# This is going to be slow as we are asking Opendap to format HDF into NetCDF4 so we only processing 10 granules
# and Opendap is very prone to failures due concurrent connections, not ideal.
file_handlers = earthaccess.download(netcdf_list[0:10], local_path="test_data", threads=4)

SUBMITTING | :   0%|          | 0/10 [00:00<?, ?it/s]

File MOD07_L2.A2010001.1225.061.2017308013709.hdf.nc4 already downloaded
File MOD07_L2.A2010001.1400.061.2017308013715.hdf.nc4 already downloaded
File MOD07_L2.A2010001.1405.061.2017308013801.hdf.nc4 already downloaded
File MOD07_L2.A2010001.2025.061.2017308013857.hdf.nc4 already downloaded
File MOD07_L2.A2010001.2030.061.2017308013827.hdf.nc4 already downloaded
File MOD07_L2.A2010001.2205.061.2017308013853.hdf.nc4 already downloaded
File MOD07_L2.A2010001.2340.061.2017308013916.hdf.nc4 already downloaded
File MOD07_L2.A2010001.2345.061.2017308013904.hdf.nc4 already downloaded
File MOD07_L2.A2011001.0005.061.2017319191638.hdf.nc4 already downloaded


PROCESSING | :   0%|          | 0/10 [00:00<?, ?it/s]

COLLECTING | :   0%|          | 0/10 [00:00<?, ?it/s]

In [19]:
import xarray as xr


ds = xr.open_dataset("test_data/MOD07_L2.A2010001.2345.061.2017308013904.hdf.nc4")
ds