Install the library from this branch, execute the next two cells and restart the kernel.

In [None]:
!rm -rf /srv/conda/envs/notebook/lib/python3.10/site-packages/earthaccess*
%pip install earthaccess

In [None]:
%pip install pyhdf

In [None]:
import earthaccess
print(f"Using earthaccess v{earthaccess.__version__}")

#### Log into NASA's Earthdata using the earthaccess package

There are multiple ways to provide your Earthdata credentials via [earthaccess](https://nsidc.github.io/earthaccess/). The [earthaccess authentication class](https://nsidc.github.io/earthaccess/tutorials/restricted-datasets/#auth) automatically tries three methods for getting user credentials to log in:
- with `EARTHDATA_USERNAME` and `EARTHDATA_PASSWORD` environment variables
- through an interactive, in-notebook login (used below); passwords are not shown plain text
- with stored credentials in a .netrc file (not recommended for security reasons)

In [None]:
auth = earthaccess.login()
# are we authenticated?
print(auth.authenticated)

#### Search for cloud-available datasets from NASA

In [None]:
# Using a keyword search

from pprint import pprint
datasets = earthaccess.search_datasets(keyword="SENTINEL",
                                       cloud_hosted=True)

for dataset in datasets[0:2]:
    pprint(dataset.summary())

In [None]:
# Using a known short name
datasets = earthaccess.search_datasets(short_name="HLSS30",
                                       cloud_hosted=True)
for dataset in datasets:
    pprint(dataset.summary())
    pprint(dataset.abstract())

### Searching for granules (files) from a given collection (dataset)
Earthaccess has 2 different ways of querying for data. We can build a query object or we can use the top level API. The difference is that the query object is a bit more flexible and we don't retrieve the metadata from CMR until we execute the `.get()` or `.get_all()` methods.

Let's use [bboxfinder.com](http://bboxfinder.com) to get the extent of our bounding box and enter it into our bounding box input below. You are also welcome to put in a bounding box you already have available. You can uncomment the Iceland bounding box if you don't want to find your own. Order should be: `min_lon, min_lat, max_lon, max_lat`

In [None]:
# Using a specific concept-id, which is unique to the specific product and version

# bbox for ~Iceland is -22.1649, 63.3052, -11.9366, 65.5970 
granules_query = earthaccess.granule_query().cloud_hosted(True) \
        .concept_id("C2021957295-LPCLOUD") \
        .bounding_box(-22.1649, 63.3052, -11.9366, 65.5970) \
        .temporal("2020-01-01","2023-01-01")
granules_query.params

In [None]:
granules_query.hits()

Earthaccess has many methods we can use for our search. For a complete list of the parameters we can use, go to [https://nsidc.github.io/earthaccess/user-reference/granules/granules-query/](https://nsidc.github.io/earthaccess/user-reference/granules/granules-query/)

In [None]:
granule = granules_query.get(1)[0]
granule.data_links()

### Downloading granules

##### **IMPORTANT**: Some datasets will require users to accept an EULA (end user license agreement), it is advisable trying to download a single granule using our browser first and see if we get redirected to a NASA form.

![EULA.png](attachment:e4d77309-1e1e-4489-b1a8-91dde5ce9ed2.png)


We can now download the granules in batches, in this case per year or we can pick some for each season depending on our needs.

In [None]:
# we are using a bounding box but we can also use a polygon or a point
iceland_bbox = (-22.1649, 63.3052, -11.9366, 65.5970)

for year in range(2020, 2023):
    print(f"Querying {year}")
    granules = earthaccess.search_data(
        short_name = "HLSS30",
        bounding_box = iceland_bbox,
        temporal = (f"{year}-01", f"{year+1}-01")
    )
    # If we really want to download the HDF files we can uncomment the next line
    # earthaccess.download(granules, f"MOD07_L2/{year}")

### Streaming data with earthaccess
If we have enough RAM (memory), we can load our granules from an S3 bucket into memory. Earthaccess works with fsspec (xarray, h5netcdf) at the moment, so this is task is better suited for Level 3 and Level 4 netcdf datasets.

We are going to select a few granules for the same day in January for 10 years.

In [None]:
iceland_bbox = (-22.1649, 63.3052, -11.9366, 65.5970)
# we are going to save our granules for each year on this list
granule_list = []

for year in range(2018, 2023):
    print(f"Querying {year}")
    granules = earthaccess.search_data(
        short_name = "HLSS30",
        bounding_box = iceland_bbox,
        temporal = (f"{year}-02-17", f"{year}-02-18")
    )
    granule_list.extend(granules)

In [None]:
scene = granule_list[0]
scene

Retrieve the data link. "Direct" indicates an S3 bucket in the cloud, which you can stream from. "External" indicates an HTTPS link which is not cloud-based. 

In [None]:
print("Direct access links: ", scene.data_links(access="direct")[0])
print("External links: ", scene.data_links(access="external")[0])

In [None]:
import matplotlib.pyplot as plt
import rasterio as rio
from rasterio.plot import show
import xarray as xr

# Read and plot with grid coordinates 
with rio.open(earthaccess.open(granule_list[2:3])[2]) as src:
    fig, ax = plt.subplots(figsize=(9,8))

    # To plot
    show(src,1)

    # To open data into a numpy array
    profile = src.profile
    arr = src.read(1)

In [None]:
hls_scene = xr.open_dataset(earthaccess.open(granule_list[0:1])[0], engine='rasterio')
hls_scene

### Working with legacy data formats

**MOD07_L2** is an HDF EOS dataset providing atmospheric profiles from the MODerate Resolution Imaging Spectroradiometer instrument on the Terra satelite. You could use this data for providing an atmospheric correction for imagery to get a surface reflectance measurement. 

It is in a very old HDF data format and although streaming is technically possible, the only libraries that can open HDF EOS expect file paths not remote file systems like the one used by xarray (fsspec). In this case we will be better off accesing the netCDF endpoint via Opendap to download the granules instead of streaming them.

In [None]:
iceland_bbox = (-22.1649, 63.3052, -11.9366, 65.5970)

granules = earthaccess.search_data(
    short_name = "MOD07_L2",
    bounding_box = iceland_bbox,
    temporal = (f"2020-01-01", f"2020-01-02")
)

In [None]:
print("Direct access link: ", granules[0].data_links(access="direct"))
print("External link: ", granules[0].data_links(access="external"))

If we try to open these hdf files in xarray...

In [None]:
mod07 = xr.open_mfdataset(earthaccess.open(granule_list[0:3]))

We notice that the access occurs quickly, but xarray is unable to recognize the legacy file format. HDF files are remarkably hard to open. You must download the files and open using pyhdf to open them using code like this: 
`from pyhdf.SD import SD,SDC` \
`mod07_l2 = SD(MODfile, SDC.READ)`

We do have another option to convert to nc4 files during download so that we can open the files in xarray.

In [None]:
# This method should be public in upcoming versions of earthaccess
granules[0]._filter_related_links("USE SERVICE API")

In [None]:
netcdf_list = [g._filter_related_links("USE SERVICE API")[0].replace(".html", ".nc4") for g in granules]
netcdf_list[0]

In [None]:
# This is going to be slow as we are asking Opendap to format HDF into NetCDF4 so we only processing 3 granules
# and Opendap is very prone to failures due concurrent connections, not ideal.
file_handlers = earthaccess.download(netcdf_list[0:3], local_path="test_data", threads=4)

In [None]:
import os

# Get the file names
path = "test_data"
dir_list = os.listdir(path)
print(dir_list)

In [None]:
cd test_data

In [None]:
# Open a file into xarray for analysis
ds = xr.open_dataset(dir_list[0])
ds