## Summary

earthaccess is a Python library that simplifies data discovery and access to NASA Earth science data by providing an abstraction layer for NASA’s Common Metadata Repository (CMR) Search API so that searching for data can be done using a simpler notation instead of low level HTTP queries.

## Prerequisites  




## Learning Objectives

1. Utilize the `earthaccess` python library to search for data using spatial and temporal filters and explore search results
2. Perform in-region direct access of data from an Amazon Simple Storage Service (S3) bucket


***

## Import Required Packages  

In [2]:
# Suppress warnings
#import warnings
#warnings.simplefilter('ignore')
#warnings.filterwarnings('ignore')

# Direct access
import earthaccess 
from pprint import pprint
import xarray as xr
import geopandas as gpd
import geoviews as gv
gv.extension('bokeh', 'matplotlib')
#from harmony import BBox, Client, Collection, Request, LinkType
#import datetime as dt
#import s3fs
%matplotlib inline

## About `earthaccess`  

`earthaccess` is a Python library that simplifies data discovery and access to NASA Earth science data by providing an abstraction layer for NASA’s [Common Metadata Repository (CMR) API](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html) Search API so that searching for data can be done using a simpler notation instead of low level HTTP queries. `earthaccess` makes **authentication** and **search** easier while also providing a stream line way to stream search results into an `xarray` object.  

### Authentication for NASA Earthdata  

An Earthdata Login account is required to access data from NASA Earthdata. Please visit <https://urs.earthdata.nasa.gov> to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.  

The first step is to get the correct authentication that will allow us to get cloud-hosted data from NASA. This is all done through Earthdata Login. We can use the `earthaccess` library here, where the `login` method also gets the correct AWS credentials.  

In [3]:
auth = earthaccess.login()
# are we authenticated?
if not auth.authenticated:
    # ask for credentials and persist them in a .netrc file
    auth.login(strategy="interactive", persist=True)

EARTHDATA_USERNAME and EARTHDATA_PASSWORD are not set in the current environment, try setting them or use a different strategy (netrc, interactive)
You're now authenticated with NASA Earthdata Login
Using token with expiration date: 11/27/2023
Using .netrc file for EDL


### Search for data  

Earthdata Search also uses the CMR API. Let's head back to our [Earthdata Search](https://search.earthdata.nasa.gov/search/granules?p=C2270392799-POCLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&q=measures%20ssh%20anomalies&tl=1685392107!3!!) results to gather more information about our dataset of interest. The dataset "short name" can be found by clicking on the Info button on our collection search result, and we can paste that into a python variable.  

| Shortname | Collection Concept ID | DOI |
| --- | --- | --- |
| GPM_3IMERGDF | C2723754864-GES_DISC | 10.5067/GPM/IMERGDF/DAY/07 |
| MOD10C1 | C1646609808-NSIDC_ECS | 10.5067/MODIS/MOD10C1.061 |
| SPL4SMGP | C2531308461-NSIDC_ECS | 10.5067/EVKPQZ4AFC4D | 
| SPL4SMAU | C2537927247-NSIDC_ECS | 10.5067/LWJ6TF5SZRG3 |

#### Search by collection

In [4]:
collection_id = 'C2723754864-GES_DISC'

In [5]:
results = earthaccess.search_data(
    concept_id = collection_id,
    cloud_hosted = True,
    count = 10    # Restricting to 10 records returned
)

Granules found: 8400


In this example we used the `concept_id` parameter to search from our desired collection. However, there are multiple ways to specify the collection(s) we are interested in. Alternative parameters include:  

- `doi` - request collection by digital object indentifier (e.g., `doi` = '10.5067/GPM/IMERGDF/DAY/07')  
- `short_name` - request collection by CMR shortname (e.g., `short_name` = 'GPM_3IMERGDF')  

We can refine our search by passing more parameters that describe the spatiotemporal domain of our use case. Here, we use the `temporal` parameter to request a date range and the `bounding_box` parameter to request granules that intersect with a bounding box.  

For our bounding box, we are going to read in a GeoJSON file containing a single feature and extract the coordinate pairs for the southeast corner and the northwest corner (or lowerleft and upperright corners) of the bounding box around the feature.  

In [7]:
inGeojson = gpd.read_file('../../2023-Cloud-Workshop-AGU/data/sf_to_sierranvmt.geojson')

In [8]:
xmin, ymin, xmax, ymax = inGeojson.total_bounds

We will assign our start date and end date to a varialbe named `date_range` and we'll assign the southeast and the northwest corner coordinates to a variable named `bbox` to be passed to our `earthaccess` search request.  

In [9]:
date_range = ("2022-11-19", "2023-04-06")
#bbox = (-127.0761, 31.6444, -113.9039, 42.6310)
bbox = (xmin, ymin, xmax, ymax)

In [10]:
results = earthaccess.search_data(
    concept_id = collection_id,
    #cloud_hosted = True,
    temporal = date_range,
    bounding_box = bbox,
)

Granules found: 139


- The `short_name` and `concept_id` search parameters can be used to request one or multiple collections per request, but the `doi` parameter can only request a single collection.  
> `concept_ids` = ['C2723754864-GES_DISC', 'C1646609808-NSIDC_ECS']  
- CMR concept IDs and collection DOIs are unique to each version of a data collection. However CMR shortnames are not. CMR shortnames can be associated with multiple versions of a collection, so it is recommended to use the `short_name` parameter and the `version` parameter in conjuction.  
- Use the `cloud_hosted` search parameter only to search for data assets available from NASA's Earthdata Cloud.


In [8]:
# col_ids = ['C2723754864-GES_DISC', 'C1646609808-NSIDC_ECS', 'C2531308461-NSIDC_ECS', 'C2537927247-NSIDC_ECS'] 

# results = earthaccess.search_data(
#     concept_id = col_ids,
#     #cloud_hosted = True,
#     temporal = date_range,
#     bounding_box = bbox,
# )

### Working with `earthaccess` returns  

`earthaccess` provides several convienence methods to help streamline processes that historically have be painful when done using traditional methods. Following the search for data, you'll likely take one of two pathways with those results. You may choose to **download** the assets that have been returned to you or you may choose to continue working with the search results within the Python environment.  

#### Download `earthaccess` results

In some cases you may want to download your assets. `earthaccess` makes downloading the data from the search results very easy using the `earthaccess.download()` function.

In [10]:
downloaded_files = earthaccess.download(
    results[0:9],
    local_path='../../2023-Cloud-Workshop-AGU/data',
)

'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data


#### Explore `earthaccess` search response

In [None]:
print(f'The results variable is a {type(results)} of {type(results[0])}')

In [None]:
len(results)

In [None]:
item = results[0]
type(item)

In [None]:
item.keys()

In [None]:
item['meta']

#### Get data URLs / S3 URIs

Get links to data. The `data_links()` method is used to return the URL(s)/data link(s) for the item. By defalt the method returns the HTTPS URL to download or access the item.

In [None]:
item.data_links()

The `data_links()` method can also be used to get the s3 URI when we want to perform direct s3 access of the data in the cloud. To get the s3 URI, pass `access = 'direct'` to the method.

In [None]:
item.data_links(access='direct')

Finally, we can extract all of the data links from our search results and add them to a list

In [None]:
data_link_list = []

for granule in results:
    for asset in granule.data_links(access='direct'):
        data_link_list.append(asset)
        

In [None]:
data_link_list[0:9]

#### Pass results to `xarray`

We pass the `earthaccess.open()` function to 

In [None]:
fileset = earthaccess.open(results)

In [None]:
ds = xr.open_mfdataset(fileset, chunks = {})
#ds

Some really cool things just happened here! Not only were we able to seamlessly stream our `earthaccess` search results into a `xarray` `dataset` using the `open_mfdataset()` (multi-file) method, but `earthaccess` determined that we were working from within AWS us-west-2 and accessed the data via direct S3 access! We didn't have to create a session or a filesystem to authenicate and connect to the data. `earthaccess` did this for us using the `auth` object we created at the beginning of this tutorial. If we were not working in AWS us-west-2, `earthaccess` would automagically switch to accessing the data via the HTTPS endpoints and would again handle the authentication for us.

Let's take a quick lock at our `xarray` `dataset`

In [None]:
ds

---

In [11]:
collection_id = 'C2723754864-GES_DISC'
date_range = ("2022-11-19", "2023-04-06")
bbox = (-127.0761, 31.6444, -113.9039, 42.6310)



In [12]:
results = earthaccess.search_data(
    concept_id = collection_id,
    #cloud_hosted = True,
    temporal = date_range,
    bounding_box = bbox,
)



Granules found: 139


In [13]:
ds = xr.open_mfdataset(earthaccess.open(results))

AttributeError: 'NoneType' object has no attribute 'open'

In [None]:
ds

In [None]:
search_params = {
    "concept_id": "C2408009906-LPCLOUD", # CMR concept ID for EMITL1BRAD.001
    #"day_night_flag": "day",
    "cloud_cover": (0, 10),
    "temporal": ("2022-05", "2023-08"),
    "bounding_box": (-99.65, 18.85, -98.5, 19.95)
}
results = earthaccess.search_data(**search_params)

## Additional Resources

### Tutorials

This clinic was based off of several notebook tutorials including those presented during [past workshop events](https://nasa-openscapes.github.io/earthdata-cloud-cookbook/tutorials/), along with other materials co-created by the NASA Openscapes mentors:
* [2021 Earthdata Cloud Hackathon](https://nasa-openscapes.github.io/2021-Cloud-Hackathon/)
* [2021 AGU Workshop](https://nasa-openscapes.github.io/2021-Cloud-Workshop-AGU/)
* [Accessing and working with ICESat-2 data in the cloud](https://github.com/nsidc/NSIDC-Data-Tutorials/tree/main/notebooks/ICESat-2_Cloud_Access)
* [Analyzing Sea Level Rise Using Earth Data in the Cloud](https://github.com/betolink/earthaccess-gallery/blob/main/notebooks/Sea_Level_Rise/SSL.ipynb)

### Cloud services

The examples used in the clinic provide an abbreviated and simplified workflow to explore access and subsetting options available through the Earthdata Cloud. There are several other options that can be used to interact with data in the Earthdata Cloud including: 

* [OPeNDAP](https://opendap.earthdata.nasa.gov/) 
    * Hyrax provides direct access to subsetting of NASA data using Python or your favorite analysis tool
    * Tutorial highlighting OPeNDAP usage: https://nasa-openscapes.github.io/earthdata-cloud-cookbook/how-tos/working-locally/Earthdata_Cloud__Data_Access_OPeNDAP_Example.html
* [Zarr-EOSDIS-Store](https://github.com/nasa/zarr-eosdis-store)
    * The zarr-eosdis-store library allows NASA EOSDIS Collections to be accessed efficiently by the Zarr Python library, provided they have a sidecar DMR++ metadata file generated. 
    * Tutorial highlighting this library's usage: https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/09_Zarr_Access.html 

### Support

* [Earthdata Forum](https://forum.earthdata.nasa.gov/)
    * User Services and community support for all things NASA Earthdata, including Earthdata Cloud
* [Earthdata Webinar series](https://www.earthdata.nasa.gov/learn/webinars-and-tutorials)
    * Webinars from DAACs and other groups across EOSDIS including guidance on working with Earthdata Cloud
    * See the [Earthdata YouTube channel](https://www.youtube.com/@NASAEarthdata/featured) for more videos 