# Bullk Download of Megacities Data by Species

### **Summary:**

This notebook contains Python code to query and extract metadata and data [**TROPESS CrIS SNPP Megacities** data products](https://daac.gsfc.nasa.gov/datasets?page=3&project=TROPESS&keywords=megacity) from the NASA Earthdata CMR API using the [**earthaccess Python library**](https://github.com/nsidc/earthaccess) for a user-defined date/time range and species type (gas and/or atmospheric parameters) as publicly hosted and managed by **GES DISC**. This notebook utilizes the [**Los Angeles Megacity** Standard Product](https://daac.gsfc.nasa.gov/datasets/TRPSDL2ALLCRSMGLOS_1/summary) for data files from 1 January 2020 to 31 Decemember 2020, which is queried using the DOI (Digital Object Identifier; see cell #5 below). While the notebook can be adjusted for a user-defined start/end date/time and gas species (see cell #6 below), it is intialized using the NH3 (Ammonia) gas species. 

A .netrc file containing an individual user's **NASA Earthdata Login** credentials (not supplied with this notebook) can either be supplied by the user up-front (in the user's home directory) or created on-demand with an embedded prompt inside this notebook. If preferred to do so manually, below are the steps to creating a .netrc file:

  1. Create an **Earthdata Login Account** ([clear here for details](https://urs.earthdata.nasa.gov/users/new)).
  2. Approve access to the **NASA GES DISC Archive** in your Earthdata Login Profile ([click here for details](https://disc.gsfc.nasa.gov/earthdata-login)).
  3. Create/modify a **.netrc** file in your home directory containing your **Earthdata Login** credentials as shown below:
     <br>`machine urs.earthdata.nasa.gov login <your username> password <your password>`


**Other important notes:**

  1. This code was designed and tested **"as-is"** to run in a generic Jupyter Lab/Hub environment.
  2. Please **"trust"** this notebook before executing; this will prevent errors when writing output to your local directory.
  3. Please create a **"data"** directory in your local path to store data that is downloaded by this notebook.
  4. For ease of managing your computing environment dependencies, an [**environment.yml** file](https://github.com/NASA-TROPESS/tutorials_notebooks/blob/main/environment.yml) is available. 

### Notebook Author / Affiliation

David F. Moroni (david.f.moroni@jpl.nasa.gov) / Jet Propulsion Laboratory, Californa Institute of Technology

### Date Authored

16 April 2024

### Acknowledgements

The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004). Government sponsorship acknowledged.

In [1]:
import earthaccess

from earthaccess import Auth, Store, DataCollections, DataGranules
auth = Auth()

import requests

## Verify Successful Authentication
### Prompt for NASA Earthdata Login Credentials (if .netrc file doesn't already exist)
### A .netrc file will be created if it doesn't already exist.

In [2]:
auth = earthaccess.login(strategy="interactive", persist=True)

auth.login(strategy="netrc")
authvalid = auth.authenticated
print('Authenication Valid =',authvalid)

# The Store() class enables download or access to data and is instantiated with the user's auth instance.
store = Store(auth)

Authenication Valid = True


## Create a Function for CMR Catalog Requests

In [3]:
def cmr_request(params):
    response = requests.get(url,
                        params=params,
                        headers={
                            'Accept': 'application/json',
                        }
                       )
    return response

## Check that the CMR catalog can be accessed

If "200, CMR is accessible" is returned, the catalog can be accessed!

In [4]:
url = 'https://cmr.earthdata.nasa.gov/search/collections'

# Create our request for finding cloud-hosted granules, and check that we can access CMR
response = cmr_request({
                        'cloud_hosted': 'True',
                        'has_granules': 'True'
                        })

if response.status_code == 200:
    print(str(response.status_code) + ", CMR is accessible")
else:
    print(str(response.status_code) + ", CMR is not accessible, check for outages")

200, CMR is accessible


## Query CMR for the dataset shortname using the DOI (digital object identifier)

In [5]:
# CMR API base url
cmrurl='https://cmr.earthdata.nasa.gov/search/' # define the base url of NASA's CMR API as the variable `cmrurl`
doi = '10.5067/0QPQFIXDET1X'                   # TROPESS dataset DOI

doisearch = cmrurl + 'collections.json?doi=' + doi
print(doisearch)

shortname = requests.get(doisearch).json()['feed']['entry'][0]['short_name']
print('Short Name = '+shortname)

https://cmr.earthdata.nasa.gov/search/collections.json?doi=10.5067/0QPQFIXDET1X
Short Name = TRPSDL2ALLCRSMGLOS


## Specify input parameters and execute the CMR query for matching granules

In [6]:
# Define the start and end date/time (YYYY-MM-DD).
start_time = '2020-01-01'
end_time = '2020-12-31'

# Select Species Type; options include:  TATM, CH4, HDO, H2O, NH3, PAN, O3
species = 'NH3'

# CMR Query for Data Granules
query = DataGranules().granule_name('*'+species+'*').short_name(shortname).temporal(start_time, end_time)

# How many granules are located in the query?
print(f"Granule hits: {query.hits()}")

# Extract all available granule metadata from the query; invoking "-1" for all avaialble granules.
granules = query.get()

# Print sample list of granule metadata
granules[0:1]

Granule hits: 366


[Collection: {'ShortName': 'TRPSDL2ALLCRSMGLOS', 'Version': '1'}
 Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'BoundingRectangles': [{'WestBoundingCoordinate': -120.0, 'EastBoundingCoordinate': -117.0, 'NorthBoundingCoordinate': 36.0, 'SouthBoundingCoordinate': 33.0}]}}}
 Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2020-01-01T00:00:00.000Z', 'EndingDateTime': '2020-01-01T23:59:59.000Z'}}
 Size(MB): 0.32215404510498047
 Data: ['https://data.gesdisc.earthdata.nasa.gov/data/TROPESS_MegaCities_Standard/TRPSDL2ALLCRSMGLOS.1/2020/TROPESS_CrIS-SNPP_L2_Standard_NH3_20200101_MUSES_R1p12_SC_MGLOS_F0p6.nc']]

## If previous cell produced nonzero Granule hits, execute next cell to download the data files

In [7]:
try:
    files = store.get(granules[:], local_path="./data")
except Exception as e:
    print(f"Error: {e}, we are probably not using this code in the Amazon cloud. Trying external links...")
    # There is hope, even if we are not in the Amazon cloud we can still get the data
    files = store.get(granules[:], access="external", local_path="./data")

 Getting 366 granules, approx download size: 0.12 GB
Accessing cloud dataset using dataset endpoint credentials: https://data.gesdisc.earthdata.nasa.gov/s3credentials
Downloaded: data/TROPESS_CrIS-SNPP_L2_Standard_NH3_20200101_MUSES_R1p12_SC_MGLOS_F0p6.nc
Downloaded: data/TROPESS_CrIS-SNPP_L2_Standard_NH3_20200102_MUSES_R1p12_SC_MGLOS_F0p6.nc
Downloaded: data/TROPESS_CrIS-SNPP_L2_Standard_NH3_20200103_MUSES_R1p12_SC_MGLOS_F0p6.nc
Downloaded: data/TROPESS_CrIS-SNPP_L2_Standard_NH3_20200104_MUSES_R1p12_SC_MGLOS_F0p6.nc
Downloaded: data/TROPESS_CrIS-SNPP_L2_Standard_NH3_20200105_MUSES_R1p12_SC_MGLOS_F0p6.nc
Downloaded: data/TROPESS_CrIS-SNPP_L2_Standard_NH3_20200106_MUSES_R1p12_SC_MGLOS_F0p6.nc
Downloaded: data/TROPESS_CrIS-SNPP_L2_Standard_NH3_20200107_MUSES_R1p12_SC_MGLOS_F0p6.nc
Downloaded: data/TROPESS_CrIS-SNPP_L2_Standard_NH3_20200108_MUSES_R1p12_SC_MGLOS_F0p6.nc
Downloaded: data/TROPESS_CrIS-SNPP_L2_Standard_NH3_20200109_MUSES_R1p12_SC_MGLOS_F0p6.nc
Downloaded: data/TROPESS_CrIS-SN