# Obtain a List of S3 URLs for a GES DISC Collection Using the CMR API
### Authors: Chris Battisto, Alexis Hunzinger
### Date Authored: 1-31-22
### Date Updated: 2-1-23

### Timing

Exercise: 15 minutes

### Overview

This notebook demonstrates how to obtain a list of S3 URLs for desired cloud-hosted GES DISC granules using the [Commmon Metadata Repository (CMR) API](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html).

### Prerequisites

This notebook was written using Python 3.8, and requires these libraries and files: 
- requests

Identify your data collection of interest and acquire its shortname or concept ID.
- These can be found on the collection's Dataset Landing Page on the GES DISC website. For example, this is the Dataset Landing Page for Hourly MERRA-2 SLV: https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_5.12.4/summary  


## Import Libraries

In [1]:
import requests

## Create a Function for CMR Catalog Requests

In [2]:
def cmr_request(params):
    response = requests.get(url,
                        params=params,
                        headers={
                            'Accept': 'application/json',
                        }
                       )
    return response

## Search CMR Catalogs and Obtain S3 URLs

### Check that the CMR catalog can be accessed

If "200, CMR is accessible" is returned, the catalog can be accessed!

In [3]:
url = 'https://cmr.earthdata.nasa.gov/search/collections'

# Create our request for finding cloud-hosted granules, and check that we can access CMR
response = cmr_request({
                        'cloud_hosted': 'True',
                        'has_granules': 'True'
                        })

if response.status_code == 200:
    print(str(response.status_code) + ", CMR is accessible")
else:
    print(str(response.status_code) + ", CMR is not accessible, check for outages")

200, CMR is accessible


### Search CMR with your desired data collection's shortname or concept ID and desired date range

Using the collection's shortname or concept ID, we can obtain individual granules by querying https://cmr.earthdata.nasa.gov/search/granules. By querying a JSON response of the granules that we want, we can obtain each granule's S3 URL. 

Here, we will parse out an S3 URL for the AQUA AIRS IR + MW Level 2 CLIMCAPS dataset. 
- **Shortname**: SNDRAQIML2CCPRET
- **Concept ID**: C1693440798-GES_DISC

Our desried date range is 00:00:00 to 06:00:00 on January 1, 2016. 

In [5]:
url = 'https://cmr.earthdata.nasa.gov/search/granules'

shortname = 'SNDRAQIML2CCPRET'
concept_id = 'C1693440798-GES_DISC'

start_time = '2016-01-01T00:00:00Z'
end_time = '2016-01-01T06:00:00Z'
ß
# OPTION 1: Using shortname
response = cmr_request({
                        'shortname': shortname,
                        'temporal': start_time+','+end_time,
                        })

# OPTION 2: Using concept ID
response = cmr_request({
                        'concept_id': concept_id,
                        'temporal': start_time+','+end_time,
                        })

# Pretty print the JSON response, where we can see which entry contains the S3 link:
#pprint.pprint(response.json()['feed']['entry'][0]['links'])

#print(response.json()['feed']['entry'])
print(response.headers['CMR-Hits'])
granules = response.json()['feed']['entry']
print(granules)

61
[{'producer_granule_id': 'SNDR.AQUA.AIRS_IM.20151231T2359.m06.g240.L2_CLIMCAPS_RET.std.v02_39.G.210425005357.nc', 'time_start': '2015-12-31T23:59:22.000Z', 'updated': '2021-04-25T00:59:39.000Z', 'dataset_id': 'Sounder SIPS: AQUA AIRS IR + MW Level 2 CLIMCAPS: Atmosphere, cloud and surface geophysical state V2 at GES DISC', 'data_center': 'GES_DISC', 'title': 'SNDRAQIML2CCPRET.2:SNDR.AQUA.AIRS_IM.20151231T2359.m06.g240.L2_CLIMCAPS_RET.std.v02_39.G.210425005357.nc', 'coordinate_system': 'GEODETIC', 'day_night_flag': 'DAY', 'time_end': '2016-01-01T00:05:22.000Z', 'id': 'G2040464612-GES_DISC', 'original_format': 'ECHO10', 'granule_size': '21.317625045776367', 'browse_flag': False, 'polygons': [['-28.39 -142.23 -7.27 -148.33 -9.68 -164.2 -31.15 -160.47 -28.39 -142.23']], 'collection_concept_id': 'C1693440798-GES_DISC', 'online_access_flag': True, 'links': [{'rel': 'http://esipfed.org/ns/fedsearch/1.1/data#', 'title': 'Download SNDR.AQUA.AIRS_IM.20151231T2359.m06.g240.L2_CLIMCAPS_RET.std.

Now, we can parse out that link, and assign it to a variable:

In [20]:
#climcaps_s3_link = response.json()['feed']['entry'][0]['links'][1]['href']
#print(climcaps_s3_link)

#[print('Granule',i['links']) for i in granules]
# for i in granules:
#     print(i['links'])

#'title': 'This link provides direct download access via S3 to the granule'
#'href' is they key for the S3 links

test = granules[0]['links']

In [21]:
test

[{'rel': 'http://esipfed.org/ns/fedsearch/1.1/data#',
  'title': 'Download SNDR.AQUA.AIRS_IM.20151231T2359.m06.g240.L2_CLIMCAPS_RET.std.v02_39.G.210425005357.nc',
  'hreflang': 'en-US',
  'href': 'https://data.gesdisc.earthdata.nasa.gov/data/Aqua_Sounder_Level2/SNDRAQIML2CCPRET.2/2015/365/SNDR.AQUA.AIRS_IM.20151231T2359.m06.g240.L2_CLIMCAPS_RET.std.v02_39.G.210425005357.nc'},
 {'rel': 'http://esipfed.org/ns/fedsearch/1.1/s3#',
  'title': 'This link provides direct download access via S3 to the granule',
  'hreflang': 'en-US',
  'href': 's3://gesdisc-cumulus-prod-protected/Aqua_Sounder_Level2/SNDRAQIML2CCPRET.2/2015/365/SNDR.AQUA.AIRS_IM.20151231T2359.m06.g240.L2_CLIMCAPS_RET.std.v02_39.G.210425005357.nc'},
 {'rel': 'http://esipfed.org/ns/fedsearch/1.1/service#',
  'type': 'application/x-netcdf',
  'title': 'The OPENDAP location for the granule. (GET DATA : OPENDAP DATA)',
  'hreflang': 'en-US',
  'href': 'https://sounder.gesdisc.eosdis.nasa.gov/opendap/Aqua_Sounder_Level2/SNDRAQIML2CCP