# How to Access GES DISC Data Using Python

<p></p>

<div style="background:#eeeeee; border:1px solid #000000; padding:5px 10px; color:#000000;">
    Please, be very judicious when working on long data time series residing on a remote data server.<br />
    It is very likely that attempts to apply similar approaches on remote data, such as hourly data, for more than a year of data at a time, will result in a heavy load on the remote data server. This may lead to negative consequences, ranging from very slow performance that will be experienced by hundreds of other users, up to denial of service.
</div>

### Overview

There are multiple ways to work with GES DISC data resources using Python. For example, the data can accessed using [techniques that rely on a native Python code](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html). 

Still, there are several third-party libraries that can further simplify the access. In the sections below, we demonstrate downloading and streaming granules to the notebook using these libraries.

The examples will use a sample MERRA-2 granule, from the [M2T1NXSLV.5.12.4 collection](https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_5.12.4/summary?keywords=M2T1NXSLV_5.12.4), to demonstrate data access.

### Prerequisites

***Note:*** An Earthdata Login account with the "NASA GES DISC DATA ARCHIVE" and "Hyrax in the Cloud" applications enabled are required to access GES DISC data and store "Earthdata prerequisite files". To create an Earthdata Login account, and enable these applications, please visit [this guide](https://disc.gsfc.nasa.gov/earthdata-login).

This notebook was written using Python 3.10, and requires these libraries and files:

- `netrc` file with valid Earthdata Login credentials
   - [How to Generate Earthdata Prerequisite Files](https://disc.gsfc.nasa.gov/information/howto?title=How%20to%20Generate%20Earthdata%20Prerequisite%20Files)
- [requests](https://docs.python-requests.org/en/latest/) (version 2.22.0 or later)
- [pydap](https://github.com/pydap/pydap) (we recommend using version 3.4.0 or later)
- [xarray](https://docs.xarray.dev/en/stable/)
- [netCDF4-python](https://github.com/Unidata/netcdf4-python) (we recommend using version 1.6.2)
- [earthaccess](https://earthaccess.readthedocs.io/en/latest/quick-start/)
- ***Optional:***
   - For OPeNDAP examples, this notebook can be run using the ['opendap' YAML file](https://github.com/nasa/gesdisc-tutorials/tree/main/environments/opendap.yml) provided in the 'environments' subfolder. Please follow the instructions [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) to install and activate this environment.
 

### Contents
* [Download Full Granule Data](#download_full_granules)
    * [Option 1: Use `requests`](#download_requests)
    * [Option 2: Use `earthaccess`](#download_earthaccess)
* [Stream Full Granule Data](#stream_full_granules)

* [Subset and Stream Granule Data from OPeNDAP Servers](#opendap)
    * [Option 1: Use `pydap` with Earthdata Login credentials ](#opendap_pydap)
    * [Option 2: Use `xarray`](#opendap_xarray)
    * [Option 3: Use `netcdf4-python`](#opendap_netcdf4-python)
* [Subset and Stream Granule Data from THREDDS Servers](#thredds)
    * [Option 1: Use `xarray`](#thredds_xarray)

### Links Used in this Notebook

There are several example links that will be used to access data from the same granule. Each link can be searched for using several tools, including [Earthdata Search](https://search.earthdata.nasa.gov/search/granules/granule-details?p=C1276812863-GES_DISC&pg[0][v]=f&pg[0][qt]=1980-01-01%2C1981&pg[0][gsk]=-start_date&g=G1277898447-GES_DISC&q=M2T1NXSLV&tl=1723660883!3!!), the [dataset landing page](https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_5.12.4/summary?keywords=M2T1NXSLV_5.12.4) for the particular collection, or through the [Content Metadata Repository](https://cmr.earthdata.nasa.gov/virtual-directory/collections/C1276812863-GES_DISC/temporal/1980/01/01).

Links used in this notebook:
- HTTPS: https://data.gesdisc.earthdata.nasa.gov/data/MERRA2/M2T1NXSLV.5.12.4/1980/01/MERRA2_100.tavg1_2d_slv_Nx.19800101.nc4
- OPeNDAP: 
    - OPeNDAP Subsetting Page: https://opendap.earthdata.nasa.gov/collections/C1276812863-GES_DISC/granules/M2T1NXSLV.5.12.4%3AMERRA2_100.tavg1_2d_slv_Nx.19800101.nc4.dmr.html
    - Example OPeNDAP URL (for `Xarray` and `pydap` access only): https://opendap.earthdata.nasa.gov/collections/C1276812863-GES_DISC/granules/M2T1NXSLV.5.12.4%3AMERRA2_100.tavg1_2d_slv_Nx.19800101.nc4
- THREDDS: 
    - Example THREDDS URL Subsetting Page: https://goldsmr4.gesdisc.eosdis.nasa.gov/thredds/dodsC/MERRA2_aggregation/M2T1NXSLV.5.12.4/M2T1NXSLV.5.12.4_Aggregation_1980.ncml.html
    - Example THREDDS URL (for `Xarray` access only): https://goldsmr4.gesdisc.eosdis.nasa.gov/thredds/dodsC/MERRA2_aggregation/M2T1NXSLV.5.12.4/M2T1NXSLV.5.12.4_Aggregation_1980.ncml


---



### Download Full Granule Data <a class="anchor" id="download_full_granules"></a>

#### Option 1: Use `requests` <a class="anchor" id="download_requests"></a>

`requests` is a popular Python library that simplifies Python access to Internet-based resources. In the following code, we demonstrate how to use 'Requests' to access GES DISC data using cookies created by a host operating system.

In [1]:
import requests

URL = 'https://data.gesdisc.earthdata.nasa.gov/data/MERRA2/M2T1NXSLV.5.12.4/1980/01/MERRA2_100.tavg1_2d_slv_Nx.19800101.nc4'

# Set the FILENAME string to the data file name, the LABEL keyword value, or any customized name. 
# Remember to include the same file extension as in the URL.
FILENAME = 'MERRA2_100.tavg1_2d_slv_Nx.19800101.nc4'

import requests
result = requests.get(URL)
try:
    result.raise_for_status()
    f = open(FILENAME,'wb')
    f.write(result.content)
    f.close()
    print('contents of URL written to '+FILENAME)
except:
    print('requests.get() returned an error code '+str(result.status_code))

contents of URL written to MERRA2_100.tavg1_2d_slv_Nx.19800101.nc4


#### Option 2: Use `earthaccess`  <a class="anchor" id="download_earthaccess"></a>

The `earthaccess` library can be used to search for granules and download them to your local machine. The `search_data` function will search for granules inside the specified temporal and bounding box ranges, and will return a list of URLs to be downloaded. Finally, it will download these URLs, assuming you have been authenticated using your previously-generated Earthdata prerequisite files.

Please note that as of August 2024, `earthaccess` does not have the ability to return OPeNDAP URLs.

In [2]:
import earthaccess

# This will work if Earthdata prerequisite files have already been generated
auth = earthaccess.login()

# To download multiple files, change the second temporal parameter
results = earthaccess.search_data(
    short_name='M2T1NXSLV',
    version='5.12.4',
    temporal=('1980-01-01', '1980-01-01'), # This will download the same 1980-01-01 granule used elsewhere in this notebook
    bounding_box=(-180, 0, 180, 90)
)

downloaded_files = earthaccess.download(
    results,
    local_path='.', # Change this string to download to a different path
)

Granules found: 1
 Getting 1 granules, approx download size: 0.39 GB


SUBMITTING | :   0%|          | 0/1 [00:00<?, ?it/s]

File MERRA2_100.tavg1_2d_slv_Nx.19800101.nc4 already downloaded


PROCESSING | :   0%|          | 0/1 [00:00<?, ?it/s]

COLLECTING | :   0%|          | 0/1 [00:00<?, ?it/s]

### Stream Full Granule Data Using Python <a class="anchor" id="stream_full_granules"></a>

The `earthaccess` library has the ability to "stream" the full data of a granule to an Xarray dataset object, without having to download before opening in your current notebook session. Please note that this will stream the full data and every variable of the granule to the notebook, which may take extra time. To access one variable at a time or perform subsetting, please access data from an OPeNDAP server.

In [3]:
import earthaccess
import xarray as xr

# This will work if Earthdata prerequisite files have already been generated
auth = earthaccess.login()

# We recommend only streaming one granule at a time, as some collections can be quite large
results = earthaccess.search_data(
    short_name='M2T1NXSLV',
    version='5.12.4',
    cloud_hosted = False,
    temporal=('1980-01-01', '1980-01-01'), # This will download the same 1980-01-01 granule used elsewhere in this notebook
    bounding_box=(-180, 0, 180, 90)
)

fs = earthaccess.get_fsspec_https_session()
f = fs.open(results[0].data_links()[0]) # Extracts the single URL from the results variable

ds = xr.open_dataset(f)
ds


Granules found: 1


ERROR 1: PROJ: proj_create_from_database: SQLite error on SELECT name, type, coordinate_system_auth_name, coordinate_system_code, datum_auth_name, datum_code, area_of_use_auth_name, area_of_use_code, text_definition, deprecated FROM geodetic_crs WHERE auth_name = ? AND code = ?: no such column: area_of_use_auth_name
PROJ: proj_create_from_database: SQLite error on SELECT name, type, coordinate_system_auth_name, coordinate_system_code, datum_auth_name, datum_code, area_of_use_auth_name, area_of_use_code, text_definition, deprecated FROM geodetic_crs WHERE auth_name = ? AND code = ?: no such column: area_of_use_auth_name


### Subset and Stream Granule Data From OPeNDAP Servers <a class="anchor" id="opendap"></a>

Rather than having to download or stream an entire granule, you can access data from an OPeNDAP server, which will allow you to view dataset metadata and subset single or multiple varibles before its data is streamed to the current notebook session.

Please note that when accessing data from OPeNDAP servers, you will experience errors if your `.dodsrc` prerequisite file is not generated and properly stored in addition to your `.netrc` file.

#### Option 1a: Use `pydap` with Earthdata Login Credentials  <a class="anchor" id="opendap_pydap"></a>

A convenient access to GES DISC OPeNDAP resources can be also achieved with `Pydap`, a Python library that both provides an interface for Python programs to read from OPeNDAP servers and the netCDF4 Python module which uses the netCDF-C library to actually access data.


In [4]:
from pydap.client import open_url
from pydap.cas.urs import setup_session
import getpass

dataset_url = 'https://opendap.earthdata.nasa.gov/collections/C1276812863-GES_DISC/granules/M2T1NXSLV.5.12.4%3AMERRA2_100.tavg1_2d_slv_Nx.19800101.nc4'

prompts = [
    'Enter NASA Earthdata Login Username \n(or create an account at urs.earthdata.nasa.gov): ',
    'Enter NASA Earthdata Login Password: '
]

username = input(prompts[0])
password = getpass.getpass(prompts[1])

try:
    session = setup_session(username, password, check_url=dataset_url)
    dataset = open_url(dataset_url, session=session)
    print(dataset['T2M']) # Select a variable and view its data
except AttributeError as e:
    print('Error:', e)
    print('Please verify that the dataset URL points to an OPeNDAP server, the OPeNDAP server is accessible, or that your username and password are correct.')

<GridType with array 'T2M' and maps 'time', 'lat', 'lon'>


#### Option 1b: Use `pydap` with stored Earthdata Login Token  <a class="anchor" id="opendap_pydap"></a>

If you have a `.edl_token` file stored, you can use the `Pydap` library to access OPeNDAP data using the stored token.

In [1]:
from pydap.client import open_url
import requests
import os

token_file_path = os.path.join(os.path.expanduser("~"), ".edl_token")

# Read the token from the .edl_token file
with open(token_file_path, 'r') as token_file:
    token = token_file.read().strip()  # Ensure to strip any newlines or extra spaces

my_session = requests.Session()
my_session.headers={"Authorization": token}

dataset = open_url('https://opendap.earthdata.nasa.gov/collections/C1276812863-GES_DISC/granules/M2T1NXSLV.5.12.4%3AMERRA2_100.tavg1_2d_slv_Nx.19800101.nc4', session=my_session)
print(dataset)

<DatasetType with children 'lon', 'time', 'lat', 'TROPPB', 'T2M', 'TQL', 'T500', 'TOX', 'U2M', 'U850', 'PS', 'V850', 'OMEGA500', 'H250', 'Q250', 'T2MDEW', 'PBLTOP', 'V250', 'CLDPRS', 'V50M', 'Q500', 'DISPH', 'H1000', 'TO3', 'TS', 'T10M', 'TROPPT', 'TQI', 'SLP', 'TROPT', 'U250', 'Q850', 'ZLCL', 'TQV', 'V2M', 'T250', 'TROPQ', 'V10M', 'H850', 'T850', 'U50M', 'U10M', 'QV2M', 'CLDTMP', 'TROPPV', 'H500', 'V500', 'T2MWET', 'U500', 'QV10M'>


#### Option 2: Use `xarray` <a class="anchor" id="opendap_xarray"></a>

The `xarray` library allows for OPeNDAP URLs to be streamed directly to the notebook, as long as Earthdata Login prerequisite files are present and correct. Please note that this library works best on non-subsetted OPeNDAP URLs, and that subsetting should be done programmatically using `xarray` functions. For more information on these functions, see their documentation: https://docs.xarray.dev/en/stable/user-guide/indexing.html

If you wish to save the subsetted granule locally, please use the `to_dataset` function, documented here: https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_netcdf.html


In [5]:
import xarray as xr

# Reading a single granule URL:
ds = xr.open_dataset('https://opendap.earthdata.nasa.gov/collections/C1276812863-GES_DISC/granules/M2T1NXSLV.5.12.4%3AMERRA2_100.tavg1_2d_slv_Nx.19800101.nc4')
ds['T2M']

#### Option 3: Use `netcdf4-python` <a class="anchor" id="opendap_netcdf4-python"></a>

`netCDF4-python` is a Python library that uses the [netCDF-c](https://github.com/Unidata/netcdf-c) library to open and read netCDF4 files. It can be used to remotely access OPeNDAP netCDF4 granules with optional URL subsetting, or locally downloaded netCDF4 granules.

In [6]:
import netCDF4 as nc4

nc = nc4.Dataset('https://opendap.earthdata.nasa.gov/collections/C1276812863-GES_DISC/granules/M2T1NXSLV.5.12.4%3AMERRA2_100.tavg1_2d_slv_Nx.19800101.nc4')
nc['T2M'][:]

masked_array(
  data=[[[244.07703, 244.07703, 244.07703, ..., 244.07703, 244.07703,
          244.07703],
         [244.01453, 244.02234, 244.03015, ..., 243.99109, 243.9989 ,
          244.00671],
         [244.55359, 244.5614 , 244.57703, ..., 244.52234, 244.53015,
          244.54578],
         ...,
         [253.21375, 253.22156, 253.22937, ..., 253.19226, 253.20007,
          253.20789],
         [253.93445, 253.9364 , 253.9403 , ..., 253.92468, 253.92859,
          253.93054],
         [254.40125, 254.40125, 254.40125, ..., 254.40125, 254.40125,
          254.40125]],

        [[243.79819, 243.79819, 243.79819, ..., 243.79819, 243.79819,
          243.79819],
         [243.88412, 243.89194, 243.89975, ..., 243.85287, 243.86069,
          243.87631],
         [244.56381, 244.57944, 244.58725, ..., 244.54037, 244.54819,
          244.556  ],
         ...,
         [252.97006, 252.97787, 252.9896 , ..., 252.94272, 252.95053,
          252.96225],
         [253.75717, 253.75912, 253.

### Subset and Stream Granule Data from THREDDS Servers <a class="anchor" id="thredds"></a>

Datasets that include <code>.ncml</code> aggregation, like some provided through THREDDS, may be useful for quickly subsetting multiple granules into a single data array.

This operation requires a <code>.dodsrc</code> file in your root and working directories, and a <code>.netrc</code> file in your root directory.

**NOTE:** Please use a reasonable spatiotemporal subset when calling from THREDDS servers. Subsets that are too large will cause data access errors, or rate limiting on your IP address.

#### Option 1: Use `xarray` <a class="anchor" id="thredds_xarray"></a>

We recommend using the `xarray` library when interacting with THREDDS URLs, due to its built-in authentication and subsetting capabilities.

In [7]:
import xarray as xr

# Subsetting a .ncml file URL:
URL = 'https://goldsmr4.gesdisc.eosdis.nasa.gov/thredds/dodsC/MERRA2_aggregation/M2T1NXSLV.5.12.4/M2T1NXSLV.5.12.4_Aggregation_1980.ncml'

lat_slice = slice(41, 43)
lon_slice = slice(-89, -87)
time_slice = slice('1980-01-01')

try:
    ds = xr.open_dataset(URL).sel(lat=lat_slice,lon=lon_slice,time=time_slice)
    print(ds)
except OSError as e:
    print('Error', e)
    print('Please check that your .dodsrc files are in their correct locations, or that your .netrc file has the correct username and password.')

<xarray.Dataset>
Dimensions:   (lon: 3, lat: 5, time: 24)
Coordinates:
  * lon       (lon) float64 -88.75 -88.12 -87.5
  * lat       (lat) float64 41.0 41.5 42.0 42.5 43.0
  * time      (time) datetime64[ns] 1980-01-01T00:30:00 ... 1980-01-01T23:30:00
Data variables: (12/47)
    CLDPRS    (time, lat, lon) float32 ...
    CLDTMP    (time, lat, lon) float32 ...
    DISPH     (time, lat, lon) float32 ...
    H1000     (time, lat, lon) float32 ...
    H250      (time, lat, lon) float32 ...
    H500      (time, lat, lon) float32 ...
    ...        ...
    V250      (time, lat, lon) float32 ...
    V2M       (time, lat, lon) float32 ...
    V500      (time, lat, lon) float32 ...
    V50M      (time, lat, lon) float32 ...
    V850      (time, lat, lon) float32 ...
    ZLCL      (time, lat, lon) float32 ...
Attributes: (12/30)
    History:                           Original file generated: Sat Jun 21 10...
    Comment:                           GMAO filename: d5124_m2_jan79.tavg1_2d...
    Fil