# M1.5 - Accessing MERRA-2 Data in the Cloud

*Part of:* **M1: Open Climate Data**

**Contents:**

1. [Using `earthaccess`](#Using-earthaccess)
2. [Getting a temperature time series](#Getting-a-temperature-time-series)

In [None]:
import earthaccess
import xarray as xr
from matplotlib import pyplot

auth = earthaccess.login()

## Using `earthaccess`

Previously, we manually downloaded a netCDF4 file from NASA's Earthdata Search. Now, let's see how we can instead use Python code to download data from the cloud. 

The `earthaccess` library can be used to search and download data from Earthdata Search without opening a web browser and clicking around. For some applications, this can enable us to write Python code that is more transparent and reproducible, because it allows someone else to easily obtain the same raw data that we are starting with.

We use `earthaccess.search_data()` to get one or more **granules** that match a search query. A granule is a single dataset associated with a particular NASA mission or product. 

**Do you remember where the short name for Earthdata products is found?** In the information (i) page, there are two locations:

![](./assets/M1_screenshot_Earthdata_Search_MERRA2_info.png)

**We use the `short_name` (short name) to indicate which collection of data granules we want to search.**

In this example, we find that there are 32 granules between May 1 and June 1, inclusive.

In [None]:
results = earthaccess.search_data(
    short_name = 'M2SDNXSLV',
    temporal = ("2023-05", "2023-06"))
results[0]

No data have been downloaded yet; we just have a reference to the data that is stored in the cloud.

In [None]:
type(results[0])

To download a granule, we can use `earthaccess.open()`. **An important thing to note about `earthaccess.open()` is that it requires a *sequence* of granules to open.** Therefore, even if we want to open just a single granule, that granule must be given as part of a Python list or tuple.

In [None]:
# NOTE: open() requires a sequence of file references
files = earthaccess.open(results[0:2])

In [None]:
type(files[0])

The granules have been downloaded into our computer's memory. To open one of the downloaded files in Python, we now use `xarray`. There is a slight lag when using `open_dataset()` on a granule downloaded using `earthaccess.open()` because `xarray` has to analyze the file and figure out the coordinate system and dimensions.

In [None]:
ds2 = xr.open_dataset(files[0])
ds2

The resulting `xarray.Dataset` is ready for plotting!

In [None]:
ds2['T2MMIN'].plot()

## Getting a temperature time series

This worked great for a single file, but what if we wanted to generate a time series of climate data? We know that `xarray` datasets can have a time dimension, making them capable of representing more than one instance in time. How do we create such a dataset?

In this next example, we use a `for` loop to iterate over the MERRA-2 granules, opening each one and then selecting the `T2MMIN` (minimum temperature) value at a specific location. We add this value to a list, along with the date (`"time"`) of the granule, to build up a time series dataset.

In [None]:
# NOTE: This example may take half a minute with a good internet connection.
results = earthaccess.search_data(
    short_name = 'M2SDNXSLV',
    temporal = ("2023-05", "2023-06"))

time_list = []
data_list = []
file_list = earthaccess.open(results)
for filename in file_list:
    ds = xr.open_dataset(filename)
    data_list.append(ds['T2MMIN'].sel(lat = 36.5, lon = 3.125).values)
    time_list.append(ds['T2MMIN']['time'].values)

We now have a Python `list` of minimum temperature values and another `list` of dates. Below, we plot these data, after using `numpy` to convert the temperature data from degrees Kelvin (K) to degrees Celsius (C).

In [None]:
import numpy as np

# Convert from deg K to deg C
data = np.array(data_list).ravel() - 273.15
time = np.array(time_list).ravel()

pyplot.figure(figsize = (12, 4))
pyplot.plot(time, data)
pyplot.ylabel('Min. Temperature (deg C)')