# M1.2 - Introduction to NASA Earthdata Search and Re-Analysis Data

*Part of:* **M1: Open Climate Data**

**Contents:**

1. [Creating an account](#Creating-an-account)
2. [How to use Earthdata Search](#How-to-use-Earthdata-Search)
3. [Searching for data](#Searching-for-data)4. [Re-analysis datasets](#Re-analysis-datasets)
5. [MERRA-2 data in Earthdata Search](#MERRA-2-data-in-Earthdata-Search)
6. [Downloading MERRA-2 data](#Downloading-MERRA-2-data)

## Creating an account

NASA's Earthdata is a comprehensive repository of all of NASA's earth science datasets. Earthdata Search is the search engine we can use to explore those datasets.

- [Register for an Earthdata account here.](https://urs.earthdata.nasa.gov/users/new)

## How to use Earthdata Search

**Let's familiarize ourselves with Earthdata Search by searching for some climate data.**

1. Once you're logged in, look at the left-hand side panel. Under **"Filter Collections,"** expand the heading that reads **"Keywords."**
![](./assets/M1_screenshot_Earthdata_Search_keywords.png)
2. Check the box that reads **"Climate Indicators."**
3. The choices in this section can be overwhelming! Let's break down the information presented on the right-hand side of the search panel...
- Each dataset contains a certain number of **granules,** which is another name for a data file.
- Each dataset spans a certain time period; for example, "2003-02-03 ongoing" means that the data have been collected continuously since February 3, 2003.
- Some datasets have an icon that reads **"Earthdata Cloud,"** which means that the dataset is available in the cloud, without the need to download data files to your computer. We'll see how to do this later in the curriculum.

### Information about each dataset

**If you hover over one of the datasets in the right-half of this panel, an information (i) icon will appear.** See the screenshot below for an example. **Click this (i) button.**

![](./assets/M1_screenshot_Earthdata_Search_dataset_hovering.png)

There's a lot of important information here. First, at the very top of this information page, you'll see three boxes, as in the following example:

![](./assets/M1_screenshot_Earthdata_Search_info_page.png)

- **A short name that uniquely identifies this dataset;** in this example, it is: `GRACEADM_CLSM025GL_7D`
- **A version number;** in this example, it is: `Version 3.0`
- **A digital object identifier (DOI).** A DOI is like a website's URL, except that it can uniquely identify any kind of digital information, most often a dataset. If you use a dataset in your research, you should make sure to cite that dataset's DOI in any publication.

---

## Searching for data

In addition to browsing for datasets by thematic area (like "Climate Indicators"), we can search for datasets by name or keyword, just like when we use an internet search engine.

**Type "MERRA-2" into the search box at the top-left of this page, as in the screenshot below.**

![](./assets/M1_screenshot_Earthdata_Search_text_box_search.png)

**Why are there only "5 Matching Collections"?** 

**We forgot to un-check the "Climate Indicators" keyword, from our previous search!** After un-checking that box, there should be hundreds of collections available.

---

## Re-analysis datasets

**Re-analysis datasets** use data assimilation to combine historical ground-based data with a numerical weather prediction model. In data assimilation, the model first tries to predict a certain weather condition (e.g., the rainfall rate on August 31, 2020) and then the predicted value is compared to the actual value that was observed. If the predicted values are significantly different, the model is adjusted so it produces better predictions.

![](./assets/M1_fig_re-analysis.png)

The [**Modern-Era Retrospective Analysis for Research and Applications (MERRA-2)**](https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/) is NASA’s main reanalysis dataset. It incorporates data from multiple satellite platforms and ground-based weather stations in order to predict meteorological conditions world-wide. A vast number of different weather variables are provided on a consistent, global grid with a resolution of 0.625 degrees of longitude by 0.5 degrees of latitude. As with other weather models, there are multiple vertical layers, corresponding either to relative height above the ground or to pressure levels in the atmosphere.

Ingesting and processing all this data takes time, so MERRA-2 is always a few weeks behind; so, as with other re-analysis datasets, MERRA-2 should only be used to describe prevailing or historical climate conditions.

MERRA-2 produces both **instantaneous** and **time-averaged** products. Instantaneous products are snapshots of current meteorological conditions. For example, if the northward wind speed in a 3-hourly instantaneous product is given as 2 meters per second, that was the model-simulated wind speed at that precise moment in time, e.g., 15h00 UTC. In a 3-hourly time-averaged product, on the other hand, the wind speed at 13h30 UTC would be the average over the three hours spanning that central time, e.g., from 12h00 UTC to 15h00 UTC. There are also **daily average** products such as the minimum and maximum daily temperatures.

Similarly, MERRA-2 data products can be **2-dimensional (2D)** or **3-dimensional (3D).** In the 3D case, there are multiple latitude-longitude grids for different vertical levels, usually represented as different levels of atmospheric pressure. Agricultural scientists, farmers, and land managers are probably more interested in the 2D data, which are also called **single-level ("slv")** data. These MERRA-2 products are usually labeled like this:

```
	inst1_2d_slv_*
	inst3_2d_slv_*
```

Above, the `inst` refers to instantaneous 1-hourly or 3-hourly data. You might instead see time-averaged data, labeled like this:

```
	tavg1_2d_slv_*
	tavg3_2d_slv_*
```

Where `tavg1` and `tavg3` refer to *averaged* 1-hourly or 3-hourly data, respectively. In place of `slv` you might see a product labeled `rad` for solar and thermal radiation data or `aer` for aerosols; there are both 2D and 3D versions of these products.


---

## MERRA-2 data in Earthdata Search

Let's take a look at some of these MERRA-2 datasets. Suppose we wanted to identify the minimum daily temperature across our area of interest. From the MERRA-2 documentation, we learn that the dataset we want is called `statD_2d_slv_Nx`.

1. Type "statD_2d_slv_Nx" into the search box at the top-left of Earthdata Search.
2. Click on the box that shows in the results (there should only be one).
3. Take a look at the left-hand side, under "Filter Granules;" note the only real choice here is to filter the granules by time. That's because every MERRA-2 granule covers the entire globe, so spatial filtering isn't necessary. If we put a date into the "Start" box, we’ll see only granules after that date. If we put a date into the “Start” and “End” boxes, we'll see only granules between those dates.
5. If we clicked the Download button right now, we'd get a single file that ends with the file extension `*.nc4`. **Click to download any one of the granules you see; see the screenshot below for which button to click.**

![](./assets/M1_screenshot_Earthdata_Search_download.png)

---

## Downloading MERRA-2 data

**Now that we're starting to download climate datasets, it's time to think critically about file management.** Our project is likely to include a lot of raw datasets like this MERRA-2 granule but also Python scripts, output figures, and research notes. How can we keep it all organized?

There are some simple guidelines that can help you stay organized. **Consider the example file tree, below.**

### Example project organization

![](./assets/M1_file_tree_MERRA2.png)

**Note the following:**

- All of our project's files are kept within a single directory: `my_project`. If any file is related to our project, we ought to be able to point to a single place on our file system where it is kept.
- **Raw data, Python scripts, and results are kept in separate folders.** This is very important, particularly to protect raw data from accidentally being changed. If we want to look up some plot we made, we can go straight to the `results` folder, instead of searching for it in one or more folders that may contain different things. We can imagine different sub-folders within `results` that we might choose to help further organize our results: `tables`, `plots_for_publication`, `summary_statistics`, etc.
- **We chose meaningful filenames:** Our current project is pretty simple, so `MERRA2_processing.ipynb` might be the best name for the Jupyter Notebook where we process the MERRA-2 data. An even better name might be `MERRA2_mean_annual_temp_calculation.ipynb`, to indicate the *purpose* of the script. Similarly, `MERRA2_mean_annual_temp_2023.png` tells us what the file (a plot) represents (mean annual temperature) and the relevant time period (the year 2023).