# M1.5 - Earth Observation Data

*Part of:* **M1: Open Climate Data**

**Contents:**

1. [Satellite data sources](#Satellite-data-sources)
2. [Multi-sensor datasets](#Multi-sensor-datasets)
3. [Using satellite-based precipitation data](#Using-satellite-based-precipitation-data)
4. [Zooming in on a study area](#Zooming-in-on-a-study-area)
5. [Using `cartopy`](#Using-cartopy)
6. [Merging multiple time-series datasets](#Merging-multiple-time-series-datasets)

## Satellite data sources

NASA's constellation of earth-observing satellites offers many sources of information on earth's climate system. NASA's [Eyes on the Earth website](https://eyes.nasa.gov/apps/earth/) shows where many of these satellites are in real time.

Most of NASA's earth-observing satellites have **sun-synchronous, polar orbits** (see figure below). This means that the satellite's orbit always passes (nearly) over the earth's poles as the earth spins beneath it. When the satellite is moving between the poles, it crosses the equator at the same local time, over a different part of the earth.

![](./assets/sun-synchronous.png)

*Image from [NASA's Earth Observatory.](https://earthobservatory.nasa.gov/features/OrbitsCatalog)*

Earth-observing satellites carry sensors pointed at the earth for taking measurements of the earth's surface or atmosphere. The sensors measure some part of the electromagnetic spectrum: either visible light, near-infrared and short-wave infrared light, infrared waves, or microwaves. These sensors can be described in terms of:

- **Spectral resolution:** How many types of electromagnetic energy are detected and how narrow the spacing between wavelength bands is.
- **Spatial resolution:** The smallest target size that can be measured on the ground.
- **Temporal resolution:** How often a sensor acquires data on a specific location.

The temporal resolution is also referred to as the **revisit time:** how long it takes for the sensor to view the same location on the earth from the same viewing angle. The revisit time determines how often we can get information on agricultural systems.

## Multi-sensor datasets

Different satellite missions, whether overlapping or separated by years, sometimes measure the same thing. **Precipitation** is one important climate variable for agriculture that has been measured by multiple satellite missions in NASA's Global Precipitation Measurement (GPM) constellation.

![](./assets/GPM-constellation.jpg)

### IMERG precipitation data

The Integrated Multi-satellitE Retrievals for GPM (IMERG) algorithm combines data from these different GPM missions to estimate total precipitation across the globe in 30-minute intervals and with 10-km resolution. There are three (3) different IMERG products, differentiated by how they integrate data and their latency, or how soon they are made available:

- IMERG "Early" has the lowest latency, available within approximately 4 hours of data collection but may be the least accurate because it only projects forward in time.
- IMERG "Late" has a latency of approximately 14 hours and uses both forward and backward projection to improve estimates.
- IMERG "Final" has a latency of approximately 3.5 months but is the most accurate product as it incorporates ground-based rain gauge data.

## Using satellite-based precipitation data

Parts of northern Algeria and Tunisia experienced flash floods in May 2023. Let's use the IMERG-Late product to quantify the total precipitation that fell across the country on one day in that region.

In [None]:
import earthaccess
import xarray as xr
from matplotlib import pyplot

auth = earthaccess.login()

While IMERG-Final is produced in 30-minute intervals, today we'll be using [a version of the data that have been aggregated to daily time steps.](https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDF_06/summary)

In [None]:
results = earthaccess.search_data(
    short_name = 'GPM_3IMERGDF',
    temporal = ('2023-05-25', '2023-05-30'))

results[0]

In [None]:
# NOTE: open() requires a sequence of file references
files = earthaccess.open(results)
files

In [None]:
ds = xr.open_dataset(files[1])
ds

In [None]:
list(ds.variables.keys())

There are a lot of different variables in this dataset, all described in [the IMERG-Late documentation.](https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDF_06/summary) We'll use the `precipitation` variable, which is the gauge-calibrated, multi-satellite estimate that NASA recommends for general use.

In [None]:
ds['precipitation']

In [None]:
# NOTE: vmax = 100 makes it easier to see lower precipitation values
ds['precipitation'].plot(vmax = 100)

That looks weird! Any ideas about what is wrong?

We need to rotate the plot so that the rows of the image correspond to latitude bands. **Specifically, we need to tell `xarray` that longitude ("lon") should span the X-axis and latitude ("lat") should span the Y-axis.**

In [None]:
ds['precipitation'].plot(x = 'lon', y = 'lat', vmax = 100)

This looks better. But why is most of the image dark?

The easiest way to see more detail in the image is to tell `xarray` to stretch the colorbar so that extreme values don't dominate; we do this with `robust = True`.

In [None]:
ds['precipitation'].plot(x = 'lon', y = 'lat', robust = True)

Alternatively, we could tell `xarray` what the maximum value assigned to a color should be, using the `vmax` keyword argument. There's a corresponding `vmin` argument for the minimum value.

In [None]:
ds['precipitation'].plot(x = 'lon', y = 'lat', vmax = 15)

## Zooming in on a study area

How can we use these data for local applications? We need to figure out a way to focus the map on a smaller area.

Python's built-in `slice()` function can be used with the `sel()` method of an `xarray` DataArray in order to slice a larger array into a smaller array. Here, we focus on a small, rectangular bounding box that inclues Algiers.

In [None]:
# Area between 30-40 degree N latitude and between 6 W and 6 E longitude
precip = ds['precipitation'].sel(lat = slice(30, 40), lon = slice(-6, 6))

precip.plot(x = 'lon', y = 'lat')

## Using `cartopy`

It's a good idea to verify that we're mapping the right part of the world, especially since the IMERG data are rotated. `cartopy` is a Python library that provides some additional mapping tools.

Below, I changed the colormap, `cmap`, so that it is easier to see the dark coastlines on top of the precipitation data.

Note that Plate-Carree is just a fancy term for an equirectangular coordinate system, i.e., a longitude-latitude plot.

In [None]:
from cartopy import crs

proj = crs.PlateCarree()
style = {
    'projection': proj
}

plot = precip.plot(subplot_kws = style, transform = proj, cmap = 'magma_r', x = 'lon', y = 'lat')
plot.axes.coastlines()

[You can see what other color maps are available here.](https://matplotlib.org/stable/gallery/color/colormap_reference.html) Any colormap can be reversed by adding `'_r'` to the end of the colormap's name.

**It's important to choose the right colormap for your data.** People sometimes think that a rainbow color scale is better because it has "more colors." However, in the example below, you can see that a rainbow color scale emphasizes different parts of the linear scale. The change in color between 40 and 60 mm seems much sharper than the change between 0 and 20 mm, even though it's the same step size (20 mm difference). This is an example of how the rainbow color scale fails to provide **perceptual linearity;** the perception of a change in hue or brightness that is proportional to the change in the numeric value (e.g., precipitation).

In [None]:
plot = precip.plot(subplot_kws = style, transform = proj, cmap = 'jet', x = 'lon', y = 'lat')
plot.axes.coastlines()

**But rainbow color scales are also problematic for color-blind viewers, as you can see in the simulation of what some color-blind viewers would experience when looking at different color scales.**

![](assets/M1_fig_colorblind_scales.jpg)

*Image from [Light & Bartlein (2004)](https://eos.org/features/the-end-of-the-rainbow-color-schemes-for-improved-data-graphics)*

**Fortunately, most of the colormaps available in `xarray` and `matplotlib` are perceptually linear, and many of those remain consistent for colorblind viewers.** You can see some of these colormaps in more detail at [Dr. Cynthia Brewer's website.](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3)

**So, how much rain fell around Algiers on this day?**

In [None]:
precip.sel(lon = 3.059, lat = 36.754, method = 'nearest').values

That's a good amount of rain for this region in a single day, but from the map above it's clear that there are areas near Algiers that received more rain. What's the maximum rainfall total for the coast of Algiers?

In [None]:
precip.sel(lon = slice(3, 3.2), lat = slice(36.5, 36.8)).max()

Just a reminder, because the `values` are returned as a NumPy array, we can do math on these arrays, treating them as if they were just numbers. So, a conversion from mm to meters is easy:

In [None]:
precip.sel(lon = slice(3, 3.2), lat = slice(36.5, 36.8)).max() / 1000

## Merging multiple time-series datasets

Each IMERG granule in this collection is a single, daily precipitation estimate.

In [None]:
files

Earlier, we developed a temperature time series by opening each MERRA-2 file in a `for` loop. We can get a precipitation time series the same way.

In [None]:
# This may take half a minute
datasets = []
for filename in files:
    ds = xr.open_dataset(filename)
    datasets.append(ds['precipitation'])

In [None]:
len(datasets)

In the MERRA-2 example, we extracted the temperature value at a specific location before appending it to a list. But `xarray` is capable of representing multiple time steps in a single dataset. Is there a way to merge adjacent time steps together?

We can do just that with the `concat()` function in `xarray`. We specify that the multiple datasets we've open should be joined along the `'time'` dimension.

In [None]:
time_series = xr.concat(datasets, dim = 'time')

Now that we have all the time steps in a single dataset, when we use label-based indexing with the `sel()` function, we're able to get a point estimate for each time step (6 days in total).

In [None]:
time_series.sel(lon = 3.059, lat = 36.754, method = 'nearest')

In [None]:
time_series.sel(lon = 3.059, lat = 36.754, method = 'nearest').plot()