Learning Objectives

- Read MODIS data in HDF4 format into Python using open source packages (xarray).
- Extract metadata from HDF4 files.
- Plot data extracted from HDF4 files.

In this lesson, you will learn how to **open a MODIS HDF4 format file using xarray.**

In [2]:
# Import packages
import os
import warnings

import matplotlib.pyplot as plt
import numpy.ma as ma
import xarray as xr
import rioxarray as rxr
from shapely.geometry import mapping, box
import geopandas as gpd
import earthpy as et
import earthpy.plot as ep

warnings.simplefilter('ignore')

# Get the MODIS data
et.data.get_data('cold-springs-modis-h4')

# This download is for the fire boundary
et.data.get_data('cold-springs-fire')

# Set working directory
os.chdir(os.path.join(et.io.HOME,
                      'earth-analytics',
                      'data'))

Downloading from https://ndownloader.figshare.com/files/10960112
Extracted output to C:\Users\34639\earth-analytics\data\cold-springs-modis-h4\.


## Hierarchical Data Formats - HDF4 - EOS in Python
You can use rioxarray to open HDF4 data. Note that both tools wrap around gdal and will make the code needed to open your HDF4 data, simpler.



To begin, create a path to your HDF4 file.

In [3]:
# Create a path to the pre-fire MODIS h4 data
modis_pre_path = os.path.join("cold-springs-modis-h4",
                              "07_july_2016",
                              "MOD09GA.A2016189.h09v05.006.2016191073856.hdf")
modis_pre_path

'cold-springs-modis-h4\\07_july_2016\\MOD09GA.A2016189.h09v05.006.2016191073856.hdf'

## Open HDF4 Files Using Open Source Python and Xarray
HDF files are hierarchical and self describing (the metadata is contained within the data). Because the data are hierarchical, you will have to loop through the main dataset and the subdatasets nested within the main dataset to access the reflectance data (the bands) and the qa layers.

Below you open the HDF4 file. Notice that rioxarray
returns a list rather than an single xarray object. Within that list are two xarray objects representing the two groups in the h4 file.

In [4]:
# Open data with rioxarray
modis_pre = rxr.open_rasterio(modis_pre_path,masked=True)
type(modis_pre)

list

In [8]:
#modis_pre

The first object returned in the list contains all of the quality control layers. Notice that each layer is stored as a data variable.

In [5]:
# This is just a data exploration step
modis_pre_qa = modis_pre[0]
modis_pre_qa

**You can access a data variable in a similar fashion to how you would access a column in a pandas DataFrame using the ["variable-name-here"]**.

In [10]:
modis_pre_qa["granule_pnt_1"]

The second element in the list contains the reflectance data. This is the data that you will want to use for your analysis

In [11]:
# Reflectance data
modis_pre_bands = modis_pre[1]
modis_pre_bands

## Subset Data By Group or Variable
If you need to open the entire dataset, you can follow the steps above. Alternatively you can specific subgroups or even layers / variables
in the data to open specifically using the **group=** parameter.

There are a few ways to get the group names. One manual way is to use the HDF4 tool (or something like panoply) to view the
groups. You could also use something like gdalinfo or rasterio to loop through groups and subgroups.

The files with this pattern in the name:

**sur_refl_b01_1**

are the bands which contain surface reflectance data.

- sur_refl_b01_1: MODIS Band One
- sur_refl_b02_1: MODIS Band Two

etc.

Notice that there are some other layers in the file as well including the state_1km layer which contains the QA (cloud and quality assurance) information.

In [12]:
# Use rasterio to print all of the subdataset names in the data
# Here you can see the group names: MODIS_Grid_500m_2D & MODIS_Grid_1km_2D
import rasterio as rio
with rio.open(modis_pre_path) as groups:
    for name in groups.subdatasets:
        print(name)

HDF4_EOS:EOS_GRID:cold-springs-modis-h4\07_july_2016\MOD09GA.A2016189.h09v05.006.2016191073856.hdf:MODIS_Grid_1km_2D:num_observations_1km
HDF4_EOS:EOS_GRID:cold-springs-modis-h4\07_july_2016\MOD09GA.A2016189.h09v05.006.2016191073856.hdf:MODIS_Grid_1km_2D:granule_pnt_1
HDF4_EOS:EOS_GRID:cold-springs-modis-h4\07_july_2016\MOD09GA.A2016189.h09v05.006.2016191073856.hdf:MODIS_Grid_500m_2D:num_observations_500m
HDF4_EOS:EOS_GRID:cold-springs-modis-h4\07_july_2016\MOD09GA.A2016189.h09v05.006.2016191073856.hdf:MODIS_Grid_500m_2D:sur_refl_b01_1
HDF4_EOS:EOS_GRID:cold-springs-modis-h4\07_july_2016\MOD09GA.A2016189.h09v05.006.2016191073856.hdf:MODIS_Grid_500m_2D:sur_refl_b02_1
HDF4_EOS:EOS_GRID:cold-springs-modis-h4\07_july_2016\MOD09GA.A2016189.h09v05.006.2016191073856.hdf:MODIS_Grid_500m_2D:sur_refl_b03_1
HDF4_EOS:EOS_GRID:cold-springs-modis-h4\07_july_2016\MOD09GA.A2016189.h09v05.006.2016191073856.hdf:MODIS_Grid_500m_2D:sur_refl_b04_1
HDF4_EOS:EOS_GRID:cold-springs-modis-h4\07_july_2016\MOD09G

Below you actually open the data subsetting first by

- 1.group and then
- 2.by variable names

In [13]:
# Subset by group only - Notice you have all bands in the returned object
rxr.open_rasterio(modis_pre_path,
                  masked=True,
                  group="MODIS_Grid_500m_2D").squeeze()

Subset by a list of variable names.

In [15]:
# Open just the bands that you want to process
desired_bands = ["sur_refl_b01_1",
                 "sur_refl_b02_1",
                 "sur_refl_b03_1",
                 "sur_refl_b04_1",
                 "sur_refl_b07_1"]
# Notice that here, you get a single xarray object with just the bands that
# you want to work with
modis_pre_bands = rxr.open_rasterio(modis_pre_path, variable=desired_bands).squeeze()
modis_pre_bands

In [16]:
#  view nodata value
modis_pre_bands.sur_refl_b01_1.rio.nodata

-28672