<center>
<table>
  <tr>
    <td><img src="http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg" width="100"/> </td>
     <td><img src="https://github.com/astg606/py_materials/blob/master/logos/ASTG_logo.png?raw=true" width="80"/> </td>
     <td> <img src="https://www.nccs.nasa.gov/sites/default/files/NCCS_Logo_0.png" width="130"/> </td>
    </tr>
</table>
</center>

        
<center>
<h1><font color= "blue" size="+3">ASTG Python Courses</font></h1>
</center>

---

<center><h1> <font color="red">Reading MODIS hdf Files using pyhdf</font></h1></center>

## <font color="red">Primary References/Resources</font>

- [Moderate Resolution Imaging Spectrometer (MODIS)](https://modis.gsfc.nasa.gov/data/)
- [HDF-EOS Comprehensive Examples page](http://hdfeos.org/zoo/)
- [How to read a MODIS HDF4 file using python and pyhdf ?](https://moonbooks.org/Articles/How-to-read-a-MODIS-HDF-file-using-python-/)
- [SD (scientific dataset) API (pyhdf.SD)](http://fhs.github.io/pyhdf/modules/SD.html) 

### Import Statements:

In [None]:
import os
import pprint
import glob

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import hvplot.xarray
from cartopy import crs as ccrs
import cartopy.feature as cfeature
import cartopy.io.shapereader as shapereader
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter

In [None]:
import numpy as np
import xarray as xr

In [None]:
from pyhdf.SD import SD
from pyhdf.SD import SDS
from pyhdf.SD import SDC
from pyhdf.SD import SDim
from pyhdf.SD import SDAttr

In [None]:
# Toggles off alphabetical sorting
pprint.sorted = lambda x, key=None:x

## <font color="red">MODIS File Naming Conventions</font>

- [MODIS Naming Conventions](https://lpdaac.usgs.gov/data/get-started-data/collection-overview/missions/modis-overview/#:~:text=MODIS%20Filenames,A2019159.)
- [MODIS Level-2 Hierarchical Data Format (HDF)](https://modis-images.gsfc.nasa.gov/MOD07_L2/filename.html)
- [MODIS/VIIRS Land Product Subsets](https://modis.ornl.gov/documentation.html)


MODIS filenames follow a naming convention which gives useful information regarding the specific product. The filename `MOD09A1.A2006001.h08v05.005.2006012234657.hdf` indicates:

- `MOD09A1`: Product Short Name
- `A2006001`: Julian Date of Acquisition (A-YYYYDDD)
- `h08v05`: Tile Identifier (horizontalXXverticalYY)
- `005`: Collection Version
- `2006012234567`: Julian Date of Production (YYYYDDDHHMMSS)
- `hdf`: Data Format (HDF-EOS)

## <font color="red"> Accessing a Sample HDF4 Data Files</font>

Directory where the MODIS files are located:

In [None]:
#data_dir = "/Users/jkouatch/myTasks/PythonTraining/ASTG606/Materials/sat_data/MODIS_Data/"
data_dir = "/tljh-data/sat_data/MODIS_Data"

Full path to the file names:

In [None]:
file_name = os.path.join(data_dir, "MOD021KM.A2015013.1240.006.2015014140954.hdf")
geo_file_name = os.path.join(data_dir, "MOD03.A2015013.1240.006.2015013194359.hdf")

Name of the field of interest:

In [None]:
field_name = "EV_Band26"

### <font color="blue"> Opening the Files</font>

Opening files for reading:

In [None]:
fid = SD(file_name, SDC.READ)
geo_fid = SD(geo_file_name, SDC.READ)

Basic information on the files:

- The first number indicates the number of datasets in the file (not to be confused w/ xarray datasets)
- The second number indicates the number of attributes attached to the global file.

In [None]:
fid.info()

In [None]:
geo_fid.info()

#### File Attributes

- We can access the file attributes which hold important global metadata.

In [None]:
file_attrs = fid.attributes()
pprint.pprint(file_attrs)

We can also access the datasets' names and basic info such as shape and dimension labels

In [None]:
file_dts = fid.datasets()
pprint.pprint(file_dts)

In [None]:
for index, name in enumerate(file_dts.keys(), start=1):
    print(index, name)

### <font color="blue">Data Extraction as NumPy Arrays</font>

Let's assume that we want to extract data from the field `EV_Band26`.

The `select()` method from the `SD` class allows us to extract a dataset (object) given it's name or index number.

In [None]:
sample_ds = fid.select(field_name)

Get basic information from the dataset:

- The `info()` function in the `SDS` class allows us to get the dataset name, rank, dimension lengths, data type, and number of attributes.

In [None]:
sample_ds.info()

List the dataset attributes:

In [None]:
attrs = sample_ds.attributes()
attrs

#### Extract the data

- We can retrieve and store the data itself as a NumPy array using the `get()` function.

In [None]:
sample_data = sample_ds.get()

Confirms that the data has been stored as a NumPy array.

In [None]:
print("Dataset Class Type: ", type(sample_data))

Just like any NumPy array, we can get the shape and dtype.

In [None]:
sample_data.shape

In [None]:
sample_data.dtype

We need to change the type from integer to float:

In [None]:
sample_data = sample_data.astype(np.double)

### Get the dataset attributes

In [None]:
attrs = sample_ds.attributes(full=1)

long_name = attrs["long_name"][0]
add_offset = attrs["radiance_offsets"][0]
_FillValue = attrs["_FillValue"][0]
scale_factor = attrs["radiance_scales"][0]       
valid_min = attrs["valid_range"][0][0]        
valid_max = attrs["valid_range"][0][1]        
units = attrs["radiance_units"][0]

### Use the attributes to restore the data

In [None]:
def restore_data(data, scale_factor, add_offset,  _FillValue, 
                 valid_min, valid_max):
    """
       Use the attributes to:
        1- Select the values within the valid range
        2- Mask the filled values
        3- To apply the offset and scale to the data
    """
    invalid = np.logical_or(data > valid_max, data < valid_min)
    invalid = np.logical_or(invalid, data == _FillValue)
    data[invalid] = np.nan
    data = (data - add_offset) * scale_factor 
    data = np.ma.masked_array(data, np.isnan(data))
    return data

In [None]:
data = restore_data(sample_data, scale_factor, add_offset,  _FillValue, 
                 valid_min, valid_max)

In [None]:
print(sample_data.min(), sample_data.max())

In [None]:
print(data.min(), data.max())

### Read geolocation dataset from MOD03 product.

In [None]:
lats = geo_fid.select('Latitude').get()
lats.shape

In [None]:
lons = geo_fid.select('Longitude').get()
lons.shape

### Use Cartopy to plot the data

In [None]:
map_projection = ccrs.PlateCarree()
data_transform = ccrs.PlateCarree()

#Create the figure object with the dimansion of the figure
subplot_kw = dict(projection=map_projection)
fig, ax = plt.subplots(1, 1,
                       figsize=(15, 9),
                       subplot_kw=subplot_kw)

#ax = fig.add_subplot(1, 1, 1, projection=map_projection)

# Map features
map_res = '50m'
ax.coastlines(resolution=map_res, linewidth=1.0)
ax.add_feature(cfeature.LAND, edgecolor='black', linewidth=1.0)
ax.add_feature(cfeature.BORDERS, edgecolor='black', linewidth=1.0)
lakes_res = cfeature.NaturalEarthFeature('physical', 'lakes', map_res)
ax.add_feature(lakes_res,edgecolor='black',facecolor='None',linewidth=1.0)
ax.add_feature(cfeature.NaturalEarthFeature('cultural', 'admin_1_states_provinces_lines',
                                      map_res, edgecolor='black', facecolor='None',
                                      linewidth=1.0))

#ax.set_extent([-180, 180, -90, 90], ccrs.PlateCarree())

im = ax.pcolormesh(lons, lats, data, transform=map_projection)

cbar = fig.colorbar(im, ax=ax,  orientation="horizontal", shrink=0.75)
cbar.ax.tick_params(labelsize=15)
cbar.set_label(units, labelpad=+1)

# ---> Ticks and labels
gl = ax.gridlines(crs=map_projection, draw_labels=True,
                  linewidth=2, color='gray', alpha=0.5, linestyle='--')
gl.xlabels_top = False
gl.ylabels_right = False

lon_formatter = LongitudeFormatter(zero_direction_label=True)
lat_formatter = LatitudeFormatter()
ax.xaxis.set_major_formatter(lon_formatter)
ax.yaxis.set_major_formatter(lat_formatter)
ax.xaxis.label.set_size(20)
ax.yaxis.label.set_size(20)


## <font color="red">Application</font>

- Takes a collection of MODIS data files
- Loop over the files:
     - Select only the files which horizontal coverage fall within a prescribed latitude range
     - Perform a global plot on a selected field


Field of interest:

In [None]:
DATAFIELD_NAME = 'LST'

Latidude range of the area of interest:

In [None]:
min_lat = -55.0
max_lat =  60.0

Map Settings:

In [None]:
map_projection = ccrs.PlateCarree()
data_transform = ccrs.PlateCarree()

#Create the figure object with the dimansion of the figure
subplot_kw = dict(projection=map_projection)
fig, ax = plt.subplots(1, 1,
                       figsize=(15, 9),
                       subplot_kw=subplot_kw)

#ax = fig.add_subplot(1, 1, 1, projection=map_projection)

# Map features
map_res = '50m'
ax.coastlines(resolution=map_res, linewidth=1.0)
ax.add_feature(cfeature.LAND, edgecolor='black', linewidth=1.0)
ax.add_feature(cfeature.BORDERS, edgecolor='black', linewidth=1.0)
lakes_res = cfeature.NaturalEarthFeature('physical', 'lakes', map_res)
ax.add_feature(lakes_res,edgecolor='black',facecolor='None',linewidth=1.0)
ax.add_feature(cfeature.NaturalEarthFeature('cultural', 
                                            'admin_1_states_provinces_lines',
                                      map_res, edgecolor='black', facecolor='None',
                                      linewidth=1.0))

ax.set_extent([-180, 180, -90, 90], ccrs.PlateCarree())

# Get the list of possible files to process:

list_files = glob.glob(data_dir+'MOD11_*.hdf')

# Loop over the files and do the plot:

for fname in list_files:
    basename = os.path.basename(fname)
    fname_GEO = glob.glob(data_dir+'MOD03.'+basename[9:26]+'*hdf')[0]
    hdf_geo = SD(fname_GEO, SDC.READ)
    latitude = hdf_geo.select('Latitude')
    lat = latitude[:,:]

    # Only consider the the data file which horizontal coverage 
    # overlaps with the prescribed latitude rangnge
    if np.min(lat)<min_lat or np.max(lat)>max_lat:
        print(f"Rejecting the files: \n\t {fname} \n\t {fname_GEO}")
        continue

    longitude = hdf_geo.select('Longitude')
    lon = longitude[:,:]

    print(f"Processing the files: \n\t {fname} \n\t {fname_GEO}")

    # Read dataset.
    hdf = SD(fname, SDC.READ)

    data2D = hdf.select(DATAFIELD_NAME)
    data = data2D[:,:].astype(np.double)

    # Get the dataset attributes
    attrs = data2D.attributes(full=1)
    long_name = attrs["long_name"][0]
    add_offset = attrs["add_offset"][0]
    _FillValue = attrs["_FillValue"][0]
    scale_factor = attrs["scale_factor"][0]
    valid_min = attrs["valid_range"][0][0]
    valid_max = attrs["valid_range"][0][1]
    units = attrs["units"][0]

    data = restore_data(data, scale_factor, add_offset,  _FillValue,
                        valid_min, valid_max)

    # Plot the data
    im = ax.pcolormesh(lon, lat, data, transform=map_projection)
    
# Add a colorbar
cbar = fig.colorbar(im, ax=ax,  orientation="horizontal", shrink=0.75)
cbar.ax.tick_params(labelsize=15)
cbar.set_label(units, labelpad=+1)

# ---> Ticks and labels
gl = ax.gridlines(crs=map_projection, draw_labels=True,
                  linewidth=2, color='gray', alpha=0.5, linestyle='--')
gl.xlabels_top = False
gl.ylabels_right = False

lon_formatter = LongitudeFormatter(zero_direction_label=True)
lat_formatter = LatitudeFormatter()
ax.xaxis.set_major_formatter(lon_formatter)
ax.yaxis.set_major_formatter(lat_formatter)
ax.xaxis.label.set_size(20)
ax.yaxis.label.set_size(20)