Skip to content


Repository files navigation

|   Author: David Lindell                                                 |
|   Copyright: MERS Lab 2015                                              | 

The enhanced soil moisture processing is based on the TU-Wien algorithm. 
Further information can be found in a report I wrote, as well as a
dissertation by Scipal. 

The following scripts can be called to complete the processing. Note that 
I also wrote a few C programs to speed up some of the more intensive processing.

make_landmasks.m - Creates the landmask files used in the processing

C Program Processing

Use the following C Programs to do the bulk of the processing:

sm_gen_time_series - This C program sifts through all of the msfa 
SIR files (currently I have only done NAm region, but the program should
be able to handle any region) and makes a time series NetCDF file. You can 
specify whether to parse the A images or B images. Note that this script
must be run on a machine with >40 G RAM. I currently have it set to run with
24 threads, too, so take that into consideration. The number of threads used
can be changed in one of the #define statements. The netCDF files are somewhat large
~18 GB, so I have the paths hardcoded to save them into my TEMP directory. Other users
using this script will need to alter the hardcoded links to point to their directories.
Note that the script will also move the necessary files from the FTP server into a 
temp directory where they are unzipped and read. They are not deleted after the read.

sm_gen_c0 - This C program reads in the time series NetCDF files generated by the 
time series C program and determines the values for C0-dry and C0-wet for every
pixel. The resulting images are saved in a netCDF file. As with the previous program, you
can specify whether you are calculating C0-dry or C0-wet from the A or B image timeseries.
The resulting netCDF files are saved to a hardcoded temp directory path.

MATLAB Processing

At this point you have to switch back to MATLAB for the next step of processing.
I decided to do this part in MATLAB because MATLAB has a nice interface for reading
NetCDF files and it has to do a somewhat complicated fourier series fit, 
which would be complicated to code up in C. Run the following programs in MATLAB to
generate netCDF files with time series topsoil moisture and Soil Water Index estimates:

sm_swi_jobs.m - This MATLAB program generates jobs to run sm_gen_swi.m and submits them 
to the queue. The number of jobs depends on the region being processed and depends on the
number of rows in the final image. A job is submitted for each row, and then sm_gen_swi.m
reads in part of the time series NetCDF files (both A and B time series are required as 
well as the NetCDF C0 files for the A images). At that point it iterates over each column
entry for a row and determines the parameterizations for sigma-0_wet and sigma-0_dry. Using
these, it calculates the topsoil moisture and soil water index. Sigma-0_dry, the topsoil
moisture, and the soil water index for the entire row are saved to one NetCDF file. Note
that each of these measurements is for a time series, so the NetCDF file has 4 dimensions:
Row, Column, Year, Day. Only the Sigma-0_dry is saved to the file (rather than both 
sigma-0_dry and sigma-0_wet) because Sigma-0_wet happens to be the same as C0-wet and is
a constant across the time series for each given pixel. 

C Processing

Okay, at this point you have to go back to a C program to create the final
topsoil moisture and Soil Water Index images. 

sm_gen_img- This program reads in the NetCDF files written by the MATLAB script into 
a few huge arrays. The arrays are then rearranged so that they can be easily indexed
by day and year. Then, netCDF files containing images of sigma0_dry, topsoil moisture, 
and soil water index, are written for each day at 2-day temporal resolution.
Note that this program should be run on a big compute node. The code previously only
wrote netCDF files with the topsoil moisture and soil water index and ran with about 
40 G of RAM. But then I updated it to also write out the sigma0_dry data and it
now uses around 70 G. So if you're processing huge images, you might need to watch
the memory-- our nodes max out at 96 G memory.

This concludes the processing steps to create the data.


I wrote a few helper programs with the validation, too. I chose to initially validate the data
using EUMETSAT's real time ASCAT soil moisture product (low resolution). This made 
the most sense for the initial validation, anyway. It turns out the real-time
algorithm is quite different than the Water Retrieval Package (WARP) algorithm
I'm trying to implement at high-resolution here. So I spoke with Sebastian Hahn
(who works with WARP) and obtained their data. 

For the real-time data, EUMETSAT packages their soil moisture product in a nasty ungridded
swath-based binary format similar to the l1b files. On the bright side, Richard made
a very nice l1b file reader which gets the basic gist of reading these soil moisture files.
So I started with his file reader and then added the required code so that it can also now
read the soil moisture product files. I don't believe I broke any of the l1b reading code, 
so it should still work with l1b files in theory-- but this hasn't been tested. It should
probably be tested and merged with our current reader at some point. 

Anyway, the sm_reader program uses the altered l1b reader to read in the soil moisture files
and it outputs a netCDF file with gridded images of the topsoil moisture, sigma0-wet, 
sigma0-dry, and the soil moisture error. These are all gridded at a 12.5 km resolution on 
a lambert projection (the same we use for the SIR files). Note that this means the sm_reader
only really works right now for the NAm region. Plus, the output images look kind of funny
until they are remapped onto the same grid that the current sir files use. 

To regrid the output images from sm_reader, I wrote a matlab script that loads in a sample
SIR file header from the NAm region and for every pixel it checks the latitude and longitude,
and gets the pixel x,y coordinates for that lat lon pair in the sm_reader output images.
There was some wonkiness with the regridding and the y coordinates started at 741 for some
reason. So I introduced a subtractive correction and the program now appears to correctly
regrid the data to our NAm sir map projection. Essentially I'm just regridding the low
resolution image to our high resolution grid. The resolution of the image is the same, 
but it's easier to compare it with the high resolution sir files. The matlab script
that does this is sm_validate.m. It also outputs some nice figures that I've used in my 

For the WARP data, I have a program that will read in all the data and output 
images for the entire time series. I have to do it this way because they store all
data for all years from a given geographic region in one NetCDF file. So you have to read
multiple files to get the image for one day. Anyway, the
C program 'sm_gen_warp_images' does reads all the files and regrids the images to the 
projections that our SIR files use. Using the resulting images, it's possible to run 
comparisons with the high-resolution data.

Other figures produced from the research can be generated by sm_view_c0.m. This script
also makes a corrective factor for c0_wet that is necessary for the processing.

A time series of images can be shown using the sm_view_images.m script.


Use this command to search through the output files to find rows that segfaulted in the
time series processing:

grep -rwnaE "(Segmentation fault)" *.out | sed 's/.* \(-r\) \([0-9]*\).*/\2/' | sort -g


Soil Moisture Processing for the MERS Lab






No releases published


No packages published