Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Soil Moisture Processing for the MERS Lab
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
/-------------------------------------------------------------------------\ | ASCAT HIGH RESOLUTION SOIL MOISTURE PROCESSING | | Author: David Lindell | | Copyright: MERS Lab 2015 | \-------------------------------------------------------------------------/ The enhanced soil moisture processing is based on the TU-Wien algorithm. Further information can be found in a report I wrote, as well as a dissertation by Scipal. The following scripts can be called to complete the processing. Note that I also wrote a few C programs to speed up some of the more intensive processing. make_landmasks.m - Creates the landmask files used in the processing ---- C Program Processing ---- Use the following C Programs to do the bulk of the processing: sm_gen_time_series - This C program sifts through all of the msfa SIR files (currently I have only done NAm region, but the program should be able to handle any region) and makes a time series NetCDF file. You can specify whether to parse the A images or B images. Note that this script must be run on a machine with >40 G RAM. I currently have it set to run with 24 threads, too, so take that into consideration. The number of threads used can be changed in one of the #define statements. The netCDF files are somewhat large ~18 GB, so I have the paths hardcoded to save them into my TEMP directory. Other users using this script will need to alter the hardcoded links to point to their directories. Note that the script will also move the necessary files from the FTP server into a temp directory where they are unzipped and read. They are not deleted after the read. sm_gen_c0 - This C program reads in the time series NetCDF files generated by the time series C program and determines the values for C0-dry and C0-wet for every pixel. The resulting images are saved in a netCDF file. As with the previous program, you can specify whether you are calculating C0-dry or C0-wet from the A or B image timeseries. The resulting netCDF files are saved to a hardcoded temp directory path. ---- MATLAB Processing ---- At this point you have to switch back to MATLAB for the next step of processing. I decided to do this part in MATLAB because MATLAB has a nice interface for reading NetCDF files and it has to do a somewhat complicated fourier series fit, which would be complicated to code up in C. Run the following programs in MATLAB to generate netCDF files with time series topsoil moisture and Soil Water Index estimates: sm_swi_jobs.m - This MATLAB program generates jobs to run sm_gen_swi.m and submits them to the queue. The number of jobs depends on the region being processed and depends on the number of rows in the final image. A job is submitted for each row, and then sm_gen_swi.m reads in part of the time series NetCDF files (both A and B time series are required as well as the NetCDF C0 files for the A images). At that point it iterates over each column entry for a row and determines the parameterizations for sigma-0_wet and sigma-0_dry. Using these, it calculates the topsoil moisture and soil water index. Sigma-0_dry, the topsoil moisture, and the soil water index for the entire row are saved to one NetCDF file. Note that each of these measurements is for a time series, so the NetCDF file has 4 dimensions: Row, Column, Year, Day. Only the Sigma-0_dry is saved to the file (rather than both sigma-0_dry and sigma-0_wet) because Sigma-0_wet happens to be the same as C0-wet and is a constant across the time series for each given pixel. ---- C Processing ---- Okay, at this point you have to go back to a C program to create the final topsoil moisture and Soil Water Index images. sm_gen_img- This program reads in the NetCDF files written by the MATLAB script into a few huge arrays. The arrays are then rearranged so that they can be easily indexed by day and year. Then, netCDF files containing images of sigma0_dry, topsoil moisture, and soil water index, are written for each day at 2-day temporal resolution. Note that this program should be run on a big compute node. The code previously only wrote netCDF files with the topsoil moisture and soil water index and ran with about 40 G of RAM. But then I updated it to also write out the sigma0_dry data and it now uses around 70 G. So if you're processing huge images, you might need to watch the memory-- our nodes max out at 96 G memory. This concludes the processing steps to create the data. ---- Validation ---- I wrote a few helper programs with the validation, too. I chose to initially validate the data using EUMETSAT's real time ASCAT soil moisture product (low resolution). This made the most sense for the initial validation, anyway. It turns out the real-time algorithm is quite different than the Water Retrieval Package (WARP) algorithm I'm trying to implement at high-resolution here. So I spoke with Sebastian Hahn (who works with WARP) and obtained their data. For the real-time data, EUMETSAT packages their soil moisture product in a nasty ungridded swath-based binary format similar to the l1b files. On the bright side, Richard made a very nice l1b file reader which gets the basic gist of reading these soil moisture files. So I started with his file reader and then added the required code so that it can also now read the soil moisture product files. I don't believe I broke any of the l1b reading code, so it should still work with l1b files in theory-- but this hasn't been tested. It should probably be tested and merged with our current reader at some point. Anyway, the sm_reader program uses the altered l1b reader to read in the soil moisture files and it outputs a netCDF file with gridded images of the topsoil moisture, sigma0-wet, sigma0-dry, and the soil moisture error. These are all gridded at a 12.5 km resolution on a lambert projection (the same we use for the SIR files). Note that this means the sm_reader only really works right now for the NAm region. Plus, the output images look kind of funny until they are remapped onto the same grid that the current sir files use. To regrid the output images from sm_reader, I wrote a matlab script that loads in a sample SIR file header from the NAm region and for every pixel it checks the latitude and longitude, and gets the pixel x,y coordinates for that lat lon pair in the sm_reader output images. There was some wonkiness with the regridding and the y coordinates started at 741 for some reason. So I introduced a subtractive correction and the program now appears to correctly regrid the data to our NAm sir map projection. Essentially I'm just regridding the low resolution image to our high resolution grid. The resolution of the image is the same, but it's easier to compare it with the high resolution sir files. The matlab script that does this is sm_validate.m. It also outputs some nice figures that I've used in my report. For the WARP data, I have a program that will read in all the data and output images for the entire time series. I have to do it this way because they store all data for all years from a given geographic region in one NetCDF file. So you have to read multiple files to get the image for one day. Anyway, the C program 'sm_gen_warp_images' does reads all the files and regrids the images to the projections that our SIR files use. Using the resulting images, it's possible to run comparisons with the high-resolution data. Other figures produced from the research can be generated by sm_view_c0.m. This script also makes a corrective factor for c0_wet that is necessary for the processing. A time series of images can be shown using the sm_view_images.m script. ----- Use this command to search through the output files to find rows that segfaulted in the time series processing: grep -rwnaE "(Segmentation fault)" *.out | sed 's/.* \(-r\) \([0-9]*\).*/\2/' | sort -g