Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where is the grd format described? #1

Open
hsonne opened this issue Jul 22, 2020 · 7 comments
Open

Where is the grd format described? #1

hsonne opened this issue Jul 22, 2020 · 7 comments

Comments

@hsonne
Copy link
Member

hsonne commented Jul 22, 2020

Hi @robetatis,

  • Where is the format of the binary file described? Can you provide a link?
  • We need to know the number of grid cells, is this information (maybe along with other metadata) not somehow contained in the file?
  • If not, it is important to pass the "correct" values for nx and ny to the main function, those that "match" the actual input file. It should be checked somehow if the file and the values for nx and ny fit together.
  • I did a test with one of your test files:
# Read all raw bytes from the file
con <- file("SEVP_CLI_CHN_MERGE_CMP_PRE_HOUR_GRID_0.10-2008043000.grd", "rb")
x <- readBin(con, what = "raw", n = 1e7)
close(con)

# Number of bytes in the file
length(x) # 2464000

# Expected number of bytes (number of records + 1 first record skipped)
nx = 700
ny = 440
(nx * ny + 1) * 4 # 1232004

There are 2464000 bytes in the file but we are only reading 1232004. Are nx and ny correct for the given file? If yes, what is contained in the remaining bytes of the file? The actual number of bytes is almost exactly twice the number of bytes that we actually read. Are the values maybe coded in 8 bytes instead of 4 bytes?

@robetatis
Copy link
Contributor

Hi @hsonne

  • Have a look at this site for a general description of GrADS formatting possibilities, and here for the GrADS software itself. Mind, however, that this is just a general description and we have no information about the specific formatting used by our data provider. The solution for this specific data set was trial-and-error, comparing our resulting rasters with the graphical output of the GrADS software. This way it was possible to make sure the spatial distribution and value range of the resulting precipitation fields were correct (excluding small shifts of a couple of pixels). In the next development step, I will try to export a few rasters (tif or ascii) from GrADS, read them into R and overlay them with our output to do a final check.

  • nx and ny are described in a configuration file and the documentation made available by the data provider, so we are quite sure of their values.

  • If our data provider used the default GrADS configuration (direct access binary file, i.e., no FORTRAN record headers), each record in the byte stream returned byreadBin(con, what = "raw", n = 1e7) corresponds to a pixel with an actual rainfall depth. However, if they used headers, then maybe these are the missing bytes, which would also explain why my current version with readBin(con=grdconn, what="numeric", n=grdsize, size=4) produces a correct precipitation field (as I said, excluding a small shift due to the fact that I skipped the first and last records)

  • I also think if we are going for a more general version, we should ask the user for the input parameters for readBin.

So, let's keep at it =)

Thanks for your input!!

@hsonne
Copy link
Member Author

hsonne commented Jul 22, 2020

Thanks, @robetatis.

I just googled and found this article about a package that seems to do what you are doing (reading GrADS files). Have you already read this?
https://www.r-bloggers.com/readgrads-%E2%80%93-an-r-package-to-read-and-manipulate-grads-data/

The package can be installed from a bitbucket repository as follows:

# install.packages("remotes")
remotes::install_bitbucket("paulhiemstra/readgrads@default")

The package seems to consider the metadata file that comes along with the binary data file. I propose that you try to read your data (you need the corresponding metadata files *.ctl, do you have them?) with the functions from that package and compare these functions with your function. We could then contact the author and try to contribute to his package instead of making a new package. What do you think?

@robetatis
Copy link
Contributor

Hi @hsonne

  • Yep. I know the package. Contributing to it is surely an option. But I guess we will (if we move forward with this) specialize in GrADS data sets for urban hydrology, so we'll have to discuss the best strategy.

  • Yes, I have the *.ctl file. Got it directly from the data provider. That's the configuration file I mentioned in my last comment.

  • For this first step, building our own small function for figuring out the formatting by trial-and-error seemed more practical than using the existing package's functions, given the lack of information.

@robetatis
Copy link
Contributor

Hi @hsonne

There might be an error in the way you construct the matrix in valuesToMatrix. I plotted the resulting raster and the data is upside down:
upsidedown

Also, the precipitation field seems quite different from the original data:
SEVP_CLI_CHN_MERGE_CMP_PRE_HOUR_GRID_0 10-2016112120

The output from my original (though potentially slower) version resembles much more the original data (I have yet to do an exact overlay to compute differences):
moreplausible

@robetatis
Copy link
Contributor

robetatis commented Jul 24, 2020

Hi @hsonne

I managed to read the raw binary data into GrADS and export it as a GeoTiff with the following commands (GrADS command line):

open ...\pFromGrADS\test\conf.ctl (full path to configuration file)
set gxout geotiff. Set geotiff as output option
set geotiff ...\pFromGrADS\test\out. Set fine name for geotiff output
d crain. 'Draw' data into geotiff file

Content of conf.ctl (configuration file):
DSET ...\pFromGrADS\test\gridfile.grd. Full path and name (with extension) to grid file
UNDEF -999.0. NA value in raw data
OPTIONS little_endian sequential. Endianness and type of file access....so, in the end these were sequential access files all along (the provided did use FORTRAN headers before each record)
TITLE China Hourly Merged Precipitation Analysis. Just a title for the data
XDEF 700 linear 70.05 0.10. 700 columns starting at x-coordinate 70.05 East, x-resolution = 0.10
YDEF 440 linear 15.05 0.10. 440 rows starting (bottom) at y-coordinate 15.05 North, y-resolution = 0.10
ZDEF 1 LEVELS 1 . Data is 2-dimensional, i.e., only 1 vertical level
TDEF 1 LINEAR 20Z21Nov2016 1hr. Date and time of data in the grid file. Format: HZddmmmyyyy. Hourly data
VARS 2. Start of the variable section. The raw data contains two variables
crain 1 00 CH01 combined analysis (mm/Hour). crain is the rainfall depth in mm/hour
gsamp 1 00 CH02 gauge numbers
ENDVARS. End of variable section

Here's a comparison of both rasters after loading them into R:
compare
I'm still struggling with the alignment of the rasters, so we can do an exact overlay, but visual comparison seem OK.

Best!
Roberto

@hsonne
Copy link
Member Author

hsonne commented Jul 24, 2020

Hi @robetatis,

I used the following code to read the example files that you provided to me. I extracted the control information from your comment above and provided it in a variable control_text. The code uses the functions from the readGrads package by Paul Hiemstra to

  • read the control information with readGrads::parseCTLfile(),
  • read the the binary data with readGrads::readGradsFile(). The function uses the control information to (hopefully) correctly interpret the binary data.

Does this reproduce what the software delivers?
Were you able to install the readGrads package? I needed to "downgrade" some packages but then succeeded.

Regards,
Hauke

# Content of conf.ctl (configuration file)
control_text <- c(
  "DSET just-a-dummy", # The filename is not required, I assume
  "UNDEF -999.0",
  "OPTIONS little_endian sequential", 
  "TITLE China Hourly Merged Precipitation Analysis",
  "XDEF 700 linear 70.05 0.10", 
  "YDEF 440 linear 15.05 0.10",
  "ZDEF 1 LEVELS 1",
  "TDEF 1 LINEAR 20Z21Nov2016 1hr",
  "VARS 2",
  "crain 1 00 CH01", # combined analysis (mm/Hour)
  "gsamp 1 00 CH02", # gauge numbers
  "ENDVARS"
)

# Read control file content from text
con <- textConnection(control_text)
control <- readGrads::parseCTLfile(con)
close(con)

# Get elements out of the control structure
nx <- control$xdef$noLevels # 700
ny <- control$ydef$noLevels # 440

# Provide paths to grads files
grads_files <- dir("~/../Downloads/test_pFromGrADS", full.names = TRUE)

# Read the files into matrices and convert them to raster objects
raster_objects <- lapply(grads_files, function(grads_file) {
  
  # Read the binary file into a list of data frames
  cat("Reading", basename(grads_file), "\n")  
  x <- readGrads::readGradsFile(grads_file, control)

  # Get the values of the first variable
  values <- x[[1]]$value

  # Replace undefined values with NA
  values[values == control$undef] <- NA

  # Create a raster object using the bounding box information given in the control object  
  raster::raster(
    matrix(values, byrow = TRUE, nrow = ny), 
    xmn = control$xdef$levelValues[1L], 
    xmx = control$xdef$levelValues[nx], 
    ymn = control$ydef$levelValues[1L], 
    ymx = control$ydef$levelValues[ny],
    crs = "+proj=longlat +datum=WGS84"
  )
})

# Plot the raster objects
colfunc <- colorRampPalette(c(NA, "blue"))
lapply(raster_objects, raster::plot, col = colfunc(80L))

@robetatis
Copy link
Contributor

Hello @hsonne
Thanks for all your valuable input!
As you suggested, I have created a branch to incorporate your changes ('dev'). If you agree, please address the issue in valuesToMatrix and send the pull request.
Best
Roberto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants