# Appendix to Lesson 6 - National Water Model Channel-Only & Nudging Data Assimilation

## Time slice file generation

Of course, if we want to assimilate observations, we need to make the observations available to the model. All these observations are provided for you in the provided test case. This notebook is for the attendees that would like to create their own time slices. The NWM has its own python-based workflow for pulling all available observations from NWIS and creating timeslice files. These timeslices can be used with other runs, including small domains. For research runs where older data may be needed, you may need to construct your own time slice files. The following code shows how to get streamflow observations from USGS NWIS and how to write them to the “time slice” format used by the model. The files are netcdf with certain assumptions.

Load the necessary libraries first. 

In [None]:
# All the packages which will be used below
 library(rwrfhydro)
 library(dataRetrieval)
 library(data.table)

If you check out the basin map for this test case, there are two USGS gages on the domain. The 01447720 is the outlet location and 01447680 is the interior gage. Here we would like to assimilate the streamflow values in the inerior location (gage 01447680) and evaluate the model performance at the outlet gage (01447720). There are two ways to approach the problem, one is to leave both gages in the routelink file and only provide the observation at the interior location in the time slices. The second option is to remove the outlet gage (01447720) from the routelink file, and then automatically there will not be any data assimilation at that point even if the observation exists in the time slices. Let's first take a look at the first option, since that I what was used in lesson 7. 

The USGS streamflow data are provided in the NWIS website and can be pulled using the `dataRetrieval` package. You need to specify the gage ID, the parameter code, and the dates that you are interested in. 

In [None]:
## This code is repeated from above, where we pulled in the observations.
siteNumber <- c("01447680")
parameterCd <- "00060"  # Discharge
startDate <- "2011-08-26"
endDate <- "2011-09-14"
discharge <- dataRetrieval::readNWISuv(siteNumber, parameterCd, startDate, endDate)
discharge <- as.data.table(discharge)
cfsToCms <- 1/35.31466621266132 
discharge[, `:=`(discharge.cms=X_00060_00000*cfsToCms)]
print(discharge)

As mentioned, the Fortran code requires the gage identifiers to be 15-character wide string variables. This is how to convert trimmed strings to be 15-characters wide with leading blanks

In [None]:
# reformat the name of th gauge to 15 character
discharge$site_no <- formatC(discharge$site_no, width=15)
print(discharge)

The observations require an associated quality level which is a value in [0,1] with 0 indicating “completely uncertain” and 1 indicating “completely certain” (see qn in the nudging equation above). Here is the crude quality assignment we will use.

In [None]:
# add the quality 
Qc <- function(dataDf) {
  ## assume the worst and then recover what fits out mental model of this chaos.
  dataDf$quality <- dataDf$discharge.cms * 0
  ## why would any streams with no flow be gaged?? Then again, drought. It's a really dicey value,
  ## JLM also limit maximum flows to approx twice what I believe is the largest gaged flow
  ## on MS river *2
  ## http://nwis.waterdata.usgs.gov/nwis/peak?site_no=07374000&agency_cd=USGS&format=html
  ## baton rouge 1945: 1,473,000cfs=41,711cms
  ## multiply it roughly by 2
  
  isValidFlow <- dataDf$discharge.cms > 0 & dataDf$discharge.cms < 90000
  wh100 <- which(isValidFlow)
  if(length(wh100)) dataDf$quality[wh100] <- 100
}
discharge$quality <- Qc(discharge)

print(discharge)

The file also includes a `query_time` variable. This variable indicates when the value was obtained from NWIS. It is not used internally by the model, so it is somewhat optional but can prove helpful when trying to determine updates to time slice files.

In [None]:
# add query time to the data
queryTime <- Sys.time()
attr(queryTime, "tzone") <- "UTC"
discharge$queryTime <- queryTime   # add the query time
print(discharge)

The terminology “time slice” comes from the fact that we need to discretize time into “slices”. The model is designed to let this vary (though the option is not exposed, it is currently hard coded). Because the vast majority of gages reporting to NWIS have 15 minute frequency, this was the natural choice for the time slice resolution for the NWM. (Flows also rarely vary at shorter timescales). By selecting this resolution, no more than one observation can be use during any 15 minute period. However it’s actual reporting time is used during the nudging. For 15 minute resolution, the files are stampped “00”, “15”, “30”, and “45”. The data in the files are in ±7.5 minute windows centered on these time stamps. To assign our raw data to a unique timeslice file, we need to calculate which slice it belongs to.

In [None]:
# This function is slightly different than the one currently available in rwrfhydro 
# add the Round minutes to the discharge data frame 
RoundMinutes <- function (POSIXct, nearest = 5) {
  if ((60%%nearest) != 0)
    warning(paste0("The nearest argument (passed: ", nearestMin,
                   ") is mean to divide 60 with no remainder."), immediate. = TRUE)
  nearestInv <- 1./nearest
  theMin <- as.numeric(format(POSIXct, "%M")) + as.numeric(format(POSIXct, "%S"))/60
  floorDiff <- (theMin - nearest * (floor(theMin/nearest))) / nearest # added by Arezoo
  whFloor <- which(floorDiff < 0.5)
  roundMin <- (ceiling(theMin * nearestInv)/nearestInv)
  roundMin[whFloor] <- (floor(theMin * nearestInv)/nearestInv)[whFloor]
  diffMin <- roundMin - theMin
  lubridate::floor_date(POSIXct, "hour") + lubridate::minutes(floor(roundMin))
}

discharge$dateTimeRound <- RoundMinutes(discharge$dateTime,nearest=15)

print(discharge)

Finally, loop through the times and and write the data into time slice files.

In [None]:
outPath <- "~/nwm-training/output/lesson6/nudgingTimeSliceObs"
dir.create(file.path(outPath), recursive = TRUE)
for (i in 1:length(rev(sort(unique(discharge$dateTimeRound))))) {
  dfByPosix <- subset(discharge,dateTimeRound == rev(sort(unique(discharge$dateTimeRound)))[i])
  dfByPosix$discharge.quality <- dfByPosix$quality
  rwrfhydro::WriteNcTimeSlice(dfByPosix,
                              outPath=outPath,
                              sliceResolution = 15)
}

print(list.files(outPath))

Let s check out the content of the one time slice file. 

In [None]:
rwrfhydro::GetNcdfFile("~/nwm-training/example_case/NWM/nudgingTimeSliceObs/2011-08-30_03:30:00.15min.usgsTimeSlice.ncdf")

There is only one observation in this file for gage "01447680".

© UCAR 2019