# Will Dodge 2018-03-07

#### This is a framework for converting the standard weather data that we pull from the fourth street location into a delimited file with the specifications required to drive the Ozcot model.

### Clear our envrionment.

In [1]:
rm(list=ls())

### Import requisite libraries.

In [2]:
library(data.table)
library(ggplot2)
library(gridExtra)
library(lubridate, quietly = TRUE)
library(IRdisplay)


Attaching package: ‘lubridate’

The following objects are masked from ‘package:data.table’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

The following object is masked from ‘package:base’:

    date



### Set our working directory to that containing our raw met data.

In [3]:
setwd('/home/will/PSWC_daily_data_compiled_2018-03-07')

### Create a list of filenames 

#### This list contains the filenemas only. In the next step we can iterate over this list and read each file into our environment. The filenames are generated from our working directory and contain the pattern argument which, in this case, is '.csv'.

In [4]:
met.filenames <- list.files(pattern = '.csv')

### Read .csv files from our filename list.

#### This is where the files are actually read. We use the list we just created to actually read the files into a list that we can easily manage in R. This may take a few minutes.

In [5]:
met.files <- lapply(met.filenames, function(x) read.csv(x, header = FALSE, stringsAsFactors = FALSE,
                                                           row.names = NULL, fill = TRUE))

### Convert all files to from standard data.frame to enhanced data.table 

In [6]:
met.files <- lapply(met.files, function(x) data.table(x))

### View structure each of the 18 sets to see format to correct variation and homogenize sets.

In [7]:
str(met.files)

List of 18
 $ :Classes ‘data.table’ and 'data.frame':	369 obs. of  19 variables:
  ..$ V1 : chr [1:369] "" "" "date" "" ...
  ..$ V2 : chr [1:369] "standard" "rain" "gauge" "mm" ...
  ..$ V3 : chr [1:369] "heated" "rain" "gauge" "mm" ...
  ..$ V4 : chr [1:369] "2-m" "wind" "speed" "m/s" ...
  ..$ V5 : chr [1:369] "10-m" "wind" "speed" "m/s" ...
  ..$ V6 : chr [1:369] "10-m" "wind" "dir." "deg" ...
  ..$ V7 : chr [1:369] "10-m" "max" "gust" "m/s" ...
  ..$ V8 : chr [1:369] "" "min" "RH" "%" ...
  ..$ V9 : chr [1:369] "" "max" "RH" "%" ...
  ..$ V10: chr [1:369] "" "avg" "RH" "%" ...
  ..$ V11: chr [1:369] "min" "air" "temp" "C" ...
  ..$ V12: chr [1:369] "max" "air" "temp" "C" ...
  ..$ V13: chr [1:369] "avg" "air" "temp" "C" ...
  ..$ V14: chr [1:369] "4\"" "soil" "temp" "C" ...
  ..$ V15: chr [1:369] "8\"" "soil" "temp" "C" ...
  ..$ V16: chr [1:369] "max" "solar" "rad." "W/m2" ...
  ..$ V17: chr [1:369] "avg" "solar" "rad." "W/m2" ...
  ..$ V18: chr [1:369] "avg" "station" "press." "

### Now to clean up and condense the multi row header and reassign column names.

#### Here we create an empty list to hold the variable (column) names that we extract from the data sets. To do that, we read in the first 4 rows of each data set. These first four lines conatin our header strings. At the same time we are reading in these multiline headers we are using paste() function to concantonate hte headers into a single string. The next step is to go back and reassign our collaped single line header to the column name property of each set. In other words, we are renaming our variables with our new condensed header.  

In [8]:
met.var.names <- list()
for(i in seq_along(met.files)) {
  met.var.names[[i]] <- met.files[[i]][1:4, lapply(.SD, function(x) paste(x, collapse = '.', sep = ''))]
} 
for(i in seq_along(met.files)) {
  colnames(met.files[[i]]) <- as.character(met.var.names[[i]][1])
}

### View column names for each set. Notice, the format changes over the 17 year time span.

#### We see here how our multi-line header has been concantonated into a single sring that has been reassigned back to each set. The fromat of our data changes over time. Moving forward we must note the changes in the format of our met data and continue the cleaning process diferently in a way 

In [9]:
met.var.names

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19
..date.,standard.rain.gauge.mm,heated.rain.gauge.mm,2-m.wind.speed.m/s,10-m.wind.speed.m/s,10-m.wind.dir..deg,10-m.max.gust.m/s,.min.RH.%,.max.RH.%,.avg.RH.%,min.air.temp.C,max.air.temp.C,avg.air.temp.C,"4"".soil.temp.C","8"".soil.temp.C",max.solar.rad..W/m2,avg.solar.rad..W/m2,avg.station.press..mb,.evap..pan.cm

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19
..date.,standard.rain.gauge.mm,heated.rain.gauge.mm,2-m.wind.speed.m/s,10-m.wind.speed.m/s,10-m.wind.dir..deg,10-m.max.gust.m/s,.min.RH.%,.max.RH.%,.avg.RH.%,min.air.temp.C,max.air.temp.C,avg.air.temp.C,"4"".soil.temp.C","8"".soil.temp.C",max.solar.rad..W/m2,avg.solar.rad..W/m2,avg.station.press..mb,.evap..pan.cm

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19
..date.,standard.rain.gauge.mm,heated.rain.gauge.mm,2-m.wind.speed.m/s,10-m.wind.speed.m/s,10-m.wind.dir..deg,10-m.max.gust.m/s,.min.RH.%,.max.RH.%,.avg.RH.%,min.air.temp.C,max.air.temp.C,avg.air.temp.C,"4"".soil.temp.C","8"".soil.temp.C",max.solar.rad..W/m2,avg.solar.rad..W/m2,avg.station.press..mb,.evap..pan.cm

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19
..date.,standard.rain.gauge.mm,heated.rain.gauge.mm,2-m.wind.speed.m/s,10-m.wind.speed.m/s,10-m.wind.dir..deg,10-m.max.gust.m/s,.min.RH.%,.max.RH.%,.avg.RH.%,min.air.temp.C,max.air.temp.C,avg.air.temp.C,"4"".soil.temp.C","8"".soil.temp.C",max.solar.rad..W/m2,avg.solar.rad..W/m2,avg.station.press..mb,.evap..pan.cm

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25
..date.,standard.rain.gauge.mm,heated.rain.gauge.mm,2-m.wind.speed.m/s,10-m.wind.speed.m/s,10-m.wind.dir..deg,10-m.max.gust.m/s,.min.RH.%,.max.RH.%,.avg.RH.%,⋯,max.solar.rad..W/m2,avg.solar.rad..W/m2,avg.station.press..mb,.evap..pan.cm,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V24,V25,V26,V27,V28,V29,V30,V31,V32,V33
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,.Max.Solar.Rad.,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Bat.,...AC,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V24,V25,V26,V27,V28,V29,V30,V31,V32,V33
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,.Max.Solar.Rad.,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Bat.,...AC,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V24,V25,V26,V27,V28,V29,V30,V31,V32,V33
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,.Max.Solar.Rad.,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Bat.,...AC,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V24,V25,V26,V27,V28,V29,V30,V31,V32,V33
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,.Max.Solar.Rad.,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Bat.,...AC,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA,NA.NA.NA.NA

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V25,V26,V27,V28,V29,V30,V31,V32,V33,V34
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Max Bat.,...Min Bat.,...AC,.Avg.CO2.@ 2 m,.Avg.H2O.@ 2 m,.Avg.CO2.@ 10 m,.Avg.H2O.@ 10 m

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V25,V26,V27,V28,V29,V30,V31,V32,V33,V34
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Max Bat.,...Min Bat.,...AC,.Avg.CO2.@ 2 m,.Avg.H2O.@ 2 m,.Avg.CO2.@ 10 m,.Avg.H2O.@ 10 m

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V25,V26,V27,V28,V29,V30,V31,V32,V33,V34
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Max Bat.,...Min Bat.,...AC,.Avg.CO2.@ 2 m,.Avg.H2O.@ 2 m,.Avg.CO2.@ 10 m,.Avg.H2O.@ 10 m

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V25,V26,V27,V28,V29,V30,V31,V32,V33,V34
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Max Bat.,...Min Bat.,...AC,.Avg.CO2.@ 2 m,.Avg.H2O.@ 2 m,.Avg.CO2.@ 10 m,.Avg.H2O.@ 10 m

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V25,V26,V27,V28,V29,V30,V31,V32,V33,V34
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Max Bat.,...Min Bat.,...AC,.Avg.CO2.@ 2 m,.Avg.H2O.@ 2 m,.Avg.CO2.@ 10 m,.Avg.H2O.@ 10 m

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V25,V26,V27,V28,V29,V30,V31,V32,V33,V34
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Max Bat.,...Min Bat.,...AC,.Avg.CO2.@ 2 m,.Avg.H2O.@ 2 m,.Avg.CO2.@ 10 m,.Avg.H2O.@ 10 m

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V20,V21,V22,V23,V24,V25,V26,V27,V28,V29
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,Max.Air.Temp.@ 2 m,Avg.Air.Temp.@ 2 m,.Soil.Temp.@ 4 in,.Soil.Temp.@ 8 in,.Max.Solar.Rad.,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Bat,...AC

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V20,V21,V22,V23,V24,V25,V26,V27,V28,V29
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,Max.Air.Temp.@ 2 m,Avg.Air.Temp.@ 2 m,.Soil.Temp.@ 4 in,.Soil.Temp.@ 8 in,.Max.Solar.Rad.,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Bat,...AC

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V20,V21,V22,V23,V24,V25,V26,V27,V28,V29
...Unit,...Year,...Day,...Time,..Rain.Standard,..Rain.Heated,Mean.Wind. Speed.@ 2 m,Sdev.Wind. Speed.@ 2 m,.Max.Gust.@ 2 m,Mean.Wind. Speed.@10 m,⋯,Max.Air.Temp.@ 2 m,Avg.Air.Temp.@ 2 m,.Soil.Temp.@ 4 in,.Soil.Temp.@ 8 in,.Max.Solar.Rad.,.Avg.Solar.Rad.,..Station.Press,..Evap.Pan,...Bat,...AC


### The format of the data changes on the 6th set which is 2006.

#### Knowing when and how the data change allows us to correctly extract vectors of interest and homogenize our output data. If we did a blind extraction with the assumption that the sets were all uniform, we would end up with vectors that do not contain the correct values or are of the wrong data class.

### Let's begin to clean the data.

#### We have observed that the weather files from the USDA-ARS 4th ST location canged the layout of recorded data in 2006 we are goin to separate the files into two groups. To begin we will create a list for the old weather files called met.olds.

In [10]:
met.olds <- list()

#### Now let's loop over the the first five weather files (the older files that contain the same format) and perform some cleaning operations. Several tasks will performed with each iteration of the loop. First, the date column is coverted to POSIXct (a datetime format that R recognizes), next all of the columns are converted to numeric data class. This conversion to numeric class is necessary because the data was all read in as character class because the first three lines of the date consisted of letters--our muti-line headers. Lastly, separate columns are created for the year and day. 

In [11]:
for(i in seq_along(met.files[1:5])) {
  date.vector <- parse_date_time(met.files[[i]]$`..date.`[5:nrow(met.files[[i]])], 'mdy')
  met.olds[[i]] <- met.files[[i]][5:nrow(met.files[[i]]), lapply(.SD, as.numeric), .SDcols = 2:length(met.files[[i]])]
  met.olds[[i]][, year := year(date.vector)]
  met.olds[[i]][, day := yday(date.vector)]
}

#### At this point we have all the vectors we need in the old met files. We have created some and converted the class of others. Our output met file we use for OZCOT requires only a few vectors so we are now going to populate a new list with the data sets contianing only the vectors we are interested in, as it relates to the OZCOT models. First we make a new list.

In [12]:
met.olds.oz.vect <- list()

#### Now to loop over the list of met.olds list to extract our vectors of interest and assign those to the list we just created met.olds.oz.vect.

In [14]:
for(i in seq_along(met.olds)) {
  met.olds.oz.vect[[i]] <- met.olds[[i]][, .(year = year,
                                             day = day, 
                                             radn = abs(round((`avg.solar.rad..W/m2`*86400)/(1*10^6), digits = 1)),
                                             maxt = round(max.air.temp.C, digits = 1),
                                             mint = round(min.air.temp.C, digits = 1),
                                             rain = round(standard.rain.gauge.mm, digits = 1))]
}

#### Ok! Our vectors of interest have been extracted from the old weather files. Now to repeat the process to exract our vectors of interest from the new met.files. 

#### Create a list to hold our the cleaned newer files called met.news.

In [15]:
met.news <- list()

#### Now to loop over the newer met.files and convert all of our vectors to numeric data class for the same reason as mentioned above.

In [16]:
for(i in seq_along(met.files[6:length(met.files)])) {
  met.news[[i]] <- met.files[[i+5]][6:nrow(met.files[[i]]), lapply(.SD, as.numeric)]
}

#### Create a list to hold the sets containing our vectors of interest just like we did with the older met files.

In [17]:
met.news.oz.vect <- list()

#### Loop over the met.olds list and exctract the vectors of interest we need to run the OZCOT model. These vectors are extracted and assigned to the met.news.oz.vect list. The vectors are inserted into the list as data tables for each year that we iterated over. It's just a data table that only conatins our vectors of interest.

In [18]:
for(i in seq_along(met.news)) {
  met.news.oz.vect[[i]] <- met.news[[i]][, .(year = ...Year,
                                             day = ...Day, 
                                             radn = abs(round((.Avg.Solar.Rad.*86400)/(1*10^6), digits = 1)),
                                             maxt = round(`Max.Air.Temp.@ 2 m`, digits = 1),
                                             mint = round(`Min.Air.Temp.@ 2 m`, digits = 1),
                                             rain = round(..Rain.Standard, digits = 1))]
}

#### The last step is to combine the two lists of sets wiht our vectors of interest so that we have a single list of homoginized data sets contianing only our vectors of interest. This will be the list of data tables we use to create our OZCOT ready met file. 

In [19]:
met.files <- c(met.olds.oz.vect, met.news.oz.vect)

#### So far we have created several lists with the met.files list we created in the last step being the only one we need. We can now remove all of the objects in our environment that we do not need.

### Continued cleaning and creation of a single met data set.

#### The following command fixes an anomly in the data where the year in the 2006 set reverts to 2005 about half way through. 

In [20]:
met.files[[6]][, year := 2006]

#### Next we bind all of the individual data sets in the met.files list into a single large data set called met.files.all.

In [22]:
met.files.all <- rbindlist(met.files)

#### By calling the unique function on met.files.all we can make eliminate any duplicate rows from our data set.

In [23]:
met.files.all <- unique(met.files.all)

#### The folowing commands find the mean of several vectors. Becuase we are now working with a single, large data set these calculated means cover the entire timespan of our original raw data sets. In our case, this is 17 years. We are creating objects with these mean values so that we can iterate over our data set and replace missing or impossible values with the 17 year average. This is a simple method to clean the data and get rid of NA values. The OZCOT model will break if any values are missing or listed as NA. 

In [24]:
mean.radn <- mean(met.files.all$radn, na.rm = TRUE)
mean.maxt <- mean(met.files.all$maxt, na.rm = TRUE)
mean.mint <- mean(met.files.all$mint, na.rm = TRUE)

#### In this step we set-up an if-else statment for for each for each of our vectors (except for year and day those should be fine). The ifelse statement basically says that if the value in a row for a given vector is NA or if the value is impossible, then replace thatr value with the 17 year average we just calculated.  

In [25]:
met.files.all[, radn := ifelse(is.na(radn) | radn > 50, mean.radn, radn)]
met.files.all[, maxt := ifelse(is.na(maxt) | maxt > 50, mean.maxt, maxt)]
met.files.all[, mint := ifelse(is.na(mint) | mint < -25, mean.mint, mint)]
met.files.all[, rain := ifelse(is.na(rain) | rain > 300, 0, rain)]

#### The last step in the cleaning process is to go through all the data one more time and remove any rows that do not contain comple cases or, in other words, remove any rows that still contain NA vlaues.

In [26]:
met.files.all <- met.files.all[complete.cases(met.files.all)]

### Reformat layout of data to exact OZCOT specifications.

#### The OZCOT model uses the fortran and our input met files must have values that are laid out in a highly specific manner. A specific number of digits should appear before and after the dicimal places for each vector as well as the space between each vector having specific width and only being delimited by a space rather than any other character like a comma.

In [27]:
met.files.all[, year :=  sprintf('%4.0f', year)]
met.files.all[, day  :=  sprintf('%3.0f', day)]
met.files.all[, radn := sprintf('%6.1f', radn)]
met.files.all[, maxt := sprintf('%5.1f', maxt)]
met.files.all[, mint := sprintf('%5.1f', mint)]
met.files.all[, rain := sprintf('%5.1f', rain)]

### Write output file to be used in OZCOT model.

#### This command will write our output file to a file path of our designation and in a manner that is appropriate for use in the OZCOT model.

In [29]:
write.table(met.files.all, "/home/will/PSWC_daily_data_compiled_2018-03-07/output/met.files.all.txt", 
            sep = ' ', row.names = FALSE, quote = FALSE)