## Watchdog data prep and visualization
This batch of code is intended to manage the meterologic data collected by the network of Watchdog 2000 series loggers deployed across the JFSP 2015 experimental gradient. Generally, this notebook will build a programmatic way to read in overlapping or discontinuous met records from a single station, generate unique timestamp information for each record, associate each logger with metadata, perform cursory QA/QC steps, and concatenate the data into a single met record.

Ultimately, a portion of the steps developed here will be packaged into executables and ran each time the data are downloaded by a field technician, ideally aiding the technician in performing on-site QA/QC prior to leaving the field.

### Load notebook dependencies
and configure notebook aesthetic preferences

In [4]:
# ------- Notebook config
%matplotlib inline
import matplotlib.colors
import matplotlib.pyplot as plt

# ------- Load dependencies
import pandas as pd
import numpy as np
import seaborn as sns
import os

# ------- Watchdog utils
from watchdogutils import *

# ------- Plot environment aesthetics
sns.set_style('ticks')
sns.set_context('notebook', font_scale=1.2)

dataDir = 'Z:/JFSP_2015/Weather Stations/Data/Exports/2_6_18/'
outDir = 'Z:/JFSP_2015/Weather Stations/Data/Vis/Diagnostics/'


### Processing steps:
#### Generate a list of files in the 'Exports' directory
Then parse the strings of the exported .txt files to extract station ID, station locale, and if need be down the road, the download date.

#### Fix up the timestamps
This just relates to naming and additional column generation. Rename the initial timestamp column, and extract day of year, month, year, and hour for easy resampling and averaging later on. This will also make adjusting time stamps for incorrect clocks or offsets much easier as well.

#### Create a quick panel of the variables of interest
Generally step through each column that has data in the met record and plot it. This is a crude output, first pass sort of plotting.

#### Create a variable by variable QA/QC framework
There are two types of measurements made by the watchdogs: core and ancillary. The core measurements are the air temperature, relative humidity, anemometer measurements, rainfall, and some calculated variables derived from those core measurements. Ancillary measurements come from sensors plugged into the watchdog's logger. Currently, we record two soil temperature and two soil moisture measurements at each logger (a pair 5 cm under shrubs, and a pair 5cm in the open).

In [5]:
# Quickly list all the files in the data directory
fileList = next(os.walk(dataDir))[2]
fileList

['Jemez_8_2_6_18.txt',
 'Jemez_5_2_6_18.txt',
 'Jemez_3_2_6_18.txt',
 'Jemez_4_2_6_18.txt',
 'Jemez_1_2_6_18.txt',
 'Jemez_6_2_6_18.txt',
 'Jemez_7_2_6_18.txt',
 'Jemez_9_2_6_18.txt',
 'Jemez_10_2_6_18.txt',
 'Jemez_2_2_6_18.txt']

### The above scripts are rolled into a loop 
that iterates over the entire list of climate files. The result is the creation of time series of raw and cleaned primary and ancillary measurements made at each weather station, as well as some diagnostic plots that show the QAQC steps that were taken to clean VWC and TMP. 

In [6]:
# Diagnostic plot creation
# Usage: Step through the three functions defined above, in a loop where the 
#        loop iterator is the file name in the list of met station data files.
#        The result will be the production of a list of .tif files, one for each
#        met station. The auxilliary sensors then get cleaned using a median filter
#        and diagnostic QA/QC plots are produced showing which points are replaced
#        with NaN.

import warnings
warnings.filterwarnings('ignore')

for metfile in fileList:  
    metdf = parseAndReadMetData(dataDir, metfile)
    metdf_a = prepareTimeStamps(metdf)
    rawSummaryPlots(metdf_a, outDir)
    plotWindRose(metdf_a, outDir)

    cleanVWC(metdf_a, outDir)
    cleanTMP(metdf_a, outDir)
    filteredDF = metdf_a['Locale'][0] + '_' + str(metdf_a['LoggerID'][0]) + '_filtered.csv'
    metdf_a.to_csv('Z:/JFSP_2015/Weather Stations/Data/Filtered/' + filteredDF)
    tempSummaryPlot(metdf_a)
    precipSummaryPlot(metdf_a)
    VWCSummaryPlot(metdf_a)

warnings.filterwarnings('default')

### Generating precip summary figures -- single site
for each site is the next goal. We want a quick figure that shows the cumulative rainfall, min, mean, max, and variance of temperature (air, soil), and same for VWC, by cover. Bar and box plots make the most sense here, maybe by month to start. Given time stamps we generated, we should have access to the .month attribute for easy grouping in the pandas dataframe. Start by creating cumulative precip for a single plot, by month.

In [4]:
#example

### Generating temp summary figures -- single site
We should do the same for temperature -- both air and soil temp.

In [5]:
#example

### Generating a locale-wide summary
The Jemez locale weather stations are tightly clustered along FR 287 just south of the Valles Caldera off of highway 4. Spanning just under 4km, the weather stations are distributed across an elevation gradient of ~200 meters that roughly tracks latitude, ranging from 2392 m to 2591 m.

A quick descriptive summary of the means and variances of the met variables by elevation and or aspect will be useful in the future. In order to facilitate generating those plots however, we really need to gather all the met data into a single dataframe, then subset or group by aspect, elevation, etc.

In [6]:
filteredDataDir = 'Z:/JFSP_2015/Weather Stations/Data/Filtered/'
fileList = next(os.walk(filteredDataDir))[2]

idx = 0
for df_f in fileList:
    if idx == 0:
        allMetData = pd.read_csv(filteredDataDir + df_f)
    else:
        thisDF = pd.read_csv(filteredDataDir + df_f)
        allMetData = pd.concat([thisDF, allMetData])
    idx += 1

In [7]:
metadataFN = 'Z:/JFSP_2015/Weather Stations/Weatherstation_Metadata.csv'
metadata = pd.read_csv(metadataFN)
allMetData['Aspect'] = 'Flat'
for ID in np.unique(metadata.LoggerID):
    stationAspect = metadata[metadata.LoggerID == ID].iloc[0].Aspect
    allMetData.Aspect[allMetData.LoggerID == ID] = stationAspect

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [8]:
months = ['J','F','M','A','M','J','J','A','S','O','N','D']
indexes = np.unique(allMetData.month, return_index=True)[1]
monthsInDF = np.array([allMetData.month[index] for index in sorted(indexes)])
monthLabels = [months[i] for i in monthsInDF-1]


TypeError: list indices must be integers, not Series

In [None]:
monthsInDF, monthLabels = getMonthLabels(allMetData)


sns.boxplot(x="month", y="TMP", data=allMetData, palette=['white','gray'], 
            hue = 'Aspect', order = monthsInDF)

