## Temperature Density Profiles

This notebook is an early attempt to replicate the daily minimum and maximum weather distribution profiles provided to us by the DFU.

**Try plotting the daily min and max pdf’s for the WRF-BC’d-to-a-station data, on the same plot as the station data we use for bias-correction, to compare with a similar plot that DFU shared an excel workbook for**


In [None]:
import numpy as np
import pandas as pd
import xarray as xr
import scipy.stats as stats
import calendar
import climakitae as ck

pd.options.plotting.backend = 'holoviews'

In [None]:
app = ck.Application()

### Step 1: Retrieve bias-corrected data for a station 

First we'll read in some **bias-corrected station data**. For ease of reproducibility, we have pre-loaded data selections for air temperature for the Burbank-Glendale-Pasadena Airport for 1985-2010. However, if you would like to make modifications, or see how the data can be selected, uncomment the line app.select in the cell below to pull up a useful panel that illustrates all of the data options.

In [None]:
## preset location and data selections for ease here

app.location.data_type = "Station"
app.location.station=['Burbank-Glendale-Pasadena Airport']
app.selections.variable = "Air Temperature at 2m"
app.selections.unit = "degF" 
app.selections.resolution = "3 km"
app.selections.time_slice = (1985, 2010)

# app.select()

In [None]:
data = app.retrieve().squeeze() # retrieves the data, and drops any singleton dimensions (scenario, in this case)
data = app.load(data)

In [None]:
# examine the dataset for information
data

### Step 2: Calculate daily min and max temperatures distributions

#### Step 2a: Calculate daily min and max temperaturees
As the bias corrected data is at an hourly scale, we will need to calculate the daily minimum and maximum values. We do this below using the built-in xarray function `resample` which identifies the maximum/minimum value in each 1 day period, and returns that value for every day as a collapsed daily time-series. 

In [None]:
t2_dailymax = data.resample(time="1D").max() # daily maximum from hourly data
t2_dailymin = data.resample(time="1D").min() # daily minimum from hourly data

#### Step 2b: Calculate the probability distribution function for daily maximum and minimum temperature

We'll do this with the scipy library function `stats.norm` with the `pdf` option, this ensures that we are calculating the probability density function. We've created a wrapper function `data_pdf` that does this for all the simulations available. 

In [None]:
def data_pdf(data, bins, ext):
    """processes data to produce the pdf arrays"""
    
    # determines how many simulations we are working with
    num_sim = len(data.simulation.values)
    
    # set-up for first simulation
    data_sim = data.isel(simulation=0) # first simulation
    data_sim_arr = data_sim.to_array() # converts to a data-array, as stats can only be calculated on a single array at a time
    data_sim_mean, data_sim_std = data_sim_arr.mean(), data_sim_arr.std() # calculates the mean, standard deviation
    data_sim_snd = stats.norm(data_sim_mean.values, data_sim_std.values) # calculates normal distribution using mean and std. deviation
    data_pdf_arr = data_sim_snd.pdf(bins) # calculates the pdf
    
    # sets-up dataframe of pdf values, for easy plotting and export
    df = pd.DataFrame(data = data_pdf_arr, columns = [str(data_sim.simulation.values) + "_" + str(ext)])
    
    # same process for every other simulation
    for sim in range(1, num_sim):
        data_sim = data.isel(simulation=sim)
        data_sim_arr = data_sim.to_array()
        data_sim_i_mean, data_sim_i_std = data_sim_arr.mean(), data_sim_arr.std()
        data_sim_i_snd = stats.norm(data_sim_i_mean.values, data_sim_i_std.values) 
        data_pdf_arr = data_sim_i_snd.pdf(bins)
        df[str(data_sim.simulation.values) + '_' + str(ext)] = data_pdf_arr # adds simulation name and max/min extension
                
    return df

Next we set-up the number of bins to calculate the PDF over. We are interested in the range between 20°F and 120°F, at a 1°F interval. In the bins set-up, the high end of the range has a +1 included to ensure that 120 is the maximum here (and not 119). 

In [None]:
lowest_temp = 20
highest_temp = 120
bins = np.arange(lowest_temp, highest_temp+1, 1)

Now, we calculate the PDF for a specific month. First, we need to grab just the data for that month, for which we've set-up the `grab_months` function, for which you can pass the month to, but be sure to pass a number to this function (Jan=1, Dec=12). We use February (month=2) as an example here, but you can modify the month to be any of your choosing. 

In [None]:
def grab_months(data, month):
    """Grabs the specific month of interest and returns DataSet of all years for that month.
    Month must be passed as a number"""
    data_months = data.groupby('time.month').groups
    month_idxs = data_months[month]
    return data.isel(time=month_idxs)

In [None]:
month = 2 # default of February
t2_dailymax_monthly = grab_months(t2_dailymax, month=month)
t2_dailymin_monthly = grab_months(t2_dailymin, month=month)

Calculate the daily PDFs for that month below. 

In [None]:
maxtemp_pdf = data_pdf(t2_dailymax_monthly, bins=bins, ext='max')
mintemp_pdf = data_pdf(t2_dailymin_monthly, bins=bins, ext='min')

Combine the dataframes together so that they are all in a single location, and can be easily visualized and exported to a .csv file. 

In [None]:
bins_df = pd.DataFrame(data=bins, columns=['Temperature'])
df_to_plot = pd.concat([bins_df, maxtemp_pdf, mintemp_pdf], axis=1, join="inner")
df_to_plot = df_to_plot.set_index('Temperature')
df_to_plot.head()

#### Step 2c: Visualize the results
Plot distributions of daily maximum and minimum temperature for a selected month over a set of years. Remember, here we are using data from 1985-2010 as our baseline, and are displaying the results for February, but you can choose any month above! Play around with different months to see how the PDF distributions vary. 

In [None]:
df_to_plot.plot(xlabel="Temperature (degF)",
                grid=True, # adds gridlines for easier interpretation
                title="PDFs for " + str(app.location.station[0]) + "\n" + calendar.month_name[month],
               )

#### Step 3: Export PDF values to a .csv file
Lastly, we'll export the dataframe of PDF values to a csv file. Included is the temperature bins, and the maximum and minimum PDF distributions per simulation. 
- **QUESTION**: original spreadsheet has "grand total" = sum of each column? do we include this? 

In [None]:
filename = "temperature_pdfs_{0}.csv".format(app.location.station[0].replace(" ", "_")).lower()
df_to_plot.to_csv(filename, index=True)