## Wind Speed Function
we have the following function for determining growing and killing degree days (respectively):
$$
g(h) = \begin{cases}
0, &h\leq 10\\
h-10, &10\leq h\leq 28\\
18, &28 \leq h\\
\end{cases}
\qquad
g(h) = \begin{cases}
0, &h \leq 28 \\
h-28, &h > 28
\end{cases}
$$
where the computed value for the mean temperature is then multiplied over the length of the growing season. We will base our initial function for killing wind speeds on a similar formulation:
$$g_k(w) = \begin{cases}
0, &w \leq 15 \\
w - 15, &w > 15
\end{cases}$$
where $t$ is the estimated duration of the event (rounded to the nearest hours).
We also might want to compute a "growing wind" variable:
$$g_g(w) = \begin{cases}
0, &w \leq 2 \\
w - 2, &2 \leq w \leq 7 \\
12-w, &w > 7 \\
0, &w > 12
\end{cases}$$

Next steps: run 4th order polynomial in order to determine where the dropoff is 
Response function - goal is to map temperature to yield. As you move along x it changes the log-yeild response. Take estimated reponse function learned from the data. y = B1 GDD + B2 KDD. 
Need to include fixed effects and trends. Replace temperature with windspeed. Estimate a polynomial in the windspeed. 
Include average windspeed up to quadratics 
Different rows - temperature response function and wind response function 


In [1]:
### Import basic libraries
import numpy as np
import pandas as pd
import sklearn as sk
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.preprocessing import PolynomialFeatures
import matplotlib.pyplot as plt
%matplotlib inline
import time
import glob

In [11]:
kill_threshold = 25
power = 3
def ws_func(w):
    if w <= kill_threshold:
        return 0
    else:
        return (w-kill_threshold)**power
ws_func_v = np.vectorize(ws_func)

def gws_func(w):
    return 0
#     if w <= 2:
#         return 0
#     elif w <= 7:
#         return w - 2
#     elif w <= 12:
#         return 12 - w
#     else:
#         return 0
gws_func_v = np.vectorize(gws_func)

In [3]:
# Let's read in a lot of wind data!
county_yield = pd.read_csv("../../data/data_yield/USA_county_yield_gsw.csv")
kansas = county_yield.loc[county_yield['State'] == 'KANSAS']
filenames = glob.glob('../../direcho_compact/*.csv')

In [12]:
def compress_data(height = 'wind_10ms', filenames = filenames):
    print(height)
    data_all = []
    for i in range(len(filenames)):
        # Read and get names
        wind_i = pd.read_csv(filenames[i])
        county_name = filenames[i].split("_")[-2].upper()
        county_data = kansas.loc[kansas['County'] == county_name]
        # Ensure year has both wind and corn data -only regress on these years (2007-2014)
        c_year = county_data["Year"].unique()
        w_year = wind_i["Year"].unique()
        years = np.intersect1d(c_year, w_year)
        county_gs = county_data[county_data["Year"].isin(years)]
        wind_gs = wind_i[wind_i["Year"].isin(years)]
        wind_gs = wind_gs[(wind_gs["Month"] =< 10) & (wind_gs["Month"] >=4)]
        for y in years:
            wind_gs_y = wind_gs.loc[wind_gs["Year"] == y]
            county_gs_y = county_gs.loc[county_gs["Year"] == y]
            w_gs_y = wind_gs_y[height]
            #Calculate KWD and GWD value for season
            growing_winds = np.sum(gws_func_v(w_gs_y))
            killing_winds = np.sum(ws_func_v(w_gs_y))
            # Year, County, log-yeild, GDD, KDD
            data = np.append(county_gs_y.iloc[:,[1, 5, 12, 14, 15]].values, [growing_winds, killing_winds]).tolist()
            data_all.append(data)
    return(data_all)
def evaluate(data):
    data_arr = np.array(data)
    ln_yield = data_arr[:,2].astype(np.float64)
    covariates = data_arr[:,3:7].astype(np.float64)
    #Linear Model
    print("Linear Model")
    model = LinearRegression()
    model.fit(covariates, ln_yield)
    y_preds = model.predict(covariates)
    print("\tr-squared:\t", r2_score(ln_yield, y_preds))
    print("\tCoefficients\t", model.coef_)
    print("\tIntercept:\t", model.intercept_)
    #Hmm that's not great...let's try a polynomial transform?
    print("Polynomial Model")
    poly = PolynomialFeatures(3)
    cov_poly = poly.fit_transform(covariates[:,-1].reshape(-1,1))
    #print(cov_poly)
    covariates_poly = np.hstack((covariates[:,0:1], cov_poly))
    model = LinearRegression()
    model.fit(covariates_poly, ln_yield)
    y_preds = model.predict(covariates_poly)
    print("\tr-squared:\t", r2_score(ln_yield, y_preds))
    print("\tCoefficients\t", model.coef_)
    print("\tIntercept:\t", model.intercept_)

In [13]:
data_all = compress_data()
##wind coefficient is still positive at 40m, 60m, 80m, 100m (error at 120m)
evaluate(data_all)

wind_10ms
Linear Model
	r-squared: 0.606497866585681
	Coefficients [ 0.02162    -1.17882103  0.         -0.00379882]
	Intercept: 5.114316390074879
Polynomial Model
	r-squared:	 0.3740280194199118
	Coefficients	 [-3.02619131e-01  2.62147888e-15 -1.16730866e-04 -6.56190633e-04
  2.59446070e-05]
	Intercept:	 7.481937677363444


In [6]:
data_all = compress_data(height = 'wind speed at 140m (m/s)') #first?
evaluate(data_all)

wind speed at 140m (m/s)


  if (await self.run_code(code, result,  async_=asy)):
  if (await self.run_code(code, result,  async_=asy)):
  if (await self.run_code(code, result,  async_=asy)):


Linear Model
	r-squared: 0.5485441782722307
	Coefficients [-1.49738040e-02 -1.01448181e+00 -2.25592413e-05 -1.47284642e-05]
	Intercept: 5.748358582070504
Polynomial Model
	r-squared: 0.3554558710266311
	Coefficients [-2.95944999e-01  4.12367140e-08  7.15642651e-04 -6.46228357e-07
  1.63367311e-10]
	Intercept: 7.215668881361662


The following regressions were performed on compressed 5-minute data.

In [7]:
filenames2 = glob.glob('C:\\Users\\david\\EPSCI_168\\data_5\\*.csv')
data_all2 = compress_data(height = 'wind_10ms', filenames = filenames2)
evaluate(data_all2)

wind_10ms
Linear Model
	r-squared: 0.669770714393117
	Coefficients [ 1.72915303e-01 -1.50997608e+00  1.98447620e-05  1.13093654e-02]
	Intercept: 3.38079886553727
Polynomial Model
	r-squared: 0.3680440909747744
	Coefficients [-3.47826815e-01 -2.83035043e-15 -4.77219744e-02  3.25273670e-03
 -6.25148315e-05]
	Intercept: 8.044889928766509


In [15]:
data_all2 = compress_data(height = 'wind_100ms', filenames = filenames)
evaluate(data_all2)

wind_100ms
Linear Model
	r-squared: 0.614533752509945
	Coefficients [ 3.15017309e-02 -1.18805784e+00  0.00000000e+00  6.86814991e-05]
	Intercept: 5.007172877609628
Polynomial Model
	r-squared:	 0.3826499262096752
	Coefficients	 [-3.04291280e-01  8.12780926e-09 -1.51800235e-04  1.82618206e-07
 -3.53610814e-11]
	Intercept:	 7.498810926716393


In [18]:
test = pd.read_csv(filenames[0]).dtypes
test

Unnamed: 0      int64
Year            int64
Month           int64
Day             int64
Hour            int64
wind_10ms     float64
wind_40ms     float64
wind_100ms    float64
temp_100      float64
dtype: object

In [None]:
print("test")