### Heuristic Model
Look at the Seattle weather in the **data** folder. Come up with a heuristic model to predict if it will rain today. Keep in mind this is a time series, which means that you only know what happened historically (before a given date). One example of a heuristic model is: It will rain tomorrow if it rained more than 1 inch (>1.0 PRCP) today. Describe your heuristic model in the next cell.

In [108]:
#here is an example of how to build and populate a hurestic model

import pandas as pd

df = pd.read_csv('../data/seattle_weather_1948-2017.csv')

numrows = 25549 # can be as large as 25549

#create an empty dataframe to hold 100 values
heuristic_df = pd.DataFrame({'yesterday':[0.0]*numrows,
                             'today':[0.0]*numrows,
                             'tomorrow':[0.0]*numrows,
                             'guess':[False]*numrows, #logical guess
                             'rain_tomorrow':[False]*numrows, #historical observation
                             'correct':[False]*numrows}) #TRUE if your guess matches the historical observation

#sort columns for convience
seq = ['yesterday','today','tomorrow','guess','rain_tomorrow','correct']
heuristic_df = heuristic_df.reindex(columns=seq)

In [109]:
df.head()

Unnamed: 0,DATE,PRCP,TMAX,TMIN,RAIN
0,1948-01-01,0.47,51,42,True
1,1948-01-02,0.59,45,36,True
2,1948-01-03,0.42,45,35,True
3,1948-01-04,0.31,45,34,True
4,1948-01-05,0.17,45,32,True


In [110]:
heuristic_df.head()

Unnamed: 0,yesterday,today,tomorrow,guess,rain_tomorrow,correct
0,0.0,0.0,0.0,False,False,False
1,0.0,0.0,0.0,False,False,False
2,0.0,0.0,0.0,False,False,False
3,0.0,0.0,0.0,False,False,False
4,0.0,0.0,0.0,False,False,False


Build a loop to add your heuristic model guesses as a column to this dataframe

In [111]:
# here is a loop that populates the dataframe created earlier
# with the total percip from yesterday and today
# then the guess is set to true if rained both yesterday and today 
for z in range(numrows):
    # start at time 2 in the data frame
    i = z + 2
    # pull values from the dataframe
    yesterday = df.iloc[(i-2),1]
    today = df.iloc[(i-1),1]
    tomorrow = df.iloc[i,1]
    rain_tomorrow = df.iloc[(i),1]
    
    heuristic_df.iat[z,0] = yesterday
    heuristic_df.iat[z,1] = today
    heuristic_df.iat[z,2] = tomorrow
    heuristic_df.iat[z,3] = False # set guess default to False
    heuristic_df.iat[z,4] = rain_tomorrow
    
    if heuristic_df.iat[z,3] == heuristic_df.iat[z,4]:
        heuristic_df.iat[z,5] = True
    else:
        heuristic_df.iat[z,5] = False

### Evaluate the performance of the Heuristic model

***the accuracy of your predicitions***

In [112]:
heuristic_df['correct'].value_counts()/numrows

True     0.57333
False    0.42667
Name: correct, dtype: float64

<hr>

### Linear Regression with scikit-learn
Build a linear regression model using the last two days of weather data (use today and yesterday to predict tomorrow). Use the examples from the linear regression folder and make changes to the assignment code to accomplish this task. 

Only use scikit-learn to train your model.  

In [113]:
data = heuristic_df.dropna()
data = data[['yesterday', 'today', 'tomorrow']]
data

Unnamed: 0,yesterday,today,tomorrow
0,0.47,0.59,0.42
1,0.59,0.42,0.31
2,0.42,0.31,0.17
3,0.31,0.17,0.44
4,0.17,0.44,0.41
...,...,...,...
25544,0.00,0.00,0.00
25545,0.00,0.00,0.00
25546,0.00,0.00,0.00
25547,0.00,0.00,0.00


In [114]:
X = data[['yesterday', 'today']]
y = data['tomorrow']

In [115]:
X.shape, y.shape

((25542, 2), (25542,))

<br>

In [116]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

model = LinearRegression(fit_intercept=True)

model.fit(X, y)

predictions = model.predict(X)

r2_score(y, predictions)

0.09817161023749199

<hr>

### Logistic Regression Prediction Model with scikit-learn

Using scikit-learn, a logistic regression prediction model is being built here, with at least three input variables, for the Seattle Weather data. 

The model will predict if it is going to rain tomorrow (true or false).

In [137]:
data = heuristic_df.dropna()
data = data[['yesterday', 'today', 'tomorrow', 'rain_tomorrow']]
data.head()

Unnamed: 0,yesterday,today,tomorrow,rain_tomorrow
0,0.47,0.59,0.42,True
1,0.59,0.42,0.31,True
2,0.42,0.31,0.17,True
3,0.31,0.17,0.44,True
4,0.17,0.44,0.41,True


In [138]:
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values

<br>

Changing `y`'s type from `bool` to `float` will fix the error that occurs below when evaluating the model with `r2_score`. Is there a way to fix it without chaning the type?

In [139]:
# y = y.astype(float)

<br>

In [140]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import r2_score

In [141]:
# split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [142]:
# instantiate model, train it and predict test set labels 
model = LogisticRegression().fit(X_train, y_train)
y_pred = model.predict(X_test)

In [143]:
X_test[0].reshape(1, -1)

array([[0.  , 0.17, 0.03]])

In [144]:
import numpy as np

prcp_yesterday = 0.4
prcp_today = 0.17
prcp_tomorrow = 0.03

print(f'Amount of precipitation:\nyesterday:  {prcp_yesterday}\ntoday:      {prcp_today}\ntomorrow:   {prcp_tomorrow}')
print('\nWill rain tomorrow?')
print(model.predict(np.array([prcp_yesterday, prcp_today, prcp_tomorrow]).reshape(1, -1)))

Amount of precipitation:
yesterday:  0.4
today:      0.17
tomorrow:   0.03

Will rain tomorrow?
[False]


In [145]:
# validate model
r2_score(y_test, y_pred)

TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.