### Heuristic Models (Cost Function Extension)
Look at the Seattle weather in the **data** folder. Come up with a heuristic model to predict if it will rain today. Keep in mind this is a time series, which means that you only know what happened historically (before a given date). One example of a heuristic model is: It will rain tomorrow if it rained more than 1 inch (>1.0 PRCP) today. Describe your heuristic model in the next cell.

**your model here**  

Examples:  

If rained yesterday it will rain today.  
If it rained yesterday or the day before it will rain today.

In [1]:
# here is an example of how to build and populate a hurestic model

import pandas as pd

df = pd.read_csv('../data/seattle_weather_1948-2017.csv')

numrows = 25549 # can be as large as 25549

#create an empty dataframe to hold 100 values
heuristic_df = pd.DataFrame({'yesterday':[0.0]*numrows,
                             'today':[0.0]*numrows,
                             'tomorrow':[0.0]*numrows,
                             'guess':[False]*numrows,  # logical guess
                             'rain_tomorrow':[False]*numrows,  # historical observation
                             'correct':[False]*numrows,  # TRUE if your guess matches the historical observation
                             'true_positive':[False]*numrows,  # TRUE If you said it would rain and it did
                             'false_positive':[False]*numrows,  # TRUE If you sait id would rain and it didn't
                             'true_negative':[False]*numrows,  # TRUE if you said it wouldn't rain and it didn't
                             'false_negative':[False]*numrows})  # TRUE if you said it wouldn't raing and it did

# sort columns for convience
seq = ['yesterday',
       'today',
       'tomorrow',
       'guess',
       'rain_tomorrow',
       'correct',
       'true_positive',
       'false_positive',
       'true_negative',
       'false_negative']

heuristic_df = heuristic_df.reindex(columns=seq)

In [2]:
df#.head()

Unnamed: 0,DATE,PRCP,TMAX,TMIN,RAIN
0,1948-01-01,0.47,51,42,True
1,1948-01-02,0.59,45,36,True
2,1948-01-03,0.42,45,35,True
3,1948-01-04,0.31,45,34,True
4,1948-01-05,0.17,45,32,True
...,...,...,...,...,...
25546,2017-12-10,0.00,49,34,False
25547,2017-12-11,0.00,49,29,False
25548,2017-12-12,0.00,46,32,False
25549,2017-12-13,0.00,48,34,False


In [3]:
heuristic_df#.head()

Unnamed: 0,yesterday,today,tomorrow,guess,rain_tomorrow,correct,true_positive,false_positive,true_negative,false_negative
0,0.0,0.0,0.0,False,False,False,False,False,False,False
1,0.0,0.0,0.0,False,False,False,False,False,False,False
2,0.0,0.0,0.0,False,False,False,False,False,False,False
3,0.0,0.0,0.0,False,False,False,False,False,False,False
4,0.0,0.0,0.0,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...
25544,0.0,0.0,0.0,False,False,False,False,False,False,False
25545,0.0,0.0,0.0,False,False,False,False,False,False,False
25546,0.0,0.0,0.0,False,False,False,False,False,False,False
25547,0.0,0.0,0.0,False,False,False,False,False,False,False


Build a loop to add your heuristic model guesses as a column to this dataframe

In [4]:
# here is an example loop that populates the dataframe created earlier
# with the total percip from yesterday and today
# then the guess is set to true if rained both yesterday and today 

for _ in range(numrows):
    
    # start at time 2 in the data frame
    i = _ + 2
    
    # pull values from the dataframe
    yesterday = df.iloc[(i - 2), 1]
    today = df.iloc[(i - 1), 1]
    tomorrow = df.iloc[i, 1]
    rain_tomorrow = df.iloc[(i), 1]
    
    heuristic_df.iat[_, 0] = yesterday
    heuristic_df.iat[_, 1] = today
    heuristic_df.iat[_, 2] = tomorrow
    heuristic_df.iat[_, 3] = False # set guess default to False
    heuristic_df.iat[_, 4] = rain_tomorrow
    
    # example hueristic
    if today > 0.0 and yesterday > 0.0:
        heuristic_df.iat[_, 3] = True
        
    if heuristic_df.iat[_, 3] == heuristic_df.iat[_, 4]:
        heuristic_df.iat[_, 5] = True
        
        if heuristic_df.iat[_, 3] == True:
            heuristic_df.iat[_, 6] = True  # true positive
        else:
            heuristic_df.iat[_, 8] = True  # true negative
        
    else:
        heuristic_df.iat[_, 5] = False
        
        if heuristic_df.iat[_, 3] == True:
            heuristic_df.iat[_, 7] = True  # false positive
        else:
            heuristic_df.iat[_, 9] = True  # false negative

In [5]:
heuristic_df

Unnamed: 0,yesterday,today,tomorrow,guess,rain_tomorrow,correct,true_positive,false_positive,true_negative,false_negative
0,0.47,0.59,0.42,True,True,True,True,False,False,False
1,0.59,0.42,0.31,True,True,True,True,False,False,False
2,0.42,0.31,0.17,True,True,True,True,False,False,False
3,0.31,0.17,0.44,True,True,True,True,False,False,False
4,0.17,0.44,0.41,True,True,True,True,False,False,False
...,...,...,...,...,...,...,...,...,...,...
25544,0.00,0.00,0.00,False,False,True,False,False,True,False
25545,0.00,0.00,0.00,False,False,True,False,False,True,False
25546,0.00,0.00,0.00,False,False,True,False,False,True,False
25547,0.00,0.00,0.00,False,False,True,False,False,True,False


In [6]:
df

Unnamed: 0,DATE,PRCP,TMAX,TMIN,RAIN
0,1948-01-01,0.47,51,42,True
1,1948-01-02,0.59,45,36,True
2,1948-01-03,0.42,45,35,True
3,1948-01-04,0.31,45,34,True
4,1948-01-05,0.17,45,32,True
...,...,...,...,...,...
25546,2017-12-10,0.00,49,34,False
25547,2017-12-11,0.00,49,29,False
25548,2017-12-12,0.00,46,32,False
25549,2017-12-13,0.00,48,34,False


### Evaluate the performance of the Heuristic model

***split data into training and testing***

In [7]:
from sklearn.model_selection import train_test_split 

# enter split function here to make h_train and h_test subsets of the data
X_train, X_test, y_train, y_test = train_test_split(df, df['RAIN'], test_size=0.3)

X_train

Unnamed: 0,DATE,PRCP,TMAX,TMIN,RAIN
24028,2013-10-14,0.00,60,39,False
10549,1976-11-18,0.00,45,37,False
14029,1986-05-30,0.00,83,57,False
17635,1996-04-13,0.02,59,45,True
22168,2008-09-10,0.00,73,50,False
...,...,...,...,...,...
9372,1973-08-29,0.00,72,56,False
20569,2004-04-25,0.00,70,44,False
22269,2008-12-20,0.16,26,14,True
15446,1990-04-16,0.00,77,50,False


In [8]:
X_test

Unnamed: 0,DATE,PRCP,TMAX,TMIN,RAIN
12881,1983-04-08,0.06,54,38,True
6824,1966-09-07,0.00,72,54,False
7681,1969-01-11,0.21,39,34,True
10327,1976-04-10,0.08,67,42,True
24136,2014-01-30,0.00,47,43,False
...,...,...,...,...,...
8483,1971-03-24,0.00,49,35,False
9981,1975-04-30,0.00,67,41,False
19607,2001-09-06,0.00,61,48,False
2359,1954-06-17,0.00,60,43,False


***the accuracy of your predicitions***

In [9]:
# we used this simple approach in the first part to see what percent of the time we where correct 
# calculated as (true positive + true negative)/ number of guesses
heuristic_df['correct'].value_counts()/numrows

True     0.671611
False    0.328389
Name: correct, dtype: float64

***the precision of your predicitions***

In [10]:
# precision is the percent of your postive prediction which are correct
# more specifically it is calculated (num true positive)/(num tru positive + num false positive)

heuristic_df['true_positive'].sum() / (heuristic_df['true_positive'].sum() + heuristic_df['false_positive'].sum())

0.674109000138677

***the recall of your predicitions***

In [11]:
# recall the percent of the time you are correct when you predict positive
# more specifically it is calculated (num true positive)/(num tru positive + num false negative)

heuristic_df['true_positive'].sum() / (heuristic_df['true_positive'].sum() + heuristic_df['false_negative'].sum())

0.44592239244106047

***The sum of squared error (SSE) of your predictions***

In [12]:
import numpy as np

def sse(y_true, y_pred):
    '''returns sum of squared errors (actual vs model)'''
    squared_errors = (y_true - y_pred) ** 2
    return np.sum(squared_errors)


sse(y_true=heuristic_df['rain_tomorrow'].astype('int'), y_pred=heuristic_df['guess'].astype('int'))

8390

In [13]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true=heuristic_df['rain_tomorrow'].astype('int'), y_pred=heuristic_df['guess'].astype('int'))

0.3283885866374418