## Heuristic Models (Cost Function Extension)
Look at the Seattle weather in the data folder. Come up with a heuristic model to predict if it will rain today. Keep in mind this is a time series, which means that you only know what happened historically (before a given date).



### The model :

**It will rain tomorrow if it rained more than 0.7 inch (>0.7 PRCP) today.**

In [41]:
# here is an example of how to build and populate a hurestic model

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/daniel-dc-cd/data_science/master/module_4_ML/data/seattle_weather_1948-2017.csv')

numrows = 25549 # can be as large as 25549

#create an empty dataframe to hold 100 values
heuristic_df = pd.DataFrame({'yesterday':[0.0]*numrows,
                             'today':[0.0]*numrows,
                             'tomorrow':[0.0]*numrows,
                             'guess':[False]*numrows,  # logical guess
                             'rain_tomorrow':[False]*numrows,  # historical observation
                             'correct':[False]*numrows,  # TRUE if your guess matches the historical observation
                             'true_positive':[False]*numrows,  # TRUE If you said it would rain and it did
                             'false_positive':[False]*numrows,  # TRUE If you sait id would rain and it didn't
                             'true_negative':[False]*numrows,  # TRUE if you said it wouldn't rain and it didn't
                             'false_negative':[False]*numrows})  # TRUE if you said it wouldn't raing and it did

# sort columns for convience
seq = ['yesterday',
       'today',
       'tomorrow',
       'guess',
       'rain_tomorrow',
       'correct',
       'true_positive',
       'false_positive',
       'true_negative',
       'false_negative']

heuristic_df = heuristic_df.reindex(columns=seq)



In [42]:
df.head()

Unnamed: 0,DATE,PRCP,TMAX,TMIN,RAIN
0,1948-01-01,0.47,51,42,True
1,1948-01-02,0.59,45,36,True
2,1948-01-03,0.42,45,35,True
3,1948-01-04,0.31,45,34,True
4,1948-01-05,0.17,45,32,True


In [43]:
heuristic_df.head()

Unnamed: 0,yesterday,today,tomorrow,guess,rain_tomorrow,correct,true_positive,false_positive,true_negative,false_negative
0,0.0,0.0,0.0,False,False,False,False,False,False,False
1,0.0,0.0,0.0,False,False,False,False,False,False,False
2,0.0,0.0,0.0,False,False,False,False,False,False,False
3,0.0,0.0,0.0,False,False,False,False,False,False,False
4,0.0,0.0,0.0,False,False,False,False,False,False,False


In [77]:
# here is an example loop that populates the dataframe created earlier
# with the total percip from yesterday and today
# then the guess is set to true if rained both yesterday and today 

for _ in range(numrows):
    
    # start at time 2 in the data frame
    i = _ + 2
    
    # pull values from the dataframe
    yesterday = df.iloc[(i - 2), 1]
    today = df.iloc[(i - 1), 1]
    tomorrow = df.iloc[i, 1]
    rain_tomorrow = df.iloc[(i), 1]
    


   
    heuristic_df.iat[_, 0] = yesterday
    heuristic_df.iat[_, 1] = today
    heuristic_df.iat[_, 2] = tomorrow
    heuristic_df.iat[_, 3] = False # set guess default to False
    heuristic_df.iat[_, 4] = rain_tomorrow
    
      # example hueristic
    
    #today > 0.0 and yesterday > 0.0 
    #It will rain tomorrow if it rained more than 1 inch (>1.0 PRCP) today.
    
    if today > 0.7 : 
        heuristic_df.iat[_, 3] = True
        
    if heuristic_df.iat[_, 3] == heuristic_df.iat[_, 4]:
        heuristic_df.iat[_, 5] = True     #correct
        
        if heuristic_df.iat[_, 3] == True:
            heuristic_df.iat[_, 6] = True  # true positive
        else:
            heuristic_df.iat[_, 8] = True  # true negative
        
    else:
        heuristic_df.iat[_, 5] = False
        
        if heuristic_df.iat[_, 3] == True:
            heuristic_df.iat[_, 7] = True  # false positive
        else:
            heuristic_df.iat[_, 9] = True  # false negative

        



heuristic_df

Unnamed: 0,yesterday,today,tomorrow,guess,rain_tomorrow,correct,true_positive,false_positive,true_negative,false_negative
0,0.47,0.59,0.42,False,True,False,False,False,False,True
1,0.59,0.42,0.31,False,True,False,False,False,False,True
2,0.42,0.31,0.17,False,True,False,False,False,False,True
3,0.31,0.17,0.44,False,True,False,False,False,False,True
4,0.17,0.44,0.41,False,True,False,False,False,False,True
5,0.44,0.41,0.04,False,True,False,False,False,False,True
6,0.41,0.04,0.12,False,True,False,False,False,False,True
7,0.04,0.12,0.74,False,True,False,True,False,False,True
8,0.12,0.74,0.01,True,True,True,True,False,False,True
9,0.74,0.01,0.00,False,False,True,False,False,True,False


In [70]:
from sklearn.model_selection import train_test_split 

# enter split function here to make h_train and h_test subsets of the data
X_train, X_test, y_train, y_test = train_test_split(df, df['RAIN'], test_size=0.3)



In [71]:
#The accuracy of predicitions 

heuristic_df['correct'].value_counts()/numrows

True     0.595287
False    0.404713
Name: correct, dtype: float64

In [72]:
#The precision of predicitions
# precision is the percent of your postive prediction which are correct

heuristic_df['true_positive'].sum() / (heuristic_df['true_positive'].sum() + heuristic_df['false_positive'].sum())


0.911543287327478

In [73]:
#the recall of your predicitions

# recall the percent of the time you are correct when you predict positive

heuristic_df['true_positive'].sum() / (heuristic_df['true_positive'].sum() + heuristic_df['false_negative'].sum())

0.1176708778749595

In [78]:
heuristic_df['false_negative'].sum()

10895

In [74]:
#The sum of squared error (SSE) of your predictions

import numpy as np

def sse(y_true, y_pred):
    '''returns sum of squared errors (actual vs model)'''
    squared_errors = (y_true - y_pred) ** 2
    return np.sum(squared_errors)


sse(y_true=heuristic_df['rain_tomorrow'].astype('int'), y_pred=heuristic_df['guess'].astype('int'))

10340

In [75]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true=heuristic_df['rain_tomorrow'].astype('int'), y_pred=heuristic_df['guess'].astype('int'))

0.40471251320991036