# Testing the Model
## How to use:
### If the test dataset is in the same format as the train dataset provided, 
Please change the file path under section III
### Else if the dataset is similar to the input and output of the model, of the following:
Input:<br>
[demand on T-1248, demand on T-1247 ... demand on T, latitude, longitude] **1251** columns <br>
Output: <br>
[demand on T+1,... demand on T+5] **5** columns <br>
<br>
Please make them into float and pass directly into the prediction model under section IV
### Else
Please try to process the test set into either format mentioned above. I apologize for any inconvenience
## Code:

## I. Import Libraries

In [65]:
### If missing any library, please uncomment the repective line below and pip install
#!pip install tensorflow --upgrade
#!pip install h5py
#!pip install numpy --upgrade
#!pip install pandas
#!pip install dask --upgrade

## Taken from https://pypi.org/project/pygeohash/
## Using this instead of the python-geohash by hiwi due to better documentation

#!pip install pygeohash

In [66]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import pygeohash as pgh
from tqdm._tqdm_notebook import tqdm_notebook
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error
import random
import math
import pickle
tqdm_notebook.pandas()
%matplotlib inline

## II. Load Model

In [67]:
parameters,_,_,_ = pickle.load(open("training_parameters.pkl",'rb'))

## III. Load Test Data and preparation
Here I use a sample 10% sample test set for debugging and sanity check.
<br><br>
### Please replace the file path to actual test.csv 's path
_**The test set should be in the same format as the training data, i.e 4 columns of: [geohash6 , day , timestamp , demand]**_

In [4]:
df = pd.read_csv("10%test_sample.csv")

#### Data processing to match the input features

In [5]:
def string_to_time (string):
    x = string.split(":")
    timing = int(x[0]) * 60 + int(x[1])
    return timing/15
print("Converting timestamp and day to a single number")
df['time_stamp'] = df['timestamp'].progress_apply(string_to_time)
df['time_stamp'] = df['time_stamp'] + (df['day'] - 1)*96

de = df.groupby(['geohash6']).count()
hash_list = de.index.values

df = pd.pivot_table(df, values='demand', index=['time_stamp'],columns=['geohash6'])
df = df.fillna(0)

X = pd.DataFrame({"index":list(np.core.defchararray.add("T-",np.arange(1248,0,-1).astype("str")))+\
                       ["T","lat","lon"]})
Y = pd.DataFrame({"index":["T+1","T+2","T+3","T+4","T+5"]})

k=0
print("Converting to each row to record 13 days prior data as well as from T to T+5, latitude and longitude")
for geohash in tqdm_notebook(hash_list):
    for i in range(1248,df.shape[0]-5):
        if df[geohash].values[i] > 0:
            try:
                k+=1
                X[str(k)] = list(df[geohash].values[i-1248:i+1])+[pgh.decode(geohash)[0],pgh.decode(geohash)[1]]
                Y[str(k)] = list(df[geohash].values[i+1:i+6])
            except:
                k+=1
                print(df[geohash].values[i])
                print(list(df[geohash].values[i-1248:i+6])+[geohash])

X=np.array(X.T.drop(['index'])).astype("float")
Y=np.array(Y.T.drop(['index'])).astype("float")

Converting timestamp and day to a single number


A Jupyter Widget


Converting to each row to record 13 days prior data as well as from T to T+5, latitude and longitude


A Jupyter Widget




## IV. Making Predictions

In [6]:
def forward_propagation(X, parameters):
    '''
    This function obtain coefficient of various parameters and use them to predict a final cost(Z3)
    This process consists of  a linear function of X @ W1 + b1, @ being matrix multiplication,
        followed by a retilinear activation function 
    '''
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']
    
    Z1 = tf.add(tf.matmul(X,W1), b1)                      
    A1 = tf.nn.relu(Z1)  
    Z2 = tf.add(tf.matmul(A1,W2), b2)     
    A2 = tf.nn.relu(Z2)    
    Z3 = tf.add(tf.matmul(A2,W3), b3)  

    return Z3

def predict(X, parameters):
    
    W1 = tf.convert_to_tensor(parameters["W1"])
    b1 = tf.convert_to_tensor(parameters["b1"])
    W2 = tf.convert_to_tensor(parameters["W2"])
    b2 = tf.convert_to_tensor(parameters["b2"])
    W3 = tf.convert_to_tensor(parameters["W3"])
    b3 = tf.convert_to_tensor(parameters["b3"])
    
    params = {"W1": W1,
              "b1": b1,
              "W2": W2,
              "b2": b2,
              "W3": W3,
              "b3": b3}
    try:
        x = tf.placeholder("float", [X.shape[0],X.shape[1]])
    except:
        x = tf.placeholder("float", [1,X.shape[0]])
    z3 = forward_propagation(x, params)
    
    sess = tf.Session()
    prediction = sess.run(z3, feed_dict = {x: X})
        
    return prediction

#### Using the above functions to predict the values of T+1 to T+5 for each row<br> by using 13 days (1248 time stamps) of demand data, demand data at time = T, latitude and longitude.

In [68]:
prediction = predict(X, parameters)
prediction[0,:]

array([0.12361688, 0.12361688, 0.12361688, 0.12361688, 0.12361688],
      dtype=float32)

In [69]:
actual = Y
actual[0,:]

array([0., 0., 0., 0., 0.])

## V. Results
#### Output the mean square error

In [70]:
results = mean_squared_error(actual,prediction)
r = r2_score(actual,prediction)
print("The mean squared error is ",results)
r

The mean squared error is  0.015390041585956171


-43.62715449239896

In [19]:
results = mean_squared_error(actual,prediction)
r = r2_score(actual,prediction)
print("The mean squared error is ",results)
r

The mean squared error is  0.0010957312489462156


-2.1923454087188343

In [20]:
prediction

array([[0.00891849, 0.00766036, 0.00651336, 0.00614237, 0.00655985],
       [0.01737385, 0.01414197, 0.01108822, 0.00900204, 0.00809074],
       [0.00643334, 0.00576075, 0.00516869, 0.00528677, 0.00607155],
       ...,
       [0.00578769, 0.00509157, 0.00446879, 0.00457932, 0.00538735],
       [0.01271663, 0.01038673, 0.00818622, 0.00687822, 0.00658854],
       [0.01935568, 0.01549025, 0.01182064, 0.00920088, 0.00790395]],
      dtype=float32)

In [56]:
sum(Y)

array([32.5075363 , 36.80343662, 35.95972593, 29.78476868, 32.23377708])

In [57]:
sum(prediction)

array([956.6307 , 425.64618, 393.32312, 412.86932, 461.53748],
      dtype=float32)