## Building the Tensorflow local model to Train on the data    
**In the notebook I have tried building a Local Tensorflow model. But, it fails to run due to the size of the dataset as you will see below.**

In [1]:
import pandas as pd
import numpy as np
import json
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [2]:
import tensorflow as tf
import tensorflow.feature_column as fc
import os, sys

In [3]:
tf.enable_eager_execution()

**Creating the Tensorflow Model**

In [30]:
! gsutil cp gs://nyc_servicerequest/processedInput/train1.csv train1.csv

Copying gs://nyc_servicerequest/processedInput/train1.csv...
| [1 files][180.3 MiB/180.3 MiB]                                                
Operation completed over 1 objects/180.3 MiB.                                    


In [38]:
df = pd.read_csv('puthere/train1.csv')

In [39]:
df.head()

Unnamed: 0.1,Unnamed: 0,day_period,day_of_week,zip_encode,location_encode,community_encode,agency_encode,complaint_encode,TimeTaken
0,0,evening,Mon-Tue,zip_bin4,location_bin1,community_bin2,agency_bin6,complaint_bin3,74.9
1,1,afternoon,Fri-Sat-Sun,zip_bin4,location_bin1,community_bin2,agency_bin6,complaint_bin3,498.759
2,2,morning,Mon-Tue,zip_bin4,location_bin4,community_bin2,agency_bin5,complaint_bin3,830.793
3,3,night,Wed-Thu,zip_bin4,location_bin4,community_bin2,agency_bin5,complaint_bin3,26.873
4,4,morning,Mon-Tue,zip_bin4,location_bin4,community_bin2,agency_bin5,complaint_bin3,40.107


In [20]:
df = df[['day_period', 'day_of_week', 'zip_encode',
       'location_encode', 'community_encode', 'agency_encode',
       'complaint_encode', 'TimeTaken']]

In [35]:
df[:200][['day_period', 'day_of_week', 'zip_encode',
       'location_encode', 'community_encode', 'agency_encode',
       'complaint_encode', 'TimeTaken']].to_csv('demo.csv')

In [24]:
COLUMNS = ['day_period', 'day_of_week', 'zip_encode', 'location_encode',
       'community_encode', 'agency_encode', 'complaint_encode']

In [14]:
LABEL = 'TimeTaken'

In [22]:
unique_vals = dict()
unique_vals['day_period'] = ['morning', 'afternoon', 'evening', 'night']
unique_vals['day_of_week'] = ['Mon-Tue', 'Wed-Thu', 'Fri-Sat-Sun']
unique_vals['zip_encode'] = ['zip_bin1', 'zip_bin2', 'zip_bin3', 'zip_bin4']
unique_vals['location_encode'] = ['location_bin1', 'location_bin2', 'location_bin3', 'location_bin4']
unique_vals['community_encode'] = ['community_bin1', 'community_bin2', 'community_bin3']
unique_vals['agency_encode'] =  ['agency_bin1', 'agency_bin2', 'agency_bin3', 'agency_bin4', 'agency_bin5', 'agency_bin6']
unique_vals['complaint_encode'] = ['complaint_bin1', 'complaint_bin2', 'complaint_bin3']


**Defining the Feature columns for the Model**

In [116]:
feature_columns = []

In [117]:
for each in COLUMNS:
    feature_columns.append(
        tf.feature_column.categorical_column_with_vocabulary_list(
            key = each,
            vocabulary_list = unique_vals[each]
        )
    )

In [119]:
#feature_columns

**Dividing data into Train and Test sets**

In [120]:
from sklearn.model_selection import train_test_split

In [121]:
X = df[COLUMNS]
y = df[LABEL]

In [122]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)
X_train.reset_index(inplace=True)
X_test.reset_index(inplace=True)

**Defining the Input Function**

In [123]:
def return_dict(df):
    retdict = dict()
    for each in COLUMNS:
        retdict[each] = df[each]
    return retdict

In [124]:
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
    def input_function():
        ds = tf.data.Dataset.from_tensor_slices((return_dict(data_df), label_df))
    
        if(shuffle==True):
            ds = ds.shuffle(len(data_df))
        ds = ds.batch(batch_size).repeat(num_epochs)
        return ds
    
    return input_function


In [125]:
train_input_fn = make_input_fn(X_train, y_train)
eval_input_fn = make_input_fn(X_test, y_test, num_epochs=1, shuffle=False)


**Model Building**

In [139]:
ds = make_input_fn(X_train, y_train, batch_size=10)()
for feature_batch, label_batch in ds.take(1):
    print('Some feature keys:', list(feature_batch.keys()))
    print()
    print('A batch of class:', feature_batch['community_encode'].numpy())
    print('A batch of class:', feature_batch['agency_encode'].numpy())
    print('A batch of class:', feature_batch['complaint_encode'].numpy())
    print('A batch of class:', feature_batch['day_of_week'].numpy())
    print('A batch of class:', feature_batch['day_period'].numpy())
    print('A batch of class:', feature_batch['zip_encode'].numpy())
    print('A batch of class:', feature_batch['location_encode'].numpy())
    print()
    print('A batch of Labels:', label_batch.numpy())

Some feature keys: ['community_encode', 'agency_encode', 'complaint_encode', 'day_of_week', 'day_period', 'zip_encode', 'location_encode']

A batch of class: [b'community_bin2' b'community_bin1' b'community_bin2' b'community_bin2'
 b'community_bin2' b'community_bin2' b'community_bin2' b'community_bin2'
 b'community_bin2' b'community_bin2']
A batch of class: [b'agency_bin4' b'agency_bin1' b'agency_bin1' b'agency_bin1'
 b'agency_bin6' b'agency_bin2' b'agency_bin1' b'agency_bin1'
 b'agency_bin4' b'agency_bin6']
A batch of class: [b'complaint_bin3' b'complaint_bin3' b'complaint_bin2' b'complaint_bin3'
 b'complaint_bin3' b'complaint_bin3' b'complaint_bin2' b'complaint_bin3'
 b'complaint_bin2' b'complaint_bin2']
A batch of class: [b'Wed-Thu' b'Wed-Thu' b'Mon-Tue' b'Mon-Tue' b'Wed-Thu' b'Fri-Sat-Sun'
 b'Mon-Tue' b'Mon-Tue' b'Wed-Thu' b'Wed-Thu']
A batch of class: [b'night' b'night' b'night' b'evening' b'morning' b'night' b'night'
 b'morning' b'morning' b'morning']
A batch of class: [b'zip_bin

## Tensorflow Local Model

In [127]:
linear = tf.estimator.LinearRegressor(feature_columns = feature_columns)



W0722 02:51:31.554228 139679535597312 estimator.py:1811] Using temporary folder as model directory: /tmp/tmp6krgiehl


#### Upload the Data Files on GCP Storage

In [36]:
!gsutil cp 'demo.csv' gs://nyc_servicerequest/processedInput/

Copying file://demo.csv [Content-Type=text/csv]...
/ [1 files][ 18.9 KiB/ 18.9 KiB]                                                
Operation completed over 1 objects/18.9 KiB.                                     


## GIVE UP.. Model can't be trained with so much of Data here. Need to use CloudML

In [128]:
linear.train(train_input_fn)

W0722 02:51:55.698511 139679535597312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/feature_column/feature_column_v2.py:2655: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0722 02:52:01.116675 139679535597312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/canned/linear.py:308: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.


KeyboardInterrupt: 

In [None]:
results = linear.evaluate(eval_input_fn)

In [None]:
for key,value in sorted(result.items()):
    print('%s: %s' % (key, value))