# Canadian Credit Union Yelp and Asset Growth Project

Here we explore the correlation between Yelp! reviews and asset growth using various recurrent neural networks. From our experimentation with the data, and features, we found _ yielded the best accuracy of future asset growth. This prediction was generated using Yelp! reviews and sentiment analysis.


## Initial Set-Up

Here we will import the necessary libraries that we will need for the project. Additionally, will read in the data collected from various Canadian Credit Unions and their corresponding Yelp! reviews. 

In [2]:
# Let's first start by importing the libraries and data we'll need
# Libraries needed include numpy, keras, csv, matplotlib

import numpy as np
import keras
import csv
from matplotlib import pyplot as plt

# Need to first import the data into a numpy array so we can do some work with it

fname = 'jena_climate_2009_2016.csv'
f = open(fname)
data = f.read();
f.close

# Now need to separate the column headers from the rest of the data

lines = data.split('\n')
header = lines[0].split(',')
lines = lines[1:]

## Data Analytics 

Here we will plot some of our data to see if we can see any obvious patterns. Best to do that before going right into the model creation so that we can ensure that the obvious patterns are indeed accounted for during that stage. Here will simply look at temperature versus time data.

In [None]:
# Store our temperature data into a numpy array for convenience

temp  = float_data[:,1] 
plt.plot(range(len(data)), data)
plt.title('Asset Growth By Year')
plt.xlabel('Year')
plt.ylabel('Asset Growth ($ CAD)')

## Model Creation

Finally, after all the hard work of normalizing our data, taking a quick look at it, we get the to fun part: model creation. 

In [1]:
# Need to import some libraries from keras to create our model
# This will involve the use of keras sequential neural network models, layers and rmsprop optimizers

from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop

# The rest is very similar to the creation of the sequential neural network we made during the Warm-Up project
# Quick refresher though, need to define our model with a number of layers, an optimzer function, a loss function, an activation function, and how many layers we want it to be

# Define our model as a sequential one
model = Sequential()

# Add some layers to our model 
model.add(layers.Flatten(input_shape=(lookback // step, float_data.shape[-1])))
model.add(layers.Dense(32, activation = 'relu'))
model.add(layers.Dense(1))

# Now compile our model with optimizer and loss functions, no metric for this one though
model.compile(optimizer = RMSprop(), loss = 'mae')
history =  model.fit_generator(train_gen,
                               steps_per_epoch = 500,
                               epochs = 20,
                               validation_data = val_gen,
                               validation_steps = val_steps)

#Before we go any further, some important notes to make here. Will do that below in the "Model Notes" block

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


NameError: name 'lookback' is not defined

## Model Notes

1. Could have used other activation functions, 'relu' is a pretty popular one, but could use the likes of 'selu' and 'sigmoid'.

2. The number of layers we add is completely arbitrary and is usually driven by experimenting with the model to see what works the best for the project.

3. The number of epochs is another great place to play around. This is primarily due to wanting to avoid overfitting, which can happen by having too many training epochs. As such, should play around and see how many epochs yields the best result for the model.

4. The optimizer function is another area to play around as RMSprop may not always be the best choice for the project at hand.

5. The loss function selected here was another judgement call, but others could be used such as binary cross entropy. Used here since we actually have numbers to match to our model's prediction, so makes sense to use mean absolute error to see how far away our model's predictions are so we can mitigate the errors. Could also use root mean square method as well for the same purpose.

## Model Output

Will now plot our model's prediction against the actual data, and validation data to see if we're overfitting, and how our model is performing overall.

In [None]:
# Will grab our losses by going into the training history and defining appropriate variables to make plotting easier

loss = history.history['loss']
val_loss - history.history['val_loss']
epochs = range(1, len(loss) + 1)

# Now plot our training and validation losses

plt.figure()
plt.plot(epochs, loss, 'b', label = 'Training Loss')
plt.plot(epochs, val_loss, 'r', label = 'Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Model Traing and Validation Losses By Epoch')
plt.legend()
plt.show()