# Implementation

Describe the intent and functionality of the interactive visualizations you implemented. Provide clear and well-referenced images showing the key design and interaction elements.

## System Architecture

The following diagram provides a high level overview of the system architecture:

![System Architecture](SystemArchitecture.png)

The majority of the processing is performed in client-side JavaScript. The prediction calculations are currently performed in Python-based webservice hosted on Heroku, as the implementation is using Sci-kit Learn's [RandomForest](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) library. Unfortunately, there is significant latency when using the webservice, with delays up t0 10 seconds for a single prediction. This does not meet our current goals of having a very interactive, game-like experience when performing predictions. 

If time permits, we will attempt again to replace the webservice with a client-based Random Forest JavaScript implementation, such as [Karpathy's](https://github.com/karpathy/forestjs). We attempted some initial tests but
there were unsuccessful.

Update 4/18/2016. We did some further investigation and found that Karpathy's implementation does not appear to provide the prediction probabilities in a useful (or perhaps even accurate) fashion. The next library we will try is from [JFrazelle](https://github.com/jfrazelle/random-forest-classifier).

<a id="wsperf"></a>
## Solving the Webservice Performance Problem

April 20, 2016

Today, we made major progress in solving the web service performance problem. It turns out that when being hosted on Heroku, the webservice was retraining the entire Random Forest for every single prediction request. We made three changes to the web service that **dropped the prediction time from at least 20 seconds to 2 seconds**:

* We trained the Random Forest on our development machines and persisted the result in a file (using Python pickling). We tried the using recommended Sci-kit [jobload](http://scikit-learn.org/stable/tutorial/basic/tutorial.html#model-persistence) but that generated 2 GB of 5000 persistence files! 
  * We also looked into using Redis, but that had other complications. It wasn't clear if we could store a 300 MB string in Redis. It would have involved other dependencies in the webservice, and the performance improvement may not have been great. Given we were reading a static Random Forest object, there was likely little advantage to using Redis.
* We reduced the number of estimators / decision trees from 1000 to 150 to save time and pickle file size. The resulting pickle size was *only* 271 MB. We do not believe there is a significant difference in prediction accuracy.
* We reduced the number of gunicorn threads from unlimited to 2 based on a [Stackoverflow Suggestion](http://stackoverflow.com/questions/12079582/error-r14-memory-quota-exceeded-not-visible-in-new-relic) that worked out really well. It had the added advantage of keeping the classifier in memory more frequently.

We believe that the performance is adequate enough for now. We will turn our attention to the innovative visualization.