# Comparison of Natural Language Understanding Services and Frameworks

This document compares the leading cloud service providers, and programming frameworks, for Natural Language Understanding (NLU).  A summary table of specific characteristics displays an overview of the tool differences.  Additional comparison tables are show commercial cloud service providers, in-depth.  The document concludes with code samples implementing the different tools.


__Executive Summary__

Open source programming frameworks compare favorably, and oftentimes dominate, commerical cloud providers in both features and performance.  Python's [spaCy](https://spacy.io/usage/facts-figures#section-benchmarks) appears to be optimally designed for production and offers a large number of features.  Only Stanford's CoreNLP has similar qualities for open software, and commercial services do not appear to offer as much.  Building a foundational layer with this library allows for strong, general, start to more specific solutions, later.  In particular, [R has over a hundred libraries](https://cran.r-project.org/web/views/NaturalLanguageProcessing.html) that provide highly specialized functionality.

Commercial services do allow for simplified pricing because costs are per NLU item, not processing time, which must be calculated for manual methods.  However, using a commercial service does not obviate the need for additional programming in order to customize general and specific solutions.  In which case, the commercial service becomes an extra layer of complexity between processes in a pipeline.  

__Open source references__

* [Python: spaCy](https://www.analyticsvidhya.com/blog/2017/04/natural-language-processing-made-easy-using-spacy-%E2%80%8Bin-python/)
* [R: TextMining(tm)](https://eight2late.wordpress.com/2015/05/27/a-gentle-introduction-to-text-mining-using-r/)
* [R: OpenNLP](https://rpubs.com/lmullen/nlp-chapter)

__Commercial references__

* [kontikilabs: very thorough with accompanying code](https://medium.com/kontikilabs/comparing-machine-learning-ml-services-from-various-cloud-ml-service-providers-63c8a2626cb6)
* [Google vs Watson](http://fredrikstenbeck.com/google-natural-language-vs-watson-natural-language-understanding/)
* [Watson internals](https://www.quora.com/What-do-AI-ML-and-NLP-researchers-think-of-IBM%E2%80%99s-Watson-Does-it-have-the-potential-to-make-a-huge-impact)
* [Google: categories](https://cloud.google.com/natural-language/docs/categories)
* [Watson: categories](https://console.bluemix.net/docs/services/natural-language-understanding/categories.html#categories-hierarchy)

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 

### Summary of all open frameworks and commercial services


In [2]:
Image(url= "images/Cloud_and_Open.png", width=700)

### Summary of open frameworks



In [16]:
Image(url= "./images/Open.png", width=500)

### Summary of commercial cloud services

Commercial features are fairly consistent across services, except for Syntax and Part-Of-Speech, which IMHO is a must have.  Google has better Syntax, POS.  Watson has nice hierarchical categories.

In [13]:
Image(url= "./images/CloudML_Features.png", width=500)

Performance may not be as important because we using the service as batch daily, and we are not paying for processing time.

In [14]:
Image(url= "./images/CloudML_Performance.png", width=500)

Costs are also consistent

In [15]:
Image(url= "./images/CloudML_Cost.png", width=500)

END OF DOCUMENT