Entre Guillemets: compare natural language processing APIs
Run a corpus of text files through multiple natural language processing (NLP) API vendors. View API results side by side so that you can get a general feel for how well each vendor works for your use case. Supported vendors: TextRazor, Google Cloud, IBM Watson and Rosette.
Entre Guillemets is inspired by Cloudy Vision which meets the same types of objectives, but for computer vision APIs.
See example results here.
Entre Guillemets works with Python 3.6. If using Anaconda, you can first:
source activate py36
Install dependencies by running
pip install -r requirements.txt
settings.json and add you API credentials.
Running and getting results is simply
This will process all text files (
.txt extension) in
input_files and store the raw results in
output. You can view the report by opening
This will also output a tabular version of the report in Excel. This report is available under
Annotating input files for some context
You can also provide context information or metadata about each input file (eg. title, reference number, tags, etc.) to display in the output report. Simply include those in JSON by creating a file with the same name as the input file, by adding
.json to its name (so the metadata for
some_file.txt would be in
Vendor specific notes
Text in truncated to 50 000 characters in order to respect Rosette's limit.
Categories and Topics extractions are not benchmarked because they are available in English only.
The Rosette entity output includes a confidence score for some entities, and not for others. The report separates entities with confidence (sorted by reverse confidence value) and without confidence information (sorted by number of occurrences).
Classification is not benchmarked because it is available in English only.
Text in truncated to 200kb in order to respect TextRazor's limit.
The Watson API returns 50 entities by default, and Entre Guillemets uses that default value.
Adding more vendors
Adding more vendors should be relatively easy if you are developer: have a look at the constant at the beginning of
lib/entre_guillemets.py, and then at examples of the vendors already implemented.
This project is licensed under the terms of the MIT license.