## Cooking with ClarityNLP - Session #7 - NLPQL Under the Hood

Today we will take a behind-the-scenes look at how ClarityNLP evaluates NLPQL expressions. We will walk through the construction of an NLPQL file and give a high-level description of how its results are generated. We will also provide an overview of our new NLPQL editor tool that makes the task of creating NLPQL files much easier. For background on installing and using ClarityNLP, please see our [documentation](https://claritynlp.readthedocs.io/en/latest/index.html).  We welcome questions via Slack or on [GitHub](https://github.com/ClarityNLP/ClarityNLP/issues).

### Extracting Measurements from Radiology Reports

To start things off, suppose that we're developing a promising new immunotherapy drug. This drug has proven to be effective on tumors of various sizes, but we have noted particular efficacy for tumors in the 1 cm to 2 cm size range. We want to recruit patients for a new clinical trial designed to test the drug on tumors of this size.  We have access to a corpus of radiology reports, and we would like to search these reports for patients with appropriately-sized tumors. How can we use ClarityNLP to find more patients?

As you've learned in previous cooking sessions, we need to create an NLPQL file that defines a documentset for the radiology reports and a termset with tumor-related terms. We will need to run the measurement finder to extract measurements, and then filter the measurements with mathematical expressions that constrain the allowable tumor sizes.

When developing a new NLPQL file, it is best to limit the number of documents processed until the NLPQL is fully debugged and working. So let's start by limiting our initial document set to 50 documents. It shouldn't take too long to processes 50 documents, and if we make a mistake, we can quickly recover. 

A limit on the number of documents is specified by a ``limit`` statement on the first line of the NLPQL file. So open a text editor, create a new file called ``lesion.nlpql``, and enter the following line:

<pre>limit 50;</pre>

Next we need to insert some boilerplate that identifies the phenotype and version, provides a description, and imports the ClarityNLP libraries. All of your NLPQL files will have text similar to this at the start:

<pre>phenotype "Lesions1to2Cm" version "1";
description "Find lesions of sizes ranging from 1 to 2 cm.";
include ClarityCore version "1.0" called Clarity;</pre>

Since we only want to search radiology reports, we can create a documentset specifically for this purpose. Note that the ``report_types`` field actually takes an array argument (identified by the square brackets). We will use a single-element array containing the term ``Radiology``, the label used by the MIMIC-III data set:

<pre>
documentset Docs:
    Clarity.createDocumentSet({
        "report_types":["Radiology"]
    });
</pre>

Next we need to create a list of the tumor-related terms we want ClarityNLP to search for. We ponder this for a while and eventually arrive at a termset with these words:

<pre>
termset LesionTerms: [
    "lesion", "growth", "mass", "malignancy", "tumor",
    "neoplasm", "nodule", "cyst", "focus of enhancement",
    "echodensity", "hypoechoic focus", "echogenic focus"
];
</pre>

Since we need to find and extract measurements, we must insert a command to activate ClarityNLP's measurement finder. The simplest command to do this is:

<pre>
define LesionMeasurement:
    Clarity.MeasurementFinder({
        documentset: [Docs],
        termset: [LesionTerms]
    });
</pre>

This command runs the measurement finder on each sentence of our source documents. It returns any measurements that occur in the same sentence as a term in our termset.

Our goal is to find **patients** with tumors of the specified dimensions, so we specify a ``Patient`` context:

<pre>
context Patient;
</pre>

Now we're ready to write the commands for constraining the lesion measurements to our desired size of 1-2 cm. Here we will insert three commands to do so, and we will explain the differences in results for each below:

<pre>
define xBetween10and20mm:
    where LesionMeasurement.dimension_X >= 10 AND LesionMeasurement.dimension_X <= 20;

define xyBetween10and20mm:
    where LesionMeasurement.dimension_X >= 10 AND LesionMeasurement.dimension_X <= 20 AND
          LesionMeasurement.dimension_Y >= 10 AND LesionMeasurement.dimension_Y <= 20;

define xyzBetween10and20mm:
    where LesionMeasurement.dimension_X >= 10 AND LesionMeasurement.dimension_X <= 20 AND
          LesionMeasurement.dimension_Y >= 10 AND LesionMeasurement.dimension_Y <= 20 AND
          LesionMeasurement.dimension_Z >= 10 AND LesionMeasurement.dimension_Z <= 20;
</pre>

ClarityNLP normalizes all dimensional measurements to units of **millimeters**, so our desired range of 1-2 cm becomes 10-20 mm. These three statements enforce constratints on the X, XY, and XYZ measurement components respectively.

And with that we're done. Here is the final text of our ``lesion.nlpql``:

<pre>
limit 50;
phenotype "LesionDemo" version "1";
description "Find lesions of various sizes.";
include ClarityCore version "1.0" called Clarity;

// radiology documents only in the documentset
documentset Docs:
    Clarity.createDocumentSet({
        "report_types":["Radiology"]
    });

// lesion terms
termset LesionTerms: [
    "lesion", "growth", "mass", "malignancy", "tumor",
    "neoplasm", "nodule", "cyst", "focus of enhancement",
    "echodensity", "hyperechogenic focus"
];

// extract lesion measurements
define LesionMeasurement:
    Clarity.MeasurementFinder({
        documentset: [Docs],
        termset: [LesionTerms]
    });

// we want to find patients, so use 'Patient' context
context Patient;

// constraints on X, XY, and XYZ

define xBetween10and20mm:
    where LesionMeasurement.dimension_X >= 10 AND LesionMeasurement.dimension_X <= 20;

define xyBetween10and20mm:
    where LesionMeasurement.dimension_X >= 10 AND LesionMeasurement.dimension_X <= 20 AND
          LesionMeasurement.dimension_Y >= 10 AND LesionMeasurement.dimension_Y <= 20;

define xyzBetween10and20mm:
    where LesionMeasurement.dimension_X >= 10 AND LesionMeasurement.dimension_X <= 20 AND
          LesionMeasurement.dimension_Y >= 10 AND LesionMeasurement.dimension_Y <= 20 AND
          LesionMeasurement.dimension_Z >= 10 AND LesionMeasurement.dimension_Z <= 20;
</pre>

### Testing the NLPQL for Syntax Errors

Before trying to process documents with our new NLPQL file, it is a good idea to first check it for syntax errors. We can do this by submitting it to the ``nlpql_tester`` API endpoint, a useful tool for the NLPQL developer.

In prevous cooking sessions we showed you how to use the [Postman](www.postman.com) GUI tool to submit NLPQL files to the ClarityNLP webserver. Today we will show you how to use a command-line tool called [cURL](https://curl.haxx.se/) to do the same thing.

The nlpql_tester API for a local ClarityNLP instance is typically found at ``localhost:5000/nlpql_tester``. The NLPQL file should be sent via HTTP POST using a content type of ``text/plain``.

To submit the file, install ``curl`` on your system, then open a terminal window, change directories to the location of ``lesion.nlpql``, and run the next command. If you are following along in the notebook, there is a copy of ``lesion.nlpql`` in ``notebooks/cooking/assets/``.

<pre>
curl -i -X POST http://localhost:5000/nlpql_tester -H "Content-Type: text/plain" --data-binary "@lesion.nlpql"
</pre>

Here's what the various options mean:
```
-i: include the HTTP header in the output
-X: request type (must be ``POST``)
-H: add the subsequent ``Content-Type: text/plain`` to the header of the HTTP request
--data-binary: POST the data exactly as specified, no additional processing
```

The cURL command submits the file to the ``nlpql_tester`` API endpoint via HTTP POST. If the syntax is OK the system responds with a JSON result. Otherwise the system responds with an error message.

You can run the NLPQL tester directly from this notebook by first running the code in the next cell:

In [None]:
# This code below is only required for running ClarityNLP in Jupyter notebooks.
# It is not required if running NLPQL via API or the ClarityNLP GUI.
import pandas as pd
import claritynlp_notebook_helpers as claritynlp

Now run the next cell to test the NLPQL file:

### Running the NLPQL File

Having verified that the NLPQL file has the proper syntax, we submit the job to the ClarityNLP server with a similar cURL command:
<pre>
curl -i -X POST http://localhost:5000/nlpql -H "Content-Type: text/plain" --data-binary "@lesion.nlpql"
</pre>

Alternatively, you can run from the next notebook cell:

The job may take several minutes to run. After it runs to completion, browse to the location of the CSV file containing the intermediate results, and open it in in a spreadsheet application such as Microsoft Excel. We have saved the results of a run to ``notebooks/cooking/assets/lesion_intermediate.csv``, some of which is displayed in the next cell:

In [17]:
lesion_csv = pd.read_csv('assets/lesion_intermediate.csv', 
                         usecols=['dimension_X', 'dimension_Y', 'dimension_Z', 
                                  'nlpql_feature', 'subject'])
lesion_csv

Unnamed: 0,dimension_X,dimension_Y,dimension_Z,nlpql_feature,subject
0,28,16,,LesionMeasurement,40463
1,6,,,LesionMeasurement,40463
2,17,8,,LesionMeasurement,40463
3,110,101,,LesionMeasurement,40463
4,7,,,LesionMeasurement,37766
5,6,,,LesionMeasurement,37766
6,7,,,LesionMeasurement,37766
7,39,20,,LesionMeasurement,26259
8,8,,,LesionMeasurement,43634
9,5,,,LesionMeasurement,43634


### Spreadsheet Rows are MongoDB Documents

Our run generated a CSV file containing a header row and 194 rows of data. This CSV file is a dump of the results for our particular job. These results are stored in a MongoDB collection called ``phenotype_results`` in a database called ``nlp``. It is important to understand that **each row** of data above is a separate document in the MongoDB database. For instance, here is the underlying ``MeasurementFinder`` database document for row 2 above:

In [11]:
import json
obj = { "_id" : "5bfd9c9a31ab5b2e981dca14", "sentence" : "there is a 1.7 x 0.8 cm fdg-avid soft tissue nodule in the subcutaneous tissues of the right breast.", "text" : "1.7 x 0.8 cm", "start" : 11, "value" : 17, "end" : 23, "term" : "avid soft tissue nodule", "dimension_X" : 17, "dimension_Y" : 8, "dimension_Z" : None, "units" : "MILLIMETERS", "location" : [ "subcutaneous tissues of the right breast" ], "condition" : "EQUAL", "value1" : None, "value2" : "", "temporality" : "CURRENT", "min_value" : 8, "max_value" : 17, "pipeline_type" : "MeasurementFinder", "pipeline_id" : 12573, "job_id" : 11131, "batch" : 50, "owner" : "claritynlp", "nlpql_feature" : "LesionMeasurement", "inserted_date" : "2018-11-27T14:35:54.749Z", "concept_code" : -1, "phenotype_final" : False, "report_id" : "1048492", "subject" : "40463", "report_date" : "2119-02-16T00:00:00Z", "report_type" : "Radiology", "source" : "MIMIC", "solr_id" : "1048492" }
print(json.dumps(obj, indent=4))

{
    "_id": "5bfd9c9a31ab5b2e981dca14",
    "sentence": "there is a 1.7 x 0.8 cm fdg-avid soft tissue nodule in the subcutaneous tissues of the right breast.",
    "text": "1.7 x 0.8 cm",
    "start": 11,
    "value": 17,
    "end": 23,
    "term": "avid soft tissue nodule",
    "dimension_X": 17,
    "dimension_Y": 8,
    "dimension_Z": null,
    "units": "MILLIMETERS",
    "location": [
        "subcutaneous tissues of the right breast"
    ],
    "condition": "EQUAL",
    "value1": null,
    "value2": "",
    "temporality": "CURRENT",
    "min_value": 8,
    "max_value": 17,
    "pipeline_type": "MeasurementFinder",
    "pipeline_id": 12573,
    "job_id": 11131,
    "batch": 50,
    "owner": "claritynlp",
    "nlpql_feature": "LesionMeasurement",
    "inserted_date": "2018-11-27T14:35:54.749Z",
    "concept_code": -1,
    "phenotype_final": false,
    "report_id": "1048492",
    "subject": "40463",
    "report_date": "2119-02-16T00:00:00Z",
    "report_type": "Radiology",
    "sour

You will need to view the results in a spreadsheet to see how the field names become column names in the intermediate CSV file. Thus the CSV file provides a 'flattened' view of the database results for a particular job.

### Interpreting the Results

Looking at the output rows above, you can see that the results are broadly grouped by the value of the ``nlpql_feature`` field. There are four such groups with values ``LesionMeasurement``, ``xBetween10and20mm``, ``xyBetween10and20mm``, and ``xyzBetween10and20mm``. Take a look at the NLPQL file above and observe that these are the name strings in each ``define`` statement.

A value of ``NaN`` (not a number) is the equivalent of a null result, meaning that no data was found for that measurement dimension.

Rows 0-144 contain the extracted measurements, all of which have their ``nlpql_feature`` field equal to ``LesionMeasurement``. These rows comprise the output of the measurement extractor. They are the **input** data for the mathematical expressions that constrain the lesion measurements. The underlying documents for these ``LesionMeasurement`` results in the MongoDB database are called *task result documents*.

Rows 145-183 have their ``nlpql_feature`` field equal to ``xBetween10and20mm``. Unlike the ``LesionMeasurement`` rows, which are directly generated by the MeasurementFinder task, these rows are generated by evaluation of a mathematical epxression. This expression places a constraint on the X dimension of each measurement. Only those measurements that satisfy the constraint fill these rows of the intermediate result file.

Rows 184-192 have their ``nlpql_feature`` field equal to ``xyBetween10and20mm``. These rows are generated by evaluation of a mathematical expression that constrains both the X and Y measurement dimensions. Only those measurements that satisfy the constraints fill these rows of the intermediate result file.

Row 193 has its ``nlpql_feature`` field equal to ``xyzBetween10and20mm``.  This row is the only measurement that survives the constraint on all three dimensions.

Note that the ``xBetween10and20mm`` results contain 2D and 3D measurements, some of which have Y or Z dimensions that exceed 20 mm (such as rows 175 and 176). The constraint for these rows is only on the X dimension. The Y and Z dimensions can have any value whatsoever, even NaN (which means they don't exist).

We see a single 3D measurement in the ``xy`` result section, in row 188. This measurement happens to have its Z dimension satisfying the constraints on the X and Y dimensions, but there is no constraint imposed on the measurement by the code itself.

### NLPQL Expressions

In the NLPQL example above, we expressed constraints on the measurement dimensions with NLPQL expressions. In this section we describe the different types of expression and provide an overview of how ClarityNLP evaluates them.

#### Mathematical Expressions

An NLPQL mathematical expression is found in a ``define`` statement such as:
<pre>
define hasFever:
    where Temperature.value >= 100.4;
    
define xBetween10and20mm:
    where LesionMeasurement.dimension_X >= 10 AND LesionMeasurement.dimension_X <= 20;
</pre>

The ``where`` portion of the statement is the mathematical expression. These expressions feature mathematical operations on variables of the form ``nlpql_feature.variable_name`` such as ``Temperature.value``, ``LesionMeasurement.dimension_X``, etc. They can also include numeric literals such as ``100.4``.

NLPQL mathematical expressions produce a numerical result from data contained in a **single** task result document. Since each task result document comprises a row in the intermediate results CSV file (see above), the evaluation of mathematical expressions is also called a **single-row operation**. The numerical result from the expression evaluation is written to a new MongoDB result document, as demonstrated in the lesion example above.

#### Logical Expressions

An NLPQL logical expression is also found in a ``define`` statement and involves the logical operators AND, OR, and NOT, such as:
<pre>
define hasSmallOrLargeLesion:
    where xLessThan5mm OR xGreaterThan30mm;

define hasSepsis:
    where hasFever and hasSepsisSymptoms;
</pre>

The ``where`` portion of the statement is the logical expression. Logical expressions operate on high-level NLPQL features such as ``hasFever`` and ``hasSepsisSymptoms``, **not** on individual variables such as ``Temperature.value``. The presence of an ``nlpql_feature.variable_name`` token indicates that the expression is actually single-row, not multi-row.

NLPQL logical expressions use data from one or more task result documents and compute a new set of results. The results are written back to MongoDB as a set of new result documents. The evaluation of a logical expression is also called a **multi-row operation**, since it typically consumes and generates multiple rows in the intermediate results CSV file.

### Evaluation of Single-Row Expressions

So what does ClarityNLP have to do to evaluate a mathematical expression?

First, the NLPQL front end parses the NLPQL file and generates a string of whitespace-separated tokens for each expression. The token string is passed to the evaluator which determines if it is single-row, multi-row, or something else that cannot be evaluated. If single-row, the the nlpql_feature and field list are extracted.

Consider these examples, both of which are single-row mathematical expressions:

<pre>
where Temperature.value >= 100.4
where LesionMeasurement.dimension_X < 5 AND LesionMeasurement.dimension_Y < 5
</pre>

The first expression has an ``nlpql_feature`` of ``Temperature`` and a field list containing the single entry ``value``. The second expression has an ``nlpql_feature`` of ``LesionMeasurement`` and a field list consisting of the entries ``dimension_X`` and ``dimension_Y``.

### Initial MongoDB Aggregation Pipeline Stage

The next task for the evaluator is to convert the expression into a sequence of [MongoDB aggregation](https://docs.mongodb.com/manual/aggregation/) pipeline stages. The aggregation pipeline provides filtering, document transformation operations, and mathematical operations that ClarityNLP uses to evaluate expressions.

The conversion process involves the generation of an initial [``$match``](https://docs.mongodb.com/manual/reference/operator/aggregation/match/#pipe._S_match) query to filter out everything but the data for the current job. The match query also checks for the existence of all entries in the field list and that they have non-null values. **A simple existence check is not sufficient**, since a null field actually exists but has a value that cannot be used for computation. Hence checks for existence and a non-null value are both necessary.

For the two examples above, the ``$match`` query generates a pipeline filter stage that looks like this, assuming a job_id of 11116:

<pre>
// first example
{
    $match : {
        "job_id" : 11116,
        "nlpql_feature" : {$exists:true, $ne:null},
        "value"         : {$exists:true, $ne:null}
    }
}

// second example
{
    $match : {
        "job_id" : 11116,
        "nlpql_feature" : {$exists:true, $ne:null},
        "dimension_X"   : {$exists:true, $ne:null},
        "dimension_Y"   : {$exists:true, $ne:null}
    }
}
</pre>

Note that the presence of this initial filter is the reason why the ``xBetween10and20mm`` results ignore the Y and Z dimensions, and why the ``xyBetween10and20mm`` results ignore the Z dimension. There are no filters on those variables in their respective ``define`` statements!

### Subsequent Pipeline Stages

After generating the initial filter stage, ClarityNLP transforms the mathematical expression from infix to postfix. The reason for this is to remove parentheses and to resolve operator precedence and associativity issues. NLPQL uses the same [operator precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence) and associativity as the Python programming language.

After conversion to postfix the expressions become:

<pre>
'nlpql_feature', 'Temperature', '==', 'value', '100.4', '>=', 'and'
'nlpql_feature', 'LesionMeasurement', '==', 'dimension_X', '5', '<', 'dimension_Y', '5', '<', 'and', 'and'
</pre>

A postfix expression can be evaluated with a stack-based evaluator. The general idea is to push the postfix tokens onto a stack until an operator is encountered, at which point its operands are popped, the operator expression evaluated, and the result pushed back onto the stack.

ClarityNLP uses this method, but the evaluation process does not compute a mathematical result. Instead, it performs string processing to generate MongoDB aggregation commands for evaluating the mathematical expression. MongoDB aggregation uses a consistent syntax that makes this evaluation process possible.

After the postfix evaluation stage the expressions become:

<pre>
// (nlpql_feature == Temperature) and (value >= 100.4)
{
   $match : {
       "job_id" : 11116,
       "nlpql_feature" : {$exists:true, $ne:null},
       "value"         : {$exists:true, $ne:null}
   }
},
{
    "$project" : {
        "value" : {
            "$and" : [
                {"$eq"  : ["$nlpql_feature", "Temperature"]},
                {"$gte" : ["$value", 100.4]}
            ]
        }
    }
}

// (nlpql_feature == LesionMeasurement) and (dimension_X < 5 and dimension_Y < 5)
{
    "$match" : {
        "job_id" : 11116,
        "nlpql_feature" : {$exists:true, $ne:null},
        "dimension_X"   : {$exists:true, $ne:null},
        "dimension_Y"   : {$exists:true, $ne:null}
    }
},
{
    "$project" : {
        "value" : {
            "$and" : [
                {
                    "$eq" : ["$nlpql_feature", "LesionMeasurement"]
                },
                {
                    "$and" : [
                        {"$lt" : ["$dimension_X", 5]},
                        {"$lt" : ["$dimension_Y", 5]}
                    ]
                }
            ]
        }
    }
}
</pre>

At this point the aggregation pipelines for both expressions are complete. Each pipeline is sent to MongoDB where it runs and generates the results seen in the spreadsheet output above.

### Evaluation of Multi-Row Expressions

## Case #1:  Sentiment Analysis
For this  Cooking session, we are going to integrate a few external APIs that perform [sentiment analysis](https://en.wikipedia.org/wiki/Sentiment_analysis) and enable their use within the ClarityNLP ecosystem.  By the end of the session, you should have a good handle on how to incorporate any REST API into your [NLPQL](https://clarity-nlp.readthedocs.io/en/latest/user_guide/intro/overview.html#example-nlpql-phenotype-walkthrough) phenotypes.

### 1.1 Identify external APIs for sentiment analysis

For this example, we want to leverage some of the brilliant minds in text analytics to help us perform Sentiment Analysis using ClarityNLP.  You may or may not be surprised to learn that there are >100 APIs out there for performing sentiment analysis.

![NLPQL_Runner.png](assets/Sentiment_APIs.png)

Our first stop will be [Microsoft Azure Text Analytics](https://westus.dev.cognitive.microsoft.com/docs/services/TextAnalytics.V2.0/operations/56f30ceeeda5650db055a3c9/console).  The Azure Sentiment API lets you pass in a simple sentence or group of sentences and get back an overall sentiment score from 0 to 1.  0 being very negative and 1 very positive.

Here is an example from Postman:

![NLPQL_Runner.png](assets/Azure_Sentiment_Query.png)

The sentiment score for the above sentence is very low (i.e., negative).  Let's try something a little more upbeat.

![NLPQL_Runner.png](assets/Azure_Happy_Query.png)

As you can see, we have a much more positive score (99+).  It's pretty fun to play around with just different sentences ("I am super mad at you" scores a 0.14 whereas "I am not super mad at you" score a 0.03).  Cool stuff, but our goal today is to look at how we might integrate such an API into ClarityNLP. 

### 1.2 Transforming APIs into Custom Tasks 

*Start with a Template*

The first thing we'll do is start with a [Custom API Task Base Template](https://github.com/ClarityNLP/ClarityNLP/blob/ceb40586257078ef4f3f7ea91739141d47e83748/nlp/custom_tasks/SampleAPITask.py). This sample task calls an API to assign a random Chuck Norris joke to every document.

```python
from tasks.task_utilities import BaseTask
from pymongo import MongoClient
import requests


class SampleAPITask(BaseTask):
    task_name = "ChuckNorrisJokeTask"

    # NLPQL

    # define sampleTask:
    # Clarity.ChuckNorrisJokeTask({
    #   documentset: [ProviderNotes]
    # });

    def run_custom_task(self, temp_file, mongo_client: MongoClient):
        for doc in self.docs:

            response = requests.post('http://api.icndb.com/jokes/random')
            if response.status_code == 200:
                json_response = response.json()
                if json_response['type'] == 'success':
                    val = json_response['value']
                    obj = {
                        'joke': val['joke']
                    }

                    # writing results
                    self.write_result_data(temp_file, mongo_client, doc, obj)

            else:
                # writing to log (optional)
                self.write_log_data("OOPS", "No jokes this time!")
```

Now there is a lot of stuff to look at in there, but the only part you really have to pay attention to is the middle part below:

```python
     
        for doc in self.docs:

            response = requests.post('http://api.icndb.com/jokes/random')
            if response.status_code == 200:
                json_response = response.json()
                if json_response['type'] == 'success':
                    val = json_response['value']
                    obj = {
                        'joke': val['joke']
                    }

                    # writing results
                    self.write_result_data(temp_file, mongo_client, doc, obj)

            else:
                # writing to log (optional)
                self.write_log_data("OOPS", "No jokes this time!")
```

What this means is that for each document in the selected documentset, make an API POST request. (The parameter `documentset: [ProviderNotes]` from our NLPQL becomes `self.docs` in the Custom Task code.)  The documentset could be nursing notes containing the word "central line" or  documents tagged "Echocardiogram" or any documentset you can imagine as we discusssed in a [prior Cooking class](https://github.com/ClarityNLP/ClarityNLP/blob/master/notebooks/cooking/Cooking_with_ClarityNLP_091218.ipynb).  They will always be referred to as `self.docs` in a Custom Task.

For every one of these documents, this Task is going to ring up the `http://api.icndb.com/jokes/random` joke API and pick a good joke.  It will then add the joke to an object called `obj` and store it back in our results database.  Now, let's see if we can modify this for our Azure Sentiment API.

*Change the API Call*

For our sentiment analysis, we need to change up the POST headers and body to match the Azure API specifications.  So our we'll change a couple things:

```python
headers = {'Content-Type': 'application/json', 'Ocp-Apim-Subscription-Key': 'XXXXXX'}
payload = {"documents": [{"language": "en", "id": "1", "text": doc}]}
response = requests.post('https://eastus.api.cognitive.microsoft.com/text/analytics/sentiment', headers=headers, json=payload)
```

What we've done is added some of the headers required (like our secret API key) and made the body of the request (the "payload") match the configuration shown in the Postman image above. Then instead of calling the ChuckNorris API, we change our call to Microsoft's URL.

*Change the API Result Handling*

Each API returns results in its own way, so you've got to follow the API documentation so see what you can expect back.  As we saw earlier, this Sentiment API responds with this kind of result:

```json
{
	"documents": [{
		"score": 0.14780092239379883,
		"id": "1"
	}],
	"errors": []
}
```

So we'll build our object a little differently than we did for Chuck Norris.  It'll need to look something like this.

```python
json_response = response.json()
val = json_response['documents'][0]
obj = {
    'sentiment_score': val['score']
    }
```

If we were passing in multiple documents at a time (which we are not), we would need to loop through the response one document at a time.  But in this case, we can just take the first (and only) response, hence the [0].

*API Keys*

Chuck Norris was a free API.  Azure is also free for limited usage, but you need an API key.  In this version, we are going to rely on the user to supply us the API key by passing a parameter in their NLPQL.  Here is example NLPQL we might see:

```
define PatientFeelings:
    Clarity.AzureSentiment({
        documentset: [ProviderNotes],
        "api_key": "{your_api_key}"
    });
```

In order to "catch" this api_key and use it in our Custom Task, we've got a library that get custom_arguments from the NLPQL.  It looks like this:

```
self.pipeline_config.custom_arguments['{parameter_name']
```

So in this case, `self.pipeline_config.custom_arguments['api_key']` would retrieve the API key submitted by the user in the NLPQL.




So putting the whole thing together, we've got our final code:

```python
    for doc in self.docs:
        headers = {'Content-Type': 'application/json', 'Ocp-Apim-Subscription-Key': self.pipeline_config.custom_arguments['api_key']}
        payload = {"documents": [{"language": "en", "id": "1", "text": doc}]}
        response = requests.post('https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment', headers=headers, json=payload)
        json_response = response.json()
        val = json_response['documents'][0]
        obj = {
            'sentiment_score': val['score'],
        }

        # writing results
        self.write_result_data(temp_file, mongo_client, doc, obj)

```

To see the final code, with the wrapping back in place and a little bit of error handling thrown in, take a look at [AzureSentimentTask.py](https://github.com/ClarityNLP/ClarityNLP/blob/master/nlp/custom_tasks/AzureSentimentTask.py) in the repo. We made one additional tweak to be sure we are only sending sentences containing birds.  

```java
for doc in self.docs:
  sentence_list = self.get_document_sentences(doc)
    for sentence in sentence_list:
      if any(word.lower() in sentence.lower() for word in self.pipeline_config.terms):
```

### 1.3 Using the Sentiment API Task in a Query 

Our API can now be called using the NLPQL

In [34]:
nlpql ='''
limit 1;

//phenotype name
phenotype "How we feel about birds" version "1";

//include Clarity main NLP libraries
include ClarityCore version "1.0" called Clarity;

termset Birds:
  ["football"];

define BirdFeelings:
  Clarity.AzureSentiment({
    termset:[Birds],
    "api_key":"'''+azure_key+'''"
    });
'''
run_result, main_csv, intermediate_csv, luigi = claritynlp.run_nlpql(nlpql)

Job Successfully Submitted
{
    "intermediate_results_csv": "http://18.220.133.76:5000/job_results/643/phenotype_intermediate",
    "job_id": "643",
    "luigi_task_monitoring": "http://18.220.133.76:8082/static/visualiser/index.html#search__search=job=643",
    "main_results_csv": "http://18.220.133.76:5000/job_results/643/phenotype",
    "phenotype_config": "http://18.220.133.76:5000/phenotype_id/643",
    "phenotype_id": "643",
    "pipeline_configs": [
        "http://18.220.133.76:5000/pipeline_id/862"
    ],
    "pipeline_ids": [
        862
    ],
    "results_viewer": "?job=643",
    "status_endpoint": "http://18.220.133.76:5000/status/643"
}


## NLPQL Editor

We know-- NLPQL isn't too hard to read or copy/tweak, but it is pretty tough to generate *de novo*.  So we've created an editor that helps you build your NLPQL without worrying about a missed semi-colon here or bracket there.  Let's [check it out](https://nlpql-editor.herokuapp.com/demo.html).

![NLPQL_Runner.png](assets/NLPQL_editor.png)

Thank you for joining this week's Cooking with ClarityNLP!  Please send any requests or ideas for future Cooking shows to charity.hilton@gtri.gatech.edu.

Have a great week!