# Prerequisite Setup
First ensure you have TAP running and the IP of the instance of TAP.

Also ensure you have fetched the schema.

In [1]:
!pip install 'tapclipy>=0.2.2'
from tapclipy import tap_connect
import json

# Create TAP Connection
tap = tap_connect.Connect('http://tap.hi2lab.io')
tap.fetch_schema()
print(tap.url())

http://tap.hi2lab.io/graphql


## Annotations

Annotation is a query that will splitup the text into json data, including seperating the sentences into their own array and providing various stats on each word. 

The stats provided for each word:

- lemma = provides the intended meaning of the word based on it's inflection You can find out more about Lemmatisation [here](https://en.wikipedia.org/wiki/Lemmatisation)
- parent = returns the word this word is dependant on
- pos tag = returns the part of speech tag for this word, learn more [here](https://nlp.stanford.edu/software/tagger.shtml)
- child = returns the word that is dependant on this word
- dep type = returns the dependency type, learn more [here](https://nlp.stanford.edu/software/dependencies_manual.pdf)
- ner tag = returns the named entity recognized if any. learn more [here](https://nlp.stanford.edu/software/CRF-NER.shtml)

This query can provide different outcomes based on the pipeline type passed.

possible pipelines:

- clu = returns the lemma, pos tag and ner tag
- standard = returns the lemma, pos tag, parent, children and dep type
- fast = returns the lemma and pos tag
- ner = returns the lemma, pos tag, parent, children, dep type and ner tag.

See below for examples and descriptions.

## Clu Pipeline
Returns the lemma, pos tag and ner tag for each word.

### Example:

In [11]:
# Set our query type to annotations
query = tap.query('annotations')

# Set our pipeline parameter to clu, standard, fast or ner
params = '''{ "pipeType":"clu" }'''

# pass in some test data
string = "The man Mike jumped over a log and patted his dog Max"

# query the api
strResult = tap.analyse_text(query, string, params)


# build an array of all the lemma, pos and ner terms for each word
lemSentence = []
posSentence = []
nerSentence = []
for sentence in strResult['data']['annotations']['analytics']:
    for token in sentence['tokens']:
        lemSentence.append(token['lemma'])
        posSentence.append(token['postag'])
        nerSentence.append(token['nertag'])


# Print Result

print("-" * 40)
print("Annotations:")
print("-" * 40)

print("Input Text:\n\n", string)

print("\n")
print("Lemma Result:\n\n", " ".join(lemmaSentence))

print("\n")
print("Pos Tag Result:\n\n", " ".join(posSentence))

print("\n")
print("Ner Tag Result:\n\n", " ".join(nerSentence))

print("\n")
print("Raw Result:\n\n", json.dumps(strResult, indent=2))

----------------------------------------
Annotations:
----------------------------------------
Input Text:

 The man Mike jumped over a log and patted his dog Max


Lemma Result:

 the man mike jump over a log and pat his dog max


Pos Tag Result:

 DT NN NNP VBD IN DT NN CC VBD PRP$ NN NNP


Ner Tag Result:

 O O I-PER O O O O O O O O I-PER


Raw Result:

 {
  "data": {
    "annotations": {
      "analytics": [
        {
          "idx": 0,
          "start": -1,
          "end": -1,
          "length": 12,
          "tokens": [
            {
              "idx": 0,
              "term": "The",
              "lemma": "the",
              "postag": "DT",
              "parent": -1,
              "children": [],
              "deptype": "",
              "nertag": "O"
            },
            {
              "idx": 1,
              "term": "man",
              "lemma": "man",
              "postag": "NN",
              "parent": -1,
              "children": [],
              "deptype

## Standard Pipeline
Returns the lemma, pos tag, parent, children and dep type.

### Example:

In [14]:
# Set our query type to annotations
query = tap.query('annotations')

# Set our pipeline parameter to clu, standard, fast or ner
params = '''{ "pipeType":"standard" }'''

# pass in some test data
string = "The man Mike jumped over a log and patted his dog Max"

# query the api
strResult = tap.analyse_text(query, string, params)

# build an array of all the lemma, pos parent, children and dep types for each word
lemSentence = []
posSentence = []
parentSentence = []
childSentence = []
depSentence = []

for sentence in strResult['data']['annotations']['analytics']:
    for token in sentence['tokens']:
        lemSentence.append(token['lemma'])
        posSentence.append(token['postag'])
        parentSentence.append(str(token['parent']))
        if len(token['children']) > 0:
            childSentence.append(str(token['children']))
        else:
            childSentence.append("0")
        depSentence.append(token['deptype'])

# Print Result
print("-" * 40)
print("Annotations:")
print("-" * 40)

print("Input Text:\n\n", string)

print("\n")
print("Lemma Result:\n\n", " ".join(lemSentence))

print("\n")
print("Pos Tag Result:\n\n", " ".join(posSentence))

print("\n")
print("Parents Result:\n\n", " ".join(parentSentence))

print("\n")
print("Children Result:\n\n", " ".join(childSentence))

print("\n")
print("Dep Result:\n\n", " ".join(depSentence))

print("\n")
print("Raw Result:\n\n", json.dumps(strResult, indent=2))

----------------------------------------
Annotations:
----------------------------------------
Input Text:

 The man Mike jumped over a log and patted his dog Max


Lemma Result:

 the man mike jump over a log and pat his dog max


Pos Tag Result:

 DT NN NNP VBD IN DT NN CC VBD PRP$ NN NNP


Parents Result:

 1 -2 3 1 3 6 4 3 3 10 8 10


Children Result:

 0 [0, 3] 0 [2, 4, 7, 8] [6] 0 [5] 0 [10] 0 [9, 11] 0


Dep Result:

 det  nsubj rcmod prep det pobj cc conj poss dobj appos


Raw Result:

 {
  "data": {
    "annotations": {
      "analytics": [
        {
          "idx": 0,
          "start": 0,
          "end": 12,
          "length": 12,
          "tokens": [
            {
              "idx": 0,
              "term": "The",
              "lemma": "the",
              "postag": "DT",
              "parent": 1,
              "children": [],
              "deptype": "det",
              "nertag": ""
            },
            {
              "idx": 1,
              "term": "man",


## Fast Pipeline
Just returns the lemma and pos tag

### Example:

In [15]:
# Set our query type to annotations
query = tap.query('annotations')

# Set our pipeline parameter to clu, standard, fast or ner
params = '''{ "pipeType":"fast" }'''

# pass in some test data
string = "The man Matthew jumped over a log and patted his dog Max"

# query the api
strResult = tap.analyse_text(query, string, params)


# build an array of all the lemma and pos terms for each word
lemSentence = []
posSentence = []

for sentence in strResult['data']['annotations']['analytics']:
    for token in sentence['tokens']:
        lemSentence.append(token['lemma'])
        posSentence.append(token['postag'])

# Print Result
print("-" * 40)
print("Annotations:")
print("-" * 40)

print("Input Text:\n\n", string)

print("\n")
print("Lemma Result:\n\n", " ".join(lemSentence))

print("\n")
print("Pos Tag Result:\n\n", " ".join(posSentence))

print("\n")
print("Raw Result:\n\n", json.dumps(strResult, indent=2))

----------------------------------------
Annotations:
----------------------------------------
Input Text:

 The man Matthew jumped over a log and patted his dog Max


Lemma Result:

 the man matthew jump over a log and pat his dog max


Pos Tag Result:

 DT NN NNP VBD IN DT NN CC VBD PRP$ NN NNP


Raw Result:

 {
  "data": {
    "annotations": {
      "analytics": [
        {
          "idx": 0,
          "start": 0,
          "end": 12,
          "length": 12,
          "tokens": [
            {
              "idx": 0,
              "term": "The",
              "lemma": "the",
              "postag": "DT",
              "parent": -1,
              "children": [],
              "deptype": "",
              "nertag": ""
            },
            {
              "idx": 1,
              "term": "man",
              "lemma": "man",
              "postag": "NN",
              "parent": -1,
              "children": [],
              "deptype": "",
              "nertag": ""
            },

## Ner Pipeline
Returns everything!
The lemma, pos tag, parent, children, dep type and ner tag.

### Example:

In [17]:
# Set our query type to annotations
query = tap.query('annotations')

# Set our pipeline parameter to clu, standard, fast or ner
params = '''{ "pipeType":"ner" }'''

# pass in some test data
string = "The man Matthew jumped over a log and patted his dog Max"

# query the api
strResult = tap.analyse_text(query, string, params)

# build an array of all the results
lemSentence = []
posSentence = []
parentSentence = []
childSentence = []
depSentence = []
nerSentence = []

for sentence in strResult['data']['annotations']['analytics']:
    for token in sentence['tokens']:
        lemSentence.append(token['lemma'])
        posSentence.append(token['postag'])
        parentSentence.append(str(token['parent']))
        if len(token['children']) > 0:
            childSentence.append(str(token['children']))
        else:
            childSentence.append("0")
        depSentence.append(token['deptype'])
        nerSentence.append(token['nertag'])





# Print Result

print("-" * 40)
print("Annotations:")
print("-" * 40)

print("Input Text:\n\n", string)

print("\n")
print("Lemma Result:\n\n", " ".join(lemSentence))

print("\n")
print("Pos Tag Result:\n\n", " ".join(posSentence))

print("\n")
print("Parents Result:\n\n", " ".join(parentSentence))

print("\n")
print("Children Result:\n\n", " ".join(childSentence))

print("\n")
print("Dep Result:\n\n", " ".join(depSentence))

print("\n")
print("Ner Result:\n\n", " ".join(nerSentence))

print("\n")
print("Raw Result:\n\n", json.dumps(strResult, indent=2))

----------------------------------------
Annotations:
----------------------------------------
Input Text:

 The man Matthew jumped over a log and patted his dog Max


Lemma Result:

 the man matthew jump over a log and pat his dog max


Pos Tag Result:

 DT NN NNP VBD IN DT NN CC VBD PRP$ NN NNP


Parents Result:

 1 -2 3 1 3 6 4 3 3 10 8 10


Children Result:

 0 [0, 3] 0 [2, 4, 7, 8] [6] 0 [5] 0 [10] 0 [9, 11] 0


Dep Result:

 det  nsubj rcmod prep det pobj cc conj poss dobj appos


Ner Result:

 O O PER O O O O O O O O PER


Raw Result:

 {
  "data": {
    "annotations": {
      "analytics": [
        {
          "idx": 0,
          "start": 0,
          "end": 12,
          "length": 12,
          "tokens": [
            {
              "idx": 0,
              "term": "The",
              "lemma": "the",
              "postag": "DT",
              "parent": 1,
              "children": [],
              "deptype": "det",
              "nertag": "O"
            },
            {
  