### Tap Client Instructions
The first thing you want to do to get started using the TAP Python Client is to ensure you have a version of TAP running that you can connect to.


### Running TAP
The easiest way to do this is by following one of the Quick Start examples [here](https://heta-io.github.io/tap/overview/quick_start.html#get-started-with-docker) 

#### Get Started with the TAP Client
Once you have TAP running be sure to note the IP it is running on.

### Import Tap
First we want to import tap using pip

In [1]:
!pip install 'tapclipy>=0.1.8'



### Connect To Tap
Let's import TAP and get a reference to it.

You will need to enter the url of your TAP instance and include the port number.

You can print the tap graphql url to double check you are connected correctly.

In [2]:
from tapclipy import tap_connect

# Create TAP Connection
tap = tap_connect.Connect('http://tap.hi2lab.io')
print(tap.url())

http://tap.hi2lab.io/graphql


### Grab the schema
Now we can grab the TAP schema and print it out.

In [3]:
tap.fetch_schema()
print("----------------------------------------------") 
for query,type in tap.schema_query_name_types().items():
    print("{} >> {}".format(query, type))
print("----------------------------------------------") 

----------------------------------------------
clean >> StringResult
annotations >> SentencesResult
vocabulary >> VocabResult
metrics >> MetricsResult
posStats >> PosStatsResult
syllables >> SyllablesResult
spelling >> SpellingResult
expressions >> ExpressionsResult
reflectExpressions >> ReflectExpressionsResult
affectExpressions >> AffectExpressionsResult
moves >> StringListResult
batch >> BatchResult
----------------------------------------------


These are the different queries that are availible as of now. and the resulting return type.

### Run your first query

Let's run our first query.

For this demo we will use 'metrics.

To see what the query looks like, you can call print on it.

In [4]:
query = tap.query('metrics')
print("-" * 40)
print("Query:\n", query)
print("-" * 40)

----------------------------------------
Query:
 
query Metrics($input: String,$parameters:String) { 
    metrics(text:$input,parameters:$parameters) {
        analytics {
            words
            sentences
            sentWordCounts
            averageSentWordCount 
        }
        querytime
        message
        timestamp    
    }
}    

----------------------------------------


Now that we have our query, We need some text to analyse.

Let's just use a string for now, Then we can use the analyse_text function provided by the tap client and pass in our query.

This will return json, So we can import our json module to make the output a bit cleaner to read.

Take look at the result and compare it to the query above, You can see how the  schema matches up.

In [5]:
import json
string = "This is a very small test of TAP. It should produce some metrics on these two sentences! I can't wait"
strResult = tap.analyse_text(query, string)
print("-" * 40)
print("Result:\n", json.dumps(strResult, indent=2))
print("-" * 40)

----------------------------------------
Result:
 {
  "data": {
    "metrics": {
      "analytics": {
        "words": 21,
        "sentences": 3,
        "sentWordCounts": [
          8,
          9,
          4
        ],
        "averageSentWordCount": 7
      },
      "querytime": 22,
      "message": "",
      "timestamp": "2019-01-02T01:56:44.559394Z"
    }
  }
}
----------------------------------------


### Querying with a txt file
The above example demonstrates how to query using a string, Let's say you have a txt file you wish to analyse with TAP.

Let's take a look at how that would work.

First let's load a txt file. For this example we are just using a txt file filled with lorem ipsum dummy text.

In [6]:
with open('dummyText.txt', 'r') as myfile:
    txtData=myfile.read().replace('\n', '')
print(txtData)

What is Lorem Ipsum?Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.Why do we use it?It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their 

Now we can run a query with it. We still have our query we wrote before.

In [7]:
txtResult = tap.analyse_text(query, txtData)
print("-" * 40)
print("Result:\n", json.dumps(txtResult, indent=2))
print("-" * 40)

----------------------------------------
Result:
 {
  "data": {
    "metrics": {
      "analytics": {
        "words": 518,
        "sentences": 25,
        "sentWordCounts": [
          4,
          12,
          33,
          17,
          31,
          5,
          24,
          31,
          32,
          18,
          5,
          11,
          19,
          41,
          26,
          15,
          18,
          17,
          31,
          5,
          32,
          27,
          24,
          24,
          16
        ],
        "averageSentWordCount": 20.72
      },
      "querytime": 109,
      "message": "",
      "timestamp": "2019-01-02T01:56:44.732293Z"
    }
  }
}
----------------------------------------


### querying a docx file
First let's import some handy modules to make working with docx files easier.

In [8]:
!pip install python-docx
from docx import Document

Collecting python-docx
[?25l  Downloading https://files.pythonhosted.org/packages/00/ed/dc8d859eb32980ccf0e5a9b1ab3311415baf55de208777d85826a7fb0b65/python-docx-0.8.7.tar.gz (5.4MB)
[K    100% |████████████████████████████████| 5.4MB 2.8MB/s eta 0:00:01    15% |█████▏                          | 870kB 7.4MB/s eta 0:00:01    64% |████████████████████▊           | 3.5MB 10.4MB/s eta 0:00:01    99% |████████████████████████████████| 5.4MB 12.0MB/s eta 0:00:01
[?25hCollecting lxml>=2.3.2 (from python-docx)
[?25l  Downloading https://files.pythonhosted.org/packages/03/a4/9eea8035fc7c7670e5eab97f34ff2ef0ddd78a491bf96df5accedb0e63f5/lxml-4.2.5-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
[K    100% |████████████████████████████████| 5.8MB 2.4MB/s eta 0:00:01   8% |██▋                             | 471kB 11.7MB/s eta 0:00:01    19% |██████▏                         | 1.1MB 12.9MB/s eta 0:00:01    61% |███████████████████▉            | 3.6MB 6.5MB/s eta 0:00:01    75% |████████████████████████  

The process is essentially the same a txt file, just the way we read the file is different.

First we open the file, Convert it to a document, Then we loop the paragraphs and add the text to a string.

finally we can print the entire document as one string.

Now we can run a query on it.

In [9]:
f = open('dummyText.docx', 'rb')
document = Document(f)
docData = ""
for p in document.paragraphs:
    docData += p.text

print(docData)
f.close()

What is Lorem Ipsum?Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.Why do we use it?It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their 

In [10]:
docResults = tap.analyse_text(query, docData)
print("-" * 40)
print("Result:\n", json.dumps(docResults, indent=2))
print("-" * 40)

----------------------------------------
Result:
 {
  "data": {
    "metrics": {
      "analytics": {
        "words": 518,
        "sentences": 25,
        "sentWordCounts": [
          4,
          12,
          33,
          17,
          31,
          5,
          24,
          31,
          32,
          18,
          5,
          11,
          19,
          41,
          26,
          15,
          18,
          17,
          31,
          5,
          32,
          27,
          24,
          24,
          16
        ],
        "averageSentWordCount": 20.72
      },
      "querytime": 60,
      "message": "",
      "timestamp": "2019-01-02T01:56:51.054561Z"
    }
  }
}
----------------------------------------


### Querying a pdf file
Now let's see if we can run data on a pdf file.

First lets import a handy package to handle this for us.

In [11]:
!pip install tika
from tika import parser

Collecting tika
  Downloading https://files.pythonhosted.org/packages/10/75/b566e446ffcf292f10c8d84c15a3d91615fe3d7ca8072a17c949d4e84b66/tika-1.19.tar.gz
Building wheels for collected packages: tika
  Running setup.py bdist_wheel for tika ... [?25ldone
[?25h  Stored in directory: /home/jovyan/.cache/pip/wheels/b4/db/8a/3a3f0c0725448eaa92703e3dda71e29dc13a119ff6c1036848
Successfully built tika
Installing collected packages: tika
Successfully installed tika-1.19


tika is a super easy module that works with python 3 and windows and allows us to read pdf files.

First we parse our pdf file from the document then we print the raw contents and remove any extra lines.

In [12]:
raw = parser.from_file('dummyText.pdf')
pdfContent = raw['content'].replace('\n', '')
print(pdfContent)

2019-01-02 01:56:54,757 [MainThread  ] [INFO ]  Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar to /tmp/tika-server.jar.
2019-01-02 01:57:05,999 [MainThread  ] [INFO ]  Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar.md5 to /tmp/tika-server.jar.md5.
2019-01-02 01:57:06,895 [MainThread  ] [WARNI]  Failed to see startup log message; retrying...


What is Lorem Ipsum? Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. Why do we use it? It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as the

Lastly we can run the same query again.

In [13]:
pdfResult = tap.analyse_text(query, pdfContent)
print("-" * 40)
print("Result:\n", json.dumps(pdfResult, indent=2))
print("-" * 40)

----------------------------------------
Result:
 {
  "data": {
    "metrics": {
      "analytics": {
        "words": 537,
        "sentences": 29,
        "sentWordCounts": [
          4,
          12,
          33,
          17,
          31,
          5,
          24,
          31,
          32,
          18,
          5,
          11,
          19,
          41,
          26,
          15,
          18,
          17,
          31,
          5,
          32,
          27,
          24,
          24,
          16,
          4,
          5,
          5,
          5
        ],
        "averageSentWordCount": 18.517241379310345
      },
      "querytime": 72,
      "message": "",
      "timestamp": "2019-01-02T01:57:23.530922Z"
    }
  }
}
----------------------------------------


### Summary
Great, So as you can see you can run queries on almost any type of text you wish!

If you have any specific needs or want to see anything added be sure to let us know!

Lastly let's print out all the results.

In [14]:
print("-" * 40)
print("STR Result:\n", json.dumps(strResult, indent=2))
print("-" * 40)
print("TXT Result:\n", json.dumps(txtResult, indent=2))
print("-" * 40)
print("Docx Results:\n", json.dumps(docResults, indent=2))
print("-" * 40)
print("PDF Results:\n", json.dumps(pdfResult, indent=2))
print("-" * 40)

----------------------------------------
STR Result:
 {
  "data": {
    "metrics": {
      "analytics": {
        "words": 21,
        "sentences": 3,
        "sentWordCounts": [
          8,
          9,
          4
        ],
        "averageSentWordCount": 7
      },
      "querytime": 22,
      "message": "",
      "timestamp": "2019-01-02T01:56:44.559394Z"
    }
  }
}
----------------------------------------
TXT Result:
 {
  "data": {
    "metrics": {
      "analytics": {
        "words": 518,
        "sentences": 25,
        "sentWordCounts": [
          4,
          12,
          33,
          17,
          31,
          5,
          24,
          31,
          32,
          18,
          5,
          11,
          19,
          41,
          26,
          15,
          18,
          17,
          31,
          5,
          32,
          27,
          24,
          24,
          16
        ],
        "averageSentWordCount": 20.72
      },
      "querytime": 109,
      "message":