# Jupyter Demo : Parsr API Access

This demo provides a demo showing how one can process a document (pdf or image) using the Parsr pipeline's API interface to generate various outputs.

## Module Import

In [1]:
import parsr_api

## Send document for processing

In [2]:
p = parsr_api.ParserApi('localhost:3001')

In [3]:
job = p.send_document('./sampleFile.pdf', './sampleConfig.json')
jobId = job['server_response']
job

{'file': './sampleFile.pdf',
 'config': './sampleConfig.json',
 'status_code': 202,
 'server_response': '2b95967f2b1120ca597343a0f18cdc'}

## Query the queue for status

In [4]:
p.get_status(jobId)

{'request_id': '2b95967f2b1120ca597343a0f18cdc',
 'server_response': '{"id":"2b95967f2b1120ca597343a0f18cdc","json":"/api/v1/json/2b95967f2b1120ca597343a0f18cdc","csv":"/api/v1/csv/2b95967f2b1120ca597343a0f18cdc","text":"/api/v1/text/2b95967f2b1120ca597343a0f18cdc","markdown":"/api/v1/markdown/2b95967f2b1120ca597343a0f18cdc"}'}

## Get the Raw Text output

In [5]:
txt_output = p.get_text(jobId)['server_response']

## Get the Markdown output

In [6]:
md_output = p.get_markdown(jobId)['server_response']

## Get the full JSON output

In [7]:
json_output = p.get_json(jobId)['server_response']

## Interpret the JSON output

In [8]:
import parsr_output_interpreter as p
pa = p.ParsrOutputInterpreter(json_output)

### Get all the text on Page 1

In [9]:
pa.get_texts(page_number=1)

'Document Parsing\n\nA Document Parsing system\n\nOfficial Website (work in progress)\n\nhttps://axatechlab.github.io/AXA-AEL-pdfparser/\n\nAPI\n\nTo start the API server, just run:\n\nnpm run start:api\n\nThe documentation is <a href="docs/api.html">here.</a>\n\nBinary dependencies for Linux and Mac OS X\n\nWe use qpdf, mupdf-tools, imagemagick and pdf2json to do process pdf files,extract fonts and convert pdf to json structure. your machine prior to use docparser.\n\nYou must install this tools on\n\npacman -S qpdf mupdf-tools pdf2json imagemagickapt-get install \n\nOn OS X:\n\nqpdf pdf2json imagemagick\n\n# # \n\nbrew install \n\nTesseract\n\nqpdf mupdf-tools pdf2json \n\n<a href="https://github.com/tesseract-ocr/tesseract/">https://github.com/tesseract-ocr/tesseract/</a>\n\nimagemagick\n\nOnly used if you give an image to the pipeline.\n\nDuckling\n\nArch LinuxDebian based \n\nFollow this guide: <a href="https://github.com/facebook/duckling#duckling-">https://github.com/facebook/du