# Jupyter Demo : Parsr API Access

This demo provides a demo showing how one can process a document (pdf or image) using the Parsr pipeline's API interface to generate various outputs.

## Module Import

In [1]:
import parsr_api

## Send document for processing

In [2]:
p = parsr_api.ParserApi('localhost:3001')

In [3]:
job = p.sendDocument('./sampleFile.pdf', './sampleConfig.json')
jobId = job['server_response']
job

{'file': './sampleFile.pdf',
 'config': './sampleConfig.json',
 'status_code': 202,
 'server_response': '4d9349e056e83282210aec991c655c'}

## Query the queue for status

In [4]:
p.getStatus(jobId)

{'request_id': '4d9349e056e83282210aec991c655c',
 'server_response': '{"id":"4d9349e056e83282210aec991c655c","json":"/api/v1/json/4d9349e056e83282210aec991c655c","csv":"/api/v1/csv/4d9349e056e83282210aec991c655c","text":"/api/v1/text/4d9349e056e83282210aec991c655c","markdown":"/api/v1/markdown/4d9349e056e83282210aec991c655c"}'}

## Get the Raw Text output

In [5]:
txt_output = p.getText(jobId)['server_response']

## Get the Markdown output

In [6]:
md_output = p.getMarkdown(jobId)['server_response']

## Get the full JSON output

In [7]:
json_output = p.getJson(jobId)['server_response']

## Interpret the JSON output

In [8]:
import parsr_output_interpreter as poi
pa = poi.ParsrOutputInterpreter(json_output)

### Get all the text on Page 1

In [9]:
pa.getTexts(page_number=1)

'DocumentParsing\n\nADocumentParsingsystem\n\nOfficialWebsite(workinprogress)\n\nhttps://axatechlab.github.io/AXA-AEL-pdfparser/\n\nAPI\n\nTostarttheAPIsever,justrun:npmrunstart:apiThedocumentationishere.\n\nBinarydependenciesforLinuxandMacOSX\n\nWeuseqpf,mupdf-tols,imagemagickandpdf2jsontodoprocespdffiles,extractfontsandconvertpdftojsonstructure.Youmustinstallthistoolsonyourmachinepriortousedocparser.pacman-Sqpdfmupdf-toolspdf2jsonimagemagickapt-getinstallqpdfpdf2jsonimagemagickOnOSX:brewinstallqpdfmupdf-toolspdf2jsonimagemagick\n\n#ArchLinux#Debianbasedlinuxdistro\n\nTesseract\n\nhttps://github.com/tesseract-ocr/tesseract/Onlyusedifyougiveanimagetothepipeline.\n\nDuckling\n\nFollowthisguide:https://github.com/facebook/duckling#duckling-\n\nDependencies(Windows)\n\nWerecommandusingChocolateytoinstalldependencies.Itmakesthingsmuchmoreeasiertomanage.\n\n1\n\n'

### Get tables on the first page as pandas dataframes - TODO

In [10]:
# pa.getTables(page_number=1)