# PDF Parsing example

This example shows how to use the PDF parsing skill to extract text from a PDF file.

First you have to start the broker and the pdf parsing skill. You can do this by running the following commands in the terminal:

```bash
make ENV=main build
pip install .

brokerio skills build pd3f
brokerio skills run pd3f --url "http://broker:4852" --network "nlp_api_main_default"
```

Then you can run the following code to extract text from a PDF file.

In [None]:
# Running commands locally or install latest nlp-broker from Github
!pip install brokerio

In [None]:
# Imports and setup
from brokerio.client import Client, ClientTimeoutException
import os

broker_url = "http://127.0.0.1:4852"

In [None]:
# initialize client
client = Client(broker_url)

# start client (starts new thread and connect to the broker)
client.start()

# Let's clear the queue and look what skills are available
client.clear()
print("Skills: {}".format(client.skills))

In [None]:
# Get config for the pdf parsing skill
client.put({
    "event": "skillGetConfig",
    "data": {
        "name": "pd3f"
    }
})
results = client.wait_for_event("skillConfig", timeout=10)
if results:
    print(results['data'])
else:
    print("Timeout!")

In [None]:
# load and send pdf as base64
import base64
pdf_path = "2023.acl-demo.28v2.pdf"
with open(pdf_path, "rb") as f:
    pdf = base64.b64encode(f.read()).decode("utf-8")

skill = "pd3f"
event = 'skillRequest'
message_id = "test"
config = {
    "return_stats": True
}
data = {"pdf": pdf}
timeout = 10

try:
    result = client.request(skill, data, message_id, config=config, timeout=timeout)
    print(result)
except ClientTimeoutException as e:
    print(e)