# PDF processing with Unstructured and querying with HuggingChat

This sample notebook sends a PDF file to [Unstructured API services](https://docs.unstructured.io/api-reference/api-services/overview) for processing. Unstructured processes the PDF and extracts the PDF's content. The notebook then sends some of the content to [HuggingChat](https://huggingface.co/chat/), Hugging Face's open-source AI chatbot, along with some queries about this content.

## Step 1: Install the Unstructured and HuggingChat libraries

---



In [None]:
%pip install -q "unstructured"
%pip install -q hugchat

## Step 2: Set imports

---

In [None]:
from unstructured_client import UnstructuredClient
from unstructured_client.models import operations, shared

from hugchat import hugchat
from hugchat.login import Login

from google.colab import userdata

import json, os

## Step 3: Set your Unstructured API key and API URL

---

Get a key and URL:

- Pay-as-you-go unlimited version: https://docs.unstructured.io/api-reference/api-services/saas-api-development-guide#get-started
- Limited free version: https://docs.unstructured.io/api-reference/api-services/free-api#get-an-api-key

Set the following secrets:

- `UNSTRUCTURED_API_KEY` to your Unstructured API key.
- `UNSTRUCTURED_API_URL` to your Unstructured API URL.

To set these:

1. On the left sidebar, click the **Secrets** icon.
2. Enter each name/value pair above.
3. Switch on the **Notebook access** toggle for each name/value pair.

## Step 4: Set your Hugging Face account's email address and account password

---

Get a Hugging Face account: https://huggingface.co/join

Set the following secrets:

- `HUGGING_FACE_EMAIL` to your Hugging Face account's email address.
- `HUGGING_FACE_PASSWORD` to your Hugging Face account's password.

To set these:

1. On the left sidebar, click the **Secrets** icon.
2. Enter each name/value pair above.
3. Switch on the **Notebook access** toggle for each name/value pair.

## Step 5: Upload a PDF file for Unstructured to process

---

Upload a PDF file before continuing.

For example, you can run the following cell to upload a sample PDF file containing the text of the United States Constitution, from https://constitutioncenter.org/media/files/constitution.pdf, into Google Collab session storage.

Or, you can upload a different file into Google Collab session storage:

1. On the left sidebar, click the **Files** icon.
2. Click the **Upload to session storage** icon.

Then, provide the filename of the PDF file that was uploaded.

In [None]:
!wget https://constitutioncenter.org/media/files/constitution.pdf

--2024-08-08 20:48:25--  https://constitutioncenter.org/media/files/constitution.pdf
Resolving constitutioncenter.org (constitutioncenter.org)... 104.22.23.181, 104.22.22.181, 172.67.42.106, ...
Connecting to constitutioncenter.org (constitutioncenter.org)|104.22.23.181|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 413949 (404K) [application/pdf]
Saving to: ‘constitution.pdf’


2024-08-08 20:48:25 (7.82 MB/s) - ‘constitution.pdf’ saved [413949/413949]



In [None]:
input_filepath = "constitution.pdf"

## Step 6: Provide a function to log in to your Hugging Face account

---

In [None]:
def log_in_to_hugging_face() -> hugchat.ChatBot:
    sign = Login(email=userdata.get("HUGGING_FACE_EMAIL"), passwd=userdata.get("HUGGING_FACE_PASSWORD"))
    cookies = sign.login(cookie_dir_path="./cookies/")
    return hugchat.ChatBot(cookies=cookies.get_dict())

## Step 7: Process the PDF

---

This code:

1. Sends the PDF to Unstructured for processing. Unstructured then sends the processed data back.
2. Gathers all texts from the processed data that cover voting, such as texts that contain the strings "vote", "voted", and "voting".

In [None]:
print("Sending file to Unstructured for processing...")

client = UnstructuredClient(
    api_key_auth=userdata.get("UNSTRUCTURED_API_KEY"),
    server_url=userdata.get("UNSTRUCTURED_API_URL")
)

with open(input_filepath, "rb") as f:
    files = shared.Files(
        content=f.read(),
        file_name=input_filepath
    )

req = operations.PartitionRequest(
    shared.PartitionParameters(
        files=files,
        strategy=shared.Strategy.HI_RES
    )
)

print("Getting processed data back from Unstructured. This might take a minute...")
res = client.general.partition(request=req)

voting_texts = ""

print("Gathering texts...")

for element in res.elements:
    if "vot" in element["text"]:
        voting_texts += " " + element["text"]

print("Done.")

Sending file to Unstructured for processing...
Getting processed data back from Unstructured. This might take a minute...
Gathering texts...
Done.


## Step 8: Send a query to HuggingChat

---

This code:

1. Logs in to your Hugging Face account.
2. Sends the matching texts to HuggingChat along with some queries about the text.

In [None]:
print("Logging in to your Hugging Face account...")

chatbot = log_in_to_hugging_face()

print("Querying HuggingChat...")
print("\n-----\n")

req = f"Given the following information, what is the minimum voting age in the United States? {voting_texts}"
print(req)
print("\n-----\n")
print(chatbot.chat(text=req))

Logging in to your Hugging Face account...
Querying HuggingChat...

-----

Given the following information, what is the minimum voting age in the United States?  Every Bill which shall have passed the House of Represen- tatives and the Senate, shall, before it become a Law, be presented to the President of the United States; If he ap- prove he shall sign it, but if not he shall return it, with his Objections to that House in which it shall have originated, who shall enter the Objections at large on their Journal, and proceed to reconsider it. If after such Reconsideration two thirds of that House shall agree to pass the Bill, it shall be sent, together with the Objections, to the other House, by which it shall likewise be reconsidered, and if approved by two thirds of that House, it shall become a Law. But in all such Cases the Votes of both Houses shall be determined by Yeas and Nays, and the Names of the Persons voting for and against the Bill shall be entered on the Journal of each 

## Step 9: Send another related query to HuggingChat

---

This code makes another query that is related to the previous one.

In [None]:
print("Querying HuggingChat again...")
print("\n-----\n")

follow_up = "And when were women given the right to vote in the United States?"
print(follow_up)
print("\n-----\n")

print(chatbot.chat(text=follow_up))

Querying HuggingChat again...

-----

And when were women given the right to vote in the United States?

-----

According to the text, women were given the right to vote in the United States when the following provision was added:

"The right of citizens of the United States to vote shall not be denied or abridged by the United States or by any State on account of sex."

This provision, also known as the 19th Amendment to the United States Constitution, was ratified on August 18, 1920.
