# Comparing two PDFs with AWS Bedrock

In this post we'll go over an extension of the typical app to chat with a PDF. Here we will specifically take in two PDFs, and compare them using a custom prompt. We'll also configure it to let the user decide if they only want to chat with one or both documents. 

To get started, make sure you have the AWS command line installed, streamlit, faiss-cpu (for vectorstores), langchain and boto3. In AWS, you will also need to get access to a foundational model of your choice in AWS Bedrock. Keep in mind that it might not be free, so be careful with the use case (though it didn't cost me anything for this app).

The first thing to do of course is create a app.py file (or whatever you want to name it), and load in the imports. Also instantiate bedrock and bedrock embeddings.

In [None]:
import boto3
import streamlit as st

from langchain_community.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate


## Bedrock Clients
bedrock=boto3.client(service_name="bedrock-runtime")
bedrock_embeddings=BedrockEmbeddings(model_id="amazon.titan-embed-text-v1",client=bedrock)


Now let's go over a function to get the data. In streamlit later on you will see we let users upload their PDFs, so we will use PyPDFLoader to process it.

In [None]:
def data_ingestion(pdf):
    with open("temp_pdf.pdf", "wb") as f:
        f.write(pdf.read())

    loader = PyPDFLoader("temp_pdf.pdf")
    documents=loader.load()

    # - in our testing Character split works better with this PDF data set
    text_splitter=RecursiveCharacterTextSplitter(chunk_size=10000,
                                                 chunk_overlap=1000)
    
    docs=text_splitter.split_documents(documents)
    return docs