<a href = "https://www.pieriantraining.com"><img src="../PT Centered Purple.png"> </a>

<em style="text-align:center">Copyrighted by Pierian Training</em>

#  Data Connections Exercise

## Ask a Legal Research Assistant Bot about the US Constitution

Your function should do the following:

* Read the US_Constitution.txt file inside the some_data folder
* Split this into chunks (you choose the size)
* Write this to a ChromaDB Vector Store
* Use Context Compression to return the relevant portion of the document to the question

In [9]:
# Build a sample vectorDB
from langchain.vectorstores import Chroma
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor 

from dotenv import load_dotenv
load_dotenv()


True

In [14]:
text_loader = TextLoader("./some_data/US_Constitution.txt")
document = text_loader.load()

In [15]:
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=250)
docs = text_splitter.split_documents(document)

Created a chunk of size 252, which is longer than the specified 250
Created a chunk of size 333, which is longer than the specified 250
Created a chunk of size 472, which is longer than the specified 250
Created a chunk of size 312, which is longer than the specified 250


In [17]:
embedding_function = OpenAIEmbeddings()
db_conn = Chroma.from_documents(docs, embedding_function, persist_directory="./us_constitution.db")

In [22]:
def us_constitution_helper(question):
    '''
    Takes in a question about the US Constitution and returns the most relevant
    part of the constitution. Notice it may not directly answer the actual question!
    
    Follow the steps below to fill out this function:
    '''
    # PART ONE:
    # LOAD "some_data/US_Constitution in a Document object
    text_loader = TextLoader("./some_data/US_Constitution.txt")
    document = text_loader.load()
    
    # PART TWO
    # Split the document into chunks (you choose how and what size)
    text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=350)
    docs = text_splitter.split_documents(document)
    
    # PART THREE
    # EMBED THE Documents (now in chunks) to a persisted ChromaDB
    embedding_function = OpenAIEmbeddings()
    db_conn = Chroma.from_documents(docs, embedding_function, persist_directory="./us_constitution.db")

    # PART FOUR
    # Use ChatOpenAI and ContextualCompressionRetriever to return the most
    # relevant part of the documents.
    llm = ChatOpenAI(temperature=0)
    compressor = LLMChainExtractor.from_llm(llm)

    compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, 
                                                       base_retriever=db_conn.as_retriever())
    
    
    compressed_docs = compression_retriever.invoke(question)
     

    return compressed_docs[0].page_content

## Example Usage:

Notice how it doesn't return an entire Document of a large chunk size, but instead the "compressed" version!

In [23]:
print(us_constitution_helper("What is the 13th Amendment?"))

Created a chunk of size 472, which is longer than the specified 350


13th Amendment
Section 1
Neither slavery nor involuntary servitude, except as a punishment for crime whereof the party shall have been duly convicted, shall exist within the United States, or any place subject to their jurisdiction.


In [34]:
print(us_constitution_helper("What is the 5th Amendment?"))

Created a chunk of size 472, which is longer than the specified 350


No person shall be held to answer for a capital, or otherwise infamous crime, unless on a presentment or indictment of a Grand Jury, except in cases arising in the land or naval forces, or in the Militia, when in actual service in time of War or public danger; nor shall any person be subject for the same offence to be twice put in jeopardy of life or limb; nor shall be compelled in any criminal case to be a witness against himself, nor be deprived of life, liberty, or property, without due process of law; nor shall private property be taken for public use, without just compensation.
