<a href = "https://www.pieriantraining.com"><img src="../PT Centered Purple.png"> </a>

<em style="text-align:center">Copyrighted by Pierian Training</em>

#  Data Connections Exercise

## Ask a Legal Research Assistant Bot about the US Constitution

Your function should do the following:

* Read the US_Constitution.txt file inside the some_data folder
* Split this into chunks (you choose the size)
* Write this to a ChromaDB Vector Store
* Use Context Compression to return the relevant portion of the document to the question

In [2]:
# Build a sample vectorDB

from langchain_community.document_loaders.text import TextLoader
from langchain_text_splitters.character import CharacterTextSplitter
from langchain_ollama.embeddings import OllamaEmbeddings
from langchain_chroma.vectorstores import Chroma
from langchain_ollama.chat_models import ChatOllama
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain.retrievers.document_compressors.chain_extract import LLMChainExtractor
from langchain_core.prompts.chat import HumanMessagePromptTemplate
from langchain_core.prompts.chat import ChatPromptTemplate

In [5]:
def us_constitution_helper(question):
    '''
    Takes in a question about the US Constitution and returns the most relevant
    part of the constitution. Notice it may not directly answer the actual question!
    
    Follow the steps below to fill out this function:
    '''
    # # PART ONE:
    # # LOAD "some_data/US_Constitution in a Document object
    # loader = TextLoader("some_data\\US_Constitution.txt")
    # docs = loader.load()

    # # PART TWO
    # # Split the document into chunks (you choose how and what size)
    # splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size = 500)
    # docs = splitter.split_documents(docs)

    # # PART THREE
    # # EMBED THE Documents (now in chunks) to a persisted ChromaDB
    embeddings = OllamaEmbeddings(model = "llama3.2:1b")
    # db = Chroma.from_documents(docs, embedding=embeddings, persist_directory="./USConstitutionVectors")
    # db._persist_directory
    db = Chroma(persist_directory='./USConstitutionVectors', embedding_function=embeddings)

    # PART FOUR
    # Use ChatOpenAI and ContextualCompressionRetriever to return the most
    # relevant part of the documents.
    chat = ChatOllama(model = "llama3.2:1b", temperature = 0)
    compressor = LLMChainExtractor.from_llm(chat)
    compressorRetriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=db.as_retriever())
    results = compressorRetriever.invoke(question)
    humanTemplate = HumanMessagePromptTemplate.from_template(template = "Answer the question\n {question} \n\nfrom the retrieved data\n {result}")
    template = ChatPromptTemplate.from_messages([humanTemplate])
    result = chat.invoke(template.format(question = question, result = results)).content
    return result
    

## Example Usage:

Notice how it doesn't return an entire Document of a large chunk size, but instead the "compressed" version!

In [6]:
print(us_constitution_helper("What is the 13th Amendment?"))

The 13th Amendment to the US Constitution is:

"The right of citizens of the United States, who are eighteen years of age or older, to vote shall not be denied or abridged by the United States or by any State on account of age, sex, color, or previous condition of servitude."
