In [13]:
#pip install vector database
import chromadb
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_community.document_loaders import TextLoader
import os
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

#load document, split into chunks

#embedding, embed chunks into vectors

#vector chunks, save to chromadb

#"query", similarity search chromadb

In [14]:
loader = TextLoader(file_path="./some_data/FDR_State_of_Union_1944.txt")

documents = loader.load()

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=500)

docs = text_splitter.split_documents(documents)

In [None]:
#view
docs

In [15]:
embedding_function = OpenAIEmbeddings(api_key=api_key)

#create new database entry-- (list of documents, embedding function, directory that it goes in)
db = Chroma.from_documents(docs,embedding_function,persist_directory="./speech_new_db")

In [16]:
#reload embeddings
db_new_connection = Chroma(persist_directory="./speech_new_db", embedding_function=embedding_function)

In [18]:
#query for a doc
new_doc = "cost of food law, FDR"

similar_docs = db_new_connection.similarity_search(new_doc)

print(similar_docs[0].page_content)

(2) A continuation of the law for the renegotiation of war contractsâ€”which will prevent exorbitant profits and assure fair prices to the Government. For two long years I have pleaded with the Congress to take undue profits out of war.

(3) A cost of food lawâ€”which will enable the Government (a) to place a reasonable floor under the prices the farmer may expect for his production; and (b) to place a ceiling on the prices a consumer will have to pay for the food he buys. This should apply to necessities only; and will require public funds to carry out. It will cost in appropriations about one percent of the present annual cost of the war.

(4) Early reenactment of. the stabilization statute of October, 1942. This expires June 30, 1944, and if it is not extended well in advance, the country might just as well expect price chaos by summer.

(5) A national service law- which, for the duration of the war, will prevent strikes, and, with certain appropriate exceptions, will make available

In [19]:
#add a new doc to chromadb

loader = TextLoader(file_path="./some_data/Lincoln_State_of_Union_1862.txt")
documents = loader.load()

docs = text_splitter.split_documents(documents)

Created a chunk of size 611, which is longer than the specified 500
Created a chunk of size 539, which is longer than the specified 500
Created a chunk of size 686, which is longer than the specified 500


In [20]:
db_new = Chroma.from_documents(docs,embedding_function,persist_directory="./speech_new_db")

In [21]:
docs = db_new.similarity_search("slavery")
print(docs[0].page_content)

As to the second article, I think it would be impracticable to return to bondage the class of persons therein contemplated. Some of them, doubtless, in the property sense belong to loyal owners, and hence provision is made in this article for compensating such. The third article relates to the future of the freed people. It does not oblige, but merely authorizes Congress to aid in colonizing such as may consent. This ought not to be regarded as objectionable on the one hand or on the other, insomuch as it comes to nothing unless by the mutual consent of the people to be deported and the American voters, through their representatives in Congress.

I can not make it better known than it already is that I strongly favor colonization; and yet I wish to say there is an objection urged against free colored persons remaining in the country which is largely imaginary, if not sometimes malicious.

It is insisted that their presence would injure and displace white labor and white laborers. If th

In [22]:
docs = db_new.similarity_search("cost of food law")
print(docs[0].page_content)

(2) A continuation of the law for the renegotiation of war contractsâ€”which will prevent exorbitant profits and assure fair prices to the Government. For two long years I have pleaded with the Congress to take undue profits out of war.

(3) A cost of food lawâ€”which will enable the Government (a) to place a reasonable floor under the prices the farmer may expect for his production; and (b) to place a ceiling on the prices a consumer will have to pay for the food he buys. This should apply to necessities only; and will require public funds to carry out. It will cost in appropriations about one percent of the present annual cost of the war.

(4) Early reenactment of. the stabilization statute of October, 1942. This expires June 30, 1944, and if it is not extended well in advance, the country might just as well expect price chaos by summer.

(5) A national service law- which, for the duration of the war, will prevent strikes, and, with certain appropriate exceptions, will make available

In [25]:
#pass in vector stores as retriever objects -- as_retriever() method

type(db_new)

retriever = db_new.as_retriever()
results = retriever.invoke("cost of food law")

print(results)

[Document(metadata={'source': './some_data/FDR_State_of_Union_1944.txt'}, page_content='(2) A continuation of the law for the renegotiation of war contractsâ€”which will prevent exorbitant profits and assure fair prices to the Government. For two long years I have pleaded with the Congress to take undue profits out of war.\n\n(3) A cost of food lawâ€”which will enable the Government (a) to place a reasonable floor under the prices the farmer may expect for his production; and (b) to place a ceiling on the prices a consumer will have to pay for the food he buys. This should apply to necessities only; and will require public funds to carry out. It will cost in appropriations about one percent of the present annual cost of the war.\n\n(4) Early reenactment of. the stabilization statute of October, 1942. This expires June 30, 1944, and if it is not extended well in advance, the country might just as well expect price chaos by summer.\n\n(5) A national service law- which, for the duration o