#  Vector Store

Let's save our embeddings using Chroma and also discover new tools available for us.

----

#### Quick Note to users, check the GitHub if you have installation issues, since pydantic, langchain, and chroma sometimes are out of sync and need specific version numbers to play nicely together. There was an error occuring in June of 2023, installing this specific version of pydantic solved it, but make sure to double check GitHub issues first.

For example:
* https://github.com/hwchase17/langchain/issues/5113
* https://github.com/hwchase17/langchain/issues/7548
#### Also don't forget to restart your kernel!
---

In [None]:
!pip install chromadb
!pip install openai
!pip install langchain

Collecting chromadb
  Downloading chromadb-0.4.5-py3-none-any.whl (402 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/402.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.7/402.8 kB[0m [31m2.0 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m399.4/402.8 kB[0m [31m6.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m402.8/402.8 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Collecting pydantic<2.0,>=1.9 (from chromadb)
  Downloading pydantic-1.10.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting chroma-hnswlib==0.7.2 (from chromadb)
  Downloading chroma-hnswlib-0.7.2.tar.gz (31 kB)
  Installing build dependencies ... [?25l[?25hdon

In [None]:
# HERE ARE THE VERSION NUMBERS THAT WORKED FOR ME:
# CAREFUL WITH PYDANTIC, DO IT LAST SINCE CHROMA AND LANGCHAIN AUTO INSTALL IT AS A DEPENDENCY
# Use this to install specific versions numbers:
# !pip install package_name==0.3.26
import chromadb
print(chromadb.__version__)
import langchain
print(langchain.__version__)
import pydantic
print(pydantic.__version__)

0.4.5
0.0.264
1.10.12


In [None]:
import chromadb

In [None]:
!pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.4.0


In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader

In [None]:
# load the document and split it into chunks
loader = TextLoader("some_data/FDR_State_of_Union_1944.txt")
documents = loader.load()

In [None]:
# load the document and split it into chunks
# split it into chunks
# load it into Chroma

# load the document and split it into chunks
# split it into chunks


In [None]:
# split it into chunks
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=500)
docs = text_splitter.split_documents(documents)

### Connect to OpenAI for Embeddings

In [None]:
import os

In [None]:
embedding_function = OpenAIEmbeddings()

### Pass Embeddings and Docs into Chroma

In [None]:
# load it into Chroma
# similarilty same place load
db = Chroma.from_documents(docs, embedding_function,persist_directory='./speech_embedding_db')

### Save the new embeddings to disk

In [None]:
# Helpful to force a save
db.persist()

### Load Embeddings from Disk

In [None]:
db_connection = Chroma(persist_directory='./speech_embedding_db/',embedding_function=embedding_function)

In [None]:
# WATCH THE VIDEO TO TRULY UNDERSTAND WHY YOU MAY NOT WANT TO DO DIRECT QUESTIONS!
new_doc = "What did FDR say about the cost of food law?"

In [None]:
docs = db_connection.similarity_search(new_doc)

In [None]:
print(docs[0].page_content)

That is the way to fight and win a war—all out—and not with half-an-eye on the battlefronts abroad and the other eye-and-a-half on personal, selfish, or political interests here at home.

Therefore, in order to concentrate all our energies and resources on winning the war, and to maintain a fair and stable economy at home, I recommend that the Congress adopt:

(1) A realistic tax law—which will tax all unreasonable profits, both individual and corporate, and reduce the ultimate cost of the war to our sons and daughters. The tax bill now under consideration by the Congress does not begin to meet this test.

(2) A continuation of the law for the renegotiation of war contracts—which will prevent exorbitant profits and assure fair prices to the Government. For two long years I have pleaded with the Congress to take undue profits out of war.

(3) A cost of food law—which will enable the Government (a) to place a reasonable floor under the prices the farmer may expect for his production; and

## Add New Document

In [None]:
# load the document and split it into chunks
loader = TextLoader("some_data/Lincoln_State_of_Union_1862.txt")
documents = loader.load()

In [None]:
# split it into chunks
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=500)
docs = text_splitter.split_documents(documents)



In [None]:
# load it into Chroma
db = Chroma.from_documents(docs, embedding_function,persist_directory='./speech_embedding_db')

In [None]:
docs = db.similarity_search('slavery')

In [None]:
docs[0].page_content

'As to the second article, I think it would be impracticable to return to bondage the class of persons therein contemplated. Some of them, doubtless, in the property sense belong to loyal owners, and hence provision is made in this article for compensating such. The third article relates to the future of the freed people. It does not oblige, but merely authorizes Congress to aid in colonizing such as may consent. This ought not to be regarded as objectionable on the one hand or on the other, insomuch as it comes to nothing unless by the mutual consent of the people to be deported and the American voters, through their representatives in Congress.\n\nI can not make it better known than it already is that I strongly favor colonization; and yet I wish to say there is an objection urged against free colored persons remaining in the country which is largely imaginary, if not sometimes malicious.\n\nIt is insisted that their presence would injure and displace white labor and white laborers. 

### Collection class calls: https://docs.trychroma.com/reference/Collection

In [None]:
# help(db._collection)