## Embedding with Ollama, Nomic and pgVector

In [10]:
from dotenv import load_dotenv
from langchain.document_loaders import TextLoader
from langchain.embeddings import OllamaEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

load_dotenv()


True

In [11]:
loader = TextLoader("./../data/state_of_the_union.txt", encoding="utf-8")
documents = loader.load()


In [12]:
print(len(documents))

1


In [13]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
texts = text_splitter.split_documents(documents)

In [5]:
print(texts[0])

page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.' metadata={'source': './../dat

In [14]:
embeddings = OllamaEmbeddings(model="nomic-embed-text")

Add only the first 5 text chunks to prevent high costs when calling OpenAI

In [15]:
doc_vectors = embeddings.embed_documents([t.page_content for t in texts[:5]])

In [32]:
print(doc_vectors[0])

[-0.008386312773095336, 0.005388822271053543, -0.005175667219131078, -0.015213930596887646, 0.026671006605063118, -0.02657775141536619, 0.006191483543129288, 0.0365294208879733, 0.02701738408830606, 0.020462870083395795, 0.008386312773095336, 0.07401803694846067, 0.015200608959115257, -0.031866657677836606, -0.06543855204576272, -0.004819298973220121, -0.023926637465244788, 0.0026661014911072553, -0.05363510041698942, -0.016119839218312164, -0.0466276358266571, 0.03847445823383643, -0.02719057096728243, -0.01163692556868314, 0.03938036685526095, 0.000887590321749613, 0.02792329270973058, 0.02232797573997949, 0.042444470202777446, 0.029708464814389733, 0.03013477398691211, 0.023793415499585586, 0.0277634256058815, -0.004139867972813008, 0.041778358511836346, 0.027123960915775378, 0.034930761025353106, 0.015040742786588726, 0.018744309892888788, 0.009265577187652524, -0.029069000124283616, 0.03570344581847332, 0.01276264959525504, 0.010684389352410156, 0.038074794199503936, 0.01851783273

### Add pgVector

In [16]:
from langchain.vectorstores.pgvector import PGVector
# # Alternatively, you can create it from environment variables.
# import os

# CONNECTION_STRING = PGVector.connection_string_from_db_params(
#     driver=os.environ.get("PGVECTOR_DRIVER", "psycopg2"),
#     host=os.environ.get("PGVECTOR_HOST", "localhost"),
#     port=int(os.environ.get("PGVECTOR_PORT", "5432")),
#     database=os.environ.get("PGVECTOR_DATABASE", "postgres"),
#     user=os.environ.get("PGVECTOR_USER", "postgres"),
#     password=os.environ.get("PGVECTOR_PASSWORD", "postgres"),
# )
CONNECTION_STRING = "postgresql://testuser:testpwd@localhost:5432/vectordb"
COLLECTION_NAME = "state_of_the_union_ollama"

In [17]:
db = PGVector.from_documents(embedding=embeddings, documents=texts, collection_name=COLLECTION_NAME, connection_string=CONNECTION_STRING)

In [18]:
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db.similarity_search_with_score(query)

In [19]:
for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)

--------------------------------------------------------------------------------
Score:  0.491921087179255
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. 

A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. 

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. 

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  

We’re putting in place dedicated immigration judg