# Documentation

**Source**
- https://www.youtube.com/watch?v=FDBnyJu_Ndg
- https://python.langchain.com/docs/integrations/vectorstores/pgvector
- https://github.com/pgvector/pgvector
- https://www.timescale.com/blog/how-to-build-llm-applications-with-pgvector-vector-store-in-langchain/

**Dependencies**
```
!pip install langchain
!pip install openai 
!pip install tiktoken
!pip install psycopg-binary pgvector
!pip install langchain_openai
```


In [4]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from dotenv import load_dotenv

load_dotenv()

True

In [38]:
import os
from pathlib import Path
project_path = Path(os.getcwd())
text_file = str(project_path/"story.txt")

In [39]:
text_file

'/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt'

# Load text file

In [40]:
loader = TextLoader(text_file, encoding="utf-8")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
texts

[Document(page_content='Title: The Lost Temple of Eldoria', metadata={'source': '/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt'}),
 Document(page_content='In the heart of the dense jungle, where vines draped like curtains and ancient trees whispered', metadata={'source': '/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt'}),
 Document(page_content='secrets, lay the Lost Temple of Eldoria. Legends spoke of its hidden treasures, guarded by traps', metadata={'source': '/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt'}),
 Document(page_content='and mystical creatures. Many had ventured into the jungle, but few had returned, and those who did', metadata={'source': '/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt'}),
 Document(page_content='were never the same.', metadata={'source': '/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt'}),
 Document(page_content='Among those drawn to the tales was a daring explorer named Amelia Rhodes. With her trusty map i

**Extra steps**

1. Pre-processing in Dataframe to add more metadata
- Process the text by splitting it between title, text, url
- Cast it into Dataframe
- Use `langchain.document_loaders.DataFrameLoader` to save as docs
- Save in Vector db (PGvector, etc)
- db as retriever
- pass to LLM for user query


# Embed character

In [41]:
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
embeddings = OpenAIEmbeddings(api_key=OPENAI_API_KEY)
vector = embeddings.embed_query("Testing embedding model")
len(vector) # dimension

1536

In [42]:
doc_vector = embeddings.embed_documents([t.page_content for t in texts[:5]])
len(doc_vector)

5

# Setup PgVector and Postgres

1. docker pull pgvector image

`docker pull pgvector/pgvector:pg16`

`docker run --name pgvector-demo -e POSTGRES_PASSWORD=test -p 5432:5432 pgvector/pgvector:pg16`


2. docker pull pgadmin interface

`docker pull dpage/pgadmin4`

`docker run --name my-pgadmin -p 82:80 -e 'PGADMIN_DEFAULT_EMAIL=ammar@yahoo.com' -e 'PGADMIN_DEFAULT_PASSWORD=pass123' -d dpage/pgadmin4`

3. Login PGadmin4 thru http://localhost:82/
4. Create server group. 
5. Register server
     - Ensure to check "IP address" of `pgvector` container inspect as this is the Hostname to register server. ie `"IPAddress": "172.17.0.3",`
     - Connect to the pgvector database server.

**Source**
- https://www.commandprompt.com/education/how-to-run-postgresql-and-pgadmin-using-docker/
- https://bugbytes.io/posts/vector-databases-pgvector-and-langchain/

In [27]:
USERNAME = os.environ["USERNAME"]
PASSWORD = os.environ["PASSWORD"]
CONNECTION_STRING = f"postgresql+psycopg2://{USERNAME}:{PASSWORD}@localhost:5432/vector_db"
COLLECTION_NAME = 'state_of_union_vector'

In [43]:
from langchain.vectorstores.pgvector import PGVector
db = PGVector.from_documents(embedding=embeddings, 
                documents=texts, 
                collection_name=COLLECTION_NAME, 
                connection_string=CONNECTION_STRING)

# Query and Similarity Score

In [47]:
query = "Tell me about the journey."

similarity = db.similarity_search_with_score(query, k=2)
for doc in similarity:
    print(doc)

(Document(page_content='hand and determination in her heart, she embarked on a journey to find the fabled temple.', metadata={'source': '/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt'}), 0.18795442222189263)
(Document(page_content='Their journey led them to a chamber bathed in golden light, where a pedestal stood at its center.', metadata={'source': '/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt'}), 0.19075841848595365)


In [49]:
print(embeddings.embed_query(query))

[0.029227439596707826, -0.015222624091460054, 0.008323860435199469, -0.004884195095584081, -0.010254219273526361, 0.021557826757243197, -0.009068797430159457, -0.012618583749045927, -0.020923011207215583, -0.013434776234878668, 0.0008623454570325733, 0.009787824099520557, -0.006497145693235875, -0.026467932644087457, 0.025496276342016226, -0.007267993479456042, 0.03733753449061162, -0.011698749029495054, 0.011342474369099041, 0.0027595055556364927, -0.019446093777078927, -0.020249329703128347, 0.0018267149747942392, 0.0033230665461139366, -0.004346544896366817, 0.012527896212465946, 0.022140821656073034, -0.009496326650105637, 0.024783728883869368, -0.007643700651220574, -0.009075275710051116, 0.009425072090555469, -0.0038510000239856497, 0.007902809495144946, -0.013978903317003716, -0.030600713861803765, -0.0013400768059415205, -0.013862304523502265, 0.013732750101540079, -0.009081753058620192, 0.02731003638684167, -0.008537625045172558, -0.004838851327294091, -0.00015617360046253105,

- The above syntax is similar with the following SQL query
```
SELECT document, (embedding <=> '[0.029227439596707826, -0.015222624091460054, 0.008323860435199469, -0.004884195095584081, -0.010254219273526361, 0.021557826757243197, ..., -0.02267199404105993]')
AS cosine_distance
FROM public.langchain_pg_embedding
ORDER BY cosine_distance
LIMIT 2
```

# LLM inference

In [50]:
retriever = db.as_retriever(search_kwargs={"k":3})

In [54]:
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

In [58]:
llm = ChatOpenAI(api_key=OPENAI_API_KEY, temperature=0.7)
llm_qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    return_source_documents=True,  # return source document
    verbose=True
)

user_query = "Tell me the plot twist of the story. Explain it like I am 5 years old"
response = llm_qa(user_query)
print(response, response["source_documents"])

  warn_deprecated(




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
{'query': 'Tell me the plot twist of the story. Explain it like I am 5 years old', 'result': "In the story, there are some people who are exploring a jungle. They find a special temple, but then they realize it was actually a trap! The walls of the temple start to move and change, and it's really scary. This is a plot twist because it's a surprising and unexpected turn of events that they didn't see coming.", 'source_documents': [Document(page_content='In the heart of the dense jungle, where vines draped like curtains and ancient trees whispered', metadata={'source': '/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt'}), Document(page_content='Realization dawned upon them as they realized they had fallen into a trap, one set by the temple', metadata={'source': '/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt'}), Document(page_content='shook beneath their feet. With horror, they watched as the walls b

In [62]:
for metadata in response["source_documents"]:
    print(metadata.metadata["source"])

/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt
/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt
/home/ammar/LEARNING/2024/pg_vector_langchain/story.txt
