Next steps:

 - implement document management

 - [implement re-ranking](https://www.pinecone.io/learn/series/rag/rerankers/)

 - Hyde: HyDE stands for Hypothetical Document Embeddings. It consists of two steps. First, it creates a hypothetical answer to a user query. Once the hypothetical answer/document is determined, the answer and the query are transformed into embeddings. Then, the system retrieves the documents closest to the embeddings in the vector space.

 - ensemble chunking

 - https://luv-bansal.medium.com/advance-rag-improve-rag-performance-208ffad5bb6a

 - Replacing pronouns with names in split chunks can enhance semantic significance during retrieval.

 - Metadata based vetor-store filters. The current user has access to which of the retrieved vectors.

 - [Chunk enrichment phase (post chunking)](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/rag/rag-enrichment-phase)

## Quickstart

### Langchain Chatmodel

When a string is passed in as input, it is converted to a HumanMessage and then passed to the underlying model.

LangChain does not host any Chat Models, rather we rely on third party integrations.

We have some standardized parameters when constructing ChatModels:

model: the name of the model
temperature: the sampling temperature
timeout: request timeout
max_tokens: max tokens to generate
stop: default stop sequences
max_retries: max number of times to retry requests
api_key: API key for the model provider
base_url: endpoint to send requests to

### Prompt templates

Prompt templates help to translate user input and parameters into instructions for a language model. This can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.

Prompt Templates output a PromptValue. This PromptValue can be passed to an LLM or a ChatModel, and can also be cast to a string or a list of messages. The reason this PromptValue exists is to make it easy to switch between strings and messages.

 - #### ChatPromptTemplates

  These prompt templates are used to format a list of messages. These "templates" consist of a list of templates themselves. For example, a common way to construct and use a ChatPromptTemplate is as follows:

```python
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "Tell me a joke about {topic}")
])

prompt_template.invoke({"topic": "cats"})
```

#### MessagePlaceholder


This prompt template is responsible for adding a list of messages in a particular place. In the above ChatPromptTemplate, we saw how we could format two messages, each one a string. But what if we wanted the user to pass in a list of messages that we would slot into a particular spot? This is how you use MessagesPlaceholder.

```python
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage

prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    MessagesPlaceholder("msgs")
])

prompt_template.invoke({"msgs": [HumanMessage(content="hi!")]})
```

### Chat history

Most LLM applications have a conversational interface. An essential component of a conversation is being able to refer to information introduced earlier in the conversation. At bare minimum, a conversational system should be able to access some window of past messages directly.

The concept of `ChatHistory` refers to a class in LangChain which can be used to wrap an arbitrary chain. This `ChatHistory` will keep track of inputs and outputs of the underlying chain, and append them as messages to a message database. Future interactions will then load those messages and pass them into the chain as part of the input.

## Trying it out

### Install libraries

In [None]:
!pip install numpy==1.26.0 --force-reinstall

Collecting numpy==1.26.0
  Downloading numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (58 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/58.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.5/58.5 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.2/18.2 MB[0m [31m58.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.26.4
    Uninstalling numpy-1.26.4:
      Successfully uninstalled numpy-1.26.4
Successfully installed numpy-1.26.0


In [None]:
!pip install langchain
!pip install -qU langchain-groq
!pip install langchain_community

In [None]:
!pip install -qU langchain-text-splitters

In [None]:
!pip install sentence-transformers==2.2.2 --force-reinstall

In [20]:
!pip install InstructorEmbedding

Collecting InstructorEmbedding
  Obtaining dependency information for InstructorEmbedding from https://files.pythonhosted.org/packages/6c/fc/64375441f43cc9ddc81f76a1a8f516e6d63f5b6ecb67fffdcddc0445f0d3/InstructorEmbedding-1.0.1-py2.py3-none-any.whl.metadata
  Downloading InstructorEmbedding-1.0.1-py2.py3-none-any.whl.metadata (20 kB)
Downloading InstructorEmbedding-1.0.1-py2.py3-none-any.whl (19 kB)
Installing collected packages: InstructorEmbedding
Successfully installed InstructorEmbedding-1.0.1



[notice] A new release of pip is available: 23.2.1 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
!pip install InstructorEmbedding
!pip install langchain_chroma
!pip install -qU faiss-cpu
!pip install --upgrade --quiet pymupdf

### Create LLM groq provider

In [1]:
import getpass
import os
from langchain_groq import ChatGroq

In [2]:
os.environ["GROQ_API_KEY"] = getpass.getpass()

 ········


In [54]:
del model

In [3]:
LLAMA_8B = "llama3-8b-8192"
LLAMA_70B = "llama3-70b-8192"
GEMMA2_9B = "gemma2-9b-it"

model = ChatGroq(model=LLAMA_8B)

### Imports

In [4]:
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains import LLMChain
from langchain_core.runnables import RunnableSequence, RunnableMap, RunnableParallel
from langchain_core.runnables.history import RunnableWithMessageHistory

from langchain_core.chat_history import (
    BaseChatMessageHistory,
    InMemoryChatMessageHistory,
)
from langchain_core.messages import SystemMessage, trim_messages
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
from langchain_chroma import Chroma

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

## Basic chatbot

```python
system_msg = """"
You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use three sentences maximum and keep the answer concise.",
"""

prompt = ChatPromptTemplate.from_messages(
  [
    (
      "system",
      system_msg,
    ),
    ("user", "Context: {context}\n\nQuestion: {question}")
  ]
)

chain = prompt | model
```




```python
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)
```

```
with_message_history_args = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages",
)
```

### Refs

 - https://python.langchain.com/v0.2/docs/integrations/chat/google_generative_ai/

 - https://python.langchain.com/v0.2/docs/integrations/llms/google_vertex_ai_palm/

 - https://api.python.langchain.com/en/latest/chat_models/langchain_google_genai.chat_models.ChatGoogleGenerativeAI.html#langchain_google_genai.chat_models.ChatGoogleGenerativeAI

 -

## RAG

$ \large \text{RAG is a technique for augmenting LLM knowledge with additional data.} $

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).

A typical RAG application has two main components:

**Indexing**: a pipeline for ingesting data from a source and indexing it. This usually happens offline.

**Retrieval and generation**: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

### Indexing

 1. **Load**: First we need to load our data. This is done with [Document Loaders](https://python.langchain.com/v0.2/docs/concepts/#document-loaders).

 2. **Split**: [Text splitters](https://python.langchain.com/v0.2/docs/concepts/#text-splitters) break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.

 At a high level, text splitters work as following:

  - Split the text up into small, semantically meaningful chunks (often sentences).

  - Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).

  - Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).

 3. **Store**: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a [VectorStore](https://python.langchain.com/v0.2/docs/concepts/#vector-stores) and [Embeddings model](https://python.langchain.com/v0.2/docs/concepts/#embedding-models).

 The Embeddings class is a class designed for interfacing with text embedding models. There are many different embedding model providers (OpenAI, Cohere, Hugging Face, etc) and local models, and this class is designed to provide a standard interface for all of them.

 > The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).

### Retrieval and generation

 4. Retrieve: Given a user input, relevant splits are retrieved from storage using a [Retriever](https://python.langchain.com/v0.2/docs/concepts/#retrievers).

 5. Generate: A ChatModel / LLM produces an answer using a prompt that includes the question and the retrieved data

In [5]:
from langchain_community.document_loaders import PyMuPDFLoader

### Loading document

In [6]:
def read_pdf(path):
  loader = PyMuPDFLoader(
      path
  )
  doc = loader.load()
  return doc

In [7]:
data = read_pdf("./A_Dance_With_Dragons.pdf")
len(data[0].page_content)



0

In [8]:
data[-5].page_content

'—CASPOR HILL, HUMFREY STONE, MALO JAYN, DICK COLE, WILL COLE, LORIMAS MUDD, JON \nLOTHSTON, LYMOND PEASE, SER BRENDEL BYRNE, DUNCAN STRONG, DENYS STRONG, CHAINS, YOUNG \nJOHN MUDD, serjeants of the company, \n—{SER AEGOR RIVERS, called BITTERSTEEL}, a bastard son of King Aegon IV Targaryen, founder \nof the company}, \n—{MAELYS I BLACKFYRE, called MAELYS THE MONSTROUS}, captain-general of the company, \npretender to the Iron Throne of Westeros, member of the Band of Nine, slain during the War of the \nNinepenny Kings, \nTHE WINDBLOWN, two thousand horse and foot, sworn to Yunkai, \n—THE TATTERED PRINCE, a former nobleman of the Free City of Pentos, captain and founder, \n—CAGGO, called CORPSEKILLER, his right hand, \n—DENZO D’HAN, the warrior bard, his left hand, \n—HUGH HUNGERFORD, serjeant, former company paymaster, fined three fingers for stealing, \n—SER ORSON STONE, SER LUCIFER LONG, WILL OF THE WOODS, DICK STRAW, GINJER JACK, \nWesterosi sellswords, \n—PRETTY MERIS, the company 

In [9]:
data_filtered = data[7:-2]

In [10]:
len(data_filtered)

885

In [11]:
print("First page, starting content: ")
print(data_filtered[0].page_content[:500])

First page, starting content: 
PROLOGUE 
The night was rank with the smell of man. 
The warg stopped beneath a tree and sniffed, his grey-brown fur dappled by shadow. A sigh of 
piney wind brought the man-scent to him, over fainter smells that spoke of fox and hare, seal and stag, 
even wolf. Those were man-smells too, the warg knew; the stink of old skins, dead and sour, near 
drowned beneath the stronger scents of smoke and blood and rot. Only man stripped the skins from 
other beasts and wore their hides and hair. 
Wargs h


In [12]:
print("Last page, end content: ")
print(data_filtered[-1].page_content)

Last page, end content: 
—KASPORIO, called KASPORIO THE CUNNING, a bravo, second-in-command, 
—TYBERO ISTARION, called INKPOTS, company paymaster, 
—HAMMER, a drunken blacksmith and armorer, 
—his apprentice, called NAIL, 
—SNATCH, a serjeant, one-handed, 
—KEM, a young sellsword, from Flea Bottom, 
—BOKKOKO, an axeman of formidable repute, 
—UHLAN, a serjeant of the company, 
THE STORMCROWS, five hundred horse-riders, sworn to Queen Daenerys, 
—DAAERIO NAHARIS, captain and commander, 
—THE WIDOWER, his second-in-command, 



### Splitting into chunks

In [13]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

We set `add_start_index=True` so that the character index at which each split Document starts within the initial Document is preserved as metadata attribute “start_index”.

In [14]:
def split_text(doc):
  text_splitter = RecursiveCharacterTextSplitter(
      chunk_size=1000, chunk_overlap=200, add_start_index=True
  )
  all_splits = text_splitter.split_documents(doc)
  return all_splits


all_splits = split_text(data_filtered)
len(all_splits)

3173

### Understanding chunks

Note the `start_index`

In [14]:
from langchain_core.documents import Document

In [None]:
all_splits

In [25]:
all_splits[3].metadata

{'source': './A_Dance_With_Dragons.pdf',
 'file_path': './A_Dance_With_Dragons.pdf',
 'page': 8,
 'total_pages': 894,
 'format': 'PDF 1.5',
 'title': 'A Dance With Dragons - A Song of Ice and Fire',
 'author': 'George R R Martin',
 'subject': 'nothuman',
 'keywords': 'A Game of Thrones; A Song of Ice and Fire; pdf, nothuman',
 'creator': 'Microsoft® Office Word 2007',
 'producer': 'Microsoft® Office Word 2007',
 'creationDate': 'D:20110711230404Z',
 'modDate': "D:20190506190144+02'00'",
 'trapped': '',
 'start_index': 0}

In [20]:
all_splits[-1].metadata

{'source': './A_Dance_With_Dragons.pdf',
 'file_path': './A_Dance_With_Dragons.pdf',
 'page': 891,
 'total_pages': 894,
 'format': 'PDF 1.5',
 'title': 'A Dance With Dragons - A Song of Ice and Fire',
 'author': 'George R R Martin',
 'subject': 'nothuman',
 'keywords': 'A Game of Thrones; A Song of Ice and Fire; pdf, nothuman',
 'creator': 'Microsoft® Office Word 2007',
 'producer': 'Microsoft® Office Word 2007',
 'creationDate': 'D:20110711230404Z',
 'modDate': "D:20190506190144+02'00'",
 'trapped': '',
 'start_index': 0}

In [22]:
all_splits[-1]

Document(metadata={'source': './A_Dance_With_Dragons.pdf', 'file_path': './A_Dance_With_Dragons.pdf', 'page': 891, 'total_pages': 894, 'format': 'PDF 1.5', 'title': 'A Dance With Dragons - A Song of Ice and Fire', 'author': 'George R R Martin', 'subject': 'nothuman', 'keywords': 'A Game of Thrones; A Song of Ice and Fire; pdf, nothuman', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': 'D:20110711230404Z', 'modDate': "D:20190506190144+02'00'", 'trapped': '', 'start_index': 0}, page_content='—KASPORIO, called KASPORIO THE CUNNING, a bravo, second-in-command, \n—TYBERO ISTARION, called INKPOTS, company paymaster, \n—HAMMER, a drunken blacksmith and armorer, \n—his apprentice, called NAIL, \n—SNATCH, a serjeant, one-handed, \n—KEM, a young sellsword, from Flea Bottom, \n—BOKKOKO, an axeman of formidable repute, \n—UHLAN, a serjeant of the company, \nTHE STORMCROWS, five hundred horse-riders, sworn to Queen Daenerys, \n—DAAERIO NAHARIS, c

In [16]:
from hashlib import md5

In [35]:
assign_id(all_splits[0])

Document(id='ced5253083a2d06e52962cfe5495233f', metadata={'source': './A_Dance_With_Dragons.pdf', 'file_path': './A_Dance_With_Dragons.pdf', 'page': 7, 'total_pages': 894, 'format': 'PDF 1.5', 'title': 'A Dance With Dragons - A Song of Ice and Fire', 'author': 'George R R Martin', 'subject': 'nothuman', 'keywords': 'A Game of Thrones; A Song of Ice and Fire; pdf, nothuman', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': 'D:20110711230404Z', 'modDate': "D:20190506190144+02'00'", 'trapped': '', 'start_index': 0, 'role': 'registered_user'}, page_content='PROLOGUE \nThe night was rank with the smell of man. \nThe warg stopped beneath a tree and sniffed, his grey-brown fur dappled by shadow. A sigh of \npiney wind brought the man-scent to him, over fainter smells that spoke of fox and hare, seal and stag, \neven wolf. Those were man-smells too, the warg knew; the stink of old skins, dead and sour, near \ndrowned beneath the stronger scen

In [34]:
def assign_id(doc):
    meta = doc.metadata
    source = meta["source"]
    page = meta["page"]
    start_index = meta["start_index"]

    _id = bytes(f"{source}:{page}:{start_index}", 'utf-8')
    h = md5(_id)
    doc.id = h.hexdigest()
    doc.metadata["role"] = "registered_user"

    return doc

In [36]:
splits_with_id = list(map(assign_id , all_splits))

In [37]:
splits_with_id[-1]

Document(id='1c91c33d2a52bd7eb1ee5cdb139190f1', metadata={'source': './A_Dance_With_Dragons.pdf', 'file_path': './A_Dance_With_Dragons.pdf', 'page': 891, 'total_pages': 894, 'format': 'PDF 1.5', 'title': 'A Dance With Dragons - A Song of Ice and Fire', 'author': 'George R R Martin', 'subject': 'nothuman', 'keywords': 'A Game of Thrones; A Song of Ice and Fire; pdf, nothuman', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': 'D:20110711230404Z', 'modDate': "D:20190506190144+02'00'", 'trapped': '', 'start_index': 0, 'role': 'registered_user'}, page_content='—KASPORIO, called KASPORIO THE CUNNING, a bravo, second-in-command, \n—TYBERO ISTARION, called INKPOTS, company paymaster, \n—HAMMER, a drunken blacksmith and armorer, \n—his apprentice, called NAIL, \n—SNATCH, a serjeant, one-handed, \n—KEM, a young sellsword, from Flea Bottom, \n—BOKKOKO, an axeman of formidable repute, \n—UHLAN, a serjeant of the company, \nTHE STORMCROWS, five hu

### Create embeddings

In [15]:
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

In [39]:
import numpy as np
np.__version__

'1.26.0'

In [None]:
model_name = "hkunlp/instructor-large"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}

hkunlp_embeddings = HuggingFaceInstructEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

In [16]:
mpnet = "sentence-transformers/all-mpnet-base-v2"
minilm = "sentence-transformers/all-MiniLM-L6-v2"

embeddings = HuggingFaceEmbeddings(model_name=mpnet)

  from tqdm.autonotebook import tqdm, trange


### Store in vector database

#### [Chroma ref](https://python.langchain.com/api_reference/chroma/vectorstores/langchain_chroma.vectorstores.Chroma.html)

In [17]:
import langchain_community
langchain_community.__version__

'0.3.5'

In [18]:
from langchain_chroma import Chroma
from langchain_community.vectorstores import FAISS

In [43]:
print(all_splits[21].page_content)

they came. Varamyr gave them his seed, took a hank of their hair to remember them by, and sent them 
back. From time to time, some village hero would come with spear in hand to slay the beastling and save 
a sister or a lover or a daughter. Those he killed, but he never harmed the women. Some he even 
blessed with children. Runts. Small, puny things, like Lump, and not one with the gift. 
Fear drove him to his feet, reeling. Holding his side to staunch the seep of blood from his 
wound, Varamyr lurched to the door and swept aside the ragged skin that covered it to face a wall of 
white. Snow. No wonder it had grown so dark and smoky inside. The falling snow had buried the hut. 
When Varamyr pushed at it, the snow crumbled and gave way, still soft and wet. Outside, the 
night was white as death; pale thin clouds danced attendance on a silver moon, while a thousand stars 
watched coldly. He could see the humped shapes of other huts buried beneath drifts of snow, and


In [44]:
print(all_splits[32].page_content)

Varamyr knew the truth of that. When he claimed the eagle that had been Orell’s, he could feel 
the other skinchanger raging at his presence. Orell had been slain by the turncloak crow Jon Snow, and 
his hate for his killer had been so strong that Varamyr found himself hating the beastling boy as well. He 
had known what Snow was the moment he saw that great white direwolf stalking silent at his side. One 
skinchanger can always sense another. Mance should have let me take the direwolf. There would be a 
second life worthy of a king. He could have done it, he did not doubt. The gift was strong in Snow, but 
the youth was untaught, still fighting his nature when he should have gloried in it. 
  
Varamyr could see the weirwood’s red eyes staring down at him from the white trunk. The gods 
are weighing me. A shiver went through him. He had done bad things, terrible things. He had stolen,


In [45]:
print(all_splits[33].page_content)

Varamyr could see the weirwood’s red eyes staring down at him from the white trunk. The gods 
are weighing me. A shiver went through him. He had done bad things, terrible things. He had stolen, 
killed, raped. He had gorged on human flesh and lapped the blood of dying men as it gushed red and hot 
from their torn throats. He had stalked foes through the woods, fallen on them as they slept, clawed 
their entrails from their bellies and scattered them across the muddy earth. How sweet their meat had 
tasted. “That was the beast, not me,” he said in a hoarse whisper. “That was the gift you gave me.”


In [46]:
print(all_splits[34].page_content)

The gods made no reply. His breath hung pale and misty in the air. He could feel ice forming in 
his beard. Varamyr Sixskins closed his eyes. 
  
He dreamt an old dream of a hovel by the sea, three dogs whimpering, a woman’s tears. 
  
Bump. She weeps for Bump, but she never wept for me. 
  
Lump had been born a month before his proper time, and he was sick so often that no one 
expected him to live. His mother waited until he was almost four to give him a proper name, and by 
then it was too late. The whole village had taken to calling him Lump, the name his sister Meha had 
given him when he was still in their mother’s belly. Meha had given Bump his name as well, but Lump’s 
little brother had been born in his proper time, big and red and robust, sucking greedily at Mother’s 
teats. She was going to name him after Father. Bump died, though. He died when he was two and I was 
six, three days before his nameday.


In [47]:
%%time
chroma_vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="./chroma_langchain_db")

CPU times: total: 1h 47min 52s
Wall time: 32min 19s


In [19]:
%%time
faiss_vectorstore = FAISS.from_documents(documents=all_splits, embedding=embeddings)

CPU times: total: 1h 46min 37s
Wall time: 32min 28s


Retriever: An object that returns Documents given a text query

In [32]:
q_s = [
    "Why according to Dornish law, Myrcella has stronger claim on the Iron Throne?",
    "Who is the current King, Lord of the seven kingdoms, protector of the realm?",
    "Who is his wife?",
    "Who drank his way across the narrow sea?",
    "What did Varamyr lose at the Wall?",
    "From time to time, some village hero would come with spear in hand in order to slay the beastling, to save whom?",
    "How many times had Varamyr died?",
    "How did Oberyn die?",
    "Who had Orell been slain by?",
    "What was the name of crow who had slain Orell?",
    "What do Jon Snow and Tormund Giantsbane talk about",
]

##### Using `similarity_search_with_relevance_scores`

In [49]:
similar_docs = chroma_vectorstore.similarity_search_with_relevance_scores(q_s[0], k=4, score_threshold=0.1)

In [50]:
docs = [similar_doc[0].page_content for similar_doc in similar_docs]
joiner = """

============================================================================================================================================

"""

print(joiner.join(docs))

“Preparations should be made for Princess Myrcella.” 
  
“This is what comes of dealing with the Dornish,” Mace Tyrell said. “Surely a better match can 
be found for the girl?” 
  
Such as your own son Willas, perhaps? Her disfigured by one Dornishman, him crippled by 
another? “No doubt,” Ser Kevan said, “but we have enemies enough without offending Dorne. If Doran 
Martell were to join his strength to Connington’s in support of this feigned dragon, things could go very 
ill for all of us.” 
  
“Mayhaps we can persuade our Dornish friends to deal with Lord Connington,” Ser Harys Swyft 
said with an irritating titter. “That would save a deal of blood and trouble.”


their husbands and lovers. Better to live shamed than die proud, Ser Kevan told himself. “My niece will 
make no further mischief,” he promised Mace Tyrell. “You have my word on that, my lord.” 
  
Tyrell gave a grudging nod. “As you say. My Margaery prefers to be tried by the Faith, so the 
whole realm can bear witness to 

##### Using retriever

In [51]:
retriever = chroma_vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5}, score_threshold=0.9)

retrieved_docs = retriever.invoke(q_s[5])

len(retrieved_docs)

5

In [52]:
docs = [retrieved_doc.page_content for retrieved_doc in retrieved_docs]

print("\n=============================================================================================\n".join(docs))

they came. Varamyr gave them his seed, took a hank of their hair to remember them by, and sent them 
back. From time to time, some village hero would come with spear in hand to slay the beastling and save 
a sister or a lover or a daughter. Those he killed, but he never harmed the women. Some he even 
blessed with children. Runts. Small, puny things, like Lump, and not one with the gift. 
Fear drove him to his feet, reeling. Holding his side to staunch the seep of blood from his 
wound, Varamyr lurched to the door and swept aside the ragged skin that covered it to face a wall of 
white. Snow. No wonder it had grown so dark and smoky inside. The falling snow had buried the hut. 
When Varamyr pushed at it, the snow crumbled and gave way, still soft and wet. Outside, the 
night was white as death; pale thin clouds danced attendance on a silver moon, while a thousand stars 
watched coldly. He could see the humped shapes of other huts buried beneath drifts of snow, and
The hero leaned into 

### Reranking

In [57]:
from langchain.retrievers.document_compressors import FlashrankRerank

In [59]:
ranker = FlashrankRerank(top_n=6)

In [130]:
question = "Why did Tyrion Lannister kill his father?"

In [131]:
similar_docs = chroma_vectorstore.similarity_search_with_relevance_scores(question, k=20, score_threshold=0.05)

In [132]:
sim = [(doc[0].page_content, doc[1]) for doc in similar_docs]
print(joiner.join([s[0] for s in sim[:5]]))

When he was still a lonely child in the depths of Casterly Rock, he oft rode dragons through the 
nights, pretending he was some lost Targaryen princeling, or a Valyrian dragonlord soaring high o’er 
fields and mountains. Once, when his uncles asked him what gift he wanted for his name-day, he 
begged them for a dragon. “It wouldn’t need to be a big one. It could be little, like I am.” His uncle 
Gerion thought that was the funniest thing he had ever heard, but his uncle Tygett said, “The last dragon 
died a century ago, lad.” That had seemed so monstrously unfair that the boy had cried himself to sleep 
that night. 
  
Yet if the lord of cheese could be believed, the Mad King’s daughter had hatched three living 
dragons. Two more than even a Targaryen should require. Tyrion was almost sorry that he had killed his 
father. He would have enjoyed seeing Lord Tywin’s face when he learned that there was a Targaryen


known Tyrion Lannister, briefly. He took my hand and named me friend. It 

In [133]:
reranked_docs = ranker.compress_documents([doc[0] for doc in similar_docs], question)

In [134]:
[{"relevance_score": doc.metadata["relevance_score"], "page_content": doc.page_content} for doc in reranked_docs]

[{'relevance_score': 0.9996016,
  'page_content': 'known Tyrion Lannister, briefly. He took my hand and named me friend. It was hard to believe the little \nman had it in him to murder his own sire, but the fact of Lord Tywin’s demise seemed to be beyond \ndoubt. “The lion in King’s Landing is a cub, and the Iron Throne has been known to cut grown men to \nribbons.” \n  \n“A boy he may be, my lord, but … King Robert was well loved, and most men still accept that \nTommen is his son. The more they see of Lord Stannis the less they love him, and fewer still are fond of \nLady Melisandre with her fires and this grim red god of hers. They complain.” \n  \n“They complained about Lord Commander Mormont too. Men love to complain about their \nwives and lords, he told me once. Those without wives complain twice as much about their lords.” Jon \nSnow glanced toward the stockade. Two walls were down, a third falling fast. “I will leave you to finish \nhere, Bowen. Make certain every corpse is bu

In [66]:
from langchain.retrievers import ContextualCompressionRetriever

In [67]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=ranker, base_retriever=retriever
)

### Retrieval

In [68]:
from langchain_core.output_parsers import StrOutputParser

In [69]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [22]:
prompt = ChatPromptTemplate.from_messages(
  [
    (
      "system",
      "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. Use three sentences maximum and keep the answer concise.",
    ),
    ("user", "Context: {context}\n\nQuestion: {question}")
  ]
)

 - `retriever | format_docs` passes the question through the retriever, generating Document objects, and then to `format_docs` to generate strings;

 - `RunnablePassthrough()` passes through the input question unchanged. It is a Runnable to passthrough inputs unchanged or with additional keys.



The following chain can be composed as:

```python
context_question_map = RunnableMap({
    "context": RunnableSequence([retriever, format_docs]),
    "question": RunnablePassthrough()
})

rag_chain = RunnableSequence([
    context_question_map,
    prompt,
    model,
    StrOutputParser()
])
```

In [96]:
rag_chain = (
    {"context": compression_retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [135]:
for chunk in rag_chain.stream(question):
  print(chunk, end="", flush=True)

INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"


Tyrion killed his father, Lord Tywin Lannister, while he was sitting on the privy, shooting a crossbow bolt through his bowels.

### Using create stuff document chain

In [128]:
response = rag_chain.invoke(question)
print(response)

INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 0.908081 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"


Jon Snow and Tormund Giantsbane discuss a letter sent by Ramsay Bolton, demanding that Jon hand over his wildling princess, a wildling babe, and his Reek.


In [80]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

In [136]:
qa_system_prompt = (
  "You are an assistant for question-answering tasks. "
  "Use the following pieces of retrieved context to answer "
  "the question. If you don't know the answer, say that you "
  "don't know. Use five sentences maximum and try to keep the "
  "answer concise."
  "\n\n"
  "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages(
  [
    ("system", qa_system_prompt),
    ("human", "{input}"),
  ]
)

 - `create_stuff_documents_chain` specifies how retrieved context is fed into a prompt and LLM. In this case, we will "stuff" the contents into the prompt -- i.e., *we will include all retrieved context without any summarization or other processing.*

 It largely implements our above `rag_chain`, with input keys context and input-- it generates an answer using retrieved context and query.

 - `create_retrieval_chain` adds the retrieval step and propagates the retrieved context through the chain, providing it alongside the final answer. It has input key input, and includes input, context, and answer in its output.

In [137]:
question_answer_chain = create_stuff_documents_chain(model, qa_prompt)
qa_rag_chain = create_retrieval_chain(compression_retriever, question_answer_chain)

In [138]:
response = qa_rag_chain.invoke({"input": question})
print(response["answer"])

INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"


According to the text, Tyrion Lannister killed his father, Lord Tywin Lannister, by shooting him with a crossbow bolt while he was sitting on a privy. Tyrion claims he did it to see if Lord Tywin's feces would turn to gold, as was a long-standing rumor about the Lannister family, but ultimately found it to be a myth.


### Returning sources

In [88]:
for document in response["context"]:
  print(document.page_content)
  print()

Kingdoms will never be more ripe for conquest than they are right now. A boy king sits the Iron Throne. 
The north is in chaos, the riverlands a devastation, a rebel holds Storm’s End and Dragonstone. When 
winter comes, the realm will starve. And who remains to deal with all of this, who rules the little king 
who rules the Seven Kingdoms? Why, my own sweet sister. There is no one else. My brother, Jaime, 
thirsts for battle, not for power. He’s run from every chance he’s had to rule. My uncle Kevan would 
make a passably good regent if someone pressed the duty on him, but he will never reach for it. The 
gods shaped him to be a follower, not a leader.” Well, the gods and my lord father. “Mace Tyrell would 
grasp the sceptre gladly, but mine own kin are not like to step aside and give it to him. And everyone 
hates Stannis. Who does that leave? Why, only Cersei. 
  
“Westeros is torn and bleeding, and I do not doubt that even now my sweet sister is binding up

only man who can restore

## Chatting with RAG

In [104]:
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

#### Vanilla way

In [107]:
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("history"),
        ("human", "Context: {context}\n\nQuestion: {question}"),
    ]
)

In [108]:
rag_chain = (
    RunnableParallel({
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
        "history": RunnablePassthrough(),
    })
    | qa_prompt
    | model
    | StrOutputParser()
)

conversational_rag_chain_simple = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)

In [109]:
store = {}
config_simple = {"configurable": {"session_id": "abc1"}}

In [110]:
response_1 = conversational_rag_chain_simple.invoke(
    {"question": "Who was Varamyr?"},
    config=config_simple,
)

AttributeError: 'dict' object has no attribute 'replace'

#### Using `create_retrieval_chain` and `create_history_aware_retriever`

In [45]:
from langchain.chains import create_retrieval_chain, create_history_aware_retriever

In [65]:
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    model, retriever, contextualize_q_prompt
)

question_answer_chain = create_stuff_documents_chain(model, qa_prompt)
rag_qa_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)


In [103]:
store = {}

config = {"configurable": {"session_id": "abc2"}}

In [67]:
conversational_rag_chain = RunnableWithMessageHistory(
    rag_qa_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [69]:
response_1 = conversational_rag_chain.invoke(
    {"input": "Who was Varamyr?"},
    config=config,
)

In [72]:
response_1["answer"]

'Varamyr was a skinchanger, also known as Varamyr Six-skins, who could shape-shift into various animals, including bears and eagles. He was originally named Lump and was taught the ways of the skinchanger by Haggon.'

In [74]:
response_2 = conversational_rag_chain.invoke(
    {"input": "How many times did he die?"},
    config=config,
)

In [75]:
response_2["answer"]

'Varamyr died nine times before the events described in the passage.'

In [76]:
response_3 = conversational_rag_chain.invoke(
    {"input": "Who was Orell?"},
    config=config,
)

In [77]:
response_3["answer"]

'Orell was a skinchanger who had an eagle as his bonded animal. He was slain by Jon Snow, also known as the turncloak crow.'

In [78]:
response_4 = conversational_rag_chain.invoke(
    {"input": "How did he die?"},
    config=config,
)

In [79]:
response_4["answer"]

'Orell was slain by Jon Snow, the turncloak crow, while he was riding his eagle.'

## Chatting with RAG v0.3

 - [MemorySaver](https://langchain-ai.github.io/langgraph/reference/checkpoints/#langgraph.checkpoint.memory.MemorySaver)

A graph whose nodes communicate by reading and writing to a shared state. 

 - [StateGraph](https://langchain-ai.github.io/langgraph/reference/graphs/#langgraph.graph.state.StateGraph)

An in-memory checkpoint saver.

This checkpoint saver stores checkpoints in memory using a defaultdict.

### QA

In [24]:
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
from langchain_core.documents import Document

In [28]:
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


def retrieve(state: State):
    retrieved_docs = faiss_vectorstore.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = model.invoke(messages)
    return {"answer": response.content}

In [29]:
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

In [33]:
response = graph.invoke({"question": q_s[0]})
print(response["answer"])

According to Dornish law, Myrcella has a stronger claim on the Iron Throne because she is married to Prince Trystane Martell, making her a princess of Dorne.


### Chat

In [36]:
from langgraph.prebuilt import create_react_agent
from langchain_core.tools import tool
from langchain_core.messages import SystemMessage
from langgraph.prebuilt import ToolNode
from langgraph.graph import MessagesState, StateGraph

In [62]:
@tool(response_format="content_and_artifact")
def retrieve(query: str):
    """Retrieve information related to a query."""
    retrieved_docs = faiss_vectorstore.similarity_search(query, k=6)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\n" f"Content: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

#### Using graph builder

In [69]:
from langgraph.graph import END
from langgraph.prebuilt import ToolNode, tools_condition

In [71]:
graph_builder = StateGraph(MessagesState)

In [72]:
# Step 1: Generate an AIMessage that may include a tool-call to be sent.
def query_or_respond(state: MessagesState):
    """Generate tool call for retrieval or respond."""
    llm_with_tools = model.bind_tools([retrieve])
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}


# Step 2: Execute the retrieval.
# A node for the retriever tool that executes the retrieval step;
tools = ToolNode([retrieve])


# Step 3: Generate a response using the retrieved content.
# A node that generates the final response using the retrieved context.
def generate(state: MessagesState):
    """Generate answer."""

    recent_tool_messages = []
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            break
    tool_messages = recent_tool_messages[::-1]  # retrieval tools

    # Format into prompt
    docs_content = "\n\n".join(doc.content for doc in tool_messages)
    system_message_content = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. Use three sentences maximum and keep the "
        "answer concise. Pay attention to names of characters."
        "\n\n"
        f"{docs_content}"
    )
    conversation_messages = [
        message
        for message in state["messages"]
        if message.type in ("human", "system")
        or (message.type == "ai" and not message.tool_calls)
    ]
    prompt = [SystemMessage(system_message_content)] + conversation_messages

    response = model.invoke(prompt)
    return {"messages": [response]}

In [73]:
graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
    "query_or_respond",
    tools_condition,
    {END: END, "tools": "tools"},
)
graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

graph = graph_builder.compile()

In [75]:
input_message = q_s[1]

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()


Who is the current King, Lord of the seven kingdoms, protector of the realm?
Tool Calls:
  retrieve (call_j0vw)
 Call ID: call_j0vw
  Args:
    query: Who is the current King, Lord of the seven kingdoms, protector of the realm?
Name: retrieve

Source: {'source': './A_Dance_With_Dragons.pdf', 'file_path': './A_Dance_With_Dragons.pdf', 'page': 115, 'total_pages': 894, 'format': 'PDF 1.5', 'title': 'A Dance With Dragons - A Song of Ice and Fire', 'author': 'George R R Martin', 'subject': 'nothuman', 'keywords': 'A Game of Thrones; A Song of Ice and Fire; pdf, nothuman', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': 'D:20110711230404Z', 'modDate': "D:20190506190144+02'00'", 'trapped': '', 'start_index': 3383}
Content: only man who can restore the realm and defend it against the peril that gathers in the north. Because he 
has a magic sword that glows with the light of the sun. The words caught in his throat. None of them

Source: {'so

In [76]:
input_message = q_s[2]

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()


Who is his wife?
Tool Calls:
  retrieve (call_d8e5)
 Call ID: call_d8e5
  Args:
    query: no wife
Name: retrieve

Source: {'source': './A_Dance_With_Dragons.pdf', 'file_path': './A_Dance_With_Dragons.pdf', 'page': 286, 'total_pages': 894, 'format': 'PDF 1.5', 'title': 'A Dance With Dragons - A Song of Ice and Fire', 'author': 'George R R Martin', 'subject': 'nothuman', 'keywords': 'A Game of Thrones; A Song of Ice and Fire; pdf, nothuman', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': 'D:20110711230404Z', 'modDate': "D:20190506190144+02'00'", 'trapped': '', 'start_index': 844}
Content: My lord husband will cut your balls off and put you in a dress.” 
  
Qarl rolled off her. “If he can get out of his chair.” 
  
The room was cold. Asha rose from Galbart Glover’s bed and took off her torn clothes. The 
jerkin would need fresh laces, but her tunic was ruined. I never liked it anyway. She tossed it on the 
flames. The rest she left i

In [77]:
input_message = q_s[3]

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()


Who drank his way across the narrow sea?

Ygritte.


#### Create react agent

In [47]:
from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

# Specify an ID for the thread
config = {"configurable": {"thread_id": "abc123"}}

In [48]:
agent_executor = create_react_agent(model, [retrieve], checkpointer=memory)

In [60]:
config = {"configurable": {"thread_id": "def234"}}

input_message = q_s[6]

for event in agent_executor.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    event["messages"][-1].pretty_print()


How many times had Varamyr died?

According to the book excerpts provided, Varamyr Sixskins had died twice.


### Free memory

In [68]:
import gc

In [70]:
del chroma_vectorstore
gc.collect()

1796

# Question answers

### Batch Norm vs Layer Norm

In [None]:
import numpy as np

def batch_norm(x, gamma, beta, eps=1e-5):
  """
  Performs batch normalization on a given input tensor.

  Args:
    x: Input tensor of shape (n, d).
    gamma: Scale parameter of shape (d,).
    beta: Shift parameter of shape (d,).
    eps: Small constant for numerical stability.

  Returns:
    Normalized tensor of the same shape as x.
  """
  mean = np.mean(x, axis=0)
  variance = np.var(x, axis=0)
  x_hat = (x - mean) / np.sqrt(variance + eps)
  return gamma * x_hat + beta


def layer_norm(x, gamma, beta, eps=1e-5):
  """
  Performs layer normalization on a given input tensor.

  Args:
    x: Input tensor of shape (n, d).
    gamma: Scale parameter of shape (d,).
    beta: Shift parameter of shape (d,).
    eps: Small constant for numerical stability.

  Returns:
    Normalized tensor of the same shape as x.
  """
  mean = np.mean(x, axis=1, keepdims=True)
  variance = np.var(x, axis=1, keepdims=True)
  x_hat = (x - mean) / np.sqrt(variance + eps)
  return gamma * x_hat + beta

# Example usage:
n = 10
d = 5
x = np.random.randn(n, d)
gamma = np.ones(d)
beta = np.zeros(d)

x_batch_norm = batch_norm(x, gamma, beta)
x_layer_norm = layer_norm(x, gamma, beta)

print("Batch Normalized x:")
print(x_batch_norm)
print("\nLayer Normalized x:")
print(x_layer_norm)


In [None]:
# prompt: Why do transformers use layer norm, instead of batch norm

response = rag_chain.invoke({"input": "Why do transformers use layer norm, instead of batch norm"})
print(response["answer"])


### How to choose an embedding model and how to assess its performance.



 > INSTRUCTOR is the most flexible due to its instruction-tuning capability.
 >
 > INSTRUCTOR can be adapted to various specific tasks through instructions.
 >
 > If you need highly adaptable embeddings for diverse or specific tasks, choose INSTRUCTOR.
 >
 > More flexible as it can be tailored to specific tasks through instructions.
 >
 > Generally more powerful but potentially slower than the other options.
 >
 > Can handle a wider range of tasks beyond just creating embeddings.





### Instruction-tuned model - what is?

## Chain of thought prompting

Enhances the reasoning capabilities of large language models (LLMs) by incorporating logical steps—or a "chain of thought" — within the prompt.

CoT guides the model to work through intermediate reasoning steps

<img src="chain_of_thought_example.png" style="width:800px">

> few-shot prompting (left) and CoT prompting (right)

## Metrics

 - [Answer relevance](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/answer_relevance/)

 Metric focuses on assessing how pertinent the generated answer is to the given prompt. A lower score is assigned to answers that are incomplete or contain redundant information and higher scores indicate better relevancy. This metric is computed using the `user_input`, the `retrived_contexts` and the response.

 The Answer Relevancy is defined as the mean cosine similarity of the original `user_input` to a number of artificial questions, which where generated (reverse engineered) based on the `response`.

 - [Faithfulness](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/factual_correctness/)

 Faithfulness metric measures the factual consistency of the generated answer against the given context. It is calculated from answer and retrieved context. The answer is scaled to (0,1) range. Higher the better.

 The generated answer is regarded as faithful if all the claims made in the answer can be inferred from the given context. To calculate this, a set of claims from the generated answer is first identified. Then each of these claims is cross-checked with the given context to determine if it can be inferred from the context.

## HNSW

Nearest neighbor search. Greedy. Each Vertex is a vector in n-d space.

A small-world network is a graph characterized by a high clustering coefficient and low distances.

> On an example of social network, high clustering implies the high probability that two friends of one person are friends themselves. The low distances, on the other hand, mean that there is a short chain of social connections between any two people.

> Specifically, a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes (the number of steps required) grows proportionally to the logarithm of the number of nodes N in the network, that is:
>
> $ \hspace{15mm} L \propto log N $
>
> while the global clustering coefficient is not small.

### Search in a navigable small world

 - short range links and long range links - make the graph very navigable

 - friends list of a vertex

 - one or a few, predetermined entry points

 When searching the NSFW graphs, we begin at predetermined entry points. Then we perform Greedy routing. That means out of all the friends, we are goint to navigate to the one closest to our query.

 If there are no near vertices in friend list, that means it's a local minimum. And this, is a stopping condition.

 To minimize the probability of stopping early, avoiding local minima, we can increase the average degree of vertices. This also increases the complexity of our network and slows down the search time. We have to find a tradeoff between both of these. Another approach to avoid local minima too early is start search on high degree vertices. The vertices have a lot of collections.



A navigable graph, with an entry point. Finding Neighbors of the current point, closest to the query.

Greedily explore it in the direction of query.



### Layers in NSW

Take an NSW graph and spread it across multiple layers.

High degree vertices will tend to be spread across more layers.

When building the graph, we add the number of friends based on which layer it gets inserted at. The higher the layer, that a vertex is inserted at, the more friends it's going to have.

If we start at the highest layer, that means we're on a high degree vertex. That means, we are less likely to get stuck on a local minimum and stop early.

On each layer, we keep traversing across different edges in that layer, similar to an NSW. We greedily identify and traverse to the friend of the current vertex that has least distance to the query vector, and traverse to it. And we keep doing that.

Once we hit the local minimum, we do not still, stop. We move down to the next layer. And we keep doing it until we hit the local minimum at Layer 0 (the bottom-most layer).




### Building HNSW graph

 - Insertion

 - Linking

Insertion is guided by a probability function which says, we're going to put a very high number of vertex or vectors on layer 0.

## Stable Diffusion

Stable Diffusion is a generative model used for producing high-quality images from text descriptions. It involves three key components: CLIP, UNet, and VAE. Here's how each component works and their roles in the process:

### 1. **CLIP (Contrastive Language-Image Pretraining)**
   - **Role**: CLIP acts as the bridge between the text prompt and the image generation process.
   - **Function**: CLIP is a model that has been trained to understand the relationship between text and images. It learns to associate text descriptions with images by embedding both into a shared latent space. In Stable Diffusion, CLIP is used to generate an initial understanding of the text prompt, which is then translated into an image during the diffusion process. Specifically, CLIP helps in guiding the image generation process so that the resulting image aligns with the text prompt.

### 2. **UNet (U-Net)**
   - **Role**: UNet is the core architecture used in the denoising process of the diffusion model.
   - **Function**: Stable Diffusion works by iteratively denoising a noisy image. The UNet architecture is designed to take a noisy image as input and predict a less noisy version of that image. This is done over many steps, gradually reducing the noise until a coherent image emerges. The UNet in Stable Diffusion has skip connections, which allow it to capture both global and local features of the image, making it effective in producing high-quality, detailed outputs.

### 3. **VAE (Variational Autoencoder)**
   - **Role**: VAE is used for compressing and decompressing images during the generation process.
   - **Function**: The VAE is used to encode images into a lower-dimensional latent space, which is more manageable for the diffusion process. This latent space captures the essential features of the image, allowing for efficient processing. Once the denoising process is complete, the VAE decoder reconstructs the final high-resolution image from this latent space. The VAE helps to ensure that the generated images are realistic and adhere to the underlying distribution of the training data.

### **How It Works Together:**
1. **Text to Latent Space**: The process begins with a text prompt, which is processed by the CLIP model to generate a text embedding.
2. **Latent Diffusion Process**: An initial noisy image is generated in the latent space, corresponding to the text embedding. The UNet model is then used to iteratively denoise this image.
3. **Decoding**: After sufficient denoising, the latent image is passed through the VAE decoder, which converts it back into a high-resolution image in pixel space.

In summary, Stable Diffusion leverages CLIP to understand and align text with image generation, UNet to iteratively refine the image, and VAE to manage the latent space for efficient image generation. The combination of these components allows Stable Diffusion to generate high-quality images that closely match the given text descriptions.

> ### It learns to associate text descriptions with images by embedding both into a shared latent space

Certainly! Let's dive deeper into how CLIP associates text descriptions with images through the concept of embedding into a shared latent space.

### **1. Embedding in CLIP:**

CLIP, developed by OpenAI, is a model that can understand both text and images. It achieves this understanding by converting (or embedding) both text and images into vectors in a shared latent space—a high-dimensional mathematical space where similar items are close together, and dissimilar items are far apart.

#### **a. Text Embedding:**
- **Process**: CLIP takes a text input (like a sentence) and processes it through a text encoder, often based on a Transformer architecture. This encoder transforms the text into a fixed-size vector, capturing the semantic meaning of the text.
- **Output**: The result is a vector representation of the text, where each dimension in the vector captures some aspect of the text's meaning. This vector is located in the latent space where the model has learned to place textual concepts.

#### **b. Image Embedding:**
- **Process**: Similarly, CLIP takes an image and processes it through an image encoder, often a convolutional neural network (CNN). This encoder extracts features from the image (like colors, shapes, textures) and converts them into a vector.
- **Output**: This vector is a representation of the image, capturing its essential features. Just like the text vector, this image vector exists in the same latent space.

### **2. Shared Latent Space:**
The "shared latent space" is a conceptual space where both text and image vectors are placed after being processed by their respective encoders. The goal during training is to make sure that a text and an image that match (e.g., "a cat" and a picture of a cat) end up being close together in this space, while unrelated text and images (e.g., "a cat" and a picture of a car) are far apart.

### **3. Learning to Associate:**
CLIP is trained on a massive dataset of text-image pairs (like captions paired with images). During training:
- **Alignment**: The model learns to adjust the embeddings so that the vectors for matching text and images are closer together, and non-matching pairs are further apart.
- **Loss Function**: CLIP uses a contrastive loss function during training, which explicitly encourages this proximity for matching pairs and separation for non-matching pairs.

### **4. Using CLIP in Stable Diffusion:**
In the context of Stable Diffusion:
- When you input a text prompt, CLIP generates a vector representing that text in the latent space.
- This vector guides the diffusion model (which generates the image) by ensuring that the generated image's latent representation stays close to the text's latent representation, ensuring that the image aligns with the given description.

### **Why This Matters:**
By embedding both text and images into the same latent space, CLIP enables models like Stable Diffusion to understand the relationship between text and image. This shared understanding is crucial for generating images that accurately reflect the text prompts provided by users.

In summary, the shared latent space in CLIP allows for a powerful association between text and images, enabling models to generate images that are semantically aligned with textual descriptions.