# RAG Applications with Vector Databases
Retrieval-Augmented Generation (RAG) indeed leverages vector databases to enhance the capabilities of language models by integrating external knowledge. This approach is particularly useful for applications requiring up-to-date information and domain-specific knowledge.

Here are some key aspects of state-of-the-art RAG methods:

1. **Chunking**: This involves breaking down large documents into smaller, manageable pieces or "chunks" that can be individually indexed and retrieved. This improves the efficiency and accuracy of the retrieval process[1](https://www.promptingguide.ai/research/rag).

2. **Embedding**: Embeddings are numerical representations of text that capture semantic meaning. By embedding both the query and the chunks, the system can effectively compare and retrieve the most relevant information[1](https://www.promptingguide.ai/research/rag).

3. **Metadata Usage**: Incorporating metadata (such as document type, date, and author) can enhance the retrieval process by providing additional context that helps in filtering and ranking the results[1](https://www.promptingguide.ai/research/rag).

4. **Multimodal RAG**: This involves using RAG techniques across different types of data, such as text and images. For example, embedding and storing images in a vector database allows for querying images with text, enabling more versatile and comprehensive search capabilities[2](https://github.com/NirDiamant/RAG_TECHNIQUES).


[1](https://www.promptingguide.ai/research/rag): [Nextra](https://www.promptingguide.ai/research/rag)
[2](https://github.com/NirDiamant/RAG_TECHNIQUES): [GitHub - NirDiamant/RAG_Techniques](https://github.com/NirDiamant/RAG_TECHNIQUES)

## Chapter 1: Optimizing RAG

#### 1.1 What is a Vector Database?

- A vector database is a specialized tool designed to handle unstructured data represented as vectors.
- The term "vector database" is somewhat misleading; these are not traditional databases but compute engines optimized for vector data.
- In Generative AI, vector data is often referred to as vector embeddings. For RAG, vectors and embeddings are used interchangeably.

#### 1.2 Understanding Vector Embeddings

- Vector embeddings are long series of numbers, typically consisting of hundreds or thousands of values.
- Generated by deep neural networks, specifically from the second-to-last layer, encapsulating learned information without the final predictive layer.
- Embedding models generate these embeddings and vary widely in architecture and data types they can process.

**Key Points:**
  - **Data and Model Type Matching**: Match the embedding model to the data type (e.g., image models for images, text models for text).
  - **Vector Size Consistency**: Only compare vectors of the same size, using the same embedding model for both vectorization and retrieval.

#### 1.3 The Role of Large Language Models (LLMs)

- LLMs, such as GPT-4, serve as the interface for interacting with your data in RAG.
- Based on the transformer model, LLMs predict the most likely next token given a sequence of tokens.
- Publicly available LLMs are trained on extensive datasets of publicly available information, lacking access to your specific data.

#### 1.4 How RAG Works

1. **Vectorization**: Data is vectorized using embedding models.
2. **Storage**: Vectorized data is stored in a vector database.
3. **Interaction**: An LLM interfaces with the vector database to retrieve relevant information.

**Process:**
- A question is vectorized by an embedding model.
- The vectorized query searches the vector database for similar embeddings.
- Relevant results are retrieved and provided as context to the LLM.
- The LLM generates a coherent and contextually appropriate response.

#### 1.5 Preprocessing for RAG

- Preprocessing involves chunking, embeddings, and metadata.

**Components:**
  - **Chunking**: Breaking down large text blocks into smaller, manageable chunks.
  - **Embeddings**: Vectors generated by embedding models, representing the semantic meaning of input data.
  - **Metadata**: Additional data stored alongside embeddings in a vector database, critical for optimizing RAG applications.

#### 1.6 Chunking Considerations

- Chunking splits documents into smaller, consumable chunks.

**Requirements:**
  - **Consumable**: Fit within the context window of the embedding model and the LLM.
  - **Coherent**: Make sense as standalone pieces of text.
  - **Contextual**: Contain all necessary context to answer a question.

**Considerations:**
  - Chunk size, overlap, and the use of special characters to mark chunk boundaries.

#### 1.7 Types of Embeddings

- **Dense Embeddings**: Vectors with few zero values, typically created by machine learning models.
- **Sparse Embeddings**: Vectors with many zero values, usually generated by algorithms. Dense embeddings are generally preferred for RAG applications.

#### 1.8 Metadata in RAG

- Metadata includes processing metadata (e.g., section titles, paragraph numbers) and data-related metadata (e.g., publication dates, authors).
- Helps filter searches and ensures the LLM can interpret the retrieved data correctly.

#### 1.9 Introduction to Embeddings

- Before vector embeddings, comparing unstructured data was challenging.
- Embedding models, usually deep neural networks, convert various data types (text, images, videos, audio) into vectors or vector embeddings.
- Vectors enable quantitative comparison of unstructured data.

**Critical Considerations:**
  - **Embedding Model**: Choose the correct model for your data type (e.g., ResNet50 for images, Sentence Transformers for text, Whisper for audio).
  - **What to Embed**: Focus on embedding text, as it is a crucial medium for AI.
  - **How to Compare Embeddings**: Ensure embeddings are of the same size for comparison.

**Picking the Right Model:**
  - **Embedding Size**: The dimensionality of the vector, affecting computational power needed for comparison.
  - **Model Size**: Larger models provide finer results but are more computationally expensive.
  - **Training Data**: The dataset used to train the model influences its effectiveness (e.g., language, structure, data size).

#### 1.10 Embedding Examples

- **Basic Embeddings**: Directly embedding chunks of text.
- **Small to Big**: Embedding a sentence but storing the entire paragraph for increased context.
- **Big to Small**: Embedding a paragraph but storing individual sentences for post-processing.
- **Non-English Embeddings**: Using models trained on non-English data for embedding.

#### 1.11 Metadata

- Metadata is essential for making vector databases useful, providing context and filtering capabilities.
- **Chunking Metadata**: Information from the chunking process (e.g., sentence number, section header).
- **Non-Chunking Metadata**: Additional information not tied to chunking (e.g., author, last update).

**Storing Metadata:**
  - Link to traditional databases or store directly in the vector store for easier and faster access.

### 1.12 Chunking

Chunking is the process of breaking large texts up into small, workable pieces.

In the first block here, three things are imported: the document object, the character text splitter object, and the OS library.

1. **Imports**:
   - **Document Object**: LangChain's native way to store objects. This is used to add metadata to the text and prepare it for the vector store.
   - **Character Text Splitter**: A LangChain object that can split strings based on some preset parameters. In this case, it is used for determining chunk size and chunk overlap.
   - **OS Library**: Used for navigating the directory structure of the operating system.

2. **Functionality**:
   - **Set Folder**: First, ensure that the right folder is being used. In this case, the Big Star Collectibles folder is used. To access the list of text files within this folder, the OS library is used to get a list of the directory.
   - **Create List**: Next, create an empty list object to hold all of the chunked up texts that will be created.
   - **Loop Through Files**: Then start looping through all of the files and chunking them up. So, what is done in this loop?
     - Start by opening up the file and reading the entire page in as a single string.
     - Next, create a `CharacterTextSplitter` object. This specific instance is set up to split strings into 128 character chunks with 32 character overlaps.
     - Then use the object's `split_text` function and pass the string containing the entire file through to get the chunks.
     - The last bit of functionality in the chunking section is to loop through each of these chunked texts and create a document object from each chunk. To ensure that the chunks are kept stored in the vector store, assign it to the `page_content` parameter.

In [None]:
# Import necessary libraries
# from langchain.document import Document
from langchain.schema import Document
from langchain.text_splitter import CharacterTextSplitter
import os

# Set the folder containing the text files
folder_path = 'Big Star Collectibles'

# Get a list of text files in the folder
file_list = os.listdir(folder_path)

# Initialize an empty list to hold chunked texts
chunked_texts = []

# Loop through each file in the folder
for file_name in file_list:
    file_path = os.path.join(folder_path, file_name)
    
    # Open and read the entire file as a single string
    with open(file_path, 'r') as file:
        text = file.read()
    
    # Create a CharacterTextSplitter object
    text_splitter = CharacterTextSplitter(chunk_size=128, chunk_overlap=32)
    
    # Split the text into chunks
    chunks = text_splitter.split_text(text)
    
    # Loop through each chunk and create a Document object
    for chunk in chunks:
        doc = Document(page_content=chunk)
        chunked_texts.append(doc)

# Print the first chunk to verify
print(chunked_texts[0].page_content)

### 1.13 Metadata

This section is about storing metadata with chunk data. In LangChain, metadata is stored within the document object. Previously, chunks were stored in the `page_content` parameter. To store metadata, simply add a `metadata` parameter. Metadata is stored as a dictionary, and you can define the metadata to store. Common metadata includes the title of the document and the chunk number, indicating where in the document the chunk was taken from.

To implement this, enumerate through the list instead of just looping through it. This allows access to the chunk number and identifies where in the document the chunk was taken from.

This code will read all text files in the specified folder, split each file into 128-character chunks with 32-character overlaps, and store each chunk as a `Document` object with metadata in the `chunked_texts` list. The last lines print the first chunk and its metadata to verify the process.



In [None]:
# Import necessary libraries
from langchain.document import Document
from langchain.text_splitter import CharacterTextSplitter
import os

# Set the folder containing the text files
folder_path = 'Big Star Collectibles'

# Get a list of text files in the folder
file_list = os.listdir(folder_path)

# Initialize an empty list to hold chunked texts
chunked_texts = []

# Loop through each file in the folder
for file_name in file_list:
    file_path = os.path.join(folder_path, file_name)
    
    # Open and read the entire file as a single string
    with open(file_path, 'r') as file:
        text = file.read()
    
    # Create a CharacterTextSplitter object
    text_splitter = CharacterTextSplitter(chunk_size=128, chunk_overlap=32)
    
    # Split the text into chunks
    chunks = text_splitter.split_text(text)
    
    # Loop through each chunk and create a Document object with metadata
    for i, chunk in enumerate(chunks):
        metadata = {
            'doc_title': file_name,
            'chunk_number': i
        }
        doc = Document(page_content=chunk, metadata=metadata)
        chunked_texts.append(doc)

# Print the first chunk and its metadata to verify
print(chunked_texts[0].page_content)
print(chunked_texts[0].metadata)

### 1.14 Embed and Store

With the data chunked and some metadata saved in each of the chunked objects, the next step is to embed the data and store it into a vector database. This involves two new imports: FAISS and HuggingFaceEmbeddings.

1. **Imports**:
   - **FAISS**: Stands for Facebook AI Similarity Search. This library is the foundation for many popular AI-native vector databases.
   - **HuggingFaceEmbeddings**: Imported from `langchain_community`. Initially, LangChain had numerous integrations, but as it grew, many of these integrations moved to the LangChain community library, including HuggingFaceEmbeddings.

2. **Embedding and Storing**:
   - **Import Libraries**: Start by importing the FAISS library from LangChain and the HuggingFaceEmbeddings from the community module.
   - **Instantiate Embedding Function**: Create an instance of the HuggingFaceEmbeddings object. The default embedding model is `all-mpnet-base-v2`, which has 768 dimensions. Only vectors of the same dimensionality can be compared.
   - **Create Vector Store**: Use the documents created in the metadata and chunking steps. Pass the embedding function and the documents to create the vector store.

This code will read all text files in the specified folder, split each file into 128-character chunks with 32-character overlaps (each chunk starts 32 characters after the start of the previous chunk, creating an overlap), store each chunk as a `Document` object with metadata, and then embed and store these documents into a FAISS vector database.

In [None]:
# Import necessary libraries
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document import Document
from langchain.text_splitter import CharacterTextSplitter
import os

# Set the folder containing the text files
folder_path = 'Big Star Collectibles'

# Get a list of text files in the folder
file_list = os.listdir(folder_path)

# Initialize an empty list to hold chunked texts
chunked_texts = []

# Loop through each file in the folder
for file_name in file_list:
    file_path = os.path.join(folder_path, file_name)
    
    # Open and read the entire file as a single string
    with open(file_path, 'r') as file:
        text = file.read()
    
    # Create a CharacterTextSplitter object
    text_splitter = CharacterTextSplitter(chunk_size=128, chunk_overlap=32)
    
    # Split the text into chunks
    chunks = text_splitter.split_text(text)
    
    # Loop through each chunk and create a Document object with metadata
    for i, chunk in enumerate(chunks):
        metadata = {
            'doc_title': file_name,
            'chunk_number': i
        }
        doc = Document(page_content=chunk, metadata=metadata)
        chunked_texts.append(doc)

# Instantiate the HuggingFaceEmbeddings object
embedding_function = HuggingFaceEmbeddings()

# Create the vector store
vector_store = FAISS.from_documents(chunked_texts, embedding_function)

# Print the first document's embedding to verify
print(vector_store.index[0])

### 1.15 Querying

Querying the vector database is part of what goes on behind the scenes in a RAG application. The LLM queries the `vector_store` to get some context back to create a response. When interacting with the RAG app, this query is not visible. This section provides a peek behind the scenes to see what the LLM sees.

When querying a vector database, some top_k results are returned. For LangChain FAISS, the default k is 4. 

1. **Prepare the Vector Store**:
   - **as_retriever Function**: The first step to perform a `vector_store` query in LangChain is to take the `vector_store` and call the `as_retriever` function on it. This prepares the `vector_store` to be queried with strings and abstracts out the necessity of turning a string into an embedding and calling a query function directly.

2. **Perform the Query**:
   - **invoke Function**: Call the `invoke` function of the retriever and pass a string. The result is the top four results in the `vector_store` according to the embedding model defined earlier.

This code will read all text files in the specified folder, split each file into 128-character chunks with 32-character overlaps, store each chunk as a `Document` object with metadata, embed and store these documents into a FAISS vector database, and then query the vector store to retrieve the top 4 results based on the query string.

In [None]:
# Prepare the vector store for querying
retriever = vector_store.as_retriever()

# Perform a query
query = "example query text"
results = retriever.invoke(query)

# Print the top 4 results
for result in results:
    print(result.page_content)
    print(result.metadata)

### 1.15 Adding the LLM

The final part of creating a RAG application on top of the vector store is to add the LLM (Large Language Model). For this part, access to an LLM is needed. This can be done using an API key from providers like OctoAI, OpenAI, or others, or by running an LLM locally. 

1. **Set Up LLM Access**:
   - **Import Environment Variables**: Use Python-dotenv's `load_dotenv` method to load environment variables.
   - **Import OpenAI**: Import OpenAI from `langchain_openai` and initialize it as the LLM.

2. **Create a Prompt Template**:
   - **Prompt Creation**: Create a prompt template for the chat. Use brackets to pass the question and context, similar to using an f-string in Python.
   - **ChatPromptTemplate**: Use the `ChatPromptTemplate` object from LangChain to create the prompt template.

3. **Create the Chain**:
   - **Imports**: Import `RunnablePassthrough` and `StrOutputParser`.
     - **RunnablePassthrough**: Takes a string and treats it as a function by passing the string through the function.
     - **StrOutputParser**: Parses the output of the chain as a string.
   - **Build the Chain**: 
     - Get the context and the question using the objects mentioned.
     - Pass these to the prompt created.
     - Pass the prompt to the LLM.
     - Pipe the output of the LLM to the `StrOutputParser`.

4. **Invoke the Chain**:
   - **Get a Response**: Invoke the chain to get a response. The response will combine the queries and return a single string from the given context.

This code will read all text files in the specified folder, split each file into 128-character chunks with 32-character overlaps, store each chunk as a `Document` object with metadata, embed and store these documents into a FAISS vector database, and then query the vector store and use the LLM to get a response.

In [None]:
# Import necessary libraries
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import RunnablePassthrough, StrOutputParser
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Set the folder containing the text files
folder_path = 'Big Star Collectibles'

# Get a list of text files in the folder
file_list = os.listdir(folder_path)

# Initialize an empty list to hold chunked texts
chunked_texts = []

# Loop through each file in the folder
for file_name in file_list:
    file_path = os.path.join(folder_path, file_name)
    
    # Open and read the entire file as a single string
    with open(file_path, 'r') as file:
        text = file.read()
    
    # Create a CharacterTextSplitter object
    text_splitter = CharacterTextSplitter(chunk_size=128, chunk_overlap=32)
    
    # Split the text into chunks
    chunks = text_splitter.split_text(text)
    
    # Loop through each chunk and create a Document object with metadata
    for i, chunk in enumerate(chunks):
        metadata = {
            'doc_title': file_name,
            'chunk_number': i
        }
        doc = Document(page_content=chunk, metadata=metadata)
        chunked_texts.append(doc)

# Instantiate the HuggingFaceEmbeddings object
embedding_function = HuggingFaceEmbeddings()

# Create the vector store
vector_store = FAISS.from_documents(chunked_texts, embedding_function)

# Prepare the vector store for querying
retriever = vector_store.as_retriever()

# Instantiate the OpenAI LLM
llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Create a prompt template
prompt_template = ChatPromptTemplate.from_template("Question: {question}\nContext: {context}")

# Create the chain
chain = RunnablePassthrough(
    retriever,
    prompt_template,
    llm,
    StrOutputParser()
)

# Perform a query and get a response
query = "example query text"
response = chain.invoke(query)

# Print the response
print(response)

### Question
How can you modify a Retrieval-Augmented Generation (RAG) application to cite its document sources using metadata stored in a vector database?

### Answer: Cite Your Document Sources
To modify a RAG application to cite its document sources, follow these steps:

1. **Store Document Metadata**: Ensure that the names of the documents are stored as part of the metadata in the vector store. This allows access to this information when retrieving objects from the vector store.

2. **Retrieve Metadata**: When retrieving objects from the vector store, access this metadata to get the document names.

3. **Prompt Engineering**: Modify the prompt to instruct the language model to cite its sources. This can be done by adding a simple sentence to the prompt, such as "Please cite your sources."

### Example Implementation
In this example:
- The `VectorStore` class is used to interact with the vector database.
- The `retrieve_and_cite` function retrieves documents related to the query and extracts their names from the metadata.
- The prompt is modified to include an instruction to cite sources.
- The language model generates a response, and the sources are appended to the response.

### retrieve_and_cite function

In [None]:
from langchain.vectorstores import VectorStore
from langchain.llms import OpenAI

# Initialize the vector store and language model
vector_store = VectorStore()
llm = OpenAI()

# Function to retrieve documents and cite sources
def retrieve_and_cite(query):
    # Retrieve documents from the vector store
    results = vector_store.similarity_search(query)
    
    # Extract document names from metadata
    sources = [result.metadata['document_name'] for result in results]
    
    # Create a prompt with the query and instruction to cite sources
    prompt = f"{query}\n\nCite your sources."
    
    # Get the response from the language model
    response = llm(prompt)
    
    # Append the sources to the response
    cited_response = f"{response}\n\nSources: {', '.join(sources)}"
    
    return cited_response

# Example usage
query = "Explain the benefits of RAG applications."
response = retrieve_and_cite(query)
print(response)

### Cite Your Document Sources Code

In [None]:
### Cite Your Document Sources
# Import necessary libraries
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import RunnablePassthrough, StrOutputParser
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Set the folder containing the text files
folder_path = 'Big Star Collectibles'

# Get a list of text files in the folder
file_list = os.listdir(folder_path)

# Initialize an empty list to hold chunked texts
chunked_texts = []

# Loop through each file in the folder
for file_name in file_list:
    file_path = os.path.join(folder_path, file_name)
    
    # Open and read the entire file as a single string
    with open(file_path, 'r') as file:
        text = file.read()
    
    # Create a CharacterTextSplitter object
    text_splitter = CharacterTextSplitter(chunk_size=128, chunk_overlap=32)
    
    # Split the text into chunks
    chunks = text_splitter.split_text(text)
    
    # Loop through each chunk and create a Document object with metadata
    for i, chunk in enumerate(chunks):
        metadata = {
            'doc_title': file_name,
            'chunk_number': i
        }
        doc = Document(page_content=chunk, metadata=metadata)
        chunked_texts.append(doc)

# Instantiate the HuggingFaceEmbeddings object
embedding_function = HuggingFaceEmbeddings()

# Create the vector store
vector_store = FAISS.from_documents(chunked_texts, embedding_function)

# Prepare the vector store for querying
retriever = vector_store.as_retriever()

# Instantiate the OpenAI LLM
llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Create a prompt template with source citation
prompt_template = ChatPromptTemplate.from_template(
    "Question: {question}\nContext: {context}\nPlease cite your sources."
)

# Create the chain
chain = RunnablePassthrough(
    retriever,
    prompt_template,
    llm,
    StrOutputParser()
)

# Perform a query and get a response
query = "example query text"
response = chain.invoke(query)

# Print the response
print(response)

The retrieve_and_cite function and the "Cite Your Document Sources" code serve similar purposes but are structured differently. The retrieve_and_cite function is a simpler example to illustrate the basic concept of retrieving documents and citing sources.

On the other hand, the "Cite Your Document Sources" code is a more comprehensive implementation that includes additional steps like text splitting, embedding generation, and creating a query chain. This code is designed to handle more complex scenarios and provides a more robust solution.

If you prefer to use the retrieve_and_cite function within the comprehensive implementation, you can integrate it as follows:

In [None]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import RunnablePassthrough, StrOutputParser
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Set the folder containing the text files
folder_path = 'Big Star Collectibles'

# Get a list of text files in the folder
file_list = os.listdir(folder_path)

# Initialize an empty list to hold chunked texts
chunked_texts = []

# Loop through each file in the folder
for file_name in file_list:
    file_path = os.path.join(folder_path, file_name)
    
    # Open and read the entire file as a single string
    with open(file_path, 'r') as file:
        text = file.read()
    
    # Create a CharacterTextSplitter object
    text_splitter = CharacterTextSplitter(chunk_size=128, chunk_overlap=32)
    
    # Split the text into chunks
    chunks = text_splitter.split_text(text)
    
    # Loop through each chunk and create a Document object with metadata
    for i, chunk in enumerate(chunks):
        metadata = {
            'doc_title': file_name,
            'chunk_number': i
        }
        doc = Document(page_content=chunk, metadata=metadata)
        chunked_texts.append(doc)

# Instantiate the HuggingFaceEmbeddings object
embedding_function = HuggingFaceEmbeddings()

# Create the vector store
vector_store = FAISS.from_documents(chunked_texts, embedding_function)

# Prepare the vector store for querying
retriever = vector_store.as_retriever()

# Instantiate the OpenAI LLM
llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Function to retrieve documents and cite sources
def retrieve_and_cite(query):
    # Retrieve documents from the vector store
    results = retriever.retrieve(query)
    
    # Extract document names from metadata
    sources = [result.metadata['doc_title'] for result in results]
    
    # Create a prompt with the query and instruction to cite sources
    prompt = f"{query}\n\nPlease cite your sources."
    
    # Get the response from the language model
    response = llm(prompt)
    
    # Append the sources to the response
    cited_response = f"{response}\n\nSources: {', '.join(sources)}"
    
    return cited_response

# Example usage
query = "Explain the benefits of RAG applications."
response = retrieve_and_cite(query)
print(response)

### Change the Chunk Size

When using the `CharacterTextSplitter`, there are two parameters that are automatically set: `separator` and `is_separator_regex`. 
By default, the `separator` parameter is set to a double new line (`\n\n`). This means that if the text does not contain double new lines, it may not form a new chunk, even if it exceeds the specified chunk size.

To ensure that chunks are formed around the correct chunk size, a custom separator can be defined. In this case, a single new line (`\n`) can be used as the custom separator. This will help in forming chunks of the correct size and overlap.

### How to modify the code to include a custom separator and add the question to the prompt?

This code will ensure that the text is split into chunks of the correct size using a single new line as the separator. It also includes the question in the prompt to instruct the LLM to cite its sources, ensuring that the LLM provides the necessary citations.

In [None]:
# Import necessary libraries
from langchain.document import Document
from langchain.text_splitter import CharacterTextSplitter
import os

# Set the folder containing the text files
folder_path = 'Big Star Collectibles'

# Get a list of text files in the folder
file_list = os.listdir(folder_path)

# Initialize an empty list to hold chunked texts
chunked_texts = []

# Loop through each file in the folder
for file_name in file_list:
    file_path = os.path.join(folder_path, file_name)
    
    # Open and read the entire file as a single string
    with open(file_path, 'r') as file:
        text = file.read()
    
    # Create a CharacterTextSplitter object with a custom separator
    text_splitter = CharacterTextSplitter(chunk_size=128, chunk_overlap=32, separator='\n')
    
    # Split the text into chunks
    chunks = text_splitter.split_text(text)
    
    # Loop through each chunk and create a Document object with metadata
    for i, chunk in enumerate(chunks):
        metadata = {
            'doc_title': file_name,
            'chunk_number': i
        }
        doc = Document(page_content=chunk, metadata=metadata)
        chunked_texts.append(doc)

# Print the first chunk to verify
print(chunked_texts[0].page_content)

# Import necessary libraries for embedding and querying
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import OpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import RunnablePassthrough, StrOutputParser
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Instantiate the HuggingFaceEmbeddings object
embedding_function = HuggingFaceEmbeddings()

# Create the vector store
vector_store = FAISS.from_documents(chunked_texts, embedding_function)

# Prepare the vector store for querying
retriever = vector_store.as_retriever()

# Instantiate the OpenAI LLM
llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Create a prompt template with source citation
prompt_template = ChatPromptTemplate.from_template(
    "Question: {question}\nContext: {context}\nPlease cite your sources."
)

# Create the chain
chain = RunnablePassthrough(
    retriever,
    prompt_template,
    llm,
    StrOutputParser()
)

# Perform a query and get a response
query = "example query text"
response = chain.invoke(query)

# Print the response
print(response)

## Chapter 2: Vector Embeddings for Images

Images are one of the classic unstructured data types, and vector embeddings are crucial for comparing images. There are two main types of vectors used for this purpose: semantic vectors and visual vectors. These vectors describe images in fundamentally different ways.

### 2.1 Semantic Embeddings

Semantic embeddings describe the meaning of the image. They are derived from deep learning models, where image data passes from the input layer, through a series of hidden layers, to an output layer. The second-to-last layer captures all the meaning that the model has derived to make its prediction, classification, or segmentation. This layer is used as the semantic embedding. These embeddings are typically used in Retrieval-Augmented Generation (RAG) applications because they capture the essence of what the image represents.

### 2.2 Visual or Pixel Embeddings

Visual embeddings encode what the image literally looks like. In image models trained in frameworks like PyTorch, images are compressed into a vector as the input to the model. These vectors capture the visual appearance of the image. While technically vector embeddings, visual embeddings are less commonly used in RAG applications compared to semantic embeddings.

In summary, for RAG applications, the focus is primarily on semantic embeddings because they provide a meaningful representation of the image's content.

### 2.3 Vision Models 101

To understand how machines compare images, it's essential to grasp how vision models work. Vision models are a type of deep neural network trained for computer vision tasks such as image classification, segmentation, and object detection.

#### 2.3.1 History of Neural Networks

The first neural networks were simple, modeled as layers of neurons fully connected to adjacent layers. Over time, different neural network architectures were developed to better handle various types of data.

#### 2.3.2 Convolutional Neural Networks (CNNs)

CNNs are a class of deep neural networks that have proven very effective for image processing tasks.

- **Convolution**: A technique that helps in getting local context from images. It involves a filter that moves across the image, performing element-wise multiplication and summing the results to produce a new image.
- **Pooling**: Max pooling is a method to downsample the image by taking the maximum value in a region, which helps in reducing the dimensionality while retaining important features.
- **Architecture**: CNNs combine convolutional layers and pooling layers to extract and combine local contexts from different parts of the image.

#### 2.3.3 Vision Transformers (ViTs)

Vision transformers apply the attention mechanism from transformer models (originally used for language) to computer vision.

- **Patches**: Images are divided into N by N patches, each turned into embeddings.
- **Processing**: These embeddings, combined with a class token and positional encoding, are fed into the transformer. The output is processed by a multilayer perceptron (MLP) to produce logits for tasks like object detection or segmentation.

Here's a representation of how a convolution works:

- **Convolution**: Imagine a 2D image filled with numbers. A 3x3 filter moves across the image, performing element-wise multiplication with the image values and summing the results to produce a new image.
- **Pooling**: In max pooling, the maximum value in a region (e.g., 12, 20, 8, 12) is taken to represent that region (e.g., 20).

Vision transformers operate similarly but use patches and attention mechanisms to process the image.

### 2.4 Getting Semantic Vectors

In this section, we will obtain a semantic vector from an image using the OpenCLIP embeddings with LangChain.

1. **Import Libraries**:
   - `OpenCLIPEmbeddings` from LangChain is used to load the embedding model.
   - `glob` is used to retrieve all image file paths in the specified directory.

2. **Get Image File Paths**:
   - `glob.glob` retrieves all `.jpg` files in the specified folder.

3. **Load Embedding Model**:
   - `OpenCLIPEmbeddings` is instantiated to create the embedding model.

4. **Embed Images**:
   - The `embed_image` function is called with the list of image file paths to generate embeddings.

5. **Examine Embeddings**:
   - The first embedding is printed to see what it looks like.
   - The length of the first embedding is printed to check its dimensionality, which should be 1024.

This process will generate semantic vectors for the images, which can then be used for various tasks such as image comparison or retrieval.

### 2.1 Vision Models 101

To understand how machines compare images, it's essential to grasp how vision models work. Vision models are a type of deep neural network trained for computer vision tasks such as image classification, segmentation, and object detection.

1. **History of Neural Networks**:
   - **1960s**: The first neural networks were simple, modeled as layers of neurons fully connected to adjacent layers.
   - **Progression**: Over time, different neural network architectures were developed to better handle various types of data.

2. **Convolutional Neural Networks (CNNs)**:
   - **Convolution**: A technique that helps in getting local context from images. It involves a filter that moves across the image, performing element-wise multiplication and summing the results to produce a new image.
   - **Pooling**: Max pooling is a method to downsample the image by taking the maximum value in a region, which helps in reducing the dimensionality while retaining important features.
   - **Architecture**: CNNs combine convolutional layers and pooling layers to extract and combine local contexts from different parts of the image.

3. **Vision Transformers (ViTs)**:
   - **Introduction**: Vision transformers apply the attention mechanism from transformer models (originally used for language) to computer vision.
   - **Patches**: Images are divided into N by N patches, each turned into embeddings.
   - **Processing**: These embeddings, combined with a class token and positional encoding, are fed into the transformer. The output is processed by a multilayer perceptron (MLP) to produce logits for tasks like object detection or segmentation.

Here's a visual representation of how a convolution works:

- **Convolution**: Imagine a 2D image filled with numbers. A 3x3 filter moves across the image, performing element-wise multiplication with the image values and summing the results to produce a new image.
- **Pooling**: In max pooling, the maximum value in a region (e.g., 12, 20, 8, 12) is taken to represent that region (e.g., 20).

Vision transformers operate similarly but use patches and attention mechanisms to process the image.

### 2.2. Getting Semantic Vectors

In this section, a semantic vector from an image using the OpenCLIP embeddings with LangChain will obtain. 

1. **Import Libraries**:
   - `OpenCLIPEmbeddings` from LangChain is used to load the embedding model.
   - `glob` is used to retrieve all image file paths in the specified directory.

2. **Get Image File Paths**:
   - `glob.glob` retrieves all `.jpg` files in the specified folder.

3. **Load Embedding Model**:
   - `OpenCLIPEmbeddings` is instantiated to create the embedding model.

4. **Embed Images**:
   - `embed_image` function is called with the list of image file paths to generate embeddings.

5. **Examine Embeddings**:
   - The first embedding is printed to see what it looks like.
   - The length of the first embedding is printed to check its dimensionality, which should be 1024.

This process will generate semantic vectors for the images, which can then be used for various tasks such as image comparison or retrieval.

In [None]:
# Import necessary libraries
from langchain.embeddings import OpenCLIPEmbeddings
import glob

# Get all image file paths
image_folder_path = 'path/to/your/images'
image_file_paths = glob.glob(f"{image_folder_path}/*.jpg")

# Load the OpenCLIP embedding model
embedding_model = OpenCLIPEmbeddings()

# Embed the images
embeddings = embedding_model.embed_image(image_file_paths)

# Examine the first embedding
print(embeddings[0])

# Check the dimensionality of the embeddings
print(len(embeddings[0]))

### 2.4. Storing Image Vectors

This section explains how to store image vectors, similar to the process used in the text RAG chapter. Here are the steps:

1. **Imports**:
   - **LangChain Imports**: `Document`, `FAISS`, and `OpenCLIPEmbeddings`.
   - **Other Imports**: `glob` for handling multiple file paths, and `base64` for converting images into base64 strings for the LLM. base64 encoding is used to convert images into text-based strings, ensuring compatibility with text-based protocols and ease of embedding within documents. This allows seamless integration of image data into workflows that primarily handle text.

2. **Initialize List**:
   - Create an empty list to hold the documents.

3. **Define the Encode Image Function**:
   - This function takes a file path, opens the file, reads it as bytes, and returns a UTF-encoded string of the file. A UTF-encoded string of a file is a text representation of the file's binary data, encoded using a Unicode Transformation Format (UTF), such as UTF-8. This encoding ensures that the binary data can be safely and consistently represented as text, making it easier to store, transmit, and process across different systems and platforms.

4. **Process Image Paths**:
   - Loop through each image path, encode the image, and create a document containing the encoded image and metadata (including the image path).

5. **Store in FAISS Vector Database**:
   - Use `OpenCLIPEmbeddings` as the embedding function to store the documents in a FAISS vector database.

### Explanation:

1. **Imports**:
   - `Document`, `FAISS`, and `OpenCLIPEmbeddings` from LangChain.
   - `glob` for retrieving image file paths.
   - `base64` for encoding images.

2. **Get Image File Paths**:
   - Use `glob.glob` to retrieve all `.jpg` files in the specified folder.

3. **Initialize List**:
   - Create an empty list `documents` to hold the encoded images and their metadata.

4. **Encode Image Function**:
   - The `encode_image` function reads the image file as bytes and encodes it to a base64 string.

5. **Create Documents**:
   - Loop through each image path, encode the image, and create a `Document` object with the encoded image and metadata.

6. **Create Vector Store**:
   - Use `OpenCLIPEmbeddings` to create the FAISS vector store from the documents.

This code will store the image vectors in a FAISS vector database, making them ready for retrieval and comparison. The explanation provided clarifies each step, ensuring a clear understanding of the process and its purpose.

In [None]:
# Import necessary libraries
from langchain.document import Document
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenCLIPEmbeddings
import glob
import base64

# Get all image file paths
image_folder_path = 'path/to/your/images'
image_file_paths = glob.glob(f"{image_folder_path}/*.jpg")

# Initialize an empty list to hold documents
documents = []

# Define a function to encode images
def encode_image(file_path):
    with open(file_path, 'rb') as image_file:
        image_bytes = image_file.read()
    return base64.b64encode(image_bytes).decode('utf-8')

# Loop through each image path and create a document
for file_path in image_file_paths:
    encoded_image = encode_image(file_path)
    metadata = {'image_path': file_path}
    doc = Document(page_content=encoded_image, metadata=metadata)
    documents.append(doc)

# Load the OpenCLIP embedding model
embedding_model = OpenCLIPEmbeddings()

# Create the FAISS vector store
vector_store = FAISS.from_documents(documents, embedding_model)

# Print the first document to verify
print(documents[0].page_content)
print(documents[0].metadata)

### 2.5 Comparing Images Semantically

Now that the data is stored in a vector database, the next step is to compare images to find the most similar sets. Here's how to do it:

1. **Turn Vector Store into a Retriever**:
   - Convert the vector store into a retriever to facilitate querying. Converting the vector store into a retriever means transforming the stored vectors into a format or system that allows efficient searching and querying. This enables you to quickly find and retrieve relevant vectors (e.g., images or documents) based on similarity or specific criteria, facilitating tasks like information retrieval and comparison.

   ```python
   # Prepare the vector store for querying
   retriever = vector_store.as_retriever()
   ```

2. **Retrieve Images**:
   - Use the retriever to get images by passing the encoded string of an image.
   - For example, pass the encoded string of the first cat image to retrieve the top four most similar images.

   ```python
   # Encode the first image to use as a query
   query_image_path = image_file_paths[0]
   query_image_encoded = encode_image(query_image_path)

   # Perform a query to find similar images
   results = retriever.invoke(query_image_encoded)

   # Print the top 4 results
   for result in results:
      print(result.page_content)
      print(result.metadata)
   ```


In [None]:
# Import necessary libraries
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenCLIPEmbeddings
from langchain.document import Document
import glob
import base64

# Get all image file paths
image_folder_path = 'path/to/your/images'
image_file_paths = glob.glob(f"{image_folder_path}/*.jpg")

# Initialize an empty list to hold documents
documents = []

# Define a function to encode images
def encode_image(file_path):
    with open(file_path, 'rb') as image_file:
        image_bytes = image_file.read()
    return base64.b64encode(image_bytes).decode('utf-8')

# Loop through each image path and create a document
for file_path in image_file_paths:
    encoded_image = encode_image(file_path)
    metadata = {'image_path': file_path}
    doc = Document(page_content=encoded_image, metadata=metadata)
    documents.append(doc)

# Load the OpenCLIP embedding model
embedding_model = OpenCLIPEmbeddings()

# Create the FAISS vector store
vector_store = FAISS.from_documents(documents, embedding_model)

# Prepare the vector store for querying
retriever = vector_store.as_retriever()

# Encode the first image to use as a query
query_image_path = image_file_paths[0]
query_image_encoded = encode_image(query_image_path)

# Perform a query to find similar images
results = retriever.invoke(query_image_encoded)

# Print the top 4 results
for result in results:
    print(result.page_content)
    print(result.metadata)

### 2.6 Challenge: Find the Dog Most Similar to a Cat

To find the dog that looks most similar to the cats using semantic embeddings, follow these steps:

1. **Prepare the Environment**:
   - Ensure all vectors are stored in the vector store.
   - Have the retriever ready to query the vector store.

   ```python
   # Prepare the vector store for querying
   retriever = vector_store.as_retriever()
   ```

2. **Get Dog Image Paths**:
   - Create a list of paths for the dog images.

   ```python
   # Separate dog and cat image paths
   dog_image_paths = [path for path in image_file_paths if 'dog' in path]
   cat_image_paths = [path for path in image_file_paths if 'cat' in path]
   ```

3. **Create a Dictionary for Mapping**:
   - Map the paths of the dogs to the paths of the cats in an inversely-weighted order.

4. **Retrieve Top Images for Each Dog**:
   - For each dog image, retrieve the top four most similar images based on the Base64 encoding of the dog image.
   - Initialize a counter (`cats_retrieved`) to zero.

   ```python
   # Function to find the dog most similar to cats
   def find_most_similar_dog(dog_paths, retriever):
      cat_scores = {}
      for dog_path in dog_paths:
         encoded_dog_image = encode_image(dog_path)
         results = retriever.invoke(encoded_dog_image)
         cats_retrieved = 0
         for i, result in enumerate(results):
               if 'cat' in result.metadata['image_path']:
                  cats_retrieved += (4 - i)
         cat_scores[dog_path] = cats_retrieved
      most_similar_dog = max(cat_scores, key=cat_scores.get)
      return most_similar_dog, cat_scores[most_similar_dog]
   ```

5. **Calculate Cat Scores**:
   - Loop through the enumerated list of returned documents.
   - Check if the word "cat" is in the source returned.
   - Add `4 - i` (inverse weight based on rank) to `cats_retrieved` if the image is a cat.

6. **Determine the Most Similar Dog**:
   - Attach the cat score to each dog image.
   - Identify the dog with the highest cat score.

   ```python
   # Find the dog most similar to cats
   most_similar_dog, score = find_most_similar_dog(dog_image_paths, retriever)

   # Print the result
   print(f"The dog most similar to cats is: {most_similar_dog} with a score of {score}")
   ```

### Explanation:

1. **Imports**:
   - `Document`, `FAISS`, and `OpenCLIPEmbeddings` from LangChain.
   - `glob` for retrieving image file paths.
   - `base64` for encoding images.

2. **Get Image File Paths**:
   - `glob.glob` retrieves all `.jpg` files in the specified folder.
   - Separate paths for dog and cat images.

3. **Initialize List**:
   - An empty list `documents` is created to hold the encoded images and their metadata.

4. **Encode Image Function**:
   - `encode_image` function reads the image file as bytes and encodes it to a base64 string.

5. **Create Documents**:
   - Loop through each image path, encode the image, and create a `Document` object with the encoded image and metadata.

6. **Create Vector Store**:
   - Use `OpenCLIPEmbeddings` to create the FAISS vector store from the documents.

7. **Prepare Retriever**:
   - Convert the vector store into a retriever.

8. **Find Most Similar Dog**:
   - For each dog image, retrieve the top four most similar images.
   - Calculate the cat score based on the presence of cat images in the results.
   - Identify the dog with the highest cat score.

In [None]:
# Import necessary libraries
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenCLIPEmbeddings
from langchain.document import Document
import glob
import base64

# Get all image file paths
image_folder_path = 'path/to/your/images'
image_file_paths = glob.glob(f"{image_folder_path}/*.jpg")

# Separate dog and cat image paths
dog_image_paths = [path for path in image_file_paths if 'dog' in path]
cat_image_paths = [path for path in image_file_paths if 'cat' in path]

# Initialize an empty list to hold documents
documents = []

# Define a function to encode images
def encode_image(file_path):
    with open(file_path, 'rb') as image_file:
        image_bytes = image_file.read()
    return base64.b64encode(image_bytes).decode('utf-8')

# Loop through each image path and create a document
for file_path in image_file_paths:
    encoded_image = encode_image(file_path)
    metadata = {'image_path': file_path}
    doc = Document(page_content=encoded_image, metadata=metadata)
    documents.append(doc)

# Load the OpenCLIP embedding model
embedding_model = OpenCLIPEmbeddings()

# Create the FAISS vector store
vector_store = FAISS.from_documents(documents, embedding_model)

# Prepare the vector store for querying
retriever = vector_store.as_retriever()

# Function to find the dog most similar to cats
def find_most_similar_dog(dog_paths, retriever):
    cat_scores = {}
    for dog_path in dog_paths:
        encoded_dog_image = encode_image(dog_path)
        results = retriever.invoke(encoded_dog_image)
        cats_retrieved = 0
        for i, result in enumerate(results):
            if 'cat' in result.metadata['image_path']:
                cats_retrieved += (4 - i)
        cat_scores[dog_path] = cats_retrieved
    most_similar_dog = max(cat_scores, key=cat_scores.get)
    return most_similar_dog, cat_scores[most_similar_dog]

# Find the dog most similar to cats
most_similar_dog, score = find_most_similar_dog(dog_image_paths, retriever)

# Print the result
print(f"The dog most similar to cats is: {most_similar_dog} with a score of {score}")

## Chapter 3: Multimodality

### What is Multimodality?

Multimodal AI applications deal with multiple types of data. The term comes from "multi" (many) and "modal" (types). Multimodal AI mimics human senses (sight, hearing, taste, touch, smell). The core idea behind multimodal AI applications is that they deal with multiple types of data. Multimodal AI is popular because it gives AI more human-like power. Humans have a multimodal interface with the world through senses like sight, hearing, taste, touch, and smell. In AI, the most emulated modalities are sight and hearing. While the term "multimodal" is debated, examples include images and text, images and audio, and video. These correspond to human senses like sight and sound. Machines process images and texts as different kinds of data, even though both relate to sight. Some multimodal data examples are easier to relate to human senses, while others are more contested. Examples include PDFs and CSVs, texts and tables, and tables and graphs. These examples fall into the sight category but need different machine learning models for processing.

- **Common Modalities in AI**:
  - Images and text
  - Images and audio
  - Video
  - PDFs and CSVs
  - Texts and tables
  - Tables and graphs

### 3.1. Introduction to Multimodal Embedding Models

Sure, here are the models with additional information for GPT-4o and LLaVa:

1. **CLIP (Contrastive Language-Image Pretraining)**:
   - Two encoders: one for images, one for text.
   - Trained on 400 million image-text pairs.
   - Contrastive learning aligns two modalities.
   - Encoders represent pairs closely and unpaired combos far apart.
   - Same embedding space and dimensionality for vectors.

2. **GPT-4o**:
   - Developed by OpenAI, GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer.
   - Released in May 2024, it can process and generate text, images, and audio.
   - Achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence, with significant improvements in multilingual, audio, and vision capabilities[1](https://openai.com/index/hello-gpt-4o/)[2](https://en.wikipedia.org/wiki/GPT-4o).
   - Supports over 50 languages and has a context length of 128k tokens[2](https://en.wikipedia.org/wiki/GPT-4o).
   - Designed for real-time interaction, it can respond to audio inputs in as little as 232 milliseconds[1](https://openai.com/index/hello-gpt-4o/).
   - Offers advanced voice-to-voice capabilities and real-time translation[2](https://en.wikipedia.org/wiki/GPT-4o).

3. **LLaVa**:
   - LLaVa (Large Language and Vision Assistant) is an open-source multimodal model.
   - Combines an image encoder with a large language model (LLM), fine-tuned on multimodal instruction-following data[3](https://huggingface.co/docs/transformers/main/model_doc/llava).
   - Developed by fine-tuning LLaMA/Vicuna on GPT-generated data, it achieves state-of-the-art performance across multiple benchmarks[3](https://huggingface.co/docs/transformers/main/model_doc/llava)[4](https://github.com/haotian-liu/LLaVA).
   - Uses a fully-connected vision-language cross-modal connector, making it powerful and data-efficient[3](https://huggingface.co/docs/transformers/main/model_doc/llava).
   - Capable of processing and generating text and images, and designed for tasks like visual question answering and image captioning[4](https://github.com/haotian-liu/LLaVA).

4. **DALL-E**:
   - Developed by OpenAI, DALL-E generates images from textual descriptions.
   - Uses a transformer model to understand and generate images based on text prompts.
   - Trained on a large dataset of text-image pairs.

5. **ALIGN (A Large-scale ImaGe and Noisy-text embedding)**:
   - Developed by Google, ALIGN uses a dual-encoder architecture similar to CLIP.
   - Trained on a large dataset of noisy image-text pairs from the web.
   - Aligns visual and textual representations in a shared embedding space.

6. **Florence**:
   - Developed by Microsoft, Florence is a unified image-text model.
   - Uses a transformer-based architecture to process and align images and text.
   - Trained on a diverse set of image-text pairs to perform various vision-language tasks.

7. **BLIP (Bootstrapping Language-Image Pre-training)**:
   - Developed by Salesforce, BLIP uses a bootstrapping approach to improve image-text alignment.
   - Trained on a large dataset of image-text pairs with a focus on improving the quality of generated captions and image retrieval.

### 3.2. Ways to Do Multimodal RAG

1. **Single Multimodal Model**: Handles multiple types of data.
   - Uses one multimodal embedding model to process multiple types of data, usually images and text.
   - Advantages: One vector store, one embedding model, consistent dimensionality for embeddings.
   - Example: CLIP model with image and text options.
   - Frameworks like LangChain or LlamaIndex can handle this for you.

2. **Multiple Models**:
   - Uses multiple embedding models and multiple search modes for different types of data.
   - Requires routing each type of data to the right model for storing and searching.
   - Separate vector stores for each model.
   - Vectors of the same size can be compared, but different models generate embeddings of different lengths.
   - Manual process for routing data.

3. **Combination Method**:
   - Uses multiple multimodal models.
   - Rarely used in practice.
   - Reasons to use: Handling many different types of data or re-ranking vector results.
   - Routes data to different models and vector stores based on data type.
   - Uses tagging system for re-ranking.

### 3.3. Embedding and Storing Data

**Review**:
- Using LangChain to get OpenCLIPEmbeddings.
- Storing vectors into FAISS.

**Process**:
1. Grab images and encode them into Base64 for the LLM.

  ```python
  # Function to encode image to Base64 string
  def encode_image(path):
      with open(path, "rb") as image_file:
          return base64.b64encode(image_file.read()).decode("utf-8")
  ```

2. Create documents from images.

  ```python
  # Get all image paths
  paths = glob.glob('../images/*.jpeg', recursive=True)

  # Create a list of documents with encoded images
  lc_docs = []
  for path in paths:
      doc = Document(
          page_content=encode_image(path),
          metadata={'source': path}
      )
      lc_docs.append(doc)
  ```

3. Use OpenCLIPEmbeddings with documents.

  ```python
  # Create a vector store from documents using OpenCLIPEmbeddings
  vector_store = FAISS.from_documents(lc_docs, embedding=OpenCLIPEmbeddings())
  ```

  4. Store embeddings into the FAISS vector database.

  ```python
  retriever = vector_store.as_retriever()
  ```

### 3.4. Query Images with Text

**Process**:

1. **Create a retriever object from the vector store**:
   - This step involves initializing a retriever from the vector store. The retriever is responsible for querying the vector store to find and retrieve relevant vectors (e.g., images or documents) based on similarity or specific criteria.

  ```python
  retriever = vector_store.as_retriever()
  ```

2. **Import `BytesIO` and `Images` for handling byte and image data**:
   - These imports are necessary for handling image data in memory. `BytesIO` allows you to work with image data as byte streams, and `PIL.Image` provides image processing capabilities.

  ```python
  from io import BytesIO
  from PIL import Image
  ```

3. **Create three functions**:
   - **Resizing Function**: This function takes a Base64 string, resizes the image, and returns it as a Base64-encoded string. This is useful for ensuring images are of a manageable size for processing and transmission.

  ```python
  # Function to resize Base64 image
  def resize_base64_image(base64_string, size=(128, 128)):
      # Decode the Base64 string
      img_data = base64.b64decode(base64_string)
      img = Image.open(BytesIO(img_data))

      # Resize the image
      resized_img = img.resize(size, Image.LANCZOS)

      # Save the resized image to a bytes buffer
      buffered = BytesIO()
      resized_img.save(buffered, format=img.format)

      # Encode the resized image to Base64
      return base64.b64encode(buffered.getvalue()).decode("utf-8")
  ```

   - **Base64 Check Function**: This function checks if a string is in Base64 format. It ensures that the data being processed is correctly encoded.

  ```python
  # Function to check if a string is Base64 encoded
  def is_base64(s):
      try:
          return base64.b64encode(base64.b64decode(s)) == s.encode()
      except Exception:
          return False
  ```

   - **Split Function**: This function splits image and text input, resizing images if necessary. It separates the data into images and texts for further processing.

  ```python
  # Function to split image and text types, resizing images if necessary
  def split_image_text_types(docs):
      images = []
      text = []
      for doc in docs:
          doc_content = doc.page_content  # Extract Document contents
          if is_base64(doc_content):
              # Resize image to avoid OAI server error
              images.append(resize_base64_image(doc_content))  # base64 encoded str
          else:
              text.append(doc_content)
      return {"images": images, "texts": text}
  ```

4. **Create a prompt for the multimodal RAG app using imports like `HumanMessage`, `RunnableLambda`, and `ChatOpenAI`**:
   - This step involves setting up the necessary imports and creating a function to format the data into prompts that the multimodal RAG app can understand and process.

```python
from langchain.prompts import HumanMessage
from langchain.output_parsers import StrOutputParser
from langchain.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI
```

5. **Make a function to split image and text data and format them into prompts**:
   - This function formats the input data (images and text) into a structured prompt that can be used by the language model to generate responses.

  ```python
  # Function to create prompt for the multimodal RAG app
  def prompt_func(data_dict):
      # Joining the context texts into a single string
      formatted_texts = "\n".join(data_dict["context"]["texts"])
      messages = []

      # Adding image(s) to the messages if present
      if data_dict["context"]["images"]:
          image_message = {
              "type": "image_url",
              "image_url": {
                  "url": f"data:image/jpeg;base64,{data_dict['context']['images'][0]}"
              },
          }
          messages.append(image_message)

      # Adding the text message for analysis
      text_message = {
          "type": "text",
          "text": (
              "As an animal lover, your task is to analyze and interpret images of cute animals. "
              "Please use your extensive knowledge and analytical skills to provide a "
              "summary that includes:\n"
              "- A detailed description of the visual elements in the image.\n"
              f"User-provided keywords: {data_dict['question']}\n\n"
              "Text and/or tables:\n"
              f"{formatted_texts}"
          ),
      }
      messages.append(text_message)

      return [HumanMessage(content=messages)]
  ```

6. **Use the foundational model (GPT-4o mini) and create a chain**:
   - This step involves initializing the foundational model and creating a processing chain that combines the retriever, prompt function, and the language model to handle the input data and generate responses.

  ```python
  foundation = ChatOpenAI(temperature=0, model="gpt-4o-mini", max_tokens=1024)

  # RAG pipeline
  chain = (
      {
          "context": retriever | RunnableLambda(split_image_text_types),
          "question": RunnablePassthrough(),
      }
      | RunnableLambda(prompt_func)
      | foundation
      | StrOutputParser()
  )
  ```

7. **Invoke the chain to query images with text (e.g., looking for a rottweiler in images)**:
   - This step demonstrates how to use the created chain to query the vector store with a combination of image and text data, retrieving and processing the most relevant results.

  ```python
  # Example usage: Querying for a rottweiler in images with context and question
  result = chain.invoke({
      "docs": ["<Base64_encoded_image_string>", "rottweiler"],
      "context": {"texts": ["This is a search for dog breeds."], "images": []},
      "question": "Find images of a rottweiler"
  })

  # Output the result
  print(result)

  # Retrieve documents based on text query "rottweiler"
  docs = retriever.invoke("rottweiler", k=4)  # dog 5
  for doc in docs:
      print(doc.metadata)

  # Retrieve documents based on encoded image of cat_1.jpeg
  docs = retriever.invoke(encode_image("../images/cat_1.jpeg"), k=4)  # cat 1
  for doc in docs:
      print(doc.metadata)

  # Retrieve documents based on text query "gray cat with long hair in a field"
  docs = retriever.invoke("gray cat with long hair in a field", k=4)  # cat 2
  for doc in docs:
      print(doc.metadata)

  # Retrieve documents based on text query "golden retriever playing with orange ball"
  docs = retriever.invoke("golden retriever playing with orange ball", k=4)  # dog 2
  for doc in docs:
      print(doc.metadata)

  # Retrieve documents based on text query "golden retriever in field with a sunny blurred background"
  docs = retriever.invoke("golden retriever in field with a sunny blurred background", k=4)  # dog 4
  for doc in docs:
      print(doc.metadata)
  ```

In [None]:
from langchain.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_experimental.open_clip import OpenCLIPEmbeddings
import glob
import base64

# Function to encode image to Base64 string
def encode_image(path):
    with open(path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# Get all image paths
paths = glob.glob('../images/*.jpeg', recursive=True)

# Create a list of documents with encoded images
lc_docs = []
for path in paths:
    doc = Document(
        page_content=encode_image(path),
        metadata={
            'source': path
        }
    )
    lc_docs.append(doc)

# Create a vector store from documents using OpenCLIPEmbeddings
vector_store = FAISS.from_documents(lc_docs, embedding=OpenCLIPEmbeddings())
retriever = vector_store.as_retriever()

from io import BytesIO
from PIL import Image

# Function to resize Base64 image
def resize_base64_image(base64_string, size=(128, 128)):
    # Decode the Base64 string
    img_data = base64.b64decode(base64_string)
    img = Image.open(BytesIO(img_data))

    # Resize the image
    resized_img = img.resize(size, Image.LANCZOS)

    # Save the resized image to a bytes buffer
    buffered = BytesIO()
    resized_img.save(buffered, format=img.format)

    # Encode the resized image to Base64
    return base64.b64encode(buffered.getvalue()).decode("utf-8")

# Function to check if a string is Base64 encoded
def is_base64(s):
    try:
        return base64.b64encode(base64.b64decode(s)) == s.encode()
    except Exception:
        return False

# Function to split image and text types, resizing images if necessary
def split_image_text_types(docs):
    images = []
    text = []
    for doc in docs:
        doc_content = doc.page_content  # Extract Document contents
        if is_base64(doc_content):
            # Resize image to avoid OAI server error
            images.append(resize_base64_image(doc_content))  # base64 encoded str
        else:
            text.append(doc_content)
    return {"images": images, "texts": text}

from langchain.prompts import HumanMessage
from langchain.output_parsers import StrOutputParser
from langchain.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI

# Function to create prompt for the multimodal RAG app
def prompt_func(data_dict):
    # Joining the context texts into a single string
    formatted_texts = "\n".join(data_dict["context"]["texts"])
    messages = []

    # Adding image(s) to the messages if present
    if data_dict["context"]["images"]:
        image_message = {
            "type": "image_url",
            "image_url": {
                "url": f"data:image/jpeg;base64,{data_dict['context']['images'][0]}"
            },
        }
        messages.append(image_message)

    # Adding the text message for analysis
    text_message = {
        "type": "text",
        "text": (
            "As an animal lover, your task is to analyze and interpret images of cute animals. "
            "Please use your extensive knowledge and analytical skills to provide a "
            "summary that includes:\n"
            "- A detailed description of the visual elements in the image.\n"
            f"User-provided keywords: {data_dict['question']}\n\n"
            "Text and/or tables:\n"
            f"{formatted_texts}"
        ),
    }
    messages.append(text_message)

    return [HumanMessage(content=messages)]

foundation = ChatOpenAI(temperature=0, model="gpt-4o-mini", max_tokens=1024)

# RAG pipeline
chain = (
    {
        "context": retriever | RunnableLambda(split_image_text_types),
        "question": RunnablePassthrough(),
    }
    | RunnableLambda(prompt_func)
    | foundation
    | StrOutputParser()
)

# Example usage: Querying for a rottweiler in images with context and question
result = chain.invoke({
    "docs": ["<Base64_encoded_image_string>", "rottweiler"],
    "context": {"texts": ["This is a search for dog breeds."], "images": []},
    "question": "Find images of a rottweiler"
})

# Output the result
print(result)

# Retrieve documents based on text query "rottweiler"
docs = retriever.invoke("rottweiler", k=4)  # dog 5
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on encoded image of cat_1.jpeg
docs = retriever.invoke(encode_image("../images/cat_1.jpeg"), k=4)  # cat 1
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on text query "gray cat with long hair in a field"
docs = retriever.invoke("gray cat with long hair in a field", k=4)  # cat 2
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on text query "golden retriever playing with orange ball"
docs = retriever.invoke("golden retriever playing with orange ball", k=4)  # dog 2
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on text query "golden retriever in field with a sunny blurred background"
docs = retriever.invoke("golden retriever in field with a sunny blurred background", k=4)  # dog 4
for doc in docs:
    print(doc.metadata)

In [None]:
from langchain.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_experimental.open_clip import OpenCLIPEmbeddings
import glob
import base64

# Function to encode image to Base64 string
def encode_image(path):
    with open(path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# Get all image paths
paths = glob.glob('../images/*.jpeg', recursive=True)

# Create a list of documents with encoded images
lc_docs = []
for path in paths:
    doc = Document(
        page_content=encode_image(path),
        metadata={
            'source': path
        }
    )
    lc_docs.append(doc)

# Create a vector store from documents using OpenCLIPEmbeddings
vector_store = FAISS.from_documents(lc_docs, embedding=OpenCLIPEmbeddings())
retriever = vector_store.as_retriever()

import base64
from io import BytesIO
from PIL import Image

# Function to resize Base64 image
def resize_base64_image(base64_string, size=(128, 128)):
    # Decode the Base64 string
    img_data = base64.b64decode(base64_string)
    img = Image.open(BytesIO(img_data))

    # Resize the image
    resized_img = img.resize(size, Image.LANCZOS)

    # Save the resized image to a bytes buffer
    buffered = BytesIO()
    resized_img.save(buffered, format=img.format)

    # Encode the resized image to Base64
    return base64.b64encode(buffered.getvalue()).decode("utf-8")

# Function to check if a string is Base64 encoded
def is_base64(s):
    try:
        return base64.b64encode(base64.b64decode(s)) == s.encode()
    except Exception:
        return False

# Function to split image and text types, resizing images if necessary
def split_image_text_types(docs):
    images = []
    text = []
    for doc in docs:
        doc_content = doc.page_content  # Extract Document contents
        if is_base64(doc_content):
            # Resize image to avoid OAI server error
            images.append(resize_base64_image(doc_content))  # base64 encoded str
        else:
            text.append(doc_content)
    return {"images": images, "texts": text}

from langchain.prompts import HumanMessage
from langchain.output_parsers import StrOutputParser
from langchain.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI

# Function to create prompt for the multimodal RAG app
def prompt_func(data_dict):
    # Joining the context texts into a single string
    formatted_texts = "\n".join(data_dict["context"]["texts"])
    messages = []

    # Adding image(s) to the messages if present
    if data_dict["context"]["images"]:
        image_message = {
            "type": "image_url",
            "image_url": {
                "url": f"data:image/jpeg;base64,{data_dict['context']['images'][0]}"
            },
        }
        messages.append(image_message)

    # Adding the text message for analysis
    text_message = {
        "type": "text",
        "text": (
            "As an animal lover, your task is to analyze and interpret images of cute animals. "
            "Please use your extensive knowledge and analytical skills to provide a "
            "summary that includes:\n"
            "- A detailed description of the visual elements in the image.\n"
            f"User-provided keywords: {data_dict['question']}\n\n"
            "Text and/or tables:\n"
            f"{formatted_texts}"
        ),
    }
    messages.append(text_message)

    return [HumanMessage(content=messages)]

foundation = ChatOpenAI(temperature=0, model="gpt-4o-mini", max_tokens=1024)

# RAG pipeline
chain = (
    {
        "context": retriever | RunnableLambda(split_image_text_types),
        "question": RunnablePassthrough(),
    }
    | RunnableLambda(prompt_func)
    | foundation
    | StrOutputParser()
)

# Example usage: Querying for a rottweiler in images with context and question
result = chain.invoke({
    "docs": ["<Base64_encoded_image_string>", "rottweiler"],
    "context": {"texts": ["This is a search for dog breeds."], "images": []},
    "question": "Find images of a rottweiler"
})

# Output the result
print(result)

# Retrieve documents based on text query "rottweiler"
docs = retriever.invoke("rottweiler", k=4)  # dog 5
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on encoded image of cat_1.jpeg
docs = retriever.invoke(encode_image("../images/cat_1.jpeg"), k=4)  # cat 1
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on text query "gray cat with long hair in a field"
docs = retriever.invoke("gray cat with long hair in a field", k=4)  # cat 2
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on text query "golden retriever playing with orange ball"
docs = retriever.invoke("golden retriever playing with orange ball", k=4)  # dog 2
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on text query "golden retriever in field with a sunny blurred background"
docs = retriever.invoke("golden retriever in field with a sunny blurred background", k=4)  # dog 4
for doc in docs:
    print(doc.metadata)

### Question:
How can you use LangChain to encode images, create a vector store, and retrieve similar documents based on text queries and encoded images?

### Solution Explanation:
1. **Imports**: Corrected the import statements to use `langchain` instead of `langchain_core`.
   ```python
   from langchain import Document, FAISS, OpenCLIPEmbeddings
   import glob
   import base64
   ```

2. **Encoding Function**: Encodes images to Base64 strings.
   ```python
   def encode_image(path):
       with open(path, "rb") as image_file:
           return base64.b64encode(image_file.read()).decode("utf-8")
   ```

3. **Document Creation**: Creates documents with encoded images and metadata.
   ```python
   paths = glob.glob('../images/*.jpeg', recursive=True)
   lc_docs = []
   for path in paths:
       doc = Document(
           page_content=encode_image(path),
           metadata={'source': path}
       )
       lc_docs.append(doc)
   ```

4. **Vector Store**: Creates a vector store from the documents using `OpenCLIPEmbeddings`.
   ```python
   vector_store = FAISS.from_documents(lc_docs, embedding=OpenCLIPEmbeddings())
   ```

5. **Retriever**: Initializes a retriever from the vector store.
   ```python
   retriever = vector_store.as_retriever()
   ```

6. **Queries**: Retrieves documents based on various text queries and encoded images, printing the metadata of the results.
   ```python
   # Retrieve documents based on text query "rottweiler"
   docs = retriever.invoke("rottweiler", k=4)
   for doc in docs:
       print(doc.metadata)

   # Retrieve documents based on encoded image of cat_1.jpeg
   docs = retriever.invoke(encode_image("../images/cat_1.jpeg"), k=4)
   for doc in docs:
       print(doc.metadata)

   # Retrieve documents based on text query "gray cat with long hair in a field"
   docs = retriever.invoke("gray cat with long hair in a field", k=4)
   for doc in docs:
       print(doc.metadata)

   # Retrieve documents based on text query "golden retriever playing with orange ball"
   docs = retriever.invoke("golden retriever playing with orange ball", k=4)
   for doc in docs:
       print(doc.metadata)

   # Retrieve documents based on text query "golden retriever in field with a sunny blurred background"
   docs = retriever.invoke("golden retriever in field with a sunny blurred background", k=4)
   for doc in docs:
       print(doc.metadata)
   ```

This solution demonstrates how to encode images, create a vector store, and retrieve similar documents using LangChain. The code includes functions for encoding images, creating documents, initializing a vector store, and querying the store to find similar images based on text descriptions or other encoded images.

In [None]:
from langchain_core.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_experimental.open_clip import OpenCLIPEmbeddings
import glob
import base64

# Function to encode image to Base64 string
def encode_image(path):
    with open(path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# Get all image paths
paths = glob.glob('../images/*.jpeg', recursive=True)

# Create a list of documents with encoded images
lc_docs = []
for path in paths:
    doc = Document(
        page_content=encode_image(path),
        metadata={
            'source': path
        }
    )
    lc_docs.append(doc)

# Create a vector store from documents using OpenCLIPEmbeddings
vector_store = FAISS.from_documents(lc_docs, embedding=OpenCLIPEmbeddings())
retriever = vector_store.as_retriever()

# Retrieve documents based on text query "rottweiler"
docs = retriever.invoke("rottweiler", k=4)  # dog 5
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on encoded image of cat_1.jpeg
docs = retriever.invoke(encode_image("../images/cat_1.jpeg"), k=4)  # cat 1
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on text query "gray cat with long hair in a field"
docs = retriever.invoke("gray cat with long hair in a field", k=4)  # cat 2
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on text query "golden retriever playing with orange ball"
docs = retriever.invoke("golden retriever playing with orange ball", k=4)  # dog 2
for doc in docs:
    print(doc.metadata)

# Retrieve documents based on text query "golden retriever in field with a sunny blurred background"
docs = retriever.invoke("golden retriever in field with a sunny blurred background", k=4)  # dog 4
for doc in docs:
    print(doc.metadata)
