#### Retrieval-Augmented Generation (RAG)
1. Construct a vector database from document embeddings.
2. Retrieve information from vector database and constrcut the prompting context for the LLM.
3. Get the LLM responses with and without RAG given the original user prompt.

#### Setting Up the LLM and the Embedding Model

In [19]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
from transformers import AutoModelForCausalLM, AutoTokenizer

In [20]:
# Step 1: Instantiate the Embedding model from the Hugging Face library
# Configure Settings for embeddings
Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
Settings.llm = None
Settings.chunk_size = 512  # Maximum size of text chunks for embedding
Settings.chunk_overlap = 32  # Overlap between chunks to preserve context

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

LLM is explicitly disabled. Using MockLLM.


In [21]:
# Step 2: Instantiate the LLM from the Hugging Face library
model_name = "Qwen/Qwen2.5-1.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=False,  # Disable any remote execution for security
    revision="main",         # Use the latest version
    device_map="cuda:0"      # Specify GPU usage for faster inference
)

In [22]:
# Step 3: Instantiate the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

#### Construct the Vector Database

#### Step 1: Read the documents

In [25]:
def read_documents(folder_path, file_extension=[".pdf", ".docx"],
                   chunk_size=512, chunk_overlap = 64):
  Settings.chunk_size = chunk_size
  Settings.chunk_overlap = chunk_overlap
  # TODO 1: invoke the SimpleDirectoryReader class to read the documents under
  # "folder_path" of certain extensions.
  documents = SimpleDirectoryReader(folder_path,
                                    required_exts=file_extension).load_data()


  # TODO 2: return the readed documents and the number of documents read
  return documents, len(documents)

In [26]:
documents, num_documents = read_documents("sample_data")
print(f"Number of documents read: {num_documents}")

Number of documents read: 4


#### Step 2: Constructing the vector database

In this step, we write help functions to the construct vector database using the embedding model and the documents we read.

1. Use VectorStoreIndex class to setup an in-memory vector database
2. Convert the vector database as an retriever for user interaction
3. return the vector database and the retriever for later query

In [27]:
def build_vector_database(documents, top_k):
    # Step 1: Build the vector database
    # Use VectorStoreIndex to construct a vector database from the input documents
    from llama_index.core import VectorStoreIndex
    index = VectorStoreIndex.from_documents(documents)

    # Step 2: Configure the retriever
    # Use VectorIndexRetriever to set up the retrieval logic, specifying the number of top_k documents to return
    from llama_index.core.retrievers import VectorIndexRetriever
    retriever = VectorIndexRetriever(
        index=index,      # Pass the constructed VectorStoreIndex as 'index'
        top_k=top_k       # Specify the number of most relevant documents to retrieve
    )

    # Return the vector database and retriever
    return index, retriever

In [28]:
TOP_K = 2
index, retriever = build_vector_database(documents, top_k=TOP_K)

#### Query the Vector Database

### Step 1: Querying the vector database

In this step, we write a function that uses the document retriever that accepts the user prompt as input and returns the top-K relevant document chunks, then return these documents as contexts.

1. Assemble the query engine by completing the following todos.
2. The function accepts a user prompt and then ask the query engine for retrieving the top-K similar documents.
3. Return the text chunks of the top-K documents as additional contexts to be used in later prompting.

Use the following example query: \
 Query: **"What are the documents about?"**

Does the query engine and the retriever return the relevant context based on this query?

In [29]:
def database_query(query, retriever, top_k):
    """
    Query the vector database and retrieve relevant document chunks as context.
    """
    # Step 1: Use RetrieverQueryEngine to construct the query engine
    from llama_index.core.query_engine import RetrieverQueryEngine
    query_engine = RetrieverQueryEngine(retriever=retriever)

    # Step 2: Execute the query and retrieve the response
    response = query_engine.query(query)

    # Step 3: Extract the top_k document chunks from the response's source_nodes
    context = ""
    for i, node in enumerate(response.source_nodes[:top_k]):
        context += node.node.text + "\n"  # Concatenate text chunks with line breaks

    return context

In [30]:
query = "What are the documents about?"
context = database_query(query, retriever, top_k=TOP_K)
print(context)

5 Free Books on Machine Learning Algorithms You Must Read

Gain insights on machine learning algorithms through practical code examples, detailed diagrams, mathematical explanations, hands-on exercises, and real-world projects.



Image by Author



If you are a machine learning student, researcher, or practitioner, it is crucial for your career growth to have a deep understanding of how each algorithm works and the various techniques to enhance model performance. Nowadays, many individuals tend to focus solely on the code, data, and pre-trained models, often without fully comprehending the machine learning model's algorithm or architecture. They simply fine-tune the model on a new dataset and adjust hyperparameters to improve performance. However, to truly excel in building your own model and advancing AI technology to the level of systems like ChatGPT, you must start with the basics, delving into linear algebra and mastering the fundamentals using Python libraries.

In this blog, we 

-----

#### Step 2: Using the contexts as part of the LLM prompting
In this part, we will actually use the document chunks retrieved from the vector database retriever as the additional context to the prompt.

1. Invoke the prompt_with_context function and print an example prompt with the user query and the retrieved context.
2. Invoke the get_llm_response function get an response from the LLM on the following sample query **WITH** and **WITHOUT** the retrieved context: \
Query: **"What are the documents about?"**



In [31]:
# Testing without context
user_query = "What are the documents about?"
empty_context = ""

In [32]:
# Case 1: Generate prompt and response without context
prompt_without_context = prompt_with_context(empty_context, user_query)
print("Prompt without context:\n", prompt_without_context)
print('=====================================================')
response_without_context = get_llm_response(empty_context, user_query)
print("Response without context:\n", response_without_context)

Prompt without context:
 
Context: 
Please respond to the following user comment. Use the context above if it is helpful.
User comments: What are the documents about?

Response without context:
 I'm sorry, I didn't understand your question. Could you please provide more information or rephrase your question?


In [33]:
# Case 2: Generate prompt and response with context
context = database_query(user_query, retriever, top_k=TOP_K)

prompt_with_context_result = prompt_with_context(context, user_query)
print("Prompt with context:\n", prompt_with_context_result)
print('=====================================================')
response_with_context = get_llm_response(context, user_query)
print("Response with context:\n", response_with_context)

Prompt with context:
 
Context: 5 Free Books on Machine Learning Algorithms You Must Read

Gain insights on machine learning algorithms through practical code examples, detailed diagrams, mathematical explanations, hands-on exercises, and real-world projects.



Image by Author



If you are a machine learning student, researcher, or practitioner, it is crucial for your career growth to have a deep understanding of how each algorithm works and the various techniques to enhance model performance. Nowadays, many individuals tend to focus solely on the code, data, and pre-trained models, often without fully comprehending the machine learning model's algorithm or architecture. They simply fine-tune the model on a new dataset and adjust hyperparameters to improve performance. However, to truly excel in building your own model and advancing AI technology to the level of systems like ChatGPT, you must start with the basics, delving into linear algebra and mastering the fundamentals using Pyth

1. `chunk_size`: Controls the size of each document chunk in terms of characters.
* Larger Values: Includes more content per chunk, making it more comprehensive but potentially less focused.
* Smaller Values: Divides text into smaller chunks, which can increase granularity but may result in loss of context.
2. `chunk_overlap`: Determines the overlap between adjacent chunks.
* Larger Values: Preserves context across chunks but may introduce redundancy.
* Smaller Values: Reduces redundancy but risks losing connections between chunks.
3. `top_k`: Number of top-relevant document chunks returned.
* Larger Values: Provides more context but might include irrelevant information.
* Smaller Values: Focuses on the most relevant chunks but risks omitting critical context.


In [34]:
# Define parameters for the experiment
chunk_sizes = [256, 512, 1024]
chunk_overlaps = [32, 64, 128]
top_ks = [1, 3, 5]

# Example query
user_query = "What are the documents about?"

# Iterate through each parameter and record results
results = []

for chunk_size in chunk_sizes:
    for chunk_overlap in chunk_overlaps:
        for top_k in top_ks:
            # Update the Settings for chunk size and overlap
            Settings.chunk_size = chunk_size
            Settings.chunk_overlap = chunk_overlap

            # Retrieve context using the database_query function
            context = database_query(user_query, retriever, top_k)

            # Save the result
            results.append({
                "chunk_size": chunk_size,
                "chunk_overlap": chunk_overlap,
                "top_k": top_k,
                "context": context
            })


In [35]:
for result in results:
    print(f"Chunk Size: {result['chunk_size']}, Chunk Overlap: {result['chunk_overlap']}, Top K: {result['top_k']}")
    print("Context Output:\n", result['context'])
    print("=" * 50) 

Chunk Size: 256, Chunk Overlap: 32, Top K: 1
Context Output:
 5 Free Books on Machine Learning Algorithms You Must Read

Gain insights on machine learning algorithms through practical code examples, detailed diagrams, mathematical explanations, hands-on exercises, and real-world projects.



Image by Author



If you are a machine learning student, researcher, or practitioner, it is crucial for your career growth to have a deep understanding of how each algorithm works and the various techniques to enhance model performance. Nowadays, many individuals tend to focus solely on the code, data, and pre-trained models, often without fully comprehending the machine learning model's algorithm or architecture. They simply fine-tune the model on a new dataset and adjust hyperparameters to improve performance. However, to truly excel in building your own model and advancing AI technology to the level of systems like ChatGPT, you must start with the basics, delving into linear algebra and masteri

#### Come up with TWO new prompts querying more specific information about machine learning. 

a. Prompt 1: Information included in the documents

In [36]:
# Prompt 1: Information included in the documents
query_1 = "What are some datasets used for machine learning?"
context_1 = database_query(query_1, retriever, top_k=TOP_K)

# Generate prompt with context
prompt_with_context_1 = prompt_with_context(context_1, query_1)
# print("Prompt with context for Query 1:\n", prompt_with_context_1)

# Get LLM response with and without context
response_with_context_1 = get_llm_response(context_1, query_1)
print("Response with context for Query 1:\n", response_with_context_1)
print("======================================================" )
response_without_context_1 = get_llm_response("", query_1)
print("Response without context for Query 1:\n", response_without_context_1)

Response with context for Query 1:
 Some popular datasets used for machine learning include:

- **Boston House Prices**: A classic dataset for regression tasks, useful for practicing various regression techniques.
  
- **Stroke Prediction Dataset**: Ideal for building classification models, particularly logistic regression, random forests, or neural networks.

These datasets provide a good starting point for beginners and intermediate learners alike, offering a range of challenges from simple regression problems to complex classification tasks.
Response without context for Query 1:
 To provide an accurate response, I would need more specific information about what type of dataset you're interested in or which field of machine learning you're asking about (e.g., image recognition, natural language processing). However, here's a general answer based on common applications:

Machine learning often uses various types of datasets depending on the problem being solved. Some popular datasets 

b. Prompt 2: Information NOT included in the documents

In [39]:
# Prompt 2: Information NOT included in the documents
query_2 = "What are two common optimizers used in machine learning?"
context_2 = database_query(query_2, retriever, top_k=TOP_K)

# Generate prompt with context
prompt_with_context_2 = prompt_with_context(context_2, query_2)
# print("Prompt with context for Query 2:\n", prompt_with_context_2)

# Get LLM response with and without context
response_with_context_2 = get_llm_response(context_2, query_2)
print("Response with context for Query 2:\n", response_with_context_2)
print("======================================================" )
response_without_context_2 = get_llm_response("", query_2)
print("Response without context for Query 2:\n", response_without_context_2)

Response with context for Query 2:
 Two common optimizers used in machine learning include Stochastic Gradient Descent (SGD) and Adam. SGD updates the weights based on the gradient of the loss function with respect to the current parameters, while Adam uses adaptive learning rates for different parts of the weight space. Both methods help minimize the cost function during training.
Response without context for Query 2:
 Two common optimizers used in machine learning are Stochastic Gradient Descent (SGD) and Adam. SGD updates model parameters based on the gradient of the loss function with respect to each training example, while Adam uses adaptive learning rates for each parameter. Both methods aim to minimize the loss function during training.


RAG enhances LLM outputs by grounding responses in retrieved-context, but it cannot eliminate hallucination due to the inherent limitations of LLMs and the retrieval process. LLMs generate text probabilistically, meaning that when the provided context is insufficient or ambiguous, they may fabricate plausible-sounding information to fill in gaps. Additionally, suppose the retrieved context lacks necessary details or specificity. In that case, the LLM is more likely to rely on its pre-trained internal knowledge, which might not align with the query or the context. This over-reliance can result in the model prioritizing internal associations over the retrieved content, especially in scenarios where the query is highly complex or unclear, further contributing to hallucination.

In [48]:
# Prompt to cause hallucination
query_hallucination = "What does 'Hands-On Machine Learning with R' suggest as the best practice for training a neural network on Mars colony data?"
context_hallucination = database_query(query_hallucination, retriever, top_k=TOP_K)

# Generate prompt with context
prompt_hallucination = prompt_with_context(context_hallucination, query_hallucination)
print("Prompt with context for Hallucination Query:\n", prompt_hallucination)
print("======================================================" )
# Get LLM response
response_hallucination = get_llm_response(context_hallucination, query_hallucination)
print("Response with context for Hallucination Query:\n", response_hallucination)
print("======================================================" )
# Report retrieved context
print("Retrieved Context for Hallucination Query:\n", context_hallucination)

Prompt with context for Hallucination Query:
 
Context: 5 Free Books on Machine Learning Algorithms You Must Read

Gain insights on machine learning algorithms through practical code examples, detailed diagrams, mathematical explanations, hands-on exercises, and real-world projects.



Image by Author



If you are a machine learning student, researcher, or practitioner, it is crucial for your career growth to have a deep understanding of how each algorithm works and the various techniques to enhance model performance. Nowadays, many individuals tend to focus solely on the code, data, and pre-trained models, often without fully comprehending the machine learning model's algorithm or architecture. They simply fine-tune the model on a new dataset and adjust hyperparameters to improve performance. However, to truly excel in building your own model and advancing AI technology to the level of systems like ChatGPT, you must start with the basics, delving into linear algebra and mastering the