# Implementing RAG

## Loading RAG Data into FAISS Vector Store

In [1]:
import os
import json

def load_json_files(directory):
    """
    Load all JSON files from the specified directory.

    Args:
        directory (str): The path to the directory containing the JSON files.

    Returns:
        list: A list of dictionaries containing the data from all JSON files.
    """
    data = []
    for filename in os.listdir(directory):
        if filename.endswith(".json"):
            file_path = os.path.join(directory, filename)
            with open(file_path, 'r', encoding='utf-8') as f:
                data.append(json.load(f))
    return data

def extract_text(data):
    """
    Extract text data from the JSON structure, supporting two formats.

    Args:
        data (list): A list of dictionaries containing JSON data.

    Returns:
        list: A list of dictionaries with URLs or file paths and their corresponding text chunks.
    """
    documents = []
    for entry in data:
        for key, value in entry.items():
            if isinstance(value, list):  # First format with URL keys
                for chunk in value:
                    documents.append({"source": key, "text": chunk})
            elif isinstance(value, dict):  # Second format with file paths as keys
                if "text" in value:
                    documents.append({"source": key, "text": value["text"]})
    return documents

# Directory containing the JSON files
directory = "RAG_data"

# Load and preprocess data
json_data = load_json_files(directory)
documents = extract_text(json_data)

# Example output
print(f"Loaded {len(documents)} documents.")
print("Sample document:", documents[0])

Loaded 30011 documents.
Sample document: {'source': 'https://github.com/ray-project/llm-numbers#1-mb-gpu-memory-required-for-1-token-of-output-with-a-13b-parameter-model', 'text': 'Skip to content Navigation Menu Toggle navigation Sign in Product GitHub Copilot Write better code with AI Security Find and fix vulnerabilities Actions Automate any workflow Codespaces Instant dev environments Issues Plan and track work Code Review Manage code changes Discussions Collaborate outside of code Code Search Find more, search less Explore All features Documentation GitHub Skills Blog Solutions By company size Enterprises Small and medium teams Startups By use case DevSecOps DevOps CI/CD View all use cases By industry Healthcare Financial services Manufacturing Government View all industries View all solutions Resources Topics AI DevOps Security Software Development View all Explore Learning Pathways White papers, Ebooks, Webinars Customer Stories Partners Open Source GitHub Sponsors Fund open sou

## Generating Embeddings

We will be utilizing the text-embedding-005 embedding model through the Vertex AI API

In [3]:
from google.cloud import aiplatform
import vertexai
from google.auth import load_credentials_from_file
from langchain_google_vertexai import VertexAIEmbeddings

In [4]:
credentials, project_id = load_credentials_from_file("./GSuite Text Extraction Creds/vertex_ai_key.json")
vertexai.init(credentials=credentials,project="90458358443", location="us-central1")

In [20]:
embedding_model = VertexAIEmbeddings(model_name="text-embedding-005")

In [None]:
texts = [doc["text"] for doc in documents]
embeddings = embedding_model.embed(texts)

## Loading Data into ChromaDB Vector Store

We first need to initialize the ChromaDB vector store. The data will persist in the "chroma" directory

In [10]:
import chromadb
from chromadb.config import Settings

In [11]:
persistent_client = chromadb.PersistentClient()
collection = persistent_client.get_or_create_collection("llm_tutor_collection")

Now we can load the documents and generated embeddings into the llm_tutor_collection in our ChromaDB instance

In [12]:
for doc, embedding in zip(documents, embeddings):
        collection.add(
            ids=[doc["source"]],
            documents=[doc["text"]],
            metadatas=[{"source": doc["source"]}],
            embeddings=[embedding]
        )

Insert of existing embedding ID: https://github.com/ray-project/llm-numbers#1-mb-gpu-memory-required-for-1-token-of-output-with-a-13b-parameter-model
Add of existing embedding ID: https://github.com/ray-project/llm-numbers#1-mb-gpu-memory-required-for-1-token-of-output-with-a-13b-parameter-model
Insert of existing embedding ID: https://github.com/ray-project/llm-numbers#1-mb-gpu-memory-required-for-1-token-of-output-with-a-13b-parameter-model
Add of existing embedding ID: https://github.com/ray-project/llm-numbers#1-mb-gpu-memory-required-for-1-token-of-output-with-a-13b-parameter-model
Insert of existing embedding ID: https://github.com/ray-project/llm-numbers#1-mb-gpu-memory-required-for-1-token-of-output-with-a-13b-parameter-model
Add of existing embedding ID: https://github.com/ray-project/llm-numbers#1-mb-gpu-memory-required-for-1-token-of-output-with-a-13b-parameter-model
Insert of existing embedding ID: https://github.com/ray-project/llm-numbers#1-mb-gpu-memory-required-for-1-to

## Loading in Generative LLM

In [17]:
from vertexai.generative_models import GenerativeModel, ChatSession

In [18]:
gemini_model = GenerativeModel("gemini-1.5-pro")

In [19]:
chat_session = gemini_model.start_chat()

def get_chat_response(chat: ChatSession, prompt: str) -> str:
    text_response = []
    responses = chat.send_message(prompt, stream=True)
    for chunk in responses:
        text_response.append(chunk.text)
    return "".join(text_response)

## Setting Up RAG methods

In [33]:
def retrieve_relevant_documents(query, n_results=5):
    """
    Retrieve the most relevant documents from the ChromaDB vector store.
    
    Args:
        query (str): The user's question or query.
        collection (Collection): The ChromaDB collection object.
        n_results (int): Number of results to retrieve.
    
    Returns:
        str: Concatenated text of the top retrieved documents.
    """
    # Generate embedding for the query using Gemini model
    query_embedding = embedding_model.embed([query])[0]

    # Retrieve top documents
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results
    )

    # Combine text from the retrieved documents
    retrieved_text = " ".join(doc[0] for doc in results["documents"])
    return retrieved_text

In [36]:
def query(question):
    # Retrieve relevant documents
    retrieved_text = retrieve_relevant_documents(question)

    # Generate response using Gemini model
    system_prompt = "You are an LLM tutor for a Graduate Student taking a course in LLM and Deep Learning System Performance. The student asks you the following question: "
    response = get_chat_response(chat_session, system_prompt +  question + " \n\nHere is context for answering the question:" + retrieved_text)
    return response

### Querying RAG Pipeline

In [38]:
question = "I want to build a deep learning model for image classification. What are some best practices for training deep learning models?"
response = query(question)
print(response)

Okay, so your student wants to dive into image classification with deep learning. That's a great area to explore! Here's a breakdown of best practices for training deep learning models, specifically tailored for their question and building on the concepts of epochs and batch size:

**1. Data is King (and Queen!):**

* **Quantity & Quality Matter:**  Deep learning models thrive on data.  For image classification, aim for a large and diverse dataset of labeled images. More data generally leads to better generalization.
* **Cleanliness is Key:**  Garbage in, garbage out, as they say. Preprocess your images to handle variations in lighting, size, orientation, and noise. This ensures the model learns relevant features and not irrelevant artifacts.
* **Split Wisely:** Divide your dataset into training, validation, and test sets. 
    * **Training Set:**  Used to directly train the model's parameters.
    * **Validation Set:**  Used during training to monitor performance, tune hyperparameters