# Creating a Simple RAG (Retrieval-Augmented Generation)

## What is RAG? 

- Aka Retrieval Augmented Generation: Is an industry pattern standard for building applications that use Large Language Models (LLMs) to reason over specific or proprietary data not already known by LLMs. Simply put, RAG enhances the capabilities of LLMs by combining retrieval mechanisms with generation mechanisms

## Why create a RAG?

### Key intentions and Benefits of RAG:
1. ## Enhance knowledge context
    - Access to external information: 
    - Context Relevance: 

2. ### Improved Response quality:
    - Accuracy and specificity: 
    - Diversity of information:

3. ### Scalability and adaptability:
    - Dynamic updates: RAG systems can easily incorporate new data and information without requiring extensive retraining of the underlying language  model. This makes it adaptable to new information and rapidly changing fields.
    - Domain specific knowledge: By indexing domain-specific documents, RAG systems can be tailored to provide expertise in particular areas, such as medical, legal, technical, financial, or other specialized domains.

4. ### Efficiency and performance:
    - Reduced model size requirements: RAG can reduce the need for for extremely LLMs since the retrieval mechanism can offload some of the knowledge storage and processing to the retrieval system.
    - Focused Computation: Instead of generating responses solely from the model's internal knowledge, RAG narrows down the relevant information, making the generation process more focused and efficient.

5. ### Use cases and applications
    1. Question answering: RAG is great for Q&A systems where specific accurate answers are required
    2. Customer support: Chat-bots capabilitIes to provide precise information by retrieving relevant documentation or FAQs
    3. Research assistance: Retrieving pertinent literature and generation summaries or insights based on the latest studies
    4. Content creation: Supporting writers and content creators by retrieving related content and generating high-quality text based on that information
    

## How RAG Works?

1. Query processing 
    - A user query is processed to generate an embedding or vector representation of the query
2. Document Retrieval:
    - The query embedding is used to retrieve relevant documents or passages from the vector database using similarity search techniques.
3. Contextual Generation:
    - The retrieved documents are combined with the original query to form a rich context.
    - The language model generates a response based on this context, leveraging both the retrieved information and its pre-trained knowledge.


## Example Workflow

1. Input query: "What are the latest advancements in AI for healthcare?"
2. Retrieval Step: Retrieve documents on AI advancements in healthcare from a knowledge base or database.
3. Generation step: Use the retrieved documents to generate a detailed and accurate response. 


# Creating a simple Retrieval-Augmented 

Lets build a simple RAG system

Creating a RAG system with OpenAI/others involves several steps: 
- Preparing your data
- Setting up vector database
- Generating embeddings
- Retrieving relevant documents
- Generating responses

## Step 1: Preparing your data

Collect and pre-process your data


In [7]:
documents = [
    "AI is transforming healthcare by enabling doctors to diagnose diseases more accurately.",
    "Machine learning algorithms can predict patient outcomes and suggest personalized treatments.",
    "Natural language processing is being used to analyze medical records and improve patient care.",
]

## Step 2: Set Up Embedding Generation

Use OpenAI's GPT-3 or other model to generate embeddings for your documents. You'll need OpenAI's API key for this.

In [None]:
pip install --upgrade openai numpy

In [26]:
from openai import OpenAI
import numpy as np
import os

# openai.api = os.getenv("OPENAI_API_KEY")
client = OpenAI()

def get_embedding(text, model="text-embedding-ada-002"):
    response = client.embeddings.create(input=[text], model=model).data[0].embedding
    return response

# Generate embeddings for each document

embeddings = [get_embedding(doc) for doc in documents]
embeddings = np.array(embeddings)

print(embeddings)

[[-0.01953129  0.02010844  0.01069696 ... -0.00245125 -0.01942636
   0.00184131]
 [-0.01801853  0.00049462  0.02333334 ... -0.00171759 -0.01698149
   0.00535695]
 [-0.02200545  0.02710145  0.01247619 ...  0.00448152 -0.0134349
  -0.0170124 ]]


## Step 3: Setting up vector database

For simplicity, we will use FAISS, a library for efficient similarity search and clustering of dense vectors

In [None]:
!pip install pynndescent

In [27]:
import pynndescent

# Create a nearest neighbors index

index = pynndescent.NNDescent(embeddings, metric='cosine')

# Find the nearest neighbors of the first document

query_embedding = embeddings[0].reshape(1, -1)
indices, distances = index.query(query_embedding, k=5)

# Print the nearest neighbors

print("Indices", indices)
print("Distances", distances)

  warn(


Indices [[0 1 2 2 2]]
Distances [[0.         0.11787299 0.12762608 1.         1.        ]]


## Step 4: Define the Retrieval Function
Create a function to retrieve the most relevant documents based on a query.

In [28]:
def retrieve(query, k=3):
    query_embedding = np.array([get_embedding(query)])
    query_embedding = embeddings[0].reshape(1, -1)
    indices, distances = index.query(query_embedding, k=k)
    return [documents[i] for i in indices[0]]

## Step 5: Define the Generation Function
Create a function to generate responses using the retrieved documents as context.

## Step 6: Test the RAG System

In [38]:
def generate_response(query):
    retrieved_docs = retrieve(query)
    context = " ".join(retrieved_docs)
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"}
    ]
    
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        max_tokens=150,
        temperature=0.7
    )
    
    return response.choices[0].message.content.strip()

# Test the RAG system
query = "How is AI used in healthcare?"
response = generate_response(query)
print(response)

AI is utilized in healthcare in various ways, such as enabling doctors to diagnose diseases more accurately, predicting patient outcomes using machine learning algorithms, suggesting personalized treatments, and analyzing medical records to improve patient care through natural language processing. Additionally, AI is also being used for tasks like drug discovery, robotic surgeries, remote patient monitoring, and personalized medicine.


In [None]:
# Install packages

pip install --upgrade openai numpy pynndescent

In [None]:
#!/usr/bin/env python3

from openai import  OpenAI
import numpy as np
import pynndescent

# 1. Set the OpenAI API key
openai = OpenAI()

# 2. Document sources
documents = [
    "AI is transforming healthcare by enabling doctors to diagnose diseases more accurately.",
    "Machine learning algorithms can predict patient outcomes and suggest personalized treatments.",
    "Natural language processing is being used to analyze medical records and improve patient care.",
]

# 3. Define a function to get embeddings
def get_embedding(text, model="text-embedding-ada-002"):
    response = client.embeddings.create(input=[text], model=model).data[0].embedding
    return response

# 4. Generate embeddings for each document
embeddings = np.array([get_embedding(doc) for doc in documents])

# 5. Create a nearest neighbors index using pynndescent
index = pynndescent.NNDescent(embeddings, metric='cosine')

# 6. Define a function to retrieve relevant documents
def retrieve(query, k=3):
    query_embedding = np.array([get_embedding(query)])
    indices, distances = index.query(query_embedding, k=k)
    return [documents[i] for i in indices[0]]

# 7. Define a function to generate a response using OpenAI's Chat API
def generate_response(query):
    retrieved_docs = retrieve(query)
    context = " ".join(retrieved_docs)
    messages = [
        {"role": "system", "content": "You are a Healthcare AI expert."},
        {"role": "user", "content": f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"}
    ]
    
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        max_tokens=150,
        temperature=0.7
    )
    
    return response.choices[0].message.content.strip()

# 8. Test the RAG system
queries = [
    "How is AI used in healthcare?",
    "What are the applications of AI in healthcare?"
]

for query in queries:
    response = generate_response(query)
    print(f"Query: {query}\nResponse: {response}\n")

### Explanation:

1. **Imports and Setup**:
   - Import necessary modules: `openai`, `numpy`, and `pynndescent`.
   - Initialize OpenAI

2. **Document Sources**:
   - Define a list of documents that will be used as the source material for the RAG system.

3. **Embedding Generation**:
   - Define a function `get_embedding` to generate embeddings using OpenAI's embedding model.
   - The function calls OpenAI's `embedding.create` and returns the embedding for the given text.

4. **Generate Embeddings**:
   - Generate embeddings for each document in the list using the `get_embedding` function and store them in a numpy array.

5. **Simple vector (Replace with actual vector database): Nearest Neighbors Index**:
   - Create a nearest neighbors index using `pynndescent` for efficient document retrieval based on embeddings.

6. **Document Retrieval**:
   - Define a function `retrieve` to find and return the most relevant documents for a given query.
   - The function uses the `index.query` method to find the nearest neighbors of the query embedding.

7. **Response Generation**:
   - Define a function `generate_response` to generate a response using OpenAI's Chat API.
   - The function combines the retrieved documents into a context and constructs a prompt for the chat model.
   - It calls `openai.chat.completion.create` to generate a response.

8. **Testing the RAG System**:
   - Test the RAG system with a list of queries and print the responses.

# RAG (Retrieval-Augmented Generation) Architecture

```mermaid
graph TD;
    A[Set OpenAI API Key] --> B[Document Sources];
    B --> C[Define Function to Get Embeddings];
    C --> D[Generate Embeddings for Each Document];
    D --> E[Create Nearest Neighbors Index Using pynndescent];
    E --> F[Define Function to Retrieve Relevant Documents];
    F --> G[Define Function to Generate a Response Using OpenAI's Chat API];
    G --> H[Test the RAG System];

    subgraph Retrieve-Generate Cycle
        F --> G
    end

    subgraph Testing Queries
        H --> I[How is AI used in healthcare?];
        H --> J[What are the applications of AI in healthcare?];
    end

    I --> K[Generate Response for Query];
    J --> K[Generate Response for Query];

    K --> L[Print Response];


In [None]:
pip install --upgrade openai numpy pynndescent portkey_ai

In [11]:
from openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

gateway = OpenAI(
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    base_url="http://localhost:8787/v1", # Or http://localhost:8787/v1 when running locally
    default_headers=createHeaders(
        provider="anthropic",
        # api_key="port_key" # Grab from https://app.portkey.ai # Not needed when running locally
)

chat_complete = gateway.chat.completions.create(
    model="claude-3-sonnet-20240229",
    messages=[{"role": "user", "content": "What is the best way to build a RAG system and streamline the process?"}],
    max_tokens=512
)

print(chat_complete.choices[0].message.content.strip())

Building a Retrieval-Augmented Generation (RAG) system involves integrating an information retrieval component with a language generation model. The retrieval component aims to find relevant information from a large corpus of text, while the generation model uses this retrieved information to generate natural language responses. Here's a general approach to building a RAG system and streamlining the process:

1. **Data Preparation**:
   - Acquire or create a large corpus of text data relevant to your domain or task.
   - Clean and preprocess the text data, including tasks like tokenization, normalization, and filtering.
   - Index the text data using an efficient information retrieval system like Apache Lucene, Elasticsearch, or a custom solution.

2. **Retrieval Component**:
   - Implement a retrieval mechanism that can efficiently query the indexed corpus and retrieve relevant passages or documents based on the input query or context.
   - Explore techniques like sparse retrieval (e.