<a href="https://colab.research.google.com/github/anshupandey/AI_Agents/blob/main/AAP_UC3_code_retrieval_augmented_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Use Retrieval Augmented Generation (RAG) with Codey APIs

### Objective

This notebook demonstrates how you augment output from Codey APIs by bringing in external knowledge. We'll show you an example using Code Retrieval Augmented Generation(RAG) pattern using [Google Cloud's Generative AI github repository](https://github.com/GoogleCloudPlatform/generative-ai) as external knowledge.The notebook uses [Vertex AI PaLM API for Code](https://cloud.google.com/vertex-ai/docs/generative-ai/code/code-models-overview), [Embeddings for Text API](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings), FAISS vector store and [LangChain 🦜️🔗](https://python.langchain.com/en/latest/).

### Overview

Here is overview of what we'll go over.

Index Creation:

1. Recursively list the files(.ipynb) in github repo
2. Extract code and markdown from the files
3. Chunk & generate embeddings for each code strings and add initialize the vector store

Runtime:

4. User enters a prompt or asks a question as a prompt
5. Try zero-shot prompt
6. Run prompt using RAG Chain & compare results.To generate response we use **code-bison** however can also use **code-gecko** and **codechat-bison**

### Cost

This tutorial uses billable components of Google Cloud:

- Vertex AI PaLM APIs offered by Google Cloud

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

**Note:** We are using local vector store(FAISS) for this example however recommend managed highly scalable vector store for production usage such as [Vertex AI Matching Engine](https://cloud.google.com/vertex-ai/docs/vector-search/overview) or [AlloyDB for PostgreSQL](https://cloud.google.com/alloydb/docs/ai/work-with-embeddings) or [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres/features)  using pgvector extension.

### Install libraries

In [1]:
!pip install --upgrade --user -q google-cloud-aiplatform langchain==0.0.332 faiss-cpu==1.7.4

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.1/5.1 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.5/56.5 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.2/49.2 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[0m

### Restart runtime

In [2]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, you will need to authenticate your environment. To do this, run the cell below. This step is not required if you are using Vertex AI Workbench.

In [1]:
import sys

if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Import libraries

In [2]:
from typing import List
import nbformat
import requests
import time

# LangChain
from langchain.llms import VertexAI
from langchain.embeddings import VertexAIEmbeddings

from langchain.schema.document import Document

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.text_splitter import Language
from langchain.vectorstores import FAISS

from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

# Vertex AI
from google.cloud import aiplatform
import vertexai

print(f"Vertex AI SDK version: {aiplatform.__version__}")

Vertex AI SDK version: 1.60.0


In [3]:
# Initialize project
# Define project information
PROJECT_ID = "jrproject-402905"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

vertexai.init(project=PROJECT_ID, location=LOCATION)

# Code Generation
code_llm = VertexAI(
    model_name="code-bison@002",
    max_output_tokens=2048,
    temperature=0.1,
    verbose=False,
)

Next we need to create a GitHub personal token to be able to list all files in a repository.

- Follow [this link](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) to create GitHub token with repo->public_repo scope and update `GITHUB_TOKEN` variable below.

In [4]:
# provide GitHub personal access token
GITHUB_TOKEN = "xxxxxxxxxxxxxxxxxx"  # @param {type:"string"}
GITHUB_REPO = "anshuspandey/generative-ai"  # @param {type:"string"}

# Index Creation

We will be using the Google Cloud Generative AI github repository as the data source. First list all Jupyter Notebook files in the repo and store it in a text file.

You can skip this step(#1) if you have executed it once and generated the output text file.

### 1. Recursively list the files(.ipynb) in the github repository

In [5]:
# Crawls a GitHub repository and returns a list of all ipynb files in the repository
def crawl_github_repo(url: str, is_sub_dir: bool, access_token: str = GITHUB_TOKEN):
    ignore_list = ["__init__.py"]

    if not is_sub_dir:
        api_url = f"https://api.github.com/repos/{url}/contents"

    else:
        api_url = url

    headers = {
        "Accept": "application/vnd.github.v3+json",
        "Authorization": f"Bearer {access_token}",
    }

    response = requests.get(api_url, headers=headers)
    response.raise_for_status()  # Check for any request errors

    files = []

    contents = response.json()

    for item in contents:
        if (
            item["type"] == "file"
            and item["name"] not in ignore_list
            and (item["name"].endswith(".py") or item["name"].endswith(".ipynb"))
        ):
            files.append(item["html_url"])
        elif item["type"] == "dir" and not item["name"].startswith("."):
            sub_files = crawl_github_repo(item["url"], True)
            time.sleep(0.1)
            files.extend(sub_files)

    return files

In [6]:
code_files_urls = crawl_github_repo(GITHUB_REPO, False, GITHUB_TOKEN)

# Write list to a file so you do not have to download each time
with open("code_files_urls.txt", "w") as f:
    for item in code_files_urls:
        f.write(item + "\n")

len(code_files_urls)

206

In [7]:
code_files_urls[0:10]

['https://github.com/GoogleCloudPlatform/generative-ai/blob/main/conversation/data-store-status-checker/data_store_checker.ipynb',
 'https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/embedding-similarity-visualization.ipynb',
 'https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro-textemb-vectorsearch.ipynb',
 'https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro_embeddings_tuning.ipynb',
 'https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro_multimodal_embeddings.ipynb',
 'https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/use-cases/outlier-detection/bq-vector-search-log-outlier-detection.ipynb',
 'https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/vector-search-quickstart.ipynb',
 'https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/chat-completions/intro_chat_completions_api.ipynb',
 'https://github.com/GoogleCloudPla

### 2. Extract code from the Jupyter notebooks.

You could also include .py file, shell scripts etc.

In [8]:
# Extracts the python code from an ipynb file from github
def extract_python_code_from_ipynb(github_url, cell_type="code"):
    raw_url = github_url.replace("github.com", "raw.githubusercontent.com").replace(
        "/blob/", "/"
    )

    response = requests.get(raw_url)
    response.raise_for_status()  # Check for any request errors

    notebook_content = response.text

    notebook = nbformat.reads(notebook_content, as_version=nbformat.NO_CONVERT)

    python_code = None

    for cell in notebook.cells:
        if cell.cell_type == cell_type:
            if not python_code:
                python_code = cell.source
            else:
                python_code += "\n" + cell.source

    return python_code


def extract_python_code_from_py(github_url):
    raw_url = github_url.replace("github.com", "raw.githubusercontent.com").replace(
        "/blob/", "/"
    )

    response = requests.get(raw_url)
    response.raise_for_status()  # Check for any request errors

    python_code = response.text

    return python_code

In [9]:
with open("code_files_urls.txt") as f:
    code_files_urls = f.read().splitlines()
len(code_files_urls)

206

In [10]:
code_strings = []

for i in range(0, len(code_files_urls)):
    if code_files_urls[i].endswith(".ipynb"):
        content = extract_python_code_from_ipynb(code_files_urls[i], "code")
        doc = Document(
            page_content=content, metadata={"url": code_files_urls[i], "file_index": i}
        )
        code_strings.append(doc)

  validate(nb)


### 3. Chunk & generate embeddings for each code strings & initialize the vector store

We need to split code into usable chunks that the LLM can use for code generation. Therefore it's crucial to use the right chunking approach and chunk size.

In [11]:
# Utility functions for Embeddings API with rate limiting
def rate_limit(max_per_minute):
    period = 60 / max_per_minute
    print("Waiting")
    while True:
        before = time.time()
        yield
        after = time.time()
        elapsed = after - before
        sleep_time = max(0, period - elapsed)
        if sleep_time > 0:
            print(".", end="")
            time.sleep(sleep_time)


class CustomVertexAIEmbeddings(VertexAIEmbeddings):
    requests_per_minute: int
    num_instances_per_batch: int

    # Overriding embed_documents method
    def embed_documents(self, texts: List[str]):
        limiter = rate_limit(self.requests_per_minute)
        results = []
        docs = list(texts)

        while docs:
            # Working in batches because the API accepts maximum 5
            # documents per request to get embeddings
            head, docs = (
                docs[: self.num_instances_per_batch],
                docs[self.num_instances_per_batch :],
            )
            chunk = self.client.get_embeddings(head)
            results.extend(chunk)
            next(limiter)

        return [r.values for r in results]

In [12]:
# Chunk code strings
text_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, chunk_size=2000, chunk_overlap=200
)


texts = text_splitter.split_documents(code_strings)
print(len(texts))

# Initialize Embedding API
EMBEDDING_QPM = 100
EMBEDDING_NUM_BATCH = 5
embeddings = CustomVertexAIEmbeddings(
    requests_per_minute=EMBEDDING_QPM,
    num_instances_per_batch=EMBEDDING_NUM_BATCH,
    model_name="textembedding-gecko@latest",
)

# Create Index from embedded code chunks
db = FAISS.from_documents(texts, embeddings)

# Init your retriever.
retriever = db.as_retriever(
    search_type="similarity",  # Also test "similarity", "mmr"
    search_kwargs={"k": 5},
)

retriever

1029
Waiting
.............................................................................................................................................................................................................

VectorStoreRetriever(tags=['FAISS', 'CustomVertexAIEmbeddings'], vectorstore=<langchain.vectorstores.faiss.FAISS object at 0x794e93687250>, search_kwargs={'k': 5})

# Runtime
### 4. User enters a prompt or asks a question as a prompt

In [13]:
user_question = "Create a Python function that takes a prompt and predicts using langchain.llms interface with Vertex AI text-bison model"

In [14]:
# Define prompt templates


# Zero Shot prompt template
prompt_zero_shot = """
    You are a proficient python developer. Respond with the syntactically correct & concise code for to the question below.

    Question:
    {question}

    Output Code :
    """

prompt_prompt_zero_shot = PromptTemplate(
    input_variables=["question"],
    template=prompt_zero_shot,
)


# RAG template
prompt_RAG = """
    You are a proficient python developer. Respond with the syntactically correct code for to the question below. Make sure you follow these rules:
    1. Use context to understand the APIs and how to use it & apply.
    2. Do not add license information to the output code.
    3. Do not include Colab code in the output.
    4. Ensure all the requirements in the question are met.

    Question:
    {question}

    Context:
    {context}

    Helpful Response :
    """

prompt_RAG_template = PromptTemplate(
    template=prompt_RAG, input_variables=["context", "question"]
)

qa_chain = RetrievalQA.from_llm(
    llm=code_llm,
    prompt=prompt_RAG_template,
    retriever=retriever,
    return_source_documents=True,
)

### 5. Try zero-shot prompt

In [15]:
response = code_llm.predict(text=user_question, max_output_tokens=2048, temperature=0.1)
print(response)

```python
def predict_with_langchain_llms(prompt):
    """Predicts the next token using the langchain.llms interface with Vertex AI text-bison model.

    Args:
        prompt: The input text to predict the next token for.

    Returns:
        The predicted next token.
    """

    # Import the necessary libraries.
    import requests

    # Set the endpoint and API key.
    endpoint = "https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/us-central1/models/text-bison-001:predict"
    api_key = "YOUR_API_KEY"

    # Set the request body.
    body = {
        "instances": [
            {
                "text": prompt,
            }
        ]
    }

    # Make the request.
    response = requests.post(endpoint, json=body, headers={"Authorization": f"Bearer {api_key}"})

    # Get the predicted next token.
    predicted_token = response.json()["predictions"][0]["text"]

    # Return the predicted next token.
    return predicted_token
```


### 6. Run prompt using RAG Chain & compare results
To generate response we use code-bison however can also use code-gecko and codechat-bison

In [16]:
results = qa_chain({"query": user_question})
print(results["result"])

```python
from langchain_google_vertexai import ChatVertexAI

def predict_with_langchain_llms(prompt):
  """Predicts using langchain.llms interface with Vertex AI text-bison model.

  Args:
    prompt: The prompt to predict on.

  Returns:
    The prediction from the model.
  """

  # Initialize the ChatVertexAI client.
  chat = ChatVertexAI(model="text-bison-001")

  # Predict the response.
  response = chat.invoke([HumanMessage(content=prompt)])

  # Return the prediction.
  return response.content


if __name__ == "__main__":
  # Get the user prompt.
  prompt = input("Enter your prompt: ")

  # Predict the response.
  prediction = predict_with_langchain_llms(prompt)

  # Print the prediction.
  print(f"Prediction: {prediction}")

```


### Let's try another prompt

In [17]:
user_question = "Create python function that takes text input and returns embeddings using LangChain with Vertex AI textembedding-gecko model"


response = code_llm.predict(text=user_question, max_output_tokens=2048, temperature=0.1)
print(response)

```python
def get_embeddings_langchain(text):
    """Gets embeddings for a given text using LangChain with Vertex AI textembedding-gecko model.

    Args:
        text: The text to get embeddings for.

    Returns:
        A list of embeddings for the text.
    """

    # Import the necessary libraries.
    import google.cloud.aiplatform as aiplatform

    # Get the model.
    model = aiplatform.gapic.ModelServiceClient.get_model(
        name="projects/YOUR_PROJECT_ID/locations/YOUR_LOCATION/models/YOUR_MODEL_ID"
    )

    # Get the model's input and output tensor names.
    input_tensor_name = model.input_tensor_names[0]
    output_tensor_name = model.output_tensor_names[0]

    # Create the input instance.
    input_instance = {"inputs": text}

    # Get the embeddings.
    response = aiplatform.gapic.PredictionServiceClient.predict(
        name=model.name, instances=[input_instance]
    )

    # Get the embeddings from the response.
    embeddings = response.predictions[0][output

In [18]:
results = qa_chain({"query": user_question})
print(results["result"])

```python
from langchain.llms import VertexAI
from langchain.embeddings import VertexAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA

# Initialize Vertex AI SDK
vertexai.init(project=PROJECT_ID, location=LOCATION)

# Load documents from GCS
loader = GCSDirectoryLoader(
    project_name=PROJECT_ID, bucket="contractunderstandingatticusdataset"
)
documents = loader.load()

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(documents)

# Create text embeddings using Vertex AI
embedding = VertexAIEmbeddings()
contracts_vector_db = Chroma.from_documents(docs, embedding)

# Create retriever
retriever = contracts_vector_db.as_retriever(
    search_type="similarity", search_kwargs={"k": 2}
)

# Create LLM
llm = VertexAI(
    model_name="text-bison-32k",
    max_output_tokens=256,
    