# RAG exploration

Instead of being grounded in single content, what if we want the chat to generate content using knowledge from multiple sources? Now that we know how to deploy our app with RAG, let's integrate VS Code release notes as referenceable data using **RAG and modified hybrid search** for the chat to provide outputs with broader set of context.

Normally, we'd leverage proprietary data for RAG, but since this is a public demo, we'll use publicly available VS Code release notes to give users insights into recent updates, new features, and workflow improvements. The goal here isn't to showcase RAG itself, but to **demonstrate how RAG can be explored with different embedding models and completion models to optimize results**.

**So here is our plan:**
1. Use TF-IDF to **extract keywords** from user query
2. Conduct **full-text search** from the extracted keywords & get top 10 results
3. Conduct a **semantic search** over the result of the full-text search & get top 3 results - experimenting with embeddings models
4. **Generate answer** based on the results - experimenting with completions models

_(If you're familiar with RAG, you already know which part of the hybrid search was modified 😉)_

![outline](./outline.png)

Let's first try answering the question, `"What are recent features for Copilot chat in notebooks?"` using an off-the-shelf LLM model.

We can navigate to GitHub Marketplace to answer the question using one of the models through the built-in playground:

![gpt-4o-mini-answer](./gpt-4o-mini-answer.png)

The content seems great but quite generic. It also doesn't have any specific links to the features mentioned. Let's see how implementing RAG would help improve this -- and compare various AI models along the way.

## 📚 Load data

Since VS Code team uses a GitHub repo to manage the release notes, I used the GitHub API to fetch the release notes. Each release notes "document" can be long, so I knew I had to chunk the data, and chunk them in a way that preserves text segments with related context. Since the VS Code team manages the release notes in markdown format, I used a markdown parser ([LangChain's `markdown_header_metadata_splitter`](https://python.langchain.com/docs/how_to/markdown_header_metadata_splitter/)) to chunk each release notes so that release features can fit into the embeddings models that have much smaller token limits.

Let's load the data that we've saved.

> **Copilot (✨generate cell):** `Load release_notes.json as docs_contents`

In [2]:
import json

with open('release_notes.json', 'r') as file:
    release_notes = json.load(file)

release_notes[101]

# GitHub.copilot-chat/inline.request

{'content': "See what is new in the Visual Studio Code February 2017 Release (1.10)  \n### Ability to select and start a launch using keyboard  \nThe option to launch debug configurations using just the keyboard (no mouse gesture necessary) was added, as per [this request](https://github.com/microsoft/vscode/issues/16613). It works similarly to running tasks, with an ability to launch a debugging session from the **Command Palette**. The keyword `'debug '` or the command **Debug: Select and Start Debugging** from the **Command Palette** is used to select and launch a configuration from `launch.json`.  \n![launch](images/1_10/launch-keyboard.gif)",
 'url': 'https://code.visualstudio.com/updates/v1_10#_ability-to-select-and-start-a-launch-using-keyboard',
 'id': 121}

## 🔍 Hybrid search, modified for experimentation

Let's conduct a **_modified hybrid search_** that combines **full-text search** and **vector similarity search** to retrieve the most relevant documents. Where is it "modified"? If you haven't guessed already, instead of pre-generating embeddings for all of our documents and conducting the two search methods _simultaneously_, we're going to enable experimentation with various embeddings models by running the full-text and vector similarity search _consecutively_.

We'll first start with a full-text search based on keywords across all of our release notes. After obtaining the results, we'll generate embeddings for _only these results on the fly_ and then conduct a vector similarity search using those embeddings.

But I'm not sure which tool to use for the full-text search. Let's ask Copilot.

> **Copilot (side panel chat):** `Recommend a few options for lightweight and fast full-text search engine`

I like that MeiliSearch is open source and easy to deploy and use. This is perfect for the purposes of this demo. Let's use that.

You can manage your dev environment freely in Codespaces, so let's install MeiliSearch through the terminal and use the [self-hosted option](https://www.meilisearch.com/docs/learn/self_hosted/getting_started_with_self_hosted_meilisearch).

```bash
# Install Meilisearch
curl -L https://install.meilisearch.com | sh

# Launch Meilisearch
./meilisearch &
```

Now that we have MeiliSearch running, let's install the Python package (`pip install meilisearch`) and load our documents into MeiliSearch.

In [7]:
import meilisearch

ms_client = meilisearch.Client('http://127.0.0.1:7700')
ms_client.index('release').add_documents(release_notes)

TaskInfo(task_uid=8, index_uid='latest_release', status='enqueued', type='documentAdditionOrUpdate', enqueued_at=datetime.datetime(2024, 10, 10, 0, 51, 3, 768598))

Let's conduct a test search, and `Open Cell Output in Text Editor`

In [3]:
ms_client.index('release').search('copilot notebooks', {'limit': 20})['hits']

[{'content': 'Learn what is new in the Visual Studio Code June 2023 Release (1.80)  \n## Contributions to extensions  \n### GitHub Copilot  \nWe have introduced preview-only slash commands in the Chat view to help you create projects and notebooks and search for text in your workspace.  \n>**Note**: To get access to the Chat view, inline chat, and slash commands (for example `/search`, `/createWorkspace`), you need to install the [GitHub Copilot Chat](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot-chat) extension.  \n#### Create workspaces  \nYou can ask Copilot to create workspaces for popular project types with the `/createWorkspace` slash command. Copilot will first generate a directory structure for your request.  \n<video src="images/1_80/create-workspace-outline.mp4" autoplay loop controls muted title="Create workspace outline"></video>  \nYou can then use the **Create Workspace** button to create and open the project directory as a new workspace.  \n![Create 

## 📥 Retrieve documents

Since we're using natural language to query our results, let's use TF-IDF (Term Frequency - Inverse Document Frequency) to score words based on their relevance to a given document or query. This will help to find the most important words for the full-text search, which is keyword based.

### TF-IDF and full-text search

> **Copilot (✨ generate cell):** `Create a function that applies TF-IDF to extract op keywords given a sentence`

> **Copilot (inline):** `Exclude words like recent, new, feature`

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

def extract_top_keywords(query, documents, top_k=10):
    vectorizer = TfidfVectorizer(stop_words='english') # remove English words that don't carry significant meaning
    vectorizer.fit(documents) # fit the vectorizer on the documents

    tfidf_matrix = vectorizer.transform([query]) # transform the query to a TF-IDF matrix
    words = vectorizer.get_feature_names_out() # get the feature names (words)
    scores = tfidf_matrix.toarray().flatten() # get the scores for each word in the query
    
    # extract top keywords based on TF-IDF scores
    keyword_scores = dict(zip(words, scores))
    sorted_keywords = sorted(keyword_scores.items(), key=lambda x: x[1], reverse=True)
    
    # define words to exclude
    exclude_words = {'recent', 'new', 'feature', 'features', 'content', 'contents', 'release', 'releases', 'notes', 'note', 'updates', 'update'}
    
    # output top keywords
    top_keywords = [word for word, score in sorted_keywords if score > 0 and word not in exclude_words][:top_k]
    return ' '.join(top_keywords) # return keywords as a string


In [36]:
# Test: Print top keywords for a sample query
q = "What are recent features for Copilot chat in notebooks?"

documents = [doc['content'] for doc in release_notes if 'content' in doc] # only search over content of the release notes
top_keywords = extract_top_keywords(q, documents)
print(top_keywords)

copilot chat notebooks


Now let's write a function to conduct full-text search for the most relevant documents based on the query.

> **Copilot (✨ generate cell & `#kernel variable`):** `Using meilisearch and the extract_top_keywords function, write a function to conduct a full text search over only the content of #release_notes.`

In [38]:
def full_text_search(query, documents=release_notes, index_name='release', top_k=10):
    documents = [doc['content'] for doc in documents if 'content' in doc] # only search over content of the release notes
    top_keywords = extract_top_keywords(query, documents)

    result = ms_client.index(index_name).search(top_keywords, {'limit': top_k})['hits']
    return result

In [41]:
full_text_retrieved_docs = full_text_search(q)
full_text_retrieved_docs

[{'content': 'Learn what is new in the Visual Studio Code June 2023 Release (1.80)  \n## Contributions to extensions  \n### GitHub Copilot  \nWe have introduced preview-only slash commands in the Chat view to help you create projects and notebooks and search for text in your workspace.  \n>**Note**: To get access to the Chat view, inline chat, and slash commands (for example `/search`, `/createWorkspace`), you need to install the [GitHub Copilot Chat](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot-chat) extension.  \n#### Create workspaces  \nYou can ask Copilot to create workspaces for popular project types with the `/createWorkspace` slash command. Copilot will first generate a directory structure for your request.  \n<video src="images/1_80/create-workspace-outline.mp4" autoplay loop controls muted title="Create workspace outline"></video>  \nYou can then use the **Create Workspace** button to create and open the project directory as a new workspace.  \n![Create 

### Vector similarity search

Now let's conduct a vector similarity search using FAISS. We'll first copy over getting started code for using  embeddings models from [GitHub Marketplace](https://github.com/marketplace/models).

This is especially convenient with our workflow, because **GitHub token is automatically injected into Codespaces**.

In [50]:
import os

from azure.ai.inference import EmbeddingsClient
from azure.core.credentials import AzureKeyCredential

endpoint = "https://models.inference.ai.azure.com"

embeddings_client = EmbeddingsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(os.environ["GITHUB_TOKEN"])
    # credential=AzureKeyCredential(os.environ["AZURE_TOKEN"])
)

def generate_embeddings(text, model="text-embedding-3-small"):
    response = embeddings_client.embed(
        input=[text],
        model=model
    )

    return response.data[0].embedding

> **Copilot (comment in code cell generates code):** `# Create a function to conduct vector similarity search using FAISS.`

In [45]:
import faiss
import numpy as np

def faiss_search(query_embedding, doc_embeddings, top_k=3):
    # Convert document embeddings into a numpy array
    embeddings_matrix = np.array(doc_embeddings)
    
    # Build FAISS index
    dim = embeddings_matrix.shape[1]
    index = faiss.IndexFlatL2(dim)  # Using L2 (Euclidean) distance
    index.add(embeddings_matrix)

    # Perform the search with FAISS
    _, indices = index.search(np.array([query_embedding]), top_k)

    return indices.flatten()

Let's create a function to retrieve most relevant documents using our modified hybrid search.

<!-- > **Copilot (✨ generate):** `Create a function to retrieve relevant documents. Given a query and documents, first conduct full_text_search. Then, generate embeddings using the generate_embeddings function over the result of the full_text_search. Lastly, conduct faiss_search using those embeddings and the embeddings of the query and output a list of documents.` -->

In [43]:
def retrieve_and_embed_docs(query, documents=release_notes, embeddings_model="text-embedding-3-small", top_k=3):
    full_text_results = full_text_search(query) # full text search using TF-IDF & meilisearch

    # extract relevant document embeddings from meilisearch results
    relevant_texts = []
    doc_embeddings = []
    urls = []
    for hit in full_text_results:
        doc_id = hit['id']
        doc = next((item for item in documents if item.get('id') == doc_id), None)
        if 'content' in doc:
            relevant_texts.append(doc['content'])
            content_embeddings = generate_embeddings(doc['content'], model=embeddings_model)
            doc_embeddings.append(content_embeddings)
            urls.append(doc['url'])
    
    # vector search using FAISS
    query_embedding = generate_embeddings(query, model=embeddings_model)
    faiss_indices = faiss_search(query_embedding, doc_embeddings, top_k)
    
    # combine results
    combined_results = []
    for i in faiss_indices:
        combined_results.append({
            "content": relevant_texts[i],
            "url": urls[i]
        })

    return combined_results

In [52]:
retrieved_docs = retrieve_and_embed_docs(q)
retrieved_docs

[{'content': 'Learn what is new in the Visual Studio Code September 2024 Release (1.94)  \n### Attach variables in notebook chat  \nWhen you use Copilot in a notebook, you can now attach variables from the Jupyter kernel in your requests. Adding variables gives you more precise control over the context for your chat request, so that you get more relevant responses from Copilot.  \nEither type `#`, followed by the variable name, or use the 📎 control (`kb(workbench.action.chat.attachContext)`) in Inline Chat to add a context variable.  \n<video src="images/1_94/notebook-kernel-variable.mp4" title="Attach a context variable by using `#` in a notebook chat request" autoplay loop controls muted></video>',
  'url': 'https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat'},
 {'content': 'Learn what is new in the Visual Studio Code March 2024 Release (1.88)  \n### GitHub Copilot  \n#### Inline Chat improvements  \nInline Chat now starts as a floating control, making it 

### Compare between embeddings models

Let's test with a different embeddings model to see if we get different results.

In [51]:
retrieve_and_embed_docs(q, embeddings_model="cohere-embed-v3-english")

[{'content': 'Learn what is new in the Visual Studio Code June 2023 Release (1.80)  \n### Chat audio cues  \nThere are now audio cues for the [GitHub Copilot](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot) chat experience and can be enabled via `audioCues.chatRequestSent`, `audioCues.chatResponsePending`, and `audioCues.chatResponseReceived`.',
  'url': 'https://code.visualstudio.com/updates/v1_80#_chat-audio-cues'},
 {'content': 'Learn what is new in the Visual Studio Code September 2024 Release (1.94)  \n### Accept and run generated code in notebook  \nWhen you use Copilot Inline Chat to generate code in a notebook, you can now accept and directly run the generated code from Inline Chat.  \n<video src="images/1_94/notebook-accept-run.mp4" title="Accept and run generated code directly from Inline Chat" autoplay loop controls muted></video>',
  'url': 'https://code.visualstudio.com/updates/v1_94#_accept-and-run-generated-code-in-notebook'},
 {'content': 'Learn what

## 🧠 Generate answer

I think we're ready to try out our RAG system! Remember the answer to our question, `"What are recent features for Copilot chat in notebooks?"` using an LLM model straight out of the box? You can also experiment with it here just as you did in the Marketplace playground.

In [None]:
# Copy the code snippet from GH marketplace
from openai import OpenAI
import os

gpt_client = OpenAI(
    base_url="https://models.inference.ai.azure.com",
    api_key=os.getenv("GITHUB_TOKEN")
    # api_key=os.getenv("AZURE_TOKEN")
)    

system_message = """
You are a social assistant who writes creative content. You will politely decline any other requests from the user not related to creating content. Don't talk about a single VS Code release and don't talk about release dates at all. Instead, only talk about the relevant features. Don't include made up links, but do provide real links to the VS Code release notes for specific features. You format all your responses as Markdown unless otherwise specified. Avoid wrapping your entire response in a markdown code element.
"""

In [20]:
messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": f"Create a tweet sized content to answer the following question: {q}"}
]

response = gpt_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    temperature=0.3,
    max_tokens=1500, # Dynamically set max_tokens based on the combined length of the docs?
    top_p=1.0
)

print(response.choices[0].message.content)

🚀 Exciting updates for Copilot Chat in notebooks! Now you can enjoy enhanced code suggestions, improved context awareness, and seamless integration for a smoother coding experience. Check out the latest features here: [VS Code Release Notes](https://code.visualstudio.com/updates) #VSCode #CopilotChat


Now let's give it the context we got from the retrieved documents and generate the answer again.

In [22]:
def generate_llm_answer(question, context, completion_model="gpt-4o-mini"):
    # Combine the relevant documents into a single context
    context_text = " ".join([doc['content'] for doc in context if doc.get('content')])
    context_url = ", ".join([doc['url'] for doc in context if doc.get('url')])

    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": f"Create a tweet sized content on the following context: {context_text}. In your answer, always include the following URLs from the content sources: {context_url}. Question: {question}"}
    ]
    
    response = gpt_client.chat.completions.create(
        model=completion_model,
        messages=messages,
        temperature=0.3,
        max_tokens=1500, # Dynamically set max_tokens based on the combined length of the docs?
        top_p=1.0
    )

    answer = response.choices[0].message.content
    return answer

In [101]:
final_answer = generate_llm_answer(q, retrieved_docs)
print(final_answer)

🚀 Exciting updates in VS Code! Now you can attach variables in notebook chats with Copilot for more precise context. Just type `#` followed by the variable name or use the 📎 control! 🎉 Check it out: [Attach Variables](https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat) #VSCode #GitHubCopilot

For more on Copilot features, explore:  
- [March 2024 Release](https://code.visualstudio.com/updates/v1_88#_github-copilot)  
- [May 2024 Release](https://code.visualstudio.com/updates/v1_90#_github-copilot)


### Compare between completions models

Let's test with a different completions models to compare the different responses we get from our query (we'll keep the search results constant for this comparison).

In [102]:
final_answer = generate_llm_answer(q, retrieved_docs, completion_model="Mistral-small")
print(final_answer)

🚀 New in VS Code 1.94! Attach variables from the Jupyter kernel in your Copilot chat requests for more relevant responses. Use `#` or the 📎 control. Learn more: https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat

And in VS Code 1.88, the kernel state is now automatically included as context in Inline Chat for notebooks. This lets Copilot use the current state of the notebook to provide more relevant completions. Learn more: https://code.visualstudio.com/updates/v1_88#_notebook-kernel-state-as-context


In [103]:
final_answer = generate_llm_answer(q, retrieved_docs, completion_model="meta-llama-3-8b-instruct")
print(final_answer)

Here's a tweet-sized summary of the recent features for Copilot chat in notebooks:

"New in VS Code! 🚀 Attach variables from Jupyter kernel in notebook chat with Copilot. Add context with `#` or 📎 control. Get more precise control over chat requests and relevant responses. Learn more: https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat"


## 🚀 Test combinations of AI models for the RAG system

Use [GitHub Marketplace](https://github.com/marketplace/models) to find and experiment with AI models. Replace `embedidings_model` and `completion_model` names found in the marketplace:

```python
q = "What are recent features for Copilot chat in notebooks?"
retrieved_docs = retrieve_and_embed_docs(q, embeddings_model="text-embedding-3-small")
final_answer = generate_llm_answer(q, retrieved_docs, completion_model="gpt-4o-mini")
print(final_answer)
```