# RAG exploration

Generated content from the app is great but seems quite generic. I want to augment it with relevant features from the VS Code release notes where users can get more context.

## Get content

I know that VS Code team uses a GitHub repo to manage the release notes. I can use the GitHub API to fetch the release notes.

**`Copilot: Use the github api to recursively fetch all md files from a GH repo's directory. Use the GH_TOKEN from my .env file`**

In [84]:
import os
import requests
from dotenv import load_dotenv

load_dotenv()

REPO_OWNER = 'microsoft'
REPO_NAME = 'vscode-docs'

def get_markdown_files_from_github(path=''):
    url = f'https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}/contents/{path}'
    headers = {
        'Authorization': f'token {os.getenv("GH_TOKEN")}'
    }
    response = requests.get(url, headers=headers)
    markdown_files = []

    if response.status_code == 200:
        files = response.json()

        for file in files:
            if file['type'] == 'dir':
                markdown_files += get_markdown_files_from_github(file['path'])
            elif file['type'] == 'file' and file['name'].endswith('.md'):
                markdown_files.append(file['path'])

        return markdown_files
    else:
        raise Exception(f"Error fetching repo contents: {response.status_code}")

In [95]:
docs_files = get_markdown_files_from_github("release-notes")
print(docs_files)

['release-notes/July_2016.md', 'release-notes/June_2016.md', 'release-notes/May_2016.md', 'release-notes/v0_3_0.md', 'release-notes/v0_5_0.md', 'release-notes/v0_7_0.md', 'release-notes/v0_8_0.md', 'release-notes/v0_9_0.md', 'release-notes/v1_10.md', 'release-notes/v1_11.md', 'release-notes/v1_12.md', 'release-notes/v1_13.md', 'release-notes/v1_14.md', 'release-notes/v1_15.md', 'release-notes/v1_16.md', 'release-notes/v1_17.md', 'release-notes/v1_18.md', 'release-notes/v1_19.md', 'release-notes/v1_20.md', 'release-notes/v1_21.md', 'release-notes/v1_22.md', 'release-notes/v1_23.md', 'release-notes/v1_24.md', 'release-notes/v1_25.md', 'release-notes/v1_26.md', 'release-notes/v1_27.md', 'release-notes/v1_28.md', 'release-notes/v1_29.md', 'release-notes/v1_30.md', 'release-notes/v1_31.md', 'release-notes/v1_32.md', 'release-notes/v1_33.md', 'release-notes/v1_34.md', 'release-notes/v1_35.md', 'release-notes/v1_36.md', 'release-notes/v1_37.md', 'release-notes/v1_38.md', 'release-notes/v1_39.

Great, now I want to get the contents of the release notes documents I just fetched.

**`Copilot: Use the github api to fetch the contents of a file from a GH repo`**

In [130]:
def get_file_content(file_path):
    url = f'https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}/contents/{file_path}'
    headers = {
        'Authorization': f'token {os.getenv("GH_TOKEN")}'
    }
    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        file_content = base64.b64decode(response.json()['content']).decode('utf-8')
        title = next(line.replace('# ', '') for line in file_content.split('\n') if line.startswith('#'))
        area = next((line.split(':', 1)[1].strip() for line in file_content.split('\n') if line.lower().startswith('area:')), '')
        description = next((line.split(':', 1)[1].strip() for line in file_content.split('\n') if line.lower().startswith('metadescription:')), '')
        content = file_content.split(title, 1)[1].strip()
        content = content.replace('<!-- DOWNLOAD_LINKS_PLACEHOLDER -->', '')
        full_content = f"{description}\n\n{content}"
        return {
            'content': full_content, # rate limit could be exceeded for free tier
            # 'content': description,
            'url': f'https://github.com/{REPO_OWNER}/{REPO_NAME}/blob/main/{file_path}'
        }
    else:
        raise Exception(f"Error fetching file content: {response.status_code}")
    
# Test: Print a sample file content
get_file_content("release-notes/v1_94.md")

 'url': 'https://github.com/microsoft/vscode-docs/blob/main/release-notes/v1_94.md'}

Let's iterate over each file and get the contents.

**`Copilot: iterate over each file in #docs_files and get its contents`**

In [131]:
docs_contents = []
for file_path in docs_files:
    content = get_file_content(file_path)
    docs_contents.append(content)

# Test: Print the first few contents to verify
for doc in docs_contents[:5]:
    print(doc)

{'content': 'See what is new in the Visual Studio Code July 2016 Release (1.4)\n\nDuring July, we slowed down feature work in favor of reducing our bug backlog and removing engineering debt. However, we were still able to add some improvements.\n\nHere are the highlights:\n\n* **Workbench**: Editor actions such as **Open Preview** and **Switch to Changes View** are back on the title bar. IME and Copy/Paste support in the Integrated Terminal.\n* **Editor**: Better snippet and suggestions control. New **Insert Snippet** command with dedicated UI.\n* **Debugging**: **Restart Frame** action to rerun a specific stack frame.  \'Variable paging\' feature moved into VS Code and available to all debug extensions.\n* **Extension Authoring**: New \'move\' commands to better support VIM gestures. Custom link behavior with the `DocumentLinkProvider` API. Expanded Debug Protocol.\n\nDownloads: [Windows](https://az764295.vo.msecnd.net/stable/6276dcb0ae497766056b4c09ea75be1d76a8b679/VSCodeSetup-stable

Now let's save this data to a file so that I can use it later. I don't want to hit the GitHub API every time I run this notebook.

**`Copilot: Save #docs_contents as release_notes.json`**

In [132]:
import json

with open('release_notes.json', 'w') as file:
    json.dump(docs_contents, file, indent=2)

## Chunking

Before we embed the release notes data into the RAG model, we need to chunk the data so that it can fit into the embeddings model. For our exploration, we're going to use [OpenAI's Text Embedding 3 (small) model](https://github.com/marketplace/models/azure-openai/text-embedding-3-small?tab=readme) from [GitHub Marketplace](https://github.com/marketplace/models) -- which has a limit of ~8k tokens. Creating embeddings now will help us to conduct semantic search much faster later on.

The goal of chunking is often to preserve text segments with related context. Luckily for us, the VS Code team already manages the release notes in markdown format, so we can use a markdown parser to help us with this. Let's use [LangChain's `markdown_header_metadata_splitter`](https://python.langchain.com/docs/how_to/markdown_header_metadata_splitter/).

Let's load the data that we've saved.

**`Copilot: Load release_notes.json as docs_contents`**

In [111]:
with open('release_notes.json', 'r') as file:
    docs_contents = json.load(file)

Let's split our release notes:

**`Copilot: Using langchain's markdown header text splitter, split each document's "content" from docs_contents based on header levels.`**

We want to retain release dates in the content. After `split_text` _(inline)_:

**`Copilot: Take the first line of text from the page_content of the first element in page_splits and prepend it to the page_content of the split object.`**

The title is irrelevant, and I want the links to direct folks to the VS Code website instead of the GitHub repo. Let's remove the title and replace the links _(inline)_.

**`Copilot: We don't need the title in the splits. Do include the corresponding url, but replace the GitHub links that look like "https://github.com/microsoft/vscode-docs/blob/main/release-notes/FILE.md" to VS Code website links that look like "https://code.visualstudio.com/updates/FILE"`**

In [133]:
import json
import re
from langchain_text_splitters import MarkdownHeaderTextSplitter

gh_link = r'https://github\.com/microsoft/vscode-docs/blob/main/release-notes/(.+)\.md'
docs_link = r'https://code.visualstudio.com/updates/\1'

headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3")
]

# MD splits
markdown_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on, strip_headers=False
)

# Split
splits = []
for page in docs_contents:
    markdown_document = page['content']
    page_splits = markdown_splitter.split_text(markdown_document)
    for split in page_splits:
        split.page_content = page_splits[0].page_content.split('\n', 1)[0] + '\n' + split.page_content
        url = re.sub(gh_link, docs_link, page['url'])
        split.metadata['url'] = url
    splits.extend(page_splits)
splits

[Document(metadata={'url': 'https://code.visualstudio.com/updates/July_2016'}, page_content="See what is new in the Visual Studio Code July 2016 Release (1.4)  \nSee what is new in the Visual Studio Code July 2016 Release (1.4)  \nDuring July, we slowed down feature work in favor of reducing our bug backlog and removing engineering debt. However, we were still able to add some improvements.  \nHere are the highlights:  \n* **Workbench**: Editor actions such as **Open Preview** and **Switch to Changes View** are back on the title bar. IME and Copy/Paste support in the Integrated Terminal.\n* **Editor**: Better snippet and suggestions control. New **Insert Snippet** command with dedicated UI.\n* **Debugging**: **Restart Frame** action to rerun a specific stack frame.  'Variable paging' feature moved into VS Code and available to all debug extensions.\n* **Extension Authoring**: New 'move' commands to better support VIM gestures. Custom link behavior with the `DocumentLinkProvider` API. E

In [135]:
print(len(splits))

4967


In [136]:
splits[29]

{'id': None,
 'metadata': {'Header 2': 'Workbench',
  'Header 3': 'Integrated Terminal',
  'url': 'https://code.visualstudio.com/updates/June_2016'},
 'page_content': "See what is new in the Visual Studio Code June 2016 Release (1.3)\n### Integrated Terminal  \nThe integrated terminal that was introduced in VS Code 1.2.0 has seen many improvements this release, the primary one being the ability to launch and use multiple terminals at the same time. Terminal instances can be added by hitting the plus icon on the top-right of the **TERMINAL** panel or by triggering the `kb(workbench.action.terminal.new)` command. This creates another entry in the dropdown list that can be used to switch between them.  \n![Multiple terminal instances](images/June_2016/terminal_multiple_instances.png)  \nSeveral new commands were added to aid with management of the **TERMINAL** panel and its terminal instances.  \nThey are:  \n* `workbench.action.terminal.focus`: Focus the terminal. This is like toggle but

I realized that some of the urls are in an incorrect format. We need to get rid of spaces, double quotations, and backticks.

**`Copilot: Create a function to replace all spaces, double quotes, and backticks in the urls with empty strings`**

In [137]:
import re

def url_cleaner(text):
    text = text.lower()
    text = text.replace(" ", "-")
    text = text.replace('"', '')
    text = text.replace('`', '')
    text = re.sub(r'[()]+', '', text)
    return text

# Test: Print a sample converted text
print(url_cleaner("A sample TEXT (with \"special\" `char`acters)"))

a-sample-text-with-special-characters


Let's apply this function to all the urls in the splits. While we're at it, let's remove unnecessary fields and remove any splits that don't have a 'Header 3' in the metadata since we're only interested in specific features instead of general header contents.

**`Copilot: Remove any splits that don't have a 'Header 3' in the metadata. Apply the url_cleaner function to Header 3's in the splits and add it to the end of the url as a specific section of the url. Only keep "content" and "url" fields. Don't create a new variable.`**

In [140]:
for item in splits:
    pointer = ""
    if 'Header 3' in item['metadata']:
        pointer = url_cleaner(item['metadata']['Header 3'])
        item['content'] = item['page_content']
        item['url'] = f"{item['metadata']['url']}#_{pointer}"
        item.pop('id', None)
        item.pop('metadata', None)
        item.pop('page_content', None)
        item.pop('type', None)
    else:
        splits.remove(item)
        continue

# Test: Print a few contents to verify
for item in splits[40:45]:
    print(item)

{'content': 'See what is new in the Visual Studio Code June 2016 Release (1.3)\n### OS specific launch configurations  \nInitiated by a [user request](https://github.com/microsoft/vscode/issues/1873), it is now possible to specify OS specific configurations inside a `launch.json`:  \n```json\n{\n"type": "node",\n"request": "launch",\n"runtimeExecutable": "mynode",\n"windows": {\n"runtimeExecutable": "mynode.exe"\n}\n}\n```', 'url': 'https://code.visualstudio.com/updates/June_2016#_os-specific-launch-configurations'}
{'content': 'See what is new in the Visual Studio Code June 2016 Release (1.3)\n## Node.js Debugging  \n### Attach to Process  \nNode.js debugging now supports attaching to a Node.js process that has not been started in debug mode. This can be useful if you need to debug a production server that cannot always run in debug mode for performance reasons.  \nIn order to attach to a Node.js process, you specify its _process id_ via a `processId` attribute in an `attach` launch c

Before we move on, let's save the chunked and cleaned data.

**`Copilot: Save the cleaned splits as release_notes.json`**

In [141]:
with open('release_notes.json', 'w') as file:
    json.dump(splits, file, indent=2)

## Method 1: pre-generating embeddings _(**NOT** for Universe)_

### Generate embeddings

Now let's generate embeddings for the chunks. We can use the [OpenAI Text Embedding 3 (small) model from GitHub Marketplace](https://github.com/marketplace/models/azure-openai/text-embedding-3-small). Click on `Get started`, select Language: `Python`, and copy the code snippet.

Since we're on Codespaces, we don't have to create an access token--it's already set up for us.

In [129]:
# Copied code snippet

import os

from azure.ai.inference import EmbeddingsClient
from azure.core.credentials import AzureKeyCredential

endpoint = "https://models.inference.ai.azure.com"
model_name = "text-embedding-3-small"
token = os.environ["AZURE_TOKEN"]

client = EmbeddingsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token)
)

response = client.embed(
    input=["first phrase", "second phrase", "third phrase"],
    # input=splits[3]['content'],
    model=model_name
)

for item in response.data:
    length = len(item.embedding)
    print(
        f"data[{item.index}]: length={length}, "
        f"[{item.embedding[0]}, {item.embedding[1]}, "
        f"..., {item.embedding[length-2]}, {item.embedding[length-1]}]"
    )
print(response.usage)

data[0]: length=1536, [-0.0072446493, 0.007473137, ..., 0.016135989, -0.0048770397]
data[1]: length=1536, [-0.0030364485, 0.009244851, ..., 0.029940795, 0.020932602]
data[2]: length=1536, [-0.013798126, 0.031926036, ..., 0.017498866, 0.022629254]
{'prompt_tokens': 6, 'total_tokens': 6}


Let's see what this looks like for one of our content chunks.

In [8]:
response = client.embed(
    input=splits[3]['content'],
    model=model_name
)

response.data[0].embedding

[0.03699878230690956,
 -0.006938727106899023,
 0.035462018102407455,
 -0.004493873566389084,
 0.004284314811229706,
 -0.013667895458638668,
 0.027731623500585556,
 1.044838336383691e-05,
 0.001133655314333737,
 0.015367650426924229,
 0.00975612923502922,
 0.04146937280893326,
 0.019989587366580963,
 -0.028383584693074226,
 0.042284321039915085,
 0.032784320414066315,
 -0.028150741010904312,
 0.009703739546239376,
 -0.01047794334590435,
 0.05062010884284973,
 0.03923407569527626,
 -0.046731628477573395,
 0.035205889493227005,
 0.08373040705919266,
 -0.0027373626362532377,
 0.006321692373603582,
 -0.05434560030698776,
 0.009954046458005905,
 -0.025659319013357162,
 0.011677085421979427,
 0.02160784788429737,
 -0.025636034086346626,
 0.013225493021309376,
 0.025053927674889565,
 -0.0011780409840866923,
 0.004520068410784006,
 0.015018385834991932,
 -0.0336691252887249,
 0.023680152371525764,
 0.01549571380019188,
 -0.0035304848570376635,
 -0.007689646445214748,
 0.026148289442062378,
 0.0

Let's iterate over the chunks and generate the embeddings.

**`Copilot: Iterate over splits to create content_embeddings using Azure AI's EmbeddingsClient. Pretend generage_azure_embedding function doesn't exist`**

In [None]:
# Don't run - takes too long for demo
for split in splits:
    split['content_embeddings'] = client.embed(
        input=split['content'],
        model=model_name
    ).data[0].embedding

# Test: Print the first few splits to verify
for split in splits[:5]:
    print(split)

In [6]:
# Using Azure OpenAI SDK (model deployed through Azure OpenAI Studio - oai.azure.com) - not for demo

import os
import openai
from dotenv import load_dotenv

load_dotenv()

def generate_azure_embedding(text):
    client = openai.AzureOpenAI(
        azure_endpoint="https://sominverse.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15",
        api_key=os.getenv("AZURE_AI_TOKEN"),
        api_version="2023-05-15"
    )

    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )

    embeddings = response.data[0].embedding
    return embeddings

# Generate embeddings
for item in splits:
    if 'content' in item:
        item['content_embeddings'] = generate_azure_embedding(item['content'])
    else:
        splits.remove(item)

# Add id to each item
for idx, item in enumerate(splits):
    item['id'] = idx

Let's make sure that it looks good for one of the chunks.

In [20]:
splits[40]

{'content': "See what is new in the Visual Studio Code June 2016 Release (1.3)\n### API tweaks  \n* The Uri-class now allows you derive a Uri from an existing one: `someUri.with({scheme: 'newScheme', path: 'newPath'})`\n* The `previewHtml` command now allows you to provide a `title`.\n* When previewing html, we expose the style of the current theme via class names of the body element. Those are `vscode-light`, `vscode-dark`, and `vscode-high-contrast`.\n* Last, there is a new command `vscode.open` to open non-textual resources like images.",
 'url': 'https://code.visualstudio.com/updates/June_2016#_api-tweaks',
 'content_embeddings': [-0.012693786062300205,
  -6.377157842507586e-05,
  0.005413017701357603,
  -0.01395048201084137,
  0.01452694833278656,
  -0.0265866219997406,
  0.058338381350040436,
  0.0008791110012680292,
  0.01887350343167782,
  -0.015622234903275967,
  0.010520508512854576,
  -0.0145384781062603,
  -0.009690397419035435,
  -0.02056831493973732,
  0.02258594706654548

I've saved the embeddings to the `release_notes.json` file because it would be too costly and time-consuming to generate the embeddings every time I run this notebook.

In [21]:
with open('release_notes.json', 'w') as file:
    json.dump(splits, file, indent=2)

### Hybrid search

We will do a hybrid of **full-text** and **vector similarity** search to retrieve the most relevant documents. This method will combine the precision of keyword matching with the contextual understanding of semantically similar content -- which would not only improve relevance and coverage of the document retrieval, it would also capture results that would be missed by either method alone.

First let's load the data we've saved with content embeddings.

**`Copilot: Load release_notes.json as release_notes`**

In [25]:
import json

with open('release_notes.json', 'r') as file:
    release_notes = json.load(file)

release_notes[101]

{'content': 'See what is new in the Visual Studio Code February 2017 Release (1.10)  \n## Workbench  \n### Configurable Explorer key bindings  \nBy popular demand, you can now configure the key bindings for most of the commands in the File Explorer and OPEN EDITORS view.  \nThe following commands could already be assigned prior to version 1.10 in the File Explorer:  \n* `explorer.newFile` - Create a new file\n* `explorer.newFolder`-  Create a new folder  \nNew commands that work in both the File Explorer and OPEN EDITORS view  \n* `explorer.openToSide` - Open to the side\n* `copyFilePath` - Copy path of file/folder\n* `revealFileInOS` - Reveal file in OS  \nNew commands that only work in the File Explorer:  \n* `filesExplorer.copy` - Copy a file from the File Explorer\n* `filesExplorer.paste` - Paste a file that was copied from the File Explorer\n* `renameFile` - Rename a file/folder in the File Explorer\n* `moveFileToTrash` - Move a file/folder to trash from the File Explorer\n* `dele

Hm, I'm not sure which tool to use for the full-text search. Let's ask Copilot _(chat)_.

**`Copilot: Recommend a few options for lightweight and fast full-text search engine`**

I like that MeiliSearch is open source and easy to deploy and use. This is perfect for the purposes of this demo. Let's use that.

You can manage your dev environment freely in Codespaces, so let's install MeiliSearch through the terminal and use the [self-hosted option](https://www.meilisearch.com/docs/learn/self_hosted/getting_started_with_self_hosted_meilisearch).

```bash
# Install Meilisearch
curl -L https://install.meilisearch.com | sh

# Launch Meilisearch
./meilisearch --master-key="aSampleMasterKey"
```

Now that we have MeiliSearch running, let's install the Python package (`pip install meilisearch`) and index the release notes.

In [21]:
import meilisearch

ms_client = meilisearch.Client('http://127.0.0.1:7700')

But, let's not be wasteful and only index the `content` and `url` fields. We don't need to index the embeddings since we already have them saved in the `release_notes.json` file. Let's also get rid of section category headers in the content -- which I know ends with `#_` in the url.

**`Copilot: Create a new variable to store only the `content` and `url` fields from the `#release_notes` variable. Don't include items with `url` that end with `#_`.`**

In [44]:
release_text_only = [{k: v for k, v in item.items() if k != 'content_embeddings'} for item in release_notes] # don't include embeddings
release_text_only = [item for item in release_text_only if 'url' in item and not item['url'].endswith('#_')] # don't include section header documents

print(f"Original length: {len(release_notes)}")
print(f"Filtered length: {len(release_text_only)}")

Original length: 4138
Filtered length: 4023


Now let's load the data into MeiliSearch and conduct a test search.

In [45]:
ms_client.index('release').add_documents(release_text_only)

TaskInfo(task_uid=7, index_uid='release', status='enqueued', type='documentAdditionOrUpdate', enqueued_at=datetime.datetime(2024, 10, 8, 16, 45, 36, 974167))

In [46]:
ms_client.index('release').search('copilot notebooks', {'limit': 50})['hits']

[{'content': 'Learn what is new in the Visual Studio Code June 2023 Release (1.80)  \n## Contributions to extensions  \n### GitHub Copilot  \nWe have introduced preview-only slash commands in the Chat view to help you create projects and notebooks and search for text in your workspace.  \n>**Note**: To get access to the Chat view, inline chat, and slash commands (for example `/search`, `/createWorkspace`), you need to install the [GitHub Copilot Chat](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot-chat) extension.  \n#### Create workspaces  \nYou can ask Copilot to create workspaces for popular project types with the `/createWorkspace` slash command. Copilot will first generate a directory structure for your request.  \n<video src="images/1_80/create-workspace-outline.mp4" autoplay loop controls muted title="Create workspace outline"></video>  \nYou can then use the **Create Workspace** button to create and open the project directory as a new workspace.  \n![Create 

Since we're using natural language to query our results, let's use TF-IDF (Term Frequency - Inverse Document Frequency) to score words based on their relevance to a given document or query. This will help to find the most important words for the search.

**`Copilot: Create a function that applies TF-IDF to extract op keywords given a sentence`**

In [19]:
from sklearn.feature_extraction.text import TfidfVectorizer

def extract_top_keywords(query, documents, top_k=10):
    vectorizer = TfidfVectorizer(stop_words='english') # remove English words that don't carry significant meaning
    vectorizer.fit(documents) # fit the vectorizer on the documents

    tfidf_matrix = vectorizer.transform([query]) # transform the query to a TF-IDF matrix
    words = vectorizer.get_feature_names_out() # get the feature names (words)
    scores = tfidf_matrix.toarray().flatten() # get the scores for each word in the query
    
    # extract top keywords based on TF-IDF scores
    keyword_scores = dict(zip(words, scores))
    sorted_keywords = sorted(keyword_scores.items(), key=lambda x: x[1], reverse=True)
    
    # define words to exclude
    exclude_words = {'recent', 'new', 'feature', 'features', 'content', 'contents', 'release', 'releases', 'notes', 'note', 'updates', 'update'}
    
    # output top keywords
    top_keywords = [word for word, score in sorted_keywords if score > 0 and word not in exclude_words][:top_k]
    return ' '.join(top_keywords) # return keywords as a string

# Test: Print top keywords for a sample query
documents = [doc['content'] for doc in release_notes if 'content' in doc] # only search over content of the release notes
top_keywords = extract_top_keywords("What are recent features in Copilot for notebooks?", documents)
print(top_keywords)

copilot notebooks


Now let's write a function to search for the most relevant documents based on the query.

**`Copilot: Using meilisearch and the extract_top_keywords function, write a function to conduct a full text search over only the content of `#release_notes`.`**

In [16]:
def full_text_search(query, documents=release_notes, index_name='release', top_k=50):
    documents = [doc['content'] for doc in documents if 'content' in doc] # only search over content of the release notes
    top_keywords = extract_top_keywords(query, documents)

    result = ms_client.index(index_name).search(top_keywords, {'limit': top_k})['hits']
    return result

And we can use an embeddings model from the GH markeplace (making sure it's the same one we used for embedding our documents!) to create a function that embeds our query for the search.

In [51]:
from azure.ai.inference import EmbeddingsClient
from azure.core.credentials import AzureKeyCredential

import os

def generate_embedding(text):
    azure_endpoint = "https://models.inference.ai.azure.com"
    model_name = "text-embedding-3-small"
    client = EmbeddingsClient(
        endpoint=azure_endpoint,
        credential=AzureKeyCredential(os.getenv("AZURE_TOKEN"))
    )

    response = client.embed(input=text, model=model_name)
    embeddings = [item.embedding for item in response.data]
    return embeddings

# Test: Generate embeddings for a sample query
test_q = "What are recent features for Copilot chat in notebooks?"
test_q_embeddings = generate_embedding(test_q)
test_q_embeddings

[[-0.02402293,
  -0.041299332,
  0.018916734,
  -0.021760859,
  0.0016237965,
  -0.026099801,
  0.0046266625,
  0.020729037,
  0.015688984,
  -0.004854854,
  -0.0065315645,
  -0.017091203,
  -0.018056883,
  0.011217755,
  -0.054157417,
  -0.03545234,
  -0.020570295,
  0.04725215,
  0.018890277,
  0.020729037,
  -0.030160947,
  -0.024274271,
  0.014128022,
  0.013420298,
  -0.03248916,
  -0.013393842,
  -0.054025132,
  0.043521717,
  0.0057940767,
  -0.026496656,
  0.012461233,
  -0.01867862,
  0.001923091,
  0.03905049,
  -0.052331887,
  -0.061274342,
  0.022051886,
  -0.014128022,
  -0.0038627177,
  -0.006402587,
  -0.006220695,
  0.010033806,
  0.026351143,
  -0.016522378,
  -0.073338725,
  -0.00969648,
  -0.011819651,
  -0.0143529065,
  -0.01052326,
  0.048601456,
  -0.03354744,
  0.0014873778,
  -0.011383112,
  -0.037039757,
  -0.034314692,
  0.0042694937,
  -0.035240684,
  0.010893658,
  0.02777982,
  -0.012871316,
  -0.013354156,
  -0.0075137797,
  -0.016456235,
  0.0050334386,
 

Now let's conduct a vector similiarity search using FAISS.

<!-- TODO: modify -->
**`Copilot: Create a function to conduct vector similarity search using FAISS.`**

In [48]:
import faiss
import numpy as np

def faiss_search(query_embedding, doc_embeddings, top_k=5):
    # Convert document embeddings into a numpy array
    embeddings_matrix = np.array(doc_embeddings)
    
    # Build FAISS index
    dim = embeddings_matrix.shape[1]
    index = faiss.IndexFlatL2(dim)  # Using L2 (Euclidean) distance
    index.add(embeddings_matrix)
    
    # # Ensure the query embedding is a 2D array
    # query_embedding = np.array(query_embedding).reshape(1, -1)
    
    # # Perform the search with FAISS
    # distances, indices = index.search(query_embedding, top_k)

    # Perform the search with FAISS
    _, indices = index.search(np.array([query_embedding]), top_k)

    return indices.flatten()

In [138]:
def retrieve_documents(query, documents=release_notes, top_k=5):
    full_text_results = full_text_search(query) # full text search using TF-IDF & meilisearch
    
    # extract relevant document embeddings from meilisearch results
    relevant_texts = []
    doc_embeddings = []
    urls = []
    for hit in full_text_results:
        doc_id = hit['id']
        doc = next((item for item in documents if item.get('id') == doc_id), None)
        if 'content' in doc and 'content_embeddings' in doc:
            relevant_texts.append(doc['content'])
            doc_embeddings.append(doc['content_embeddings'])
            urls.append(doc['url'])
            # print(doc['url'])
    
    # vector search using FAISS
    query_embedding = generate_embedding(query)
    faiss_indices = faiss_search(query_embedding[0], doc_embeddings, top_k)
    
    # combine results
    combined_results = []
    for i in faiss_indices:
        combined_results.append({
            "content": relevant_texts[i],
            "url": urls[i]
        })

    return combined_results

In [140]:
test_combined_results = retrieve_documents(test_q)
test_combined_results

[{'content': 'Learn what is new in the Visual Studio Code April 2023 Release (1.78)  \n### GitHub Copilot  \n>**Note**: These features are available in the [GitHub Copilot Chat](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot-chat) extension.  \n#### Chat editors  \nOur first iteration on GitHub Copilot Chat enabled chat sessions in the sidebar. Now, we support opening the same chat view as an editor. This lets you customize the position of your chat session to be anywhere you want within your window layout.  \nYou can open a chat editor by running the command **Interactive Session: Open Editor** and then move it between editor groups just as you would with any other editor.  \n![A chat view as an editor](images/1_78/chat-editor.png)  \n#### Additional codeblock commands  \nThere are two new commands in the codeblock toolbar, **Insert into New File** and **Run in Terminal**. These are next to the existing commands **Copy** and **Insert at Cursor**, and give you extra

### Answer generation

Now that we have the most relevant documents, let's generate answers for the user query.

In [69]:
# Copy the code snippet from GH marketplace
from openai import OpenAI

gpt_client = OpenAI(
    base_url="https://models.inference.ai.azure.com",
    api_key=os.getenv("AZURE_TOKEN")
)

<!-- TODO: modify -->
**`Copilot: Generate an answer with LLM...`**

In [73]:
# import tinyurl

def generate_answer(question, context, model="gpt-4o-mini", shorten=False):
    # Combine the relevant documents into a single context
    context_text = " ".join([doc['content'] for doc in context])

    messages = [
        {"role": "system", "content": "You are a social assistant who writes creative content. You will politely decline any other requests from the user not related to creating content. Don't talk about a single VS Code release and don't talk about release dates at all. Instead, only talk about the relevant features. Don't include made up links. You format all your responses as Markdown unless otherwise specified. Avoid wrapping your entire response in a markdown code element."},
        {"role": "user", "content": f"Create a short tweet based on the following context: {context_text}. Question: {question}"}
    ]

    response = gpt_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.3,
        max_tokens=1500, # Copilot: how can I dynamically set max_tokens based on the combined length of the abstracts?
        top_p=1.0
    )

    answer = response.choices[0].message.content
    
    # Append source URLs to the answer
    urls = [doc['url'] for doc in context]
    # if shorten:
    #     urls = [tinyurl.create_one(url) for url in urls]
    
    answer_with_sources = f"{answer}\n\nLearn more:\n" + "\n".join(urls)
    return answer_with_sources

In [144]:
# Test: Generate an answer for a sample question
question = "What are recent features for Copilot chat in notebooks?"

# Retrieve relevant documents based on the question
retrieved_docs = retrieve_documents(question)

# Generate answer based on the retrieved documents
final_answer = generate_answer(question, retrieved_docs)
print(final_answer)


🚀 Exciting updates in VS Code! Now with GitHub Copilot, you can attach Jupyter kernel variables to your chat requests for better context. Plus, accept and run generated code directly from Inline Chat! 🐍💻 #VSCode #GitHubCopilot #DevTools

Learn more:
https://code.visualstudio.com/updates/v1_78#_github-copilot
https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat
https://code.visualstudio.com/updates/v1_90#_github-copilot
https://code.visualstudio.com/updates/v1_80#_github-copilot
https://code.visualstudio.com/updates/v1_94#_accept-and-run-generated-code-in-notebook



### Comparing between models

If you want to develop a generative AI application, you can use GitHub Models to find and experiment with AI models for free. Once you are ready to bring your application to production, you can switch to a token from a paid Azure account. See the [Azure AI](https://ai.azure.com/github/model/docs) documentation. (From [GitHub docs](https://docs.github.com/en/github-models/prototyping-with-ai-models))

Replace `model` name found in the marketplace.

In [146]:
question = "What are recent features for Copilot chat in notebooks?"
retrieved_docs = retrieve_documents(question)
final_answer = generate_answer(question, retrieved_docs, model="Mistral-small")
print(final_answer)


Answer: Recent features for Copilot chat in notebooks include the ability to attach variables from the Jupyter kernel in your requests for more precise control over the context, and the option to accept and run generated code directly from Inline Chat.

Learn more:
https://code.visualstudio.com/updates/v1_78#_github-copilot
https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat
https://code.visualstudio.com/updates/v1_90#_github-copilot
https://code.visualstudio.com/updates/v1_80#_github-copilot
https://code.visualstudio.com/updates/v1_94#_accept-and-run-generated-code-in-notebook



In [148]:
question = "What are recent features for Copilot chat in notebooks?"
retrieved_docs = retrieve_documents(question)
final_answer = generate_answer(question, retrieved_docs, model="meta-llama-3-8b-instruct")
print(final_answer)


Here's a tweet-sized summary of the recent features for Copilot chat in notebooks:

"Boost your productivity with the latest Copilot chat features in notebooks! 🚀 Now you can attach variables from the Jupyter kernel, ask questions using Bing search and enterprise knowledge bases, and even run generated code directly from Inline Chat. #Copilot #VSCode #Notebooks"

Learn more:
https://code.visualstudio.com/updates/v1_78#_github-copilot
https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat
https://code.visualstudio.com/updates/v1_90#_github-copilot
https://code.visualstudio.com/updates/v1_80#_github-copilot
https://code.visualstudio.com/updates/v1_94#_accept-and-run-generated-code-in-notebook



## Method 2: generating embeddings _ONLY_ for relevant results

First method would be great if you know which embeddings model you're going to use. But what if you want to experiment with embeddings models?

### Load data

In [1]:
import json

with open('release_notes.json', 'r') as file:
    release_notes = json.load(file)

In [6]:
import re

# Pre-compile the regex pattern for better performance in repeated use
pattern = re.compile(r'v1_(9|8)\d+')

latest_release = []
for item in release_notes:
    # Skip items without 'url' or section headers
    if 'url' not in item or item['url'].endswith('#_'):
        continue

    # Check if the 'url' matches the pattern for latest release () and filter out 'content_embeddings'
    if pattern.search(item['url']):
        filtered_item = {k: v for k, v in item.items() if k != 'content_embeddings'}
        latest_release.append(filtered_item)

# Print the filtered items
print(latest_release[:2])

[{'content': 'Learn what is new in the Visual Studio Code June 2023 Release (1.80)  \n### Accessibility help improvements  \nA new command **Open Accessibility Help** (`kb(editor.action.accessibilityHelp)`) opens a help menu based on the current context. It currently applies to the editor, terminal, notebook, chat panel, and inline chat features.  \nDisable the accessibility help menu hint and open additional documentation, if any, from within the help menu.', 'url': 'https://code.visualstudio.com/updates/v1_80#_accessibility-help-improvements', 'id': 3375}, {'content': 'Learn what is new in the Visual Studio Code June 2023 Release (1.80)  \n### Accessibility help for notebooks  \nA new accessibility help menu was added for notebooks to provide information about the editor layout and navigating and interacting with the notebook.', 'url': 'https://code.visualstudio.com/updates/v1_80#_accessibility-help-for-notebooks', 'id': 3376}]


In [65]:
print(f"Original (`release_notes`) = {len(release_notes)}")
print(f"Latest (`latest_release`) = {len(latest_release)}")

Original (`release_notes`) = 4138
Latest (`latest_release`) = 576


### Retrieve documents

Retrieve documents first via full-text search using TF-IDF & meilisearch. Then generate embeddings for top 10 full-text search results and conduct a semantic search to come up with the top 3 most relevant results.

In [45]:
import os

from azure.ai.inference import EmbeddingsClient
from azure.core.credentials import AzureKeyCredential

endpoint = "https://models.inference.ai.azure.com"

embeddings_client = EmbeddingsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(os.environ["AZURE_TOKEN"])
)

def generate_embeddings(text, model="text-embedding-3-small"):
    response = embeddings_client.embed(
        input=text,
        model=model
    )

    return response.data[0].embedding

In [66]:
def retrieve_and_embed_docs(query, documents=latest_release, embeddings_model="text-embedding-3-small", top_k=3):
    full_text_results = full_text_search(query, top_k=10) # full text search using TF-IDF & meilisearch

    # extract relevant document embeddings from meilisearch results
    relevant_texts = []
    doc_embeddings = []
    urls = []
    for hit in full_text_results:
        doc_id = hit['id']
        doc = next((item for item in documents if item.get('id') == doc_id), None)
        if 'content' in doc:
            relevant_texts.append(doc['content'])
            content_embeddings = generate_embeddings(doc['content'], model=embeddings_model)
            doc_embeddings.append(content_embeddings)
            urls.append(doc['url'])
    
    # vector search using FAISS
    query_embedding = generate_embeddings(query, model=embeddings_model)
    faiss_indices = faiss_search(query_embedding, doc_embeddings, top_k)
    
    # combine results
    combined_results = []
    for i in faiss_indices:
        combined_results.append({
            "content": relevant_texts[i],
            "url": urls[i]
        })

    return combined_results

In [68]:
# Test: Generate an answer for a sample question
q = "What are recent features for Copilot chat in notebooks?"

retrieved_docs = retrieve_and_embed_docs(q)
retrieved_docs

[{'content': 'Learn what is new in the Visual Studio Code September 2024 Release (1.94)  \n### Attach variables in notebook chat  \nWhen you use Copilot in a notebook, you can now attach variables from the Jupyter kernel in your requests. Adding variables gives you more precise control over the context for your chat request, so that you get more relevant responses from Copilot.  \nEither type `#`, followed by the variable name, or use the 📎 control (`kb(workbench.action.chat.attachContext)`) in Inline Chat to add a context variable.  \n<video src="images/1_94/notebook-kernel-variable.mp4" title="Attach a context variable by using `#` in a notebook chat request" autoplay loop controls muted></video>',
  'url': 'https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat'},
 {'content': 'Learn what is new in the Visual Studio Code March 2024 Release (1.88)  \n### GitHub Copilot  \n#### Inline Chat improvements  \nInline Chat now starts as a floating control, making it 

### Answer generation

In [106]:
# import tinyurl
def generate_llm_answer(question, context, completion_model="gpt-4o-mini"):
    # Combine the relevant documents into a single context
    context_text = " ".join([doc['content'] for doc in context if doc.get('content')])
    context_url = ", ".join([doc['url'] for doc in context if doc.get('url')])

    messages = [
        {"role": "system", "content": "You are a social assistant who writes creative content. You will politely decline any other requests from the user not related to creating content. Don't talk about a single VS Code release and don't talk about release dates at all. Instead, only talk about the relevant features. Don't include made up links. You format all your responses as Markdown unless otherwise specified. Avoid wrapping your entire response in a markdown code element."},
        {"role": "user", "content": f"Create a short tweet based on the following context: {context_text}. This won't actually be a tweet, so in your answer, always include the following URLs from the content sources: {context_url}. Question: {question}"}
    ]
    
    response = gpt_client.chat.completions.create(
        model=completion_model,
        messages=messages,
        temperature=0.3,
        max_tokens=1500, # Dynamically set max_tokens based on the combined length of the docs?
        top_p=1.0
    )

    answer = response.choices[0].message.content
    return answer

In [101]:
final_answer = generate_llm_answer(q, retrieved_docs)
print(final_answer)

🚀 Exciting updates in VS Code! Now you can attach variables in notebook chats with Copilot for more precise context. Just type `#` followed by the variable name or use the 📎 control! 🎉 Check it out: [Attach Variables](https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat) #VSCode #GitHubCopilot

For more on Copilot features, explore:  
- [March 2024 Release](https://code.visualstudio.com/updates/v1_88#_github-copilot)  
- [May 2024 Release](https://code.visualstudio.com/updates/v1_90#_github-copilot)


### Comparing between models

Use [GitHub Marketplace](https://github.com/marketplace/models) to find and experiment with AI models. Replace `embedidings_model` and `completion_model` names found in the marketplace:

```python
q = "What are recent features for Copilot chat in notebooks?"
retrieved_docs = retrieve_and_embed_docs(q, embeddings_model="text-embedding-3-small")
final_answer = generate_llm_answer(q, retrieved_docs, completion_model="gpt-4o-mini")
print(final_answer)
```

In [102]:
final_answer = generate_llm_answer(q, retrieved_docs, completion_model="Mistral-small")
print(final_answer)

🚀 New in VS Code 1.94! Attach variables from the Jupyter kernel in your Copilot chat requests for more relevant responses. Use `#` or the 📎 control. Learn more: https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat

And in VS Code 1.88, the kernel state is now automatically included as context in Inline Chat for notebooks. This lets Copilot use the current state of the notebook to provide more relevant completions. Learn more: https://code.visualstudio.com/updates/v1_88#_notebook-kernel-state-as-context


In [103]:
final_answer = generate_llm_answer(q, retrieved_docs, completion_model="meta-llama-3-8b-instruct")
print(final_answer)

Here's a tweet-sized summary of the recent features for Copilot chat in notebooks:

"New in VS Code! 🚀 Attach variables from Jupyter kernel in notebook chat with Copilot. Add context with `#` or 📎 control. Get more precise control over chat requests and relevant responses. Learn more: https://code.visualstudio.com/updates/v1_94#_attach-variables-in-notebook-chat"
