# RAG exploration

![goal_post](goal_post.png)

## 🤖 Social assistant, with **off the shelf model**

In [1]:
url='https://microsoftfabric.devpost.com/'

In [2]:
import requests
from bs4 import BeautifulSoup
import json

# Fetch the content of the blog post
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Extract the main content of the blog post
blog_content = soup.get_text()

# Convert the content to JSON format
content = json.dumps({"content": blog_content}, indent=4)

# Output the JSON content
print(content)

{
    "content": "\n\n\n\n  \n\n\n\n\n\n\n\n\n\n\nMicrosoft Fabric and AI Learning Hackathon: Building the next wave of innovative AI powered data analytics applications with Microsoft Fabric - Devpost\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n\n\n\n\n      Log in\n \n\n\n\n        Sign up\n      \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\nDevpost\n\n\nHackathons\nProjects\nHost a public hackathon\n\n\n\n\n\n\nDevpost for Teams\n\n\nTeams login\nRequest a demo\n\n\n\n\n\n    Hackathons\n\n\n\n\n    Projects\n\n\n\nBlog\n\n\n\n\n    Host a hackathon\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nProduct\n\n\n\n\n\n\n\nDevpost\nGrow your developer ecosystem and promote your platform.\n\n\n\nHackathons\nProjects\nHost a public hackathon\n\n\n\n\n\n\nDevpost for Teams\nDrive innovation, collaboration, and retention within your organization.\n\n\n\nTeams login\nRequest a demo\n\n\n\n\n\n\n\n\n\n    Hackathons\n\n\n\n\n    Proje

In [4]:
# Messages to give LLM, to create a short LinkedIn post based on a blog post

system_message = """
You are a social assistant who writes creative content. You will politely decline any other requests from the user not related to creating content. Don't talk about a single VS Code release and don't talk about release dates at all. Instead, only talk about the relevant features. Don't include made up links, but do provide real links to the VS Code release notes for specific features. You format all your responses as Markdown unless otherwise specified. Avoid wrapping your entire response in a markdown code element.
"""
messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": f"Create a very short LinkedIn post using the following: {content}"}
]

In [6]:
import os
from openai import OpenAI

token = os.environ["GITHUB_TOKEN"]
endpoint = "https://models.inference.ai.azure.com"
model_name = "gpt-4o-mini"

client = OpenAI(
    base_url=endpoint,
    api_key=token,
)

response = client.chat.completions.create(
    messages=messages,
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(post := response.choices[0].message.content)

🚀 Exciting news! Join us for the **Microsoft Fabric and AI Learning Hackathon**! 🎉 

Dive deep into building innovative AI-powered data analytics applications using Microsoft Fabric. This is a unique opportunity to leverage Azure OpenAI services, collaborate with fellow tech enthusiasts, and showcase your skills for a chance to win from a prize pool of **$10,000**! 

🗓️ **Deadline:** November 12, 2024  
🔗 [Join the hackathon here](https://devpost.com) and explore endless possibilities in AI and data analytics. Let’s shape the future together! 💡 #Hackathon #MicrosoftFabric #AI #Innovation


## 📚 Text search

#### Extract key topics & features

In [7]:
# Messages to give LLM, to extract key topics & features

topic_system_message = """
You are an expert at conducting entity extraction. Generate top topics and functionality based on provided content. Focus on identifying key concepts, themes, and relevant terms related to specific developer tooling, with a particular emphasis on VS Code features. Make sure entities you extract are directly relevant to the developer environment described. Don't mention specific dates or years. Use advanced search techniques, including Boolean operators and keyword variations, to craft precise, optimized queries that yield the most relevant results. Aim for clarity, relevance, and depth to cover all aspects of the topic efficiently. Simply list the phrases without additional explanation or details. Do not list any bullet points or numbered lists or quotation marks.
"""

topic_user_message="Come up with a list of top 5 developer tooling topics, functionalities, and relevant terms, with a strong focus on VS Code features and integrations based on the following content: "

In [8]:
def extract_key_topics(content, model="gpt-4o-mini"):
    messages = [
        {"role": "system", "content": topic_system_message},
        {"role": "user", "content": topic_user_message+content}
    ]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.3,
    )

    key_topics = response.choices[0].message.content.split('\n')
    return key_topics

key_topics = extract_key_topics(content)
print("\n".join([topic + "\n" for topic in key_topics]))

Microsoft Fabric integration  

AI-powered data analytics  

Real-Time Intelligence in Microsoft Fabric  

Azure OpenAI services  

Developer community engagement



#### Load & filter data

In [None]:
# load release_notes.json as a dataframe

import pandas as pd

df = pd.read_json('release_notes.json')
df.head()

Unnamed: 0,content,url,id
0,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...,0
1,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...,1
2,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...,2
3,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...,3
4,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...,4


In [15]:
"""
Cell generated by Data Wrangler.
"""
def clean_data(df):
    # Filter rows based on column: 'content'
    df = df[(df['content'].str.contains("2023", regex=False, na=False)) | (df['content'].str.contains("2024", regex=False, na=False))]
    return df

df_clean = clean_data(df.copy())
df_clean.head()

Unnamed: 0,content,url,id
3024,Learn what is new in the Visual Studio Code Ja...,https://code.visualstudio.com/updates/v1_75#_d...,3112
3025,Learn what is new in the Visual Studio Code Ja...,https://code.visualstudio.com/updates/v1_75#_t...,3113
3026,Learn what is new in the Visual Studio Code Ja...,https://code.visualstudio.com/updates/v1_75#_t...,3114
3027,Learn what is new in the Visual Studio Code Ja...,https://code.visualstudio.com/updates/v1_75#_w...,3115
3028,Learn what is new in the Visual Studio Code Ja...,https://code.visualstudio.com/updates/v1_75#_i...,3116


#### Perform text search

In [16]:
from rank_bm25 import BM25Okapi
import pandas as pd

def bm25_search(df, key_topics, top_n=10):
    # Preprocess the content
    tokenized_corpus = [doc.split(" ") for doc in df['content']]
    
    # Initialize BM25
    bm25 = BM25Okapi(tokenized_corpus)
    
    # Combine key topics into a single query
    query = " ".join(key_topics).split(" ")
    
    # Get BM25 scores
    scores = bm25.get_scores(query)
    
    # Get top N documents
    top_n_indices = scores.argsort()[-top_n:][::-1]
    top_n_docs = df.iloc[top_n_indices]
    
    return top_n_docs

# Perform the search
top_documents = bm25_search(df_clean, key_topics)
print(top_documents)

                                                content  \
3194  Learn what is new in the Visual Studio Code Ap...   
3914  Learn what is new in the Visual Studio Code Se...   
3073  Learn what is new in the Visual Studio Code Ja...   
3852  Learn what is new in the Visual Studio Code Au...   
3630  Learn what is new in the Visual Studio Code Ma...   
3370  Learn what is new in the Visual Studio Code Au...   
3818  Learn what is new in the Visual Studio Code Ju...   
3867  Learn what is new in the Visual Studio Code Au...   
3581  Learn what is new in the Visual Studio Code Fe...   
3207  Learn what is new in the Visual Studio Code Ma...   

                                                    url    id  
3194  https://code.visualstudio.com/updates/v1_78#_a...  3283  
3914  https://code.visualstudio.com/updates/v1_94#_m...  4011  
3073  https://code.visualstudio.com/updates/v1_75#_l...  3161  
3852  https://code.visualstudio.com/updates/v1_93#_r...  3947  
3630  https://code.visualstudi

## 🔢 Semantic reranking

In [17]:
# Messages to give LLM, to re-rank the documents based on semantic relevance

rerank_system_message = """
You are tasked with re-ranking a set of documents based on their relevance to given search queries. The documents have already been retrieved based on initial search criteria, but your role is to refine the ranking by considering factors such as semantic similarity to the query, context relevance, and alignment with the user's intent. Focus on documents that provide concise, high-quality information, ensuring that the top-ranked documents answer the query as accurately and completely as possible. If you can't rank them based on semantic relevance, give higher rank to documents with VS Code features that were published most recently. Make sure to return the full content and URL of each document, and format your response as a Markdown list item, with the URL in parentheses. Do not include any additional information or commentary about the documents. List a variety of documents, and give more weight to documents that mentions Python and or notebooks features.
"""

rerank_user_message=f"Here are some documents: {top_documents.to_json(orient='records')}. Re-rank those documents based on these key VS Code functionalities: {key_topics}. Only return the top 3."

In [19]:
def rerank_documents(model="gpt-4o-mini"):
    messages = [
        {"role": "system", "content": rerank_system_message},
        {"role": "user", "content": rerank_user_message}
    ]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.7,
    )

    reranked_documents = response.choices[0].message.content.split('\n')
    return reranked_documents

reranked_documents = rerank_documents()
print("\n".join([doc + "\n" for doc in reranked_documents]))


1. [Learn what is new in the Visual Studio Code September 2024 Release (1.94)](https://code.visualstudio.com/updates/v1_94#_msal-based-microsoft-authentication)  

   Content: We have been moving towards having our Microsoft Authentication stack use [MSAL (Microsoft Authentication Library)](http://github.com/AzureAD/microsoft-authentication-library-for-js). It's been a huge undertaking, but we have made great progress in this iteration. This work spans all VS Code clients, so that includes VS Code and [VS Code for the Web](https://vscode.dev).



2. [Learn what is new in the Visual Studio Code August 2024 Release (1.93)](https://code.visualstudio.com/updates/v1_93#_authentication-account-api)  

   Content: The authentication APIs now have more control when handling multiple accounts. Something that has always been missing is the ability to get all accounts and get an `AuthenticationSession` for a specific account. That is now possible with the finalization of the `getAccounts` API.




## 🧠 Social assistant, with **relevant features**

In [21]:
def generate_llm_answer(content, context, completion_model):
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content":  f"Create a very short LinkedIn post using the following content: {content}. Also, include the following established VS Code features along with their URLs in your response, so folks seeing the post can try them out: {context}."}
    ]

    response = client.chat.completions.create(
        model=completion_model,
        messages=messages,
        temperature=0.3
    )

    answer = response.choices[0].message.content
    return answer

print(generate_llm_answer(content, reranked_documents, completion_model="gpt-4o-mini"))

🚀 Exciting news! Join us for the **Microsoft Fabric and AI Learning Hackathon**! This is your chance to innovate and build the next wave of AI-powered data analytics applications using Microsoft Fabric. With $10,000 in prizes and a supportive community, it's the perfect opportunity for AI enthusiasts and cloud experts alike! 

🗓️ **Deadline:** November 12, 2024  
🔗 [Join the hackathon now!](https://devpost.com)  

While you're at it, check out some amazing features in Visual Studio Code to enhance your development experience:

1. [Learn what is new in the Visual Studio Code September 2024 Release (1.94)](https://code.visualstudio.com/updates/v1_94#_msal-based-microsoft-authentication) - Discover the progress on Microsoft Authentication using MSAL.
   
2. [Learn what is new in the Visual Studio Code August 2024 Release (1.93)](https://code.visualstudio.com/updates/v1_93#_authentication-account-api) - Explore the new `getAccounts` API for better account management.

3. [Learn what is new

#### Compare responses between chat models

In [22]:
print(generate_llm_answer(content, reranked_documents, completion_model="Mistral-small"))

📣 Exciting Opportunity Alert! 📣

Join the Microsoft Fabric and AI Learning Hackathon and build innovative AI-powered data analytics applications using Microsoft Fabric! 🚀

Deadline: Nov 12, 2024 @ 5:00pm PST
[Join the hackathon here](#)

While you're working on your project, why not enhance your coding experience with some of the latest features in Visual Studio Code?

1. [Learn what's new in the Visual Studio Code September 2024 Release (1.94)](https://code.visualstudio.com/updates/v1_94#_msal-based-microsoft-authentication)
   - Improved Microsoft Authentication using MSAL

2. [Learn what's new in the Visual Studio Code August 2024 Release (1.93)](https://code.visualstudio.com/updates/v1_93#_authentication-account-api)
   - More control over multiple accounts with the `getAccounts` API

3. [Learn what's new in the Visual Studio Code August 2024 Release (1.93)](https://code.visualstudio.com/updates/v1_93#_renamed-sql-to-ms-sql)
   - Improved JavaScript and TypeScript support on VS Cod

In [23]:
print(generate_llm_answer(content, reranked_documents, completion_model="meta-llama-3-8b-instruct"))

Here's a LinkedIn post based on the content you provided:

**Microsoft Fabric and AI Learning Hackathon: Building the Next Wave of Innovative AI-Powered Data Analytics Applications**

Calling all AI enthusiasts, cloud computing experts, and database gurus! Microsoft is excited to announce the Microsoft Fabric and AI Learning Hackathon, a unique opportunity to dive deep into the future of AI and cloud innovation.

**What to Build**

Complete the Microsoft Learn AI Skills Challenge (Microsoft Fabric) and build a new Fabric solution that leverages Azure OpenAI services and falls into one of the following categories:

* Microsoft Fabric + AI Innovation
* Real-Time Intelligence in Microsoft Fabric
* Azure Database for PostgreSQL Integration
* SQL And AI Integration
* Azure Cosmos DB + Microsoft Fabric Integration

**Prizes**

* Grand Prize Winner: $2,500 USD, social media recognition, opportunity to be featured in a Microsoft blog post, and a Microsoft participation badge
* Category Winners

#### Try it out yourself! 😄

VS Code extensions used for this demo:

![extensions](vsc_extensions.png)