# RAG exploration

![goal_post](goal_post.png)

## ðŸ¤– Social assistant, with **off the shelf model**

In [7]:
import json

# Load the content of blog.txt
with open('blog.txt', 'r') as file:
    blog_content = file.read()

# Parse the content as JSON
content = json.loads(blog_content)
# Convert the content dictionary back to a JSON string
content = json.dumps(content, indent=4)

print(content)

{
    "title": "Announcing: The Microsoft Fabric & AI Learning Hackathon | Microsoft Fabric Blog | Microsoft Fabric",
    "content": "Get ready for the Microsoft Fabric & AI Learning Hackathon!\u00a0\u00a0 We\u2019re calling all Data/AI Enthusiasts and Data/AI practitioners to join us for another exciting opportunity to upskill and build the next generation of Data + AI solutions with Microsoft Fabric!\u00a0 This event follows up on the recent Microsoft Fabric Global AI Hackathon held earlier this year where participants from all over the world upskilled their knowledge of the platform to create a variety of innovatitve project submissions.\u00a0 This time, we\u2019re getting the word out just ahead the European Microsoft Fabric Conference where you can expect exciting new features and additions to be revealed, while giving participants a most excellent reason to try them out in this Hackathon!\u00a0 The contest will be even bigger and better than ever, this time coordinating the Hacka

In [3]:
# Messages to give LLM, to create a short LinkedIn post based on a blog post

system_message = """
You are a social assistant who writes creative content. You will politely decline any other requests from the user not related to creating content. Don't talk about a single VS Code release and don't talk about release dates at all. Instead, only talk about the relevant features. Don't include made up links, but do provide real links to the VS Code release notes for specific features. You format all your responses as Markdown unless otherwise specified. Avoid wrapping your entire response in a markdown code element.
"""
messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": f"Create a very short LinkedIn post using the following: {content}"}
]

In [4]:
import os
from openai import OpenAI

token = os.environ["GITHUB_TOKEN"]
endpoint = "https://models.inference.ai.azure.com"
model_name = "gpt-4o-mini"

client = OpenAI(
    base_url=endpoint,
    api_key=token,
)

response = client.chat.completions.create(
    messages=messages,
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(response.choices[0].message.content)

ðŸš€ **Exciting News for Data/AI Enthusiasts!** ðŸš€

We're thrilled to announce the **Microsoft Fabric & AI Learning Hackathon!** This is your chance to upskill and develop innovative Data + AI solutions with Microsoft Fabric.

ðŸ“… **Why Participate?**
- Compete for a share of **$10,000 in prizes!**
- Engage in a **7-week submission period** and showcase your skills.
- Access **live support** from our experts - whether you're a beginner or a seasoned developer.

Join us and letâ€™s build the future of Data + AI together! For registration details, check out the link: [Microsoft Fabric & AI Learning Hackathon](https://microsoftfabric.devpost.com).

Canâ€™t wait to see what you create! ðŸŽ‰ 

#MicrosoftFabric #Hackathon #DataAI #Innovation #Upskill


## ðŸ“š Text search

#### Extract key topics & features

In [5]:
# Messages to give LLM, to extract key topics & features

topic_system_message = """
You are an expert at conducting entity extraction. Generate top topics and functionality based on provided content. Focus on identifying key concepts, themes, and relevant terms related to specific developer tooling, with a particular emphasis on VS Code features. Make sure entities you extract are directly relevant to the developer environment described. Don't mention specific dates or years. Use advanced search techniques, including Boolean operators and keyword variations, to craft precise, optimized queries that yield the most relevant results. Aim for clarity, relevance, and depth to cover all aspects of the topic efficiently. Simply list the phrases without additional explanation or details. Do not list any bullet points or numbered lists or quotation marks.
"""

topic_user_message="Come up with a list of top 5 developer tooling topics, functionalities, and relevant terms, with a strong focus on VS Code features and integrations based on the following content: "

In [8]:
def extract_key_topics(content, model="gpt-4o-mini"):
    messages = [
        {"role": "system", "content": topic_system_message},
        {"role": "user", "content": topic_user_message+content}
    ]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.3,
    )

    key_topics = response.choices[0].message.content.split('\n')
    return key_topics

key_topics = extract_key_topics(content)
print("\n".join([topic + "\n" for topic in key_topics]))

Microsoft Fabric integration with VS Code  

Data and AI solution development in VS Code  

Live support and interaction features in VS Code  

Video demonstration and documentation for submissions  

Managed Private Endpoints for secure data streaming in VS Code



#### Load & filter VS Code release notes

In [9]:
# load release_notes.json as a dataframe

import pandas as pd

df = pd.read_json('release_notes.json')
df.head()

Unnamed: 0,content,url,id
0,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...,0
1,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...,1
2,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...,2
3,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...,3
4,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...,4


In [10]:
"""
Cell generated by Data Wrangler.
"""
def clean_data(df):
    # Filter rows based on column: 'content'
    df = df[(df['content'].str.contains("2023", regex=False, na=False)) | (df['content'].str.contains("2024", regex=False, na=False))]
    return df

df_clean = clean_data(df.copy())
df_clean.head()

Unnamed: 0,content,url,id
3024,Learn what is new in the Visual Studio Code Ja...,https://code.visualstudio.com/updates/v1_75#_d...,3112
3025,Learn what is new in the Visual Studio Code Ja...,https://code.visualstudio.com/updates/v1_75#_t...,3113
3026,Learn what is new in the Visual Studio Code Ja...,https://code.visualstudio.com/updates/v1_75#_t...,3114
3027,Learn what is new in the Visual Studio Code Ja...,https://code.visualstudio.com/updates/v1_75#_w...,3115
3028,Learn what is new in the Visual Studio Code Ja...,https://code.visualstudio.com/updates/v1_75#_i...,3116


In [12]:
from rank_bm25 import BM25Okapi
import pandas as pd

def search_with_bm25(df, key_topics, top_n=10):
    # Tokenize the content of the dataframe
    tokenized_corpus = [doc.split(" ") for doc in df['content']]
    
    # Initialize BM25
    bm25 = BM25Okapi(tokenized_corpus)
    
    # Combine key topics into a single query
    query = " ".join(key_topics).split(" ")
    
    # Get BM25 scores for the query
    scores = bm25.get_scores(query)
    
    # Get the indices of the top_n scores
    top_n_indices = scores.argsort()[-top_n:][::-1]
    
    # Return the top_n documents
    top_n_docs = df.iloc[top_n_indices]
    return top_n_docs

# Perform the search and get the top 10 documents
top_documents = search_with_bm25(df_clean, key_topics)
print(top_documents)

                                                content  \
3468  Learn what is new in the Visual Studio Code Oc...   
3076  Learn what is new in the Visual Studio Code Ja...   
3111  Learn what is new in the Visual Studio Code Fe...   
3148  Learn what is new in the Visual Studio Code Ma...   
3461  Learn what is new in the Visual Studio Code Oc...   
3326  Learn what is new in the Visual Studio Code Ju...   
3360  Learn what is new in the Visual Studio Code Ju...   
3889  Learn what is new in the Visual Studio Code Se...   
3914  Learn what is new in the Visual Studio Code Se...   
3854  Learn what is new in the Visual Studio Code Au...   

                                                    url    id  
3468  https://code.visualstudio.com/updates/v1_84#_g...  3560  
3076  https://code.visualstudio.com/updates/v1_75#_g...  3164  
3111  https://code.visualstudio.com/updates/v1_76#_g...  3199  
3148  https://code.visualstudio.com/updates/v1_77#_....  3236  
3461  https://code.visualstudi

#### Perform text search based on extracted key topics

## ðŸ”¢ Semantic reranking

In [13]:
# Messages to give LLM, to re-rank the documents based on semantic relevance

rerank_system_message = """
You are tasked with re-ranking a set of documents based on their relevance to given search queries. The documents have already been retrieved based on initial search criteria, but your role is to refine the ranking by considering factors such as semantic similarity to the query, context relevance, and alignment with the user's intent. Focus on documents that provide concise, high-quality information, ensuring that the top-ranked documents answer the query as accurately and completely as possible. If you can't rank them based on semantic relevance, give higher rank to documents with VS Code features that were published most recently. Make sure to return the full content and URL of each document, and format your response as a Markdown list item, with the URL in parentheses. Do not include any additional information or commentary about the documents.
"""

rerank_user_message=f"Here are some documents: {top_documents.to_json(orient='records')}. Re-rank those documents based on these key VS Code functionalities: {key_topics}. Only return the top 3."

In [14]:
def rerank_documents(model="gpt-4o-mini"):
    messages = [
        {"role": "system", "content": rerank_system_message},
        {"role": "user", "content": rerank_user_message}
    ]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.3,
    )

    reranked_documents = response.choices[0].message.content.split('\n')
    return reranked_documents

reranked_documents = rerank_documents()
print("\n".join([doc + "\n" for doc in reranked_documents]))


1. [Learn what is new in the Visual Studio Code September 2024 Release (1.94) - MSAL-based Microsoft Authentication](https://code.visualstudio.com/updates/v1_94#_msal-based-microsoft-authentication)



2. [Learn what is new in the Visual Studio Code September 2024 Release (1.94) - Automated test setup (Experimental)](https://code.visualstudio.com/updates/v1_94#_automated-test-setup-experimental)



3. [Learn what is new in the Visual Studio Code October 2023 Release (1.84) - Gradle for Java](https://code.visualstudio.com/updates/v1_84#_gradle-for-java)



## ðŸ§  Social assistant, with **relevant features**

In [16]:
def generate_llm_answer(content, context, completion_model):
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content":  f"Create a very short LinkedIn post using the following content: {content}. Also, include the following additional information in your response, and always include the URLs: {context}."}
    ]

    response = client.chat.completions.create(
        model=completion_model,
        messages=messages,
        temperature=0.3
    )

    answer = response.choices[0].message.content
    return answer

print(generate_llm_answer(content, reranked_documents, completion_model="gpt-4o-mini"))

ðŸš€ **Exciting News!** ðŸŒŸ

We're thrilled to announce the **Microsoft Fabric & AI Learning Hackathon**! Calling all Data/AI enthusiasts and practitioners to join us in building the next generation of Data + AI solutions with Microsoft Fabric. 

This event follows the successful Microsoft Fabric Global AI Hackathon and coincides with the upcoming European Microsoft Fabric Conference. With **$10,000 in prizes** and a chance to learn from experts, this is an opportunity you donâ€™t want to miss!

ðŸ‘‰ Register now at [Microsoft Fabric Hackathon](https://microsoftfabric.devpost.com) and get ready to innovate!

#MicrosoftFabric #AI #Hackathon #DataScience

---

For those interested in the latest features in Visual Studio Code, check out these updates:
1. [Learn what is new in the Visual Studio Code September 2024 Release (1.94) - MSAL-based Microsoft Authentication](https://code.visualstudio.com/updates/v1_94#_msal-based-microsoft-authentication)
2. [Learn what is new in the Visual Studi

#### Compare responses between chat models

In [17]:
print(generate_llm_answer(content, reranked_documents, completion_model="Mistral-small"))

ðŸ“£ Exciting News! ðŸ“£

Join us for the Microsoft Fabric & AI Learning Hackathon! ðŸš€ Data/AI Enthusiasts and Practitioners, this is your chance to upskill and build the next generation of Data + AI solutions with Microsoft Fabric. ðŸ’¡

Following the success of the Microsoft Fabric Global AI Hackathon, we're now coordinating a bigger and better event through the DevPost Platform. With a 7-week submission period and $10,000 in prizes, there's never been a better time to learn and create! ðŸ’°

Register now at [Microsoft Fabric & AI Learning Hackathon](https://microsoftfabric.devpost.com) and start building your solution with Microsoft Fabric.

While you're expanding your skills, don't forget to check out the latest features in Visual Studio Code:
1. [MSAL-based Microsoft Authentication](https://code.visualstudio.com/updates/v1_94#_msal-based-microsoft-authentication)
2. [Automated Test Setup (Experimental)](https://code.visualstudio.com/updates/v1_94#_automated-test-setup-experiment

In [18]:
print(generate_llm_answer(content, reranked_documents, completion_model="meta-llama-3-8b-instruct"))

Here's a short LinkedIn post:

**Exciting News!**

Get ready to upskill and build the next generation of Data + AI solutions with Microsoft Fabric! The Microsoft Fabric & AI Learning Hackathon is now open for registration. This 7-week hackathon offers a total of $10,000 in prizes and is open to anyone looking to expand their learning through a special Microsoft Learn Skills Challenge focused on Microsoft Fabric.

Whether you're a beginner or a seasoned maker, all are welcome to participate! To learn more and register, visit [https://microsoftfabric.devpost.com](https://microsoftfabric.devpost.com).

**What's new in Visual Studio Code?**

* [Learn what is new in the Visual Studio Code September 2024 Release (1.94) - MSAL-based Microsoft Authentication](https://code.visualstudio.com/updates/v1_94#_msal-based-microsoft-authentication)
* [Learn what is new in the Visual Studio Code September 2024 Release (1.94) - Automated test setup (Experimental)](https://code.visualstudio.com/updates/v1