# RAG exploration

![goal_post](goal_post.png)

## 🤖 Social assistant, with **off the shelf model**

In [1]:
url='https://microsoftfabric.devpost.com/'

In [2]:
import requests
from bs4 import BeautifulSoup
import json

# Fetch the content from the URL
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Extract the content of the blog post
blog_post_content = soup.get_text()

# Convert the content to JSON
content = json.dumps({"content": blog_post_content})

print(content)

{"content": "\n\n\n\n  \n\n\n\n\n\n\n\n\n\n\nMicrosoft Fabric and AI Learning Hackathon: Building the next wave of innovative AI powered data analytics applications with Microsoft Fabric - Devpost\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n\n\n\n\n      Log in\n \n\n\n\n        Sign up\n      \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\nDevpost\n\n\nHackathons\nProjects\nHost a public hackathon\n\n\n\n\n\n\nDevpost for Teams\n\n\nTeams login\nRequest a demo\n\n\n\n\n\n    Hackathons\n\n\n\n\n    Projects\n\n\n\nBlog\n\n\n\n\n    Host a hackathon\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nProduct\n\n\n\n\n\n\n\nDevpost\nGrow your developer ecosystem and promote your platform.\n\n\n\nHackathons\nProjects\nHost a public hackathon\n\n\n\n\n\n\nDevpost for Teams\nDrive innovation, collaboration, and retention within your organization.\n\n\n\nTeams login\nRequest a demo\n\n\n\n\n\n\n\n\n\n    Hackathons\n\n\n\n\n    Projects\n

In [3]:
# Messages to give LLM, to create a short LinkedIn post based on a blog post

system_message = """
You are a social assistant who writes creative content. You will politely decline any other requests from the user not related to creating content. Don't talk about a single VS Code release and don't talk about release dates at all. Instead, only talk about the relevant features. Don't include made up links, but do provide real links to the VS Code release notes for specific features. You format all your responses as Markdown unless otherwise specified. Avoid wrapping your entire response in a markdown code element.
"""
messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": f"Create a very short LinkedIn post using the following: {content}"}
]

## 📚 Text search

#### Extract key topics & features

In [5]:
# Messages to give LLM, to extract key topics & features

topic_system_message = """
You are an expert at conducting entity extraction. Generate top topics and functionality based on provided content. Focus on identifying key concepts, themes, and relevant terms related to specific developer tooling, with a particular emphasis on VS Code features. Make sure entities you extract are directly relevant to the developer environment described. Don't mention specific dates or years. Use advanced search techniques, including Boolean operators and keyword variations, to craft precise, optimized queries that yield the most relevant results. Aim for clarity, relevance, and depth to cover all aspects of the topic efficiently. Simply list the phrases without additional explanation or details. Do not list any bullet points or numbered lists or quotation marks.
"""

topic_user_message="Come up with a list of top 5 developer tooling topics, functionalities, and relevant terms, with a strong focus on VS Code features and integrations based on the following content: "

In [6]:
def extract_key_topics(content, model="gpt-4o-mini"):
    messages = [
        {"role": "system", "content": topic_system_message},
        {"role": "user", "content": topic_user_message+content}
    ]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.3,
    )

    key_topics = response.choices[0].message.content.split('\n')
    return key_topics

key_topics = extract_key_topics(content)
print("\n".join([topic + "\n" for topic in key_topics]))

Microsoft Fabric integration  

Azure OpenAI services  

Real-Time Intelligence in data analytics  

AI-driven analytics and insights  

Database integrations with Azure SQL and PostgreSQL



#### Load & filter data

In [None]:
# load split_docs_contents.json as a dataframe

Unnamed: 0,content,url
0,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...
1,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...
2,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...
3,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...
4,See what is new in the Visual Studio Code July...,https://code.visualstudio.com/updates/July_201...


#### Perform text search

## 🔢 Semantic reranking

In [10]:
# Messages to give LLM, to re-rank the documents based on semantic relevance

rerank_system_message = """
You are tasked with re-ranking a set of documents based on their relevance to given search queries. The documents have already been retrieved based on initial search criteria, but your role is to refine the ranking by considering factors such as semantic similarity to the query, context relevance, and alignment with the user's intent. Focus on documents that provide concise, high-quality information, ensuring that the top-ranked documents answer the query as accurately and completely as possible. If you can't rank them based on semantic relevance, give higher rank to documents with VS Code features that were published most recently. Make sure to return the full name of the feature and URL of each release note document, and format your response as a Markdown list item, with the URL in parentheses. Do not include any additional information or commentary about the documents. List a variety of documents, and give more weight to documents that mention Python and or notebooks features. Only return the top 3 documents and reference them by the feature name, not the release version or date.
"""

rerank_user_message=f"Here are some documents: {top_10_docs.to_json(orient='records')}. Re-rank those documents based on these key VS Code functionalities: {key_topics}. Only return the top 3 documents."

In [11]:
def rerank_documents(model="gpt-4o-mini"):
    # Truncate the user message to fit within the token limit
    max_length = 7500  # Adjust this value as needed to fit within the token limit
    truncated_user_message = rerank_user_message[:max_length]

    messages = [
        {"role": "system", "content": rerank_system_message},
        {"role": "user", "content": truncated_user_message}
    ]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.7,
    )

    reranked_documents = response.choices[0].message.content.split('\n')
    return reranked_documents

reranked_documents = rerank_documents()
print("\n".join([doc + "\n" for doc in reranked_documents]))


- **Hiding Prelaunch Task Popup** [Learn what is new in the Visual Studio Code October 2024 Release](https://github.com/microsoft/vscode/releases/tag/1.95)

- **Run Mypy in the Directory of Nearest Pyproject.toml or Mypy.ini** [Learn what is new in the Visual Studio Code October 2024 Release](https://github.com/microsoft/vscode/releases/tag/1.95)

- **Document the Hide Property** [Learn what is new in the Visual Studio Code October 2024 Release](https://github.com/microsoft/vscode/releases/tag/1.95)



## 🧠 Social assistant, with **relevant features**

In [None]:
def generate_llm_answer(content, context, completion_model):
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content":  f"Create a very short LinkedIn post using the following content: {content}. Also, include the following established VS Code features along with their URLs in your response, so folks seeing the post can try them out: {context}."}
    ]

    response = client.chat.completions(
        model=completion_model,
        messages=messages,
        temperature=0.3
    )

    answer = response.choices[0].message.content
    return answer

print(generate_llm_answer(content, reranked_documents, completion_model="gpt-4o-mini"))

🚀 Exciting news! Join the **Microsoft Fabric and AI Learning Hackathon** and be part of the next wave of innovative AI-powered data analytics applications. This is a fantastic opportunity to leverage Microsoft Fabric and Azure OpenAI services to create solutions that make a real-world impact. 

🗓️ **Deadline:** November 12, 2024  
💰 **Prizes:** $10,000 in total!

Whether you're an AI enthusiast or a cloud computing expert, this hackathon is your chance to showcase your skills and creativity. Don't miss out on the live sessions and workshops to enhance your knowledge!

👉 [Join the hackathon now!](https://devpost.com/software/microsoft-fabric-and-ai-learning-hackathon)

Also, check out these great features in Visual Studio Code to enhance your development experience:
- **Hiding Prelaunch Task Popup** [Learn what is new in the Visual Studio Code October 2024 Release](https://github.com/microsoft/vscode/releases/tag/1.95)
- **Run Mypy in the Directory of Nearest Pyproject.toml or Mypy.ini*

#### Compare responses between chat models

In [14]:
print(generate_llm_answer(content, reranked_documents, completion_model="Mistral-small"))

🎉 Join the Microsoft Fabric and AI Learning Hackathon! 🤖🔍

Build innovative AI-powered data analytics applications with Microsoft Fabric and win up to $2,500 USD! 🏆

Learn more about the hackathon and register here: [Microsoft Fabric and AI Learning Hackathon](https://devpost.com/software/microsoft-fabric-and-ai-learning-hackathon)

While you're at it, check out these new features in Visual Studio Code October 2024 Release:

- **Hiding Prelaunch Task Popup**: No more interruptions! Now you can hide the prelaunch task popup. [Learn more](https://github.com/microsoft/vscode/releases/tag/1.95)
- **Run Mypy in the Directory of Nearest Pyproject.toml or Mypy.ini**: Streamline your workflow with this new feature. [Learn more](https://github.com/microsoft/vscode/releases/tag/1.95)
- **Document the Hide Property**: Improve your code readability with this update. [Learn more](https://github.com/microsoft/vscode/releases/tag/1.95)

Happy coding! 💻


In [15]:
print(generate_llm_answer(content, reranked_documents, completion_model="meta-llama-3-8b-instruct"))

Here's a LinkedIn post based on the provided content:

**Microsoft Fabric and AI Learning Hackathon: Building the next wave of innovative AI powered data analytics applications**

Are you ready to dive deep into the future of AI and cloud innovation? The Microsoft Fabric and AI Learning Hackathon is your chance to showcase your skills and creativity while building the next wave of innovative AI powered data analytics applications.

**What to Build**

Complete the Microsoft Learn AI Skills Challenge (Microsoft Fabric) and build a new Fabric solution that leverages Azure OpenAI services and falls into one of the following hackathon categories:

* Microsoft Fabric + AI Innovation
* Real-Time Intelligence in Microsoft Fabric
* Azure Database for PostgreSQL Integration
* SQL And AI Integration
* Azure Cosmos DB + Microsoft Fabric Integration

**What to Submit**

* Provide a URL to your code repository for judging and testing
* Include a video (about 3-5 minutes) that demonstrates your submi

#### Try it out yourself! 😄

VS Code extensions used for this demo:

![extensions](vsc_extensions.png)