### Semantic Kernel
[Semantic Kernel](https://github.com/microsoft/semantic-kernel/tree/main)  is an SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. Semantic Kernel achieves this by allowing you to define plugins that can be chained together in just a few lines of code.

#### Question Answering over the docs/index
Question answering in this context refers to question answering over your document data.  For question answering over many documents, you almost always want to create an index over the data. This can be used to smartly access the most relevant documents for a given question, allowing you to avoid having to pass all the documents to the LLM (saving you time and money).

#### Set Environment Variables

In [2]:
import os  
import json  
import openai
from Utilities.envVars import *

# Set Search Service endpoint, index name, and API key from environment variables
indexName = "skindex"

# Set OpenAI API key and endpoint
openai.api_type = "azure"
openai.api_version = OpenAiVersion
openai_api_key = OpenAiKey
assert openai_api_key, "ERROR: Azure OpenAI Key is missing"
openai.api_key = openai_api_key
openAiEndPoint = f"{OpenAiEndPoint}"
assert openAiEndPoint, "ERROR: Azure OpenAI Endpoint is missing"
openai.api_base = openAiEndPoint

#### Generate answer for a question from the document we already indexed in Vector Store

In [4]:
# Import required libraries
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion, OpenAITextEmbedding, AzureChatCompletion, AzureTextEmbedding
from semantic_kernel.connectors.ai.open_ai import (
    AzureTextCompletion,
    AzureTextEmbedding,
)
from semantic_kernel.connectors.memory.azure_cognitive_search import (
    AzureCognitiveSearchMemoryStore,
)
embeddingModelType = "azureopenai"


In [5]:
kernel = sk.Kernel()

# Configure AI service used by the kernel
if embeddingModelType == "azureopenai":
    #deployment, api_key, endpoint = sk.azure_openai_settings_from_dot_env()
    kernel.add_chat_service("chat_completion", AzureChatCompletion(OpenAiChat, OpenAiEndPoint, OpenAiKey))
    # next line assumes embeddings deployment name is "text-embedding-ada-002", adjust this if  appropriate 
    kernel.add_text_embedding_generation_service("ada", AzureTextEmbedding(deployment_name=OpenAiEmbedding,
            endpoint=OpenAiEndPoint,
            api_key=OpenAiKey))
    kernel.add_text_completion_service(
        "dv",
        AzureTextCompletion(
            deployment_name=OpenAiEmbedding,
            endpoint=OpenAiEndPoint,
            api_key=OpenAiKey,
        ),
    )
else:
    #api_key, org_id = sk.openai_settings_from_dot_env()
    kernel.add_chat_service("chat-gpt", OpenAIChatCompletion("gpt-3.5-turbo", OpenAiApiKey, ""))
    kernel.add_text_embedding_generation_service("ada", OpenAITextEmbedding("text-embedding-ada-002", OpenAiApiKey, ""))

In [7]:
# kernel.register_memory_store(memory_store=sk.memory.VolatileMemoryStore())
# kernel.import_skill(sk.core_skills.TextMemorySkill())
vectorSize = 1536
connector = AzureCognitiveSearchMemoryStore(
        vector_size=vectorSize, search_endpoint=f"https://{SearchService}.search.windows.net", admin_key=SearchKey
    )
# Register the memory store with the kernel
kernel.register_memory_store(memory_store=connector)

In [28]:
# Now lets augment the LLM query with retrieval from the local vector DB with the RAG (Retrieval Augmented Generation) pattern
# The prompt below should be self explanatory of what we are trying to do with this RAG pattern
async def RagQnA(kernel, query, limit, relevanceScore, promptOverride=None) -> str:
    # Step1: Retrieval: Get list of documents from local DB matching the query
    docs = await kernel.memory.search_async(collection=indexName, query=query, limit=limit, min_relevance_score=relevanceScore)
    # Step2: Augment: Construct the augmented prompt from the retrieved document. Retrieved docs separated by triple backticks to make it easy for LLM to instruct
    qdocs = "\n```\n".join([docs[i].text for i in range(len(docs))])
    
    if promptOverride is None:
        prompt = """{{ $qdocs}} 
        
        Question: Please query above documents delimited by triple backticks for {{ $query }} 
        and respond back with answer only from the above documents delimited by triple backticks.
        """
    else:
        prompt = promptOverride
    
    # Step3: Generation: Generate a summary and markdown formatted output as requested in the prompt from the LLM API
    summarize = kernel.create_semantic_function(prompt, temperature=0.0)
    context_variables = sk.ContextVariables(variables={
        "qdocs": qdocs,
        "query": query
    })
    response = summarize(variables=context_variables)
    return response

In [29]:
# We already created our index and loaded the data, so we can skip that part. Let's try to ask a question:
# Question answering involves fetching multiple documents, and then asking a question of them. 
# The LLM response will contain the answer to your question, based on the content of the documents.

topK = 3
query = "What is Microsoft Fabric"

result = await RagQnA(kernel, query,topK, 0.3)

print(result)

Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place. With Fabric, you don't need to piece together different services from multiple vendors. Instead, you can enjoy a highly integrated, end-to-end, and easy-to-use product that is designed to simplify your analytics needs. The platform is built on a foundation of Software as a Service (SaaS), which takes simplicity and integration to a whole new level.


In [33]:
template = """
            Given the following extracted parts of a long document and a question, create a final answer. 
            If you don't know the answer, just say that you don't know. Don't try to make up an answer. 
            If the answer is not contained within the text below, say \"I don't know\".

            QUESTION: {{ $query }}
            =========
            {{ $qdocs}}
            =========
            """
topK = 3
query = "What is Microsoft Fabric"

result = await RagQnA(kernel, query,topK, 0.3, template)

print(result)

Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place. It is built on a foundation of Software as a Service (SaaS) and is designed to simplify analytics needs by providing a highly integrated and easy-to-use product.


In [34]:
# Let's generate followup questions
followupTemplate = """
    Generate three very brief questions that the user would likely ask next.
    Use double angle brackets to reference the questions, e.g. <What is Azure?>.
    Try not to repeat questions that have already been asked.  Don't include the context in the answer.

    Return the questions in the following format:
    <>
   
    ALWAYS return a "NEXT QUESTIONS" part in your answer.

    {{ $qdocs}}
    """

# We already created our index and loaded the data, so we can skip that part. Let's try to ask a question:
# Question answering involves fetching multiple documents, and then asking a question of them. 
# The LLM response will contain the answer to your question, based on the content of the documents.

topK = 3
query = "What is Microsoft Fabric"

result = await RagQnA(kernel, query,topK, 0.3, followupTemplate)

print(result)

<What are the components of Microsoft Fabric?>
<What are the advantages of using Microsoft Fabric's SaaS foundation?>
<Can you explain the PDF experience in Microsoft Fabric?>


#### How about we ask a question for which the answer is not in the document we have indexed in Vector Store

In [35]:
topK = 3
query = "Tell me a Joke"
result = await RagQnA(kernel, query,topK, 0.3)
print(result)

I'm sorry, but I am an AI language model and I cannot generate jokes. However, I can assist you with any questions or information you may need.


#### What if we don't want to have LLM answer the question outside of the document we have indexed in Vector Store. We can use the custom prompt to do that.

In [36]:
topK = 3
query = "Who is the CEO of Microsoft"

result = await RagQnA(kernel, query,topK, 0.3)
print(result)

The CEO of Microsoft is not mentioned in the provided documents.
