## Setup & Configuration

### Prerequisites

**Azure Cognitive Search**

Details for creating an Azure Cognitive Search Service are available in [Create an Azure Cognitive Search service in the portal](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal).  Once your service is created return to this notebook for creating and populating the index.


**Azure OpenAI**

You must first have setup an Azure OpenAI Service - for instructions to set this up, please see [Create and deploy an Azure OpenAI Service resource](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal).  

Two types of large language models are used for the document chatbot system and must be deployed in your Azure OpenAI Service: <br/>
1) Similarity [Embeddings](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/understand-embeddings) model designed for creating the embeddings used for finding similarity between snippets of text.<br/>
2) [GPT-35-Turbo or GPT-4](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions) model for the conversational interface <br/>

This example was built using the **text-embedding-ada-002** model for similarity embeddings and **gpt-35-turbo** for the conversational functionality. 

Before continuing with this notebook, confirm that deployments exist for these types of models in your Azure OpenAI service.  

### Environment Setup

In [2]:
# Set this up to support Vector search 
#pip install azure-search-documents --pre  [does not always work]

#pip install azure-search-documents==11.4.0b8 [this version confirmed to work]

#pip install openai

In [3]:
import json
import requests
import os
from tenacity import retry, wait_random_exponential, stop_after_attempt  

from azure.core.credentials import AzureKeyCredential

import openai


Reference [Official Azure Vector Search Documentation Sample](https://github.com/Azure/cognitive-search-vector-pr/blob/main/demo-python/code/azure-search-vector-python-sample.ipynb)

In [4]:
from azure.search.documents import SearchClient  
from azure.search.documents.indexes import SearchIndexClient  
from azure.search.documents.models import Vector  
from azure.search.documents.indexes.models import (  
    SearchIndex,  
    SearchField,  
    SearchFieldDataType,  
    SimpleField,  
    SearchableField,  
    SearchIndex,  
    SemanticConfiguration,  
    PrioritizedFields,  
    SemanticField,  
    SearchField,  
    SemanticSettings,  
    VectorSearch,  
    HnswVectorSearchAlgorithmConfiguration,  
)  

## Parameter Configuration

Edit the config_TEMPLATE.json file with the information about the services created in the Prerequisite step, then rename the file to config.json.

~~~
{
    "OPENAI_API_BASE": "https://YOUR_AOAI_SERVICE.openai.azure.com/",
    "OPENAI_API_KEY":"YOUR_AOAi_KEY",
    "OPENAI_API_VERSION":"2023-05-15",
    "GPT_MODEL":"YOUR_GPT_MODEL_DEPLOYMENT",
    "EMBEDDING_MODEL":"YOUR_EMBEDDING_MODEL_DEPLOYMENT",
    "AZSEARCH_KEY": "YOUR_AZURE_COG_SEARCH_KEY",
    "AZSEARCH_INDEX_NAME": "YOUR_COG_SEARCH_INDEX_NAME",
    "AZSEARCH_ENDPOINT": "https://YOUR_SEARCH.search.windows.net/",
    "AZSEARCH_API_VERSION": "2023-07-01-preview"
}
~~~

**Azure OpenAI Configuration**

Use the [Azure Portal](https://portal.azure.com) to navigate to your Azure OpenAI Service.

1. Under "Resource Management" -> "Keys and Endpoints" use"Endpoint" value for OPENAI_API_BASE. 
2. Also under "Resource Management" -> "Keys and Endpoints" use one of the "Keys" values for OPENAI_API_KEY 
3. The versions supported for Azure OpenAI API are listed in [Chat completions API Reference](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/reference#chat-completions).  The "2023-05-15" version was used for this example.
4. Use the **deployment names** for the EMBEDDING_MODEL and GPT_MODEL you deployed in your Azure OpenAI Service.


**Azure Cognitive Search Configuration** 

Use the [Azure Portal](https://portal.azure.com) to navigate to your Cognitive Search service.  

1. On the overview page, you will find the **Url** which is the "AZSEARCH_ENDPOINT" 
2. Under "Settings" -> "Keys" use either the Primary or Secondary admin key as your "AZSEARCH_KEY".
3. The "AZSEARCH_INDEX_NAME" is the name of the index you want to create in Cognitive Search.  
4. In order to use Vector Search, you will want to use the "AZSEARCH_API_VERSION" of "2023-07-01-preview", as of August 2023. 


The next step loads the configurations from config.json file to setup the values to be used within the notebook.

In [5]:
# Load config values
with open(r'config.json') as config_file:
    config_details = json.load(config_file)

# Azure OpenAI Resources
openai_api_base = config_details['OPENAI_API_BASE']
openai_api_key = config_details["OPENAI_API_KEY"]
openai_api_version = config_details['OPENAI_API_VERSION']

chat_model_deployment = config_details['GPT_MODEL']
embeddings_model_deployment = config_details['EMBEDDING_MODEL']

# Azure Cognitive Search Resources
AZSEARCH_KEY = config_details['AZSEARCH_KEY']
AZSEARCH_ENDPOINT= config_details['AZSEARCH_ENDPOINT']
AZSEARCH_INDEX_NAME = config_details['AZSEARCH_INDEX_NAME']
AZSEARCH_API_VERSION = config_details['AZSEARCH_API_VERSION']

In [6]:
openai.api_base = openai_api_base
openai.api_key = openai_api_key
openai.api_version = openai_api_version
openai.api_type = "azure"

In [13]:
#Establish for connectivity to Azure Cognitive Search throughout
credential = AzureKeyCredential(AZSEARCH_KEY)

## Create Index

For more information on the settings for creating a search index see and details at [Create an index in Azure Cognitive Search](https://learn.microsoft.com/en-us/azure/search/search-how-to-create-search-index?tabs=index-other-sdks) and the Vector Configuration at [Add a vector field to the fields collection](https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-create-index?tabs=portal-add-field%2Cpull%2Crest-check-index#add-a-vector-field-to-the-fields-collection).

See the [python SDK reference](https://learn.microsoft.com/en-us/python/api/azure-search-documents/azure.search.documents.indexes.searchindexclient?view=azure-python).

In [14]:
# Delete index if needed to recreate it.

index_client.delete_index(AZSEARCH_INDEX_NAME)

In [15]:
# Create a search index
index_client = SearchIndexClient(
    endpoint=AZSEARCH_ENDPOINT, credential=credential)
fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True),
    SearchableField(name="title", type=SearchFieldDataType.String),
    SearchableField(name="content", type=SearchFieldDataType.String),
    SearchableField(name="category", type=SearchFieldDataType.String,
                    filterable=True),
    SearchField(name="titleVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, vector_search_dimensions=1536, vector_search_configuration="azure-docs-vector-config"),
    SearchField(name="contentVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, vector_search_dimensions=1536, vector_search_configuration="azure-docs-vector-config"),
]

vector_search = VectorSearch(
    algorithm_configurations=[
        HnswVectorSearchAlgorithmConfiguration(
            name="azure-docs-vector-config",
            kind="hnsw",
            parameters={
                "m": 4,
                "efConstruction": 400,
                "efSearch": 500,
                "metric": "cosine"
            }
        )
    ]
)

semantic_config = SemanticConfiguration(
    name="azure-docs-semantic-config",
    prioritized_fields=PrioritizedFields(
        title_field=SemanticField(field_name="title"),
        prioritized_keywords_fields=[SemanticField(field_name="category")],
        prioritized_content_fields=[SemanticField(field_name="content")]
    )
)

# Create the semantic settings with the configuration
semantic_settings = SemanticSettings(configurations=[semantic_config])

# Create the search index with the semantic settings
index = SearchIndex(name=AZSEARCH_INDEX_NAME, fields=fields,
                    vector_search=vector_search, semantic_settings=semantic_settings)
result = index_client.create_or_update_index(index)
print(f' {result.name} created')

 azure-services-documentation created


## Generate Embeddings

Read your data, generate OpenAI embeddings and export to a format to insert into your Azure Cognitive Search index:

In [10]:
# Generate Document Embeddings using OpenAI Ada 002

@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
# Function to generate embeddings for title and content fields, also used for query embeddings
def generate_embeddings(text):
    response = openai.Embedding.create(
        input=text, engine=embeddings_model_deployment)
    embeddings = response['data'][0]['embedding']
    return embeddings


In [12]:
# Read the text-sample.json
with open('text-sample.json', 'r', encoding='utf-8') as file:
    input_data = json.load(file)

# Generate embeddings for title and content fields
for item in input_data:
    title = item['title']
    content = item['content']
    title_embeddings = generate_embeddings(title)
    content_embeddings = generate_embeddings(content)
    item['titleVector'] = title_embeddings
    item['contentVector'] = content_embeddings

# Output embeddings to json file
with open("text-sample-vector.json", "w") as f:
    json.dump(input_data, f)

## Load Documents in Index

Adds text and metadata from the JSON data to the vector store:

In [22]:
# Upload some documents to the index
with open('text-sample-vector.json', 'r') as file:  
    documents = json.load(file)  
search_client = SearchClient(endpoint=AZSEARCH_ENDPOINT, index_name=AZSEARCH_INDEX_NAME, credential=credential)
result = search_client.upload_documents(documents)  
print(f"Uploaded {len(documents)} documents") 

Uploaded 108 documents


## Test Queries

In [9]:
# Pure Vector Search
query = "tools for software development"  
  
search_client = SearchClient(AZSEARCH_ENDPOINT, AZSEARCH_INDEX_NAME, credential=credential)
vector = Vector(value=generate_embeddings(query), k=3, fields="contentVector")
  
results = search_client.search(  
    search_text=None,  
    vectors= [vector],
    select=["title", "content", "category"],
)  
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  

Title: Azure DevOps
Score: 0.82971567
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.81866795
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports various platforms, suc

In [10]:
# Pure Vector Search multi-lingual (e.g 'tools for software development' in Dutch)  
query = "tools voor softwareontwikkeling"  
  
search_client = SearchClient(AZSEARCH_ENDPOINT, AZSEARCH_INDEX_NAME, credential=credential)
vector = Vector(value=generate_embeddings(query), k=3, fields="contentVector")  
  
results = search_client.search(  
    search_text=None,  
    vectors=[vector],
    select=["title", "content", "category"],
)  
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  

Title: Azure DevOps
Score: 0.80402035
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.79679435
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports various platforms, suc

In [11]:
# Cross-Field Vector Search
query = "tools for software development"  
  
search_client = SearchClient(AZSEARCH_ENDPOINT, AZSEARCH_INDEX_NAME, credential=credential)  
vector = Vector(value=generate_embeddings(query), k=3, fields="titleVector, contentVector")  
  
results = search_client.search(  
    search_text=None,  
    vectors=[vector],
    select=["title", "content", "category"],
)  
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  

Title: Azure DevOps
Score: 0.03333333507180214
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.032786883413791656
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports var

In [12]:
# Pure Vector Search with Filter
query = "tools for software development"  
  
search_client = SearchClient(AZSEARCH_ENDPOINT, AZSEARCH_INDEX_NAME, credential=credential)  
vector = Vector(value=generate_embeddings(query), k=3, fields="contentVector")  

results = search_client.search(  
    search_text=None,  
    vectors=[vector],
    filter="category eq 'Developer Tools'",
    select=["title", "content", "category"]
)  
  
for result in results:  
    print(f"Title: {result['title']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['content']}")  
    print(f"Category: {result['category']}\n")  

Title: Azure DevOps
Score: 0.82971567
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.81866795
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports various platforms, suc

In [13]:
# Semantic Hybrid Search
query = "what is azure sarch?"

search_client = SearchClient(AZSEARCH_ENDPOINT, AZSEARCH_INDEX_NAME, credential=credential)
vector = Vector(value=generate_embeddings(query), k=3, fields="contentVector")  

results = search_client.search(  
    search_text=query,  
    vectors=[vector],
    select=["title", "content", "category"],
    query_type="semantic", query_language="en-us", semantic_configuration_name='azure-docs-semantic-config', query_caption="extractive", query_answer="extractive",
    top=3
)

semantic_answers = results.get_answers()
for answer in semantic_answers:
    if answer.highlights:
        print(f"Semantic Answer: {answer.highlights}")
    else:
        print(f"Semantic Answer: {answer.text}")
    print(f"Semantic Answer Score: {answer.score}\n")

for result in results:
    print(f"Title: {result['title']}")
    print(f"Content: {result['content']}")
    print(f"Category: {result['category']}")

    captions = result["@search.captions"]
    if captions:
        caption = captions[0]
        if caption.highlights:
            print(f"Caption: {caption.highlights}\n")
        else:
            print(f"Caption: {caption.text}\n")

Semantic Answer: Azure Cognitive Search is<em> a fully managed search-as-a-service that enables you to build rich search experiences for your applications.</em> It provides features like full-text search, faceted navigation, and filters. Azure Cognitive Search supports various data sources, such as Azure SQL Database, Azure Blob Storage, and Azure Cosmos DB.
Semantic Answer Score: 0.9462890625

Title: Azure Stack Edge
Content: Azure Stack Edge is a managed, edge computing appliance that enables you to run Azure services and AI workloads on-premises or at the edge. It provides features like hardware-accelerated machine learning, local caching, and integration with Azure IoT Hub. Azure Stack Edge supports various Azure services, such as Azure Functions, Azure Machine Learning, and Azure Kubernetes Service. You can use Azure Stack Edge to build edge computing applications, optimize your data processing, and ensure the security and compliance of your workloads. It also integrates with othe