# Creating a SharePoint Chatbot

## Overview

In previous tutorials we have learned how to create chatbots by connecting to multiple Azure services such as Azure AI Search and Azure SQL Databases. For this tutorial we learn how to create a SharePoint chatbot by indexing a that SharePoint site using Azure AI Search.

## Prerequisites

We assume you have access to Azure AI Studio and have already deployed an LLM. For this tutorial we used gpt 3.5 and used the Python 3.10 kernel within our Azure Jupyter notebook.

## Learning objectives

In this tutorial you will learn:
- Creating a SharePoint Site (NIH users) 
- Setting up Azure AI Search with a SharePoint data source
- Connecting our chatbot to our SharePoint Index 

## Get started

### Install required packages.

In [None]:
pip install "openai" "requests" "python-dotenv"

### Import required libraries.

In [None]:
from openai import AzureOpenAI
import requests
import json
import requests

### Download Articles

For this tutorial we will be downloading scientific articles related to COVID, which our model will then use as refrences to answer our questions.

In [None]:
articles_urls = ['https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10781091/pdf/pone.0285645.pdf', 
                 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10219649/pdf/elife-86014.pdf',
                 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10734909/pdf/pone.0285351.pdf',
                 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10684592/pdf/41598_2023_Article_47655.pdf', 
                 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10601201/pdf/12889_2023_Article_16916.pdf',
                 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516599/pdf/elife-86043.pdf',
                 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10620090/pdf/41586_2023_Article_6651.pdf',
                 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10414557/pdf/pone.0289774.pdf',
                 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355333/pdf/aids-37-1565.pdf',
                 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10289397/pdf/pone.0286297.pdf']

In [None]:
import subprocess
for url in articles_urls:
    subprocess.run(f'wget --user-agent="Chrome" {url}', shell=True, executable="/bin/bash")

### Setting up a SharePoint Site

In order for us to index documents in a SharePoint site the documents must be located in a document library or a subsite. You can create your document library in two ways directly via a **SharePoint site** or **Microsoft Teams channel.**

If you would like to create a **SharePoint site** to index and you are a NIH user follow the section below for instructions. If you are a non-NIH user follow the instructions listed [here](https://support.microsoft.com/en-us/office/create-a-site-in-sharepoint-4d1e11bf-8ddc-499d-b889-2b48d10b1ce8). Once your SharePoint site is up upload the articles we just downloaded.

If you have a **Microsoft Teams channel** click on the three dots to open the channel as a SharePoint site. If your articles are not already in a document library you can create one by clicking `+ New` then select `Document Library`. Give your library a name then upload the articles we just downloaded.


Otherwise if you already have a SharePoint site feel free to skip to the next section.

#### Optional: Requesting a SharePoint Site (for NIH users only)

Submit a ticket through the IT Service Desk to create a new SharePoint site by going to Request a Service and under Collaboration and Conferencing go to `Microsoft 365 Services > SharePoint Request`. Once you are on the SharePoint form under `Select request type` select **Provision new SharePoint Online Site** then fillout the rest of the necessary information.

### Create Azure AI Search Service

Enter in the name you would like for your AI Search service and index along with the name of your resource group and the location you would like your index to be held in.

In [None]:
service_name='<Your Service Name>'
location = 'eastus2'
resource_group = '<Your Resource Group>'

Authenticate to use Azure CLI, follow the outputs instructions.

In [None]:
! az login

Create your Azure AI Search service. We will be using the free tier that holds 50MB of memory and allows you to create up to 3 indexes.

In [None]:
! az search service create --name {service_name} --sku free --location {location} --resource-group {resource_group} --partition-count 1 --replica-count 1

Save the key to a JSON file and then we will save the value to our index_service_key variable.

In [None]:
! az search admin-key show --resource-group {resource_group} --service-name {service_name} > keys.json

In [None]:
#save the key to a variable
with open('keys.json', mode='r') as f:
    data = json.load(f)
index_service_key = data["primaryKey"]

### Setting up Permissions

If you are a non-NIH user follow steps 1-3 listed [here](https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online). NIH user should follow the instructions listed below.

1. Sign in to the [Azure portal](https://portal.azure.com/).

2. Search for or navigate to Microsoft Entra ID, then select **App registrations**.

3. Select **+ New registration**:

4. Provide a name for your app. NIH users must structure their app name like so `Your-NIH-IC-Name App-Name`
5. Select **Single tenant**.
6. Skip the URI designation step. No redirect URI required.
7. Select **Register**.
8. On the left, select **API permissions**, then **Add a permission**, then **Microsoft Graph**
9. Select Delegated permissions and add the following permissions:
    - Delegated - Files.Read.All
    - Delegated - Sites.Read.All
    - Delegated - User.Read     

Next you will need to obtain admin consent to grant the permissions. NIH users should submit a ticket by going to their IT service then under Enterprise Cloud Platforms > Cloud Operations Support Request. Filling in the following:
- **Cloud Service Provider:** Azure
- **Request type:** Identity and Role-based Access
- **Account name and number:** Enter in your Azure subscription name and number
- **Additional information:** Type in your application ID and name

### Create Azure AI Data Store, Index, and Indexer

Because SharePoint indexing through Azure AI Search is still in preview mode we will be using curl commands via python to send our requests.

In [None]:
datastore_name = "<Your Data Store Name>"
index_name = "<Your Index Name>"
indexer_name = "<Your Indexer Name>"
app_id = "<Your Application ID>"
tenant_id = "<Your Account Tenant ID>" #You can find this by navigating to Microsoft Entra ID via the Azure console
sharepoint_endpoint = "<Your SharePoint Site URL>" #This is the home site URL

First we will create our **datastore** which will connect our SharePoint site to Azure AI Search. Notice that within the payload variable the **container** key holds values for a query this is because we want to query all documents within our **document library site**. For other types of queries see the Query section within the [Azure SharePoint Index documentation](https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online#query).

In [None]:
#DATASTORE
url = f"https://{service_name}.search.windows.net/datasources?api-version=2023-10-01-Preview"
payload = {"name" : f"{datastore_name}", "type" : "sharepoint", 
           "credentials" : { "connectionString" : f"SharePointOnlineEndpoint={sharepoint_endpoint};ApplicationId={app_id};TenantId={tenant_id}" }, 
           "container" : { "name" : "useQuery", "query" : f"includeLibrary={library}"}}

headers = {"Content-Type": "application/json", "api-key": f"{index_service_key}"}
r = requests.post(url, json=payload, headers=headers)
r.json()

Next we will create our **index** and identify and structure the fields which will represent data that well retrieve. Notice some fields also filter, sort, and/or faceting which is used to produce summaries of field values across the documents. These extra field do accrue additional costs and can be set to `False`.

In [None]:
#INDEX
url = f"https://{service_name}.search.windows.net/indexes?api-version=2023-10-01-Preview"
payload = {"name" : f"{index_name}", "fields": [
    { "name": "id", "type": "Edm.String", "key": True, "searchable": False },
    { "name": "metadata_spo_item_name", "type": "Edm.String", "key": False, "searchable": True, "filterable": False, "sortable": False, "facetable": False },
    { "name": "metadata_spo_item_path", "type": "Edm.String", "key": False, "searchable": False, "filterable": False, "sortable": False, "facetable": False },
    { "name": "metadata_spo_item_content_type", "type": "Edm.String", "key": False, "searchable": False, "filterable": True, "sortable": False, "facetable": True },
    { "name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "key": False, "searchable": False, "filterable": False, "sortable": True, "facetable": False },
    { "name": "metadata_spo_item_size", "type": "Edm.Int64", "key": False, "searchable": False, "filterable": False, "sortable": False, "facetable": False },
    { "name": "content", "type": "Edm.String", "searchable": True, "filterable": False, "sortable": False, "facetable": False }]}

headers = {"Content-Type": "application/json", "api-key": f"{index_service_key}"}
r = requests.post(url, json=payload, headers=headers)
r.json()

Finally we will link our datastore and index by creating a **indexer** which will sync our data to our index. Notice that the indexer excludes image file this is because they are not supported in this index.

In [None]:
#INDEXER

url = f"https://{service_name}.search.windows.net/indexers?api-version=2023-10-01-Preview"
payload = { "name" : f"{indexer_name}", 
    "dataSourceName" : f"{datastore_name}", 
    "targetIndexName" : f"{index_name}", 
    "parameters": { 
    "batchSize": None, 
    "maxFailedItems": None, 
    "maxFailedItemsPerBatch": None, 
    "base64EncodeKeys": None, 
    "configuration": { 
        "indexedFileNameExtensions" : ".pdf, .docx", 
        "excludedFileNameExtensions" : ".png, .jpg", 
        "dataToExtract": "contentAndMetadata" 
      } 
    }, 
    "schedule" : { }, 
    "fieldMappings" : [ 
        { 
          "sourceFieldName" : "metadata_spo_site_library_item_id", 
          "targetFieldName" : "id", 
          "mappingFunction" : { 
            "name" : "base64Encode" 
          } 
         } 
    ] 
}
headers = {"Content-Type": "application/json", "api-key": f"{index_service_key}"}
r = requests.post(url, json=payload, headers=headers)
r.json()

While the command above is running **run the command below in another notebook within 10 min** of running the first command.
Read the outputted error message and click on the supplied URL to enter the code the message has given you.

In [None]:
import requests
#RETRIEVE PASSCODE
url = f"https://{service_name}.search.windows.net/indexers/{indexer_name}/status?api-version=2023-10-01-Preview"
headers = {"Content-Type": "application/json", "api-key": f"{index_service_key}"}
r = requests.get(url, headers=headers)
r.json()

### Querying the data

Enter the following information to connect your deployed model.

In [None]:
endpoint = "<AZURE_OPENAI_ENDPOINT>"
api_key = "<AZURE_OPENAI_KEY>"
deployment_id = "<YOUR DEPLOYMENT ID>" # Add your deployment ID here

Now lets set up some questions or queries that we want our model to answer about the documents in our SharePoint index.

In [None]:
query = "What is COVID?"

In [None]:
query = "what are treatments for COVID?"

Now we can run the cell below which will retrieve information from the docs in our index related to our query and send that related information to our model.

In [None]:
search_endpoint = "https://{}.search.windows.net".format(service_name)


client = AzureOpenAI(
    base_url=f"{endpoint}/openai/deployments/{deployment_id}/extensions",
    api_key=api_key,
    api_version="2023-08-01-preview",
)

completion = client.chat.completions.create(
    model=deployment_id,
    messages=[
        {
            "role": "assistant", 
            "content": "You are a helpful assistant that will answer questions based on the documents given. You will also list the name of the documents."},
        {
            "role": "user",
            "content": query,
        },
    ],
    stream=True,
    extra_body={
        "dataSources": [
            {
                "type": "AzureCognitiveSearch",
                "parameters": {
                    "endpoint": search_endpoint,
                    "key": index_service_key,
                    "indexName": index_name
                }
            }
        ]
    }
)

for chunk in completion:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

**Tip:** If you would like to have more of a chatbot interaction you can run the example script [here](../example_scripts/example_azureaisearch_openaichat_zeroshot.py) by first exporting the following enivronment variables in the terminal then run the script like so: `python example_azureaisearch_openaichat_zeroshot.py`.

In [None]:
#enter the global variables in your terminal
export AZURE_OPENAI_API_KEY='<AZURE_OPENAI_API_KEY>' \
export AZURE_OPENAI_ENDPOINT='<AZURE_OPENAI_ENDPOINT>' \
export AZURE_OPENAI_DEPLOYMENT_NAME='<AZURE_OPENAI_DEPLOYMENT_NAME>' \
export AZURE_COGNITIVE_SEARCH_SERVICE_NAME='<AZURE_COGNITIVE_SEARCH_SERVICE_NAME>' \
export AZURE_COGNITIVE_SEARCH_INDEX_NAME='<AZURE_COGNITIVE_SEARCH_INDEX_NAME>' \
export AZURE_COGNITIVE_SEARCH_API_KEY='<AZURE_COGNITIVE_SEARCH_API_KEY>' 

## Conclusion

In this notebook you learned how to create a SharePoint index, datastore, and indexer and connected it to your creating a SharePoint chatbot!

## Clean up

If you so wish delete your sharepoint site and shutdown your notebook.

In [None]:
#delete search service this will also delete any indexes, datastore, and indexers
! az search service delete --name {service_name} --resource-group {resource_group} -y