# Setup Azure Cognitive Search Service
This notebook will set up your Azure Cognitive Search Service for the Notes/Text Analytics described at https://github.com/MarchingBug/DoctorNotesTextAnalytics.  Data is pulled from database you imported before hand.  The main indexer runs data in json format through a skillset which reshapes the data and extracts medical entities, and puts the enriched data in the search index.   

First, you will need an Azure account.  If you don't already have one, you can start a free trial of Azure [here](https://azure.microsoft.com/free/).  

Secondly, if you have not done so, create a new Azure search service using the Azure portal at <https://portal.azure.com/#create/Microsoft.Search>.  Select your Azure subscription.  You may create a new resource group (you can name it something like "doctornotes-search-rg").  You will need a globally-unique URL as the name of your search service (try something like "doctornotes-search-" plus your name, organization, or numbers).  Finally, choose a nearby location to host your search service - please remember the location that you chose, as your Cognitive Services instance will need to be based in the same location.  Click "Review + create" and then (after validation) click "Create" to instantiate and deploy the service.  

After deployment is complete, click "Go to resource" to navigate to your new search service.  We will need some information about your search service to fill in the "Azure Search variables" section in the cell below.  First, on the "Overview" main page, you should see a "Url" value.  Copy that value into the "azsearch_url" variable in the cell below (you can just update the "<YourSearchServiceName>" section of the URL with the name of your Azure search service).  Then, on the Azure portal page in the left-hand pane under "Settings", click on "Keys".  Update the azsearch_key value below with one of the keys from your service on the Azure portal page. 

## Fill out variables
You need to fill our variables in this notebook and on the file Skillset.json located in this folder.

For this notebook please fill the variables values in the cell right under.

For the file Skillset.json, search for the keyword TODO and replace variables in CAPS that start with YOUR_XXX with values of the services you will need: Azure Function URL with authentication code, Cognitive Services Key and Storage Account Connection string.

In [2]:
# Azure Search variables
azsearch_url = "<YourSearchServiceName>.search.windows.net"  # If you copy this value from the portal, leave off the "https://" from the beginning
azsearch_key = "TODO" 

# sql_server_name = "YOUR_SQL_SERVER_NAME"
# database_name =   "YOUR_DATABASE_NAME  "
# db_user_name = "YOUR_USER_NAME"
# db_user_password = "YOUR_USER_PASSWORD"

sql_server_name = "educsadevsql.database.windows.net"
database_name =   "doctor-notes-poc"
db_user_name = "mainuser"
db_user_password = "#Welcome2018#"


# Data source which contains documents to process
sql_server_connection_string = "Server=tcp:{_sql_server_name},1433;Initial Catalog={database_name};Persist Security Info=False;User ID={db_user_name};Password={db_user_password};MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;"
container = "[dbo].[DoctorNotes]"

GetNotesHealthcareAnalytics_Url = "https://doctoc-notes-healthentityextraction.azurewebsites.net/api/GetNotesHealthcareAnalytics?"

# Prefix for elements of the Cognitive Search service
search_prefix = "azuresql"  # Note that if you change this value, you will also have to change the values in the indexer json.

print("The variables are initialized.")

The variables are initialized.


We will first create a simple function to wrap REST requests to the Azure Search service.  If called with no parameters, it will get the service statistics.  

In [None]:
import json

def azsearch_rest(request_type="GET", endpoint="servicestats", body=None):
    # Imports and constants
    import http.client, urllib.request, urllib.parse, urllib.error, base64, json, urllib

    # Request headers.
    headers = {
        'Content-Type': 'application/json',
        'api-key': azsearch_key
    }

    # Request parameters
    params = urllib.parse.urlencode({
        'api-version':'2021-04-30-Preview'
    })
    
    try:
        # Execute the REST API call and get the response.
        conn = http.client.HTTPSConnection(azsearch_url)
        request_path = "/{0}?{1}".format(endpoint, params)
        conn.request(request_type, request_path, body, headers)
        response = conn.getresponse()
        print(response.status)
        data = response.read().decode("UTF-8")
        result = None
        if len(data) > 0:
            result = json.loads(data)
        return result

    except Exception as ex:
        raise ex
        
# Test the function
try:
    response = azsearch_rest()
    if response != None:
        print(json.dumps(response, sort_keys=True, indent=2))
except Exception as ex:
    print(ex.message)

Next, let's set up data sources for your search service.  In this service, we have one data source, the notes loaded on the sql server.

In [None]:
def create_datasource(datasource_name, sql_server_connection_string, container):

    # Define the request body with details of the data source we want to create
    body = {   
        "name": datasource_name,  
        "description": "",  
        "type": "azuresql",
        "credentials": 
        { 
            "connectionString": sql_server_connection_string
        },  
        "container": { 
            "name": container, 
            "query": "" 
        }
    } 

    try:
        # Call the REST API's 'datasources' endpoint to create a data source
        result = azsearch_rest(request_type="POST", endpoint="datasources", body=json.dumps(body))
        if result != None:
            print(json.dumps(result, sort_keys=True, indent=2))
    except Exception as ex:
        print(ex)
        

# Create the datasource
datasource_name = "doctor-notes-poc"

create_datasource(datasource_name, sql_server_connection_string, container)


Then let's set up your search index.  

In [None]:
index_name = search_prefix + "-index"

# Define the request body
with open("index.json") as datafile:
  index_json = json.load(datafile)

try:
    result = azsearch_rest(request_type="PUT", endpoint="indexes/" + index_name, body=json.dumps(index_json))
    if result != None:
        print(json.dumps(result, sort_keys=True, indent=2))

except Exception as e:
    print('Error:')
    print(e)

Next, we will set up your skillset.  

In [None]:
skillset_name = search_prefix + "-skillset"

# Define the request body
with open("skillset.json") as datafile:
  skillset_json = json.load(datafile)

try:
    result = azsearch_rest(request_type="PUT", endpoint="skillsets/" + skillset_name, body=json.dumps(skillset_json))
    if result != None:
        print(json.dumps(result, sort_keys=True, indent=2))

except Exception as e:
    print('Error:')
    print(e)

Now, we will set up your main indexer.  This indexer will take the data from the data source, run it through the skillset, and put the results in the search index.  

In [None]:
def create_indexer(indexer_name, filename):

    # Define the request body
    with open(filename) as datafile:
      indexer_json = json.load(datafile)

    try:
        result = azsearch_rest(request_type="PUT", endpoint="indexers/" + indexer_name, body=json.dumps(indexer_json))
        if result != None:
            print(json.dumps(result, sort_keys=True, indent=2))

    except Exception as e:
        print('Error:')
        print(e)
        

# Create main indexer
indexer_name = search_prefix + "-indexer"
create_indexer(indexer_name, filename="data-indexer.json")

If this is your first time running an indexer, you won't need to reset it.  But just in case you want to reuse this code and rerun your indexer with changes (perhaps pointing to your own dataset in Azure blob storage instead of ours), you will need to reset the indexer before making changes.  

In [None]:
def reset_indexer(indexer_name):
    # Reset the indexer.
    result = azsearch_rest(request_type="POST", endpoint="/indexers/{0}/reset".format(indexer_name), body=None)
    if result != None:
        print(json.dumps(result, sort_keys=True, indent=2))

def run_indexer(indexer_name):
    # Rerun the indexer.
    result = azsearch_rest(request_type="POST", endpoint="/indexers/{0}/run".format(indexer_name), body=None)
    if result != None:
        print(json.dumps(result, sort_keys=True, indent=2))


# Reset and rerun main indexer.  
reset_indexer(indexer_name)
run_indexer(indexer_name)

The indexer run can take a while, so let's check the status to see when it is ready.  Below we are checking the main indexer, not the metadata indexer, but you can do both if you want.  

In [None]:
import time, json

def check_indexer_status(indexer_name):
    try:
        complete = False
        while (complete == False):
            result = azsearch_rest(request_type="GET", endpoint="indexers/{0}/status".format(indexer_name))
            state = result["status"]
            if result['lastResult'] is not None:
                state = result['lastResult']['status']
            print (state)
            if state in ("success", "error"):
                complete = True
            time.sleep(1)

    except Exception as e:
        print('Error:')
        print(e)


# Check the main indexer
check_indexer_status(indexer_name)

Now that the indexers have run to build the index, we can query it.  First, we will create a wrapper function for querying an Azure Search service.  

In [None]:
def azsearch_query(index, params):
    # Imports and constants
    import http.client, urllib.request, urllib.parse, urllib.error, base64, json, urllib

    # Request headers.
    headers = {
        'Content-Type': 'application/json',
        'api-key': azsearch_key
    }
    
    try:
        # Execute the REST API call and get the response.
        conn = http.client.HTTPSConnection(azsearch_url)
        request_path = "/indexes/{0}/docs?{1}".format(index, params)
        conn.request("GET", request_path, None, headers)
        response = conn.getresponse()
        data = response.read().decode("UTF-8")
        result = json.loads(data)
        return result

    except Exception as ex:
        raise ex

print("Ready to use the REST API for Queries")

Finally, you can query your Azure search service.  Try searching for "pain".  

In [None]:
import urllib.parse, json

search_terms = input("Search: ")

# Define the search parameters
searchParams = urllib.parse.urlencode({
    'search':'"{0}"'.format(search_terms),
    'searchMode':'All',
    'queryType':'full',
    '$count':'true',
    '$select':'*',
    'api-version':'2021-04-30-Preview'
})

try:
    result = azsearch_query(index=index_name, params=searchParams)
    print('Hits:',result['@odata.count'])
    print(json.dumps(result, indent=2))

except Exception as e:
    print('Error:')
    print(e)