# Query Search Index

This notebook contains the code to pull down the data from the search index. This is particularly useful for building and monitoring custom skills. 

Please set the api_key and url here. It should not be checked into source control.

In [None]:
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

api_key = ''

In [None]:
import requests
import pandas as pd

## 1.0 Define function to pull data down from search index
#### query_search_index() takes in three parameters:

 1. **url**: The url should be in the format of "https://{search-service-name}.search.windows.net/indexes/{index-name}/docs?".
 2. **api_key**: API key can be found in the Azure Portal. 
 3. **all_rows**: Optional. The default value is False. If False, the function will return the first 50 records. If True, all rows will be returned. Depending on the size of our index this can be time consuming.
 
Additional customization can be done: for more information please see the Azure Search Rest API Documentation at https://docs.microsoft.com/en-us/rest/api/searchservice/search-documents

In [None]:
def query_search_index(url, api_key, all_rows=False):
    headers = {'api-key': api_key,
               'Content-Type': 'application/json'}
    params = {'api-version': '2017-11-11',
              'search': '*'}
    r = requests.get(url, params = params, headers = headers)

    docs = pd.DataFrame(r.json()['value'])

    #Strip whitespace from the column names
    docs.columns = docs.columns.str.strip()
    print(r)
    
    if not all_rows:
        return docs
    else:
        docs_list = [docs]
        while '@odata.nextLink' in r.json():
            r = requests.get(r.json()['@odata.nextLink'], headers = headers)
            docs = pd.DataFrame(r.json()['value'])
            #Strip whitespace from the column names
            docs.columns = docs.columns.str.strip()
            docs_list.append(docs)

        df = pd.concat(docs_list)
        return df


# 2.0 Query Search Index

In [None]:
url = "https://{search-service-name}.search.windows.net/indexes/{index-name}/docs?"
df = query_search_index(url, api_key)

# 3.0 Explore Data

In [None]:
len(df)

In [None]:
df.head()