## Building a Custom Search Engine
### Step 3 - Query the Index and Retrieve Answers
- Submit a single search query
- Submit multiple queries in batch

**Note:** A command-line script version is included under the Python folder of this project.
- For interactive queries: azsearch_query.py
- For batch queries in a file: azsearch_queryall.py

In [6]:
import requests
import json
import os
import csv
import pyexcel as pe
import codecs
import pandas as pd

Initialize Azure Search configuration parameters to point to the content index to be used.

In [7]:
# This is the service you've already created in Azure Portal
serviceName = 'your_azure_search_service_name'

# This is the index you've already created in Azure Portal or via the azsearch_mgmt.py script
indexName = 'your_index_name_to_use'

# Set your service API key, either via an environment variable or enter it below
#apiKey = os.getenv('SEARCH_KEY_DEV', '')
apiKey = 'your_azure_search_service_api_key'
apiVersion = '2016-09-01'

Optional configuration parameters to alter the search query request.

In [9]:
# Retrieval options to alter the query results
SEARCHFIELDS = None                            # use all searchable fields for retrieval
#SEARCHFIELDS = 'Keywords, SubsectionText'     # use selected fields only for retrieval
FUZZY = False                                  # enable fuzzy search (check API for details)
NTOP  = 5                                      # uumber of results to return

#### Helper functions for basic REST API operations

In [10]:
def getServiceUrl():
    return 'https://' + serviceName + '.search.windows.net'

def getMethod(servicePath):
    headers = {'Content-type': 'application/json', 'api-key': apiKey}
    r = requests.get(getServiceUrl() + servicePath, headers=headers)
    #print(r, r.text)
    return r

def postMethod(servicePath, body):
    headers = {'Content-type': 'application/json', 'api-key': apiKey}
    r = requests.post(getServiceUrl() + servicePath, headers=headers, data=body)
    #print(r, r.text)
    return r

#### Helper functions to submit a search query interactively or in batch

In [17]:
def submitQuery(query, fields=None, ntop=10, fuzzy=False):
    servicePath = '/indexes/' + indexName + '/docs?api-version=%s&search=%s&$top=%d' % \
        (apiVersion, query, ntop)
    if fields != None:
        servicePath += '&searchFields=%s' % fields
    if fuzzy:
        servicePath += '&queryType=full'
    
    # Submit GET request
    r = getMethod(servicePath)
    if r.status_code != 200:
        print('Failed to retrieve search results')
        print(r, r.text)
        return
    
    # Parse and report search results
    docs = json.loads(r.text)['value']
    print('Number of search results = %d\n' % len(docs))
    for i, doc in enumerate(docs):
        print('Results# %d' % (i+1))
        print('Chapter title   : %s' % doc['ChapterTitle'].encode('utf8'))
        print('Section title   : %s' % doc['SectionTitle'].encode('utf8'))
        print('Subsection title: %s' % doc['SubsectionTitle'].encode('utf8'))
        print('%s\n' % doc['SubsectionText'].encode('utf8'))
        
def submitBatchQuery(query, fields=None, ntop=10, fuzzy=False):
    servicePath = '/indexes/' + indexName + '/docs?api-version=%s&search=%s&$top=%d' % \
        (apiVersion, query, ntop)
    if fields != None:
        servicePath += '&searchFields=%s' % fields
    if fuzzy:
        servicePath += '&queryType=full'

    # Submit GET request
    r = getMethod(servicePath)
    if r.status_code != 200:
        print('Failed to retrieve search results')
        print(query, r, r.text)
        return {}

    # Return search results
    docs = json.loads(r.text)['value']
    return docs

Let's submit a query/question and retrieve the answers.

In [12]:
query = 'what is the tax bracket for married couple filing separately'
if query != '':
    # Submit query to Azure Search and retrieve results
    searchFields = SEARCHFIELDS
    submitQuery(query, fields=searchFields, ntop=NTOP)

Number of search results = 5

Results# 1
Chapter title   : b'Income Taxes - NORMAL TAXES AND SURTAXES'
Section title   : b'Determination of Tax Liability - TAX ON INDIVIDUALS'
Subsection title: b'Tax imposed - Married individuals filing separate returns'
b'(d) Married individuals filing separate returns There is hereby imposed on the taxable income of every married individual (as defined in section 7703) who does not make a single return jointly with his spouse under section 6013, a tax determined in accordance with the following table: If taxable income is: The tax is: Not over $18,450 15% of taxable income. Over $18,450 but not over $44,575 $2,767.50, plus 28% of the excess over $18,450. Over $44,575 but not over $70,000 $10,082.50, plus 31% of the excess over $44,575. Over $70,000 but not over $125,000 $17,964.25, plus 36% of the excess over $70,000. Over $125,000 $37,764.25, plus 39.6% of the excess over $125,000.'

Results# 2
Chapter title   : b'Income Taxes - NORMAL TAXES AND SUR

Now let's submit a set of queries in batch and retrieve all ranked lists of results. This mode would be useful for performance evaluation given a set of queries and ground truth answers.

In [27]:
# Input file coontaining the list of queries [tab-separated .txt or .tsv, Excel .xls or .xlsx]
infile  = os.path.join(os.getcwd(), '../sample/sample_queries.txt')
outfile = os.path.join(os.getcwd(), '../sample/sample_query_answers.xlsx')

if infile.endswith('.tsv') or infile.endswith('.txt'):
    records = pd.read_csv(infile, sep='\t', header=0, encoding='utf-8')
    rows = records.iterrows()
elif infile.endswith('.xls') or infile.endswith('.xlsx'):
    records = pe.iget_records(file_name=infile)
    rows = enumerate(records)
else:
    print('Unsupported query file extension. Options: tsv, txt, xls, xlsx')

In [28]:
# Dataframe to keep index of crawled pages
df = pd.DataFrame(columns = ['Qid', 'Query', 'Rank', 'SubsectionText', 'ChapterTitle', 'SectionTitle', 'SubsectionTitle', 'Keywords'])
        
for i, row in rows:
    qid   = int(row['Qid'])
    query = row['Query']
    # Submit query to Azure Search and retrieve results
    searchFields = SEARCHFIELDS
    docs = submitBatchQuery(query, fields=searchFields, ntop=NTOP, fuzzy=FUZZY)
    print('QID: %4d\tNumber of results: %d' % (qid, len(docs)))
    for id, doc in enumerate(docs):
        chapter_title    = doc['ChapterTitle']
        section_title    = doc['SectionTitle']
        subsection_title = doc['SubsectionTitle']
        subsection_text  = doc['SubsectionText']
        keywords         = doc['Keywords']

        df = df.append({'Qid'             : qid, 
                        'Query'           : query, 
                        'Rank'            : (id + 1), 
                        'SubsectionText'  : subsection_text,
                        'ChapterTitle'    : chapter_title,
                        'SectionTitle'    : section_title,
                        'SubsectionTitle' : subsection_title,
                        'Keywords'   : keywords},
                        ignore_index=True)

# Save all answers
df['Qid']  = df['Qid'].astype(int)
df['Rank'] = df['Rank'].astype(int)

if outfile.endswith('.xls') or outfile.endswith('.xlsx'):
    df.to_excel(outfile, index=False, encoding='utf-8')    
else:    # default tab-separated file
    df.to_csv(outfile, sep='\t', index=False, encoding='utf-8') 
print('Search results saved in file %s' % os.path.basename(outfile))

QID:    1	Number of results: 5
QID:    2	Number of results: 5
QID:    3	Number of results: 5
QID:    4	Number of results: 5
QID:    5	Number of results: 5
QID:    6	Number of results: 5
QID:    7	Number of results: 5
Search results saved in file sample_query_answers.xlsx
