# Create a search index using the Azure Cognitive Search Python SDK

This Jupyter Notebook demonstrates index creation, data ingestion, and queries of an Azure Cognitive Search index by calling the Azure Cognitive Search Python SDK. This notebook is a companion document to this [Python quickstart](https://docs.microsoft.com/azure/search/search-get-started-python). 

As a first step, load the libraries used for working with JSON and formulating HTTP requests.

In [1]:
# If you have not already installed the Python SDK, execute:
# !pip install azure-search-documents --pre

# This sample uses version: 11.1.0b3
!pip show azure-search-documents

import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient 
from azure.search.documents import SearchClient
from azure.search.documents.indexes.models import (
    ComplexField,
    CorsOptions,
    SearchIndex,
    ScoringProfile,
    SearchFieldDataType,
    SimpleField,
    SearchableField
)

Name: azure-search-documents
Version: 11.1.0b3
Summary: Microsoft Azure Cognitive Search Client Library for Python
Home-page: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/search/azure-search-documents
Author: Microsoft Corporation
Author-email: ascl@microsoft.com
License: MIT License
Location: c:\users\liamca\appdata\local\continuum\anaconda3\envs\py37\lib\site-packages
Requires: typing-extensions, msrest, azure-core
Required-by: 


In the first cell, we will initiate the administrative and search query clients that will be used to make each requests. Replace the search service_name, admin_key and query_key with valid values. If you get ConnectionError "Failed to establish a new connection", verify that the api-key is a primary or secondary admin key.

In [2]:
# Set the service endpoint and API key from the environment

service_name = ["SEARCH_ENDPOINT - do not include search.windows.net"]
admin_key = ["Cognitive Search Admin API Key"]
query_key = ["Cognitive Search Query API Key"]

index_name = "hotels-quickstart"

# Create an SDK client
endpoint = "https://{}.search.windows.net/".format(service_name)
admin_client = SearchIndexClient(endpoint=endpoint,
                      index_name=index_name,
                      credential=AzureKeyCredential(admin_key))

search_client = SearchClient(endpoint=endpoint,
                      index_name=index_name,
                      credential=AzureKeyCredential(query_key))


In the third cell, the index "hotels-quickstart" will be deleted if it previously existed.

In [3]:
# Delete the index if it exists
try:
    result = admin_client.delete_index(index_name)
    print ('Index', index_name, 'Deleted')
except Exception as ex:
    print (ex)


Index hotels-quickstart Deleted


Specify the index definition, including the fields that define each search document. Fields have a name type, and attributes that determine how you can use the field. For example, "searchable" enables full text search on the field, "retrievable" means it can be returned in results, and "filterable" allows the field to be used in a filter expression.

In [4]:
# Specify the index schema
name = index_name
fields = [
        SimpleField(name="HotelId", type=SearchFieldDataType.String, key=True),
        SearchableField(name="HotelName", type=SearchFieldDataType.String, sortable=True),
        SearchableField(name="Description", type=SearchFieldDataType.String, analyzer_name="en.lucene"),
        SearchableField(name="Description_fr", type=SearchFieldDataType.String, analyzer_name="fr.lucene"),
        SearchableField(name="Category", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
    
        SearchableField(name="Tags", collection=True, type=SearchFieldDataType.String, facetable=True, filterable=True),

        SimpleField(name="ParkingIncluded", type=SearchFieldDataType.Boolean, facetable=True, filterable=True, sortable=True),
        SimpleField(name="LastRenovationDate", type=SearchFieldDataType.DateTimeOffset, facetable=True, filterable=True, sortable=True),
        SimpleField(name="Rating", type=SearchFieldDataType.Double, facetable=True, filterable=True, sortable=True),

        ComplexField(name="Address", fields=[
            SearchableField(name="StreetAddress", type=SearchFieldDataType.String),
            SearchableField(name="City", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
            SearchableField(name="StateProvince", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
            SearchableField(name="PostalCode", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
            SearchableField(name="Country", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
        ])
    ]
cors_options = CorsOptions(allowed_origins=["*"], max_age_in_seconds=60)
scoring_profiles = []
suggester = [{'name': 'sg', 'source_fields': ['Tags', 'Address/City', 'Address/Country']}]


In the following cell, formulate the request. This create_index request targets the indexes collection of your search service and creates an index based on the index schema you provided in the previous cell.

In [5]:
index = SearchIndex(
    name=name,
    fields=fields,
    scoring_profiles=scoring_profiles,
    suggesters = suggester,
    cors_options=cors_options)

try:
    result = admin_client.create_index(index)
    print ('Index', result.name, 'created')
except Exception as ex:
    print (ex)

Index hotels-quickstart created


Next, provide four documents that conform to the index schema. Specify an upload action for each document.

In [6]:
documents = [
    {
    "@search.action": "upload",
    "HotelId": "1",
    "HotelName": "Secret Point Motel",
    "Description": "The hotel is ideally located on the main commercial artery of the city in the heart of New York. A few minutes away is Time's Square and the historic centre of the city, as well as other places of interest that make New York one of America's most attractive and cosmopolitan cities.",
    "Description_fr": "L'hôtel est idéalement situé sur la principale artère commerciale de la ville en plein cœur de New York. A quelques minutes se trouve la place du temps et le centre historique de la ville, ainsi que d'autres lieux d'intérêt qui font de New York l'une des villes les plus attractives et cosmopolites de l'Amérique.",
    "Category": "Boutique",
    "Tags": [ "pool", "air conditioning", "concierge" ],
    "ParkingIncluded": "false",
    "LastRenovationDate": "1970-01-18T00:00:00Z",
    "Rating": 3.60,
    "Address": {
        "StreetAddress": "677 5th Ave",
        "City": "New York",
        "StateProvince": "NY",
        "PostalCode": "10022",
        "Country": "USA"
        }
    },
    {
    "@search.action": "upload",
    "HotelId": "2",
    "HotelName": "Twin Dome Motel",
    "Description": "The hotel is situated in a  nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts.",
    "Description_fr": "L'hôtel est situé dans une place du XIXe siècle, qui a été agrandie et rénovée aux plus hautes normes architecturales pour créer un hôtel moderne, fonctionnel et de première classe dans lequel l'art et les éléments historiques uniques coexistent avec le confort le plus moderne.",
    "Category": "Boutique",
    "Tags": [ "pool", "free wifi", "concierge" ],
    "ParkingIncluded": "false",
    "LastRenovationDate": "1979-02-18T00:00:00Z",
    "Rating": 3.60,
    "Address": {
        "StreetAddress": "140 University Town Center Dr",
        "City": "Sarasota",
        "StateProvince": "FL",
        "PostalCode": "34243",
        "Country": "USA"
        }
    },
    {
    "@search.action": "upload",
    "HotelId": "3",
    "HotelName": "Triple Landscape Hotel",
    "Description": "The Hotel stands out for its gastronomic excellence under the management of William Dough, who advises on and oversees all of the Hotel's restaurant services.",
    "Description_fr": "L'hôtel est situé dans une place du XIXe siècle, qui a été agrandie et rénovée aux plus hautes normes architecturales pour créer un hôtel moderne, fonctionnel et de première classe dans lequel l'art et les éléments historiques uniques coexistent avec le confort le plus moderne.",
    "Category": "Resort and Spa",
    "Tags": [ "air conditioning", "bar", "continental breakfast" ],
    "ParkingIncluded": "true",
    "LastRenovationDate": "2015-09-20T00:00:00Z",
    "Rating": 4.80,
    "Address": {
        "StreetAddress": "3393 Peachtree Rd",
        "City": "Atlanta",
        "StateProvince": "GA",
        "PostalCode": "30326",
        "Country": "USA"
        }
    },
    {
    "@search.action": "upload",
    "HotelId": "4",
    "HotelName": "Sublime Cliff Hotel",
    "Description": "Sublime Cliff Hotel is located in the heart of the historic center of Sublime in an extremely vibrant and lively area within short walking distance to the sites and landmarks of the city and is surrounded by the extraordinary beauty of churches, buildings, shops and monuments. Sublime Cliff is part of a lovingly restored 1800 palace.",
    "Description_fr": "Le sublime Cliff Hotel est situé au coeur du centre historique de sublime dans un quartier extrêmement animé et vivant, à courte distance de marche des sites et monuments de la ville et est entouré par l'extraordinaire beauté des églises, des bâtiments, des commerces et Monuments. Sublime Cliff fait partie d'un Palace 1800 restauré avec amour.",
    "Category": "Boutique",
    "Tags": [ "concierge", "view", "24-hour front desk service" ],
    "ParkingIncluded": "true",
    "LastRenovationDate": "1960-02-06T00:00:00Z",
    "Rating": 4.60,
    "Address": {
        "StreetAddress": "7400 San Pedro Ave",
        "City": "San Antonio",
        "StateProvince": "TX",
        "PostalCode": "78216",
        "Country": "USA"
        }
    }
]

Formulate the request. This upload_documents request targets the docs collection of the hotels-quickstart index and pushes the documents provided in the previous step into the Cognitive Search index.

In [7]:
try:
    result = search_client.upload_documents(documents=documents)
    print("Upload of new document succeeded: {}".format(result[0].succeeded))
except Exception as ex:
    print (ex.message)

Upload of new document succeeded: True


You are now ready to run some queries. Since these are search queries, we will use the "search_client" which has no administrative capabilities.  

The next cell contains a query expression that executes an empty search (search=*), returning an unranked list (search score = 1.0) of arbitrary documents. By default, Azure Cognitive Search returns 50 matches at a time. As structured, this query returns an entire document structure and values. Add include_total_count=true to get a count of all documents (4) in the results.

In [8]:
results =  search_client.search(search_text="*", include_total_count=True)

print ('Total Documents Matching Query:', results.get_count())
for result in results:
    print("{}: {}".format(result["HotelId"], result["HotelName"]))


Total Documents Matching Query: 4
3: Triple Landscape Hotel
2: Twin Dome Motel
4: Sublime Cliff Hotel
1: Secret Point Motel


The next query adds whole terms to the search expression ("hotels" and "wifi") and return just a few fields to return in the results.  This helps to minimize the amount of data sent back to the client and reduces search latency.

In [9]:
results =  search_client.search(search_text="hotels wifi", include_total_count=True, select='HotelId,HotelName')

print ('Total Documents Matching Query:', results.get_count())
for result in results:
    print("{}: {}".format(result["HotelId"], result["HotelName"]))


Total Documents Matching Query: 4
2: Twin Dome Motel
3: Triple Landscape Hotel
1: Secret Point Motel
4: Sublime Cliff Hotel


This query adds a filter expression, returning only those hotels with a rating greater than 4.

In [10]:
results =  search_client.search(search_text="hotels wifi", select='HotelId,HotelName,Rating', filter='Rating gt 4')

for result in results:
    print("{}: {} - {} rating".format(result["HotelId"], result["HotelName"], result["Rating"]))

3: Triple Landscape Hotel - 4.8 rating
4: Sublime Cliff Hotel - 4.6 rating


By default, the search engine returns the top 50 documents but you can use top and skip to add pagination and choose how many documents in each result. This query returns two documents in each result set.

In [11]:
searchstring = '&search=boutique&$top=2&$select=HotelId,HotelName,Description'

results =  search_client.search(search_text="boutique", select='HotelId,HotelName', top=2)

for result in results:
    print("{}: {}".format(result["HotelId"], result["HotelName"]))

1: Secret Point Motel
2: Twin Dome Motel


In this next example, use orderby to sort results by city. This example includes fields from the Address collection.

In [12]:
results =  search_client.search(search_text="pool", select='HotelId,HotelName,Address/City,Address/StateProvince',order_by ='Address/City')

for result in results:
    print("{}: {}, {}, {}".format(result["HotelId"], result["HotelName"], result["Address"]["City"], result["Address"]["StateProvince"]))

1: Secret Point Motel, New York, NY
2: Twin Dome Motel, Sarasota, FL


In this example, we will do a document lookup based on the key field.  This is an efficient way to lookup a single document and is typically used when a user clicks on a document in a search result.

In [13]:
result = search_client.get_document(key="2")

print("Details for hotel '1' are:")
print("        Name: {}".format(result["HotelName"]))
print("      Rating: {}".format(result["Rating"]))
print("    Category: {}".format(result["Category"]))

Details for hotel '1' are:
        Name: Twin Dome Motel
      Rating: 3.6
    Category: Boutique


In this example, we will use the autocomplete function.  This is typically used in a search box to help auto complete potential matches as the user types into the search box.

When the index was created a suggester name called "sg" was created and within this it defined which fields could be used to find potential matches to suggester requests.  We will pass in the letters "sa" to indicate what the user had typed to return back potential term matches.

In [14]:
search_suggestion = 'sa'
results = search_client.autocomplete(search_text=search_suggestion, suggester_name="sg", mode='twoTerms')

print("Autocomplete for:", search_suggestion)
for result in results:
    print (result['text'])

Autocomplete for: sa
san antonio
sarasota


If you are finished with this index, you can delete it by running the following lines. Deleting unnecessary indexes frees up space for steeping through more quickstarts and tutorials.

In [15]:
try:
    result = admin_client.delete_index(index_name)
    print ('Index', index_name, 'Deleted')
except Exception as ex:
    print (ex)

Index hotels-quickstart Deleted


Confirm the index deletion by running the following script that lists all of the indexes on your search service. If hotels-quickstart is not listed, you've successfully deleted the index and have completed this quickstart.

In [16]:
try:
    result = admin_client.get_index(index_name)
    print (result)
except Exception as ex:
    print (ex)


() No index with the name 'hotels-quickstart' was found in the service 'liamca-ignite'.
