# Retail Demo Store - Search Workshop

Welcome to the Retail Demo Store Search Workshop. In this module we'll be configuring the Retail Demo Store Search service to allow searching for product data via Elasticsearch. An Elasticsearch domain should already be provisioned for you in your AWS environment.

Recommended Time: 20 Minutes

## Setup

To get started, we need to perform a bit of setup. Walk through each of the following steps to configure your environment to interact with the Amazon Personalize Service.

### Import Dependencies and Setup Boto3 Python Clients

Througout this workshop we will need access to some common libraries and clients for connecting to AWS services.

In [None]:
# Import Dependencies

import boto3
import json
import pandas as pd
import time
import requests
import csv

from random import randint
from botocore.exceptions import ClientError

# Setup Clients

servicediscovery = boto3.client('servicediscovery')
ssm = boto3.client('ssm')
es_service = boto3.client('es')

## Create Index and Bulk Index Product Data

### Get Products Service Instance

We will be creating a new Elasticsearch Index and indexing our product data so that our users can search for products. To do this, first we will be pulling our Product data from [Products Service](https://github.com/aws-samples/retail-demo-store/tree/master/src/products) that is deployed as part of the Retail Demo Store. To connect to the Products Service we will use Service Discovery to discover an instance of the Products Service, and then connect directly to that service instances to access our data.

In [None]:
response = servicediscovery.discover_instances(
    NamespaceName='retaildemostore.local',
    ServiceName='products',
    MaxResults=1,
    HealthStatus='HEALTHY'
)

products_service_instance = response['Instances'][0]['Attributes']['AWS_INSTANCE_IPV4']
print('Service Instance IP: {}'.format(products_service_instance))

#### Download and Explore the Products Dataset

Now that we have the IP address of one of our Products Service instances, we can connect to it and download our product catalog. To more easily explore our data, we will convert the json response form the Products Service into a dataframe and print it as a table. 

In [None]:
response = requests.get('http://{}/products/all'.format(products_service_instance))
products = response.json()
products_df = pd.DataFrame(products)
pd.set_option('display.max_rows', 5)

products_df

### Install Elasticsearch Python Library

We will use the Python Elasticsearch library to connect to our Amazon Elasticsearch cluster, create a new index, and then bulk index our product data. First, we need to install the Elasticsearch library into our environment.

In [None]:
!pip install --upgrade pip
!pip install elasticsearch

### Find Elasticsearch Domain Endpoint

Before we can configure the Elasticsearch client, we need to determine the endpoint for the Elasticsearch domain created in your AWS environment. We will accomplish this by looking for the Elasticsearch domain with tag key of `Name` and tag value of `retaildemostore`. This tag was associated with the Amazon Elasticsearch domain that was created when the project was deployed to your AWS account.

In [None]:
elasticsearch_domain_endpoint = None

domains_response = es_service.list_domain_names()

for domain_name in domains_response['DomainNames']:
    describe_response = es_service.describe_elasticsearch_domain(
        DomainName=domain_name['DomainName']
    )
    
    tags_response = es_service.list_tags(ARN=describe_response['DomainStatus']['ARN'])

    domain_match = False
    for tag in tags_response['TagList']:
        if tag['Key'] == 'Name' and tag['Value'] == 'retaildemostore':
            domain_match = True
            break
            
    if domain_match:
        elasticsearch_domain_endpoint = describe_response['DomainStatus']['Endpoints']['vpc']
        break;

print('Elasticsearch domain endpoint: ' + str(elasticsearch_domain_endpoint))

if not elasticsearch_domain_endpoint:
    raise Exception('Elasticsearch domain endpoint could not be determined. Ensure Elasticsearch domain has been successfully created and has "retaildemostore" tag before continuing.')

### Configure and Create Elasticsearch Client

In [None]:
from elasticsearch import Elasticsearch

ES_HOST = {
    'host' : elasticsearch_domain_endpoint,
    'port' : 443,
    'scheme' : 'https',
}
INDEX_NAME = 'products'
TYPE_NAME = 'product'
ID_FIELD = 'id'

es = Elasticsearch(hosts = [ES_HOST])

### Prepare Product Data for Indexing

Batch products into chunks to avoid timeouts.

In [None]:
bulk_datas = [] 
bulk_data = []

bulk_datas.append(bulk_data)

max_data_len = 100

for product in products:
    data_dict = product

    op_dict = {
        "index": {
            "_index": INDEX_NAME, 
            "_type": TYPE_NAME, 
            "_id": data_dict[ID_FIELD]
        }
    }
    bulk_data.append(op_dict)
    bulk_data.append(data_dict)
    
    if len(bulk_data) >= max_data_len:
        bulk_data = []
        bulk_datas.append(bulk_data)

### Check for and Delete Existing Indexes

If the products index already exists, we'll delete it so everything gets rebuilt from scratch.

In [None]:
if es.indices.exists(INDEX_NAME):
    print("Deleting '%s' index..." % (INDEX_NAME))
    res = es.indices.delete(index = INDEX_NAME)
    print(" response: '%s'" % (res))
else:
    print('Index does not exist. Nothing to do.')

### Create Index

In [None]:
request_body = {
    "settings" : {
        "number_of_shards": 1,
        "number_of_replicas": 0
    }
}
print("Creating '%s' index..." % (INDEX_NAME))
res = es.indices.create(index = INDEX_NAME, body = request_body)
print(" response: '%s'" % (res))

### Perform Bulk Indexing

In [None]:
print("Bulk indexing...")
for bulk_data in bulk_datas:
    res = es.bulk(index = INDEX_NAME, body = bulk_data, refresh = True)
    
print("Done")

### Validate Results Through Elasticsearch

To verify that the products have been successfully indexed, let's perform a wildcard search for `beard*` directly against the Elasticsearch index.

In [None]:
res = es.search(index = INDEX_NAME, body={"query": {"wildcard": { "name": "brush*"}}})
print(json.dumps(res, indent=2))

## Validate Results Through Search Service

Finally, let's verify that the Retail Demo Store's [Search service](https://github.com/aws-samples/retail-demo-store/tree/master/src/search) can successfully query from the Elasticsearch index as well.

### Discover Search Service

First we need to get the address to the [Search service](https://github.com/aws-samples/retail-demo-store/tree/master/src/search).

In [None]:
response = servicediscovery.discover_instances(
    NamespaceName='retaildemostore.local',
    ServiceName='search',
    MaxResults=1,
    HealthStatus='HEALTHY'
)

search_service_instance = response['Instances'][0]['Attributes']['AWS_INSTANCE_IPV4']
print('Service Instance IP: {}'.format(products_service_instance))

### Call Search Service

Let's call the service's index page which simply echos the service name.

In [None]:
!curl {search_service_instance}

Finally, let's do the same `beard` search through the Search service. We should get back the same item IDs as the direct Elasticsearch query above.

In [None]:
!curl {search_service_instance}/search/products?searchTerm='brush'

## Workshop Complete

**Congratulations!** You have completed the first Retail Demo Store workshop where we indexed the products from the Retail Demo Store's Products microservice in an Elasticsearch domain index. This domain is used by the Retail Demo Store's Search microservice to process search queries from the Web user interface. To see this in action, open the Retail Demo Store's web UI in a new browser tab/window and enter a value in the search field at the top of the page.

### Next Step

Move on to the **[1-Personalization](../1-Personalization/1.1-Personalize.ipynb)** workshop where we will learn how to train machine learning models using Amazon Personalize to produce personalized product recommendations to users and add the ability to provide personalized reranking of products.