# Retail Demo Store - Neptune Workshop - Lab 1
Welcome to the Retail Demo Store Neptune Workshop. In this module we will be using [Amazon Rekognition](https://aws.amazon.com/rekognition/) to enrich product data and use that data to create a graph database with [Amazon Neptune](https://aws.amazon.com/neptune/). Over these progressive labs, we will show how a graph database can be integrated with [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) to enhance user experience.
You should complete the labs in the order specified below.

## Workshop overview

### Prerequisites
This workshop requires that you have set up OpenSearch with the Retail Demo Store. You can do so by completing the OpenSearch lab (0-StartHere), or by building the index automatically as part of deployment (only applicable when creating the CloudFormation stack).

### Labs
- **Lab 1**: Introduction and data preparation (this lab) (_40 minutes_)
- **Lab 2**: Ingest data into Neptune and construct graph (_25 minutes_)
- **Lab 3**: Integrate Neptune with OpenSearch (_20 minutes_)

### Cleanup
The cleanup lab will tear down all of the Neptune resources created by the labs in this workshop.
- **Lab 4**: Cleanup resources (_15 minutes_)

## Setup
This workshop will be using the Python programming language and the AWS SDK for Python. Even if you are not fluent in Python, the code cells should be reasonably intuitive. In practice, you can use any programming language supported by the AWS SDK to complete the same steps from this workshop in your application environment.

### Update Dependencies
To get started, we need to perform a bit of setup. First, we need to ensure that a current version of botocore is currently installed. The botocore library is used by boto3, the AWS SDK library for Python.

The following cell will update pip and install the latest botocore library.

In [None]:
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install --upgrade --no-deps --force-reinstall botocore

### Import Dependencies
Next we need to import some dependencies and libraries needed to complete this lab.

In [None]:
import boto3
import pandas as pd
import requests
import json
import re

### Create Clients
Next we need to create the AWS service clients needed for this workshop.
- **dynamodb**: This resource is used to get all the product information from the pre-existing DynamoDB table.
- **s3**: This client is used to fetch product images from S3 in order to pass them into Rekognition.
- **rekognition**: This client is used to generate labels from product images to enrich our product data.
- **ssm**: This client is used to access application configuration details stored in the Systems Manager parameter store.
- **servicediscovery**: This client is used to lookup the local IP addresses of the Retail Demo Store microservices that we'll need in the workshop.
- **sagemaker**: This client is used to obtain the Uid of the notebook.

Finally we'll lookup an identifier, stored as a resource tag in the SageMaker instance at deployment time. We need this tag to lookup resources we need throughout the labs.

In [None]:
dynamodb = boto3.resource('dynamodb')
s3 = boto3.client('s3')
rekognition = boto3.client('rekognition')
ssm = boto3.client('ssm')
servicediscovery = boto3.client('servicediscovery')
sagemaker = boto3.client('sagemaker')

with open('/opt/ml/metadata/resource-metadata.json') as f:
    data = json.load(f)
    resource_arn = data["ResourceArn"]

response = sagemaker.list_tags(ResourceArn = resource_arn)
for tag in response['Tags']:
    if tag['Key'] == 'Uid':
        Uid = tag['Value']
assert Uid is not None, 'Uid could not be determined from notebook instance tags'
print('Uid:', Uid)

### Lookup DynamoDB table
When the Retail Demo Store stack was deployed in this account, a DynamoDB products table was created for you. The ARN of that table was stored in Systems Manager Parameter Store. Using the `ssm` boto3 client we created above, we can retrieve the table name.

In [None]:
table_name = ssm.get_parameter(
    Name='retaildemostore-stack-products-table'
)
ddb_table = dynamodb.Table(table_name['Parameter']['Value'])

## Enrich Product Metadata with Rekognition
To improve the relationships stored by Neptune, the existing product data needs to be enriched as the current product descriptions don't give us a huge amount of information. We used Rekognition for this purpose.

Rekognition uses computer vision capabilities to extract information and images fromy our images and videos. As all the products in the Retail Demo Store have an image associated with them, we can use Rekognition to detect labels which give us insight into the product.

### Get S3 Objects
Before we can generate our image labels, we need to get the S3 bucket storing the images and perform a scan operation to get all the product information.

In [None]:
# Get the S3 bucket from ssm
bucketresponse = ssm.get_parameter(
    Name='retaildemostore-stack-web-ui-bucket'
)
bucket = bucketresponse['Parameter']['Value']

response = servicediscovery.discover_instances(
    NamespaceName='retaildemostore.local',
    ServiceName='products',
    MaxResults=1,
    HealthStatus='HEALTHY'
)

assert len(response['Instances']) > 0, 'Products service instance not found; check ECS to ensure it launched cleanly'

products_service_instance = response['Instances'][0]['Attributes']['AWS_INSTANCE_IPV4']
print('Products Service Instance IP: {}'.format(products_service_instance))

response = requests.get('http://{}/products/all'.format(products_service_instance))
products = response.json()
products_df = pd.DataFrame(products)
pd.set_option('display.max_rows', 5)

products_df

Once we have all the variables needed, we can run the `detect_labels` operation for each product image and update the DynamoDB record with the new labels. The process will take approximately 20 minutes to complete.

In [None]:
bucket
for product in products_df.itertuples():

    # Construct s3 object key
    product_image_name = re.sub('^http:\/\/.*.cloudfront.net\/', '', product.image)

    obj = s3.get_object(Bucket=bucket, Key=product_image_name)

    # Get image labels
    response = rekognition.detect_labels(Image={'S3Object': {'Bucket': bucket, 'Name': product_image_name}},
                                         MaxLabels=10)

    # Format response so we only have the keys we need
    label_list = []
    for label in response['Labels']:
        label_list.append({
            'Name': label['Name'],
            'Confidence': label['Confidence']
        })

    # Update in DynamoDB
    ddb_table.update_item(
    Key={'id': product.id},
    UpdateExpression="set image_labels = :image_labels",
    ExpressionAttributeValues={
        ':image_labels': json.dumps(label_list),
    },
)

### Test it
Once the labels have finished generating and being inserted into DynamoDB, we can view a few of the results.

Let's start by getting a random product row and viewing the image.

In [None]:
from IPython.display import Image

test_product = products_df.iloc[10]

display(Image(url=test_product.image))

We can then query DynamoDB to get the image labels of our product.

In [None]:
response = ddb_table.get_item(Key={'id': test_product.id})

json.loads(response['Item']['image_labels'])

Compare each label with its level of confidence and see how it matches up. For the 10th product, Rekognition has 100% confidence that the image is both an accessory and a pair of glasses. 

Try changing the test product and see how results change.


## Lab 1 Summary
In this lab, you have used Rekognition to generate image labels for each product in the Retail Demo Store. In the next lab, you will ingest this new product data into Neptune and construct a graph from it.

### Store Variables Needed in the Next Lab
We will pass some variables initialized in this lab by storing them in the notebook environment.

In [None]:
%store table_name
%store bucket
%store Uid


### Continue to Lab 2
Open [Lab 2](./Lab-2-Construct-Graph-with-Neptune.ipynb) to continue the workshop.