# Retail Demo Store - Neptune Workshop - Lab 1
Welcome to the Retail Demo Store Neptune Workshop. In this module we will be using [Amazon Rekognition](https://aws.amazon.com/rekognition/) to enrich product data and use that data to create a graph database with [Amazon Neptune](https://aws.amazon.com/neptune/). Over these progressive labs, we will show how a graph database can be integrated with [Amazon Personalize](https://aws.amazon.com/personalize/) and [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) to enhance user recommendations.
You should complete the labs in the order specified below.

## Workshop overview

### Prerequestiies
This workshop requires that you have set up Personalize with the Retail Demo Store. You can implement Personalize by completing the Personalization labs, or by building the campaigns automatically post-deployment (only applicable when creating the CloudFormation stack).

### Labs
- **Lab 1**: Introduction and data preparation (this lab) (_@todo Add time estimate_)
- **Lab 2**: Ingest data into Neptune and construct graph (_@todo Add time estimate_)
- **Lab 3**: Re-train Personalize with the data from Rekognition (_@todo Add time estimate_)
- **Lab 4**: Integrate Neptune with OpenSearch (_@todo Add time estimate_)

### Cleanup
The cleanup lab will tear down all of the Neptune resources created by the labs in this workshop.
- **Lab 6**: Cleanup resources (_@todo Add time estimate_)

## Setup
This workshop will be using the Python programming language and the AWS SDK for Python. Even if you are not fluent in Python, the code cells should be reasonably intuitive. In practice, you can use any programming language supported by the AWS SDK to complete the same steps from this workshop in your application environment.

### Update Dependencies
To get started, we need to perform a bit of setup. First, we need to ensure that a current version of botocore is currently installed. The botocore library is used by boto3, the AWS SDK library for Python.

The following cell will update pip and install the latest botocore library.

In [11]:
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install --upgrade --no-deps --force-reinstall botocore

Collecting botocore
  Using cached botocore-1.29.114-py3-none-any.whl (10.6 MB)
Installing collected packages: botocore
  Attempting uninstall: botocore
    Found existing installation: botocore 1.29.114
    Uninstalling botocore-1.29.114:
      Successfully uninstalled botocore-1.29.114
Successfully installed botocore-1.29.114


### Import Dependencies
Next we need to import some dependencies and libraries needed to complete this lab.

In [12]:
import boto3
import notebook_util
import pandas as pd
import requests

ModuleNotFoundError: No module named 'boto3'

### Create Clients
Next we need to create the AWS service clients needed for this workshop.
- **dynamodb**: This resource is used to get all the product information from the pre-existing DynamoDB table.
- **s3**: This client is used to fetch product images from S3 in order to pass them into Rekognition.
- **rekognition**: This client is used to generate labels from product images to enrich our product data.
- **ssm**: This client is used to access application configuration details stored in the Systems Manager parameter store.
- **servicediscovery**: this client is used to lookup the local IP addresses of the Retail Demo Store microservices that we'll need in the workshop

Finally we'll lookup an identifier, stored as a resource tag in the SageMaker instance at deployment time. We need this tag to lookup resources we need throughout the labs.

In [None]:
dynamodb = boto3.resource('dynamodb')
s3 = boto3.client('s3')
rekognition = boto3.client('rekognition')
ssm = boto3.client('ssm')
servicediscovery = boto3.client('servicediscovery')

Uid = notebook_util.lookup_uid()
assert Uid is not None, 'Uid could not be determined from notebook instance tags'
print('Uid:', Uid)

### Lookup DynamoDB table
When the Retail Demo Store stack was deployed in this account, a DynamoDB products table was created for you. The ARN of that table was stored in Systems Manager Parameter Store. Using the `ssm` boto3 client we created above, we can retrieve the table ARN.

In [None]:
table_arn = ssm.get_parameter(
    Name='retaildemostore-stack-table'
)
ddb_table = dynamodb.Table(table_arn)

## Enrich Product Metadata with Rekognition
To improve the relationships stored by Neptune, the existing product data needs to be enriched as the current product descriptions don't give us a huge amount of information. We used Rekognition for this purpose.

Rekognition uses computer vision capabilities to extract information and images fromy our images and videos. As all the products in the Retail Demo Store have an image associated with them, we can use Rekognition to detect labels which give us insight into the product.

### Get S3 Objects
Before we can generate our image labels, we need to get the S3 bucket storing the images and perform a scan operation to get all the product information.

In [None]:
# Get the S3 bucket from ssm
bucket = ssm.get_parameter(
    Name='retaildemostore-stack-bucket'
)
products = ddb_table.scan()

Once we have all the variables needed, we can run the `detect_labels` operation for each product image and update the DynamoDB record with the new labels.

In [None]:
for product in products['Items']:
    # Construct s3 object key
    product_image_name = f"images/{product['Category']}/{product['Image']}"

    response = rekognition.detect_labels(Image={'S3Object': {'Bucket': bucket, 'Name': product_image_name}},
                                       MaxLabels=10)
    ddb_table.update_item(
        Key={'id': product['ID']},
        AttributeUpdates={'image_labels': response['Labels']}
    )

## Upload Product Data for Personalize (Optional)
As we want to use these new image labels to retrain Personalize later on, we need to update the products data in S3 that we use to train.


In [None]:
bucketresponse = ssm.get_parameter(
    Name='retaildemostore-stack-bucket'
)

# We will use this bucket to store our training data:
bucket = bucketresponse['Parameter']['Value']

# We will build and upload our training data in this file:
items_filename = "items.csv"

print('Bucket: {}'.format(bucket))

As done in the initial training in the Personalization workshop, we have to load the columns we intend to use for the items dataset into a dataframe and rename the columns.

We can re-fetch all the products data from the products service now that we've updated the DynamoDB table so the json response can easily be formatted as a dataframe.

In [None]:
# Get products service instance deployed in ECS
response = servicediscovery.discover_instances(
    NamespaceName='retaildemostore.local',
    ServiceName='products',
    MaxResults=1,
    HealthStatus='HEALTHY'
)
assert len(response['Instances']) > 0, 'Products service instance not found; check ECS to ensure it launched cleanly'
products_service_instance = response['Instances'][0]['Attributes']['AWS_INSTANCE_IPV4']

# Fetch our products
response = requests.get('http://{}/products/all'.format(products_service_instance))
products = response.json()
products_df = pd.DataFrame(products)

# Load products into dataframe
products_dataset_df = products_df[['id','price','category','style','description','gender_affinity','promoted', 'image_labels']]
products_dataset_df = products_dataset_df.rename(columns = {'id':'ITEM_ID',
                                                            'price':'PRICE',
                                                            'category':'CATEGORY_L1',
                                                            'style':'CATEGORY_L2',
                                                            'description':'PRODUCT_DESCRIPTION',
                                                            'gender_affinity':'GENDER',
                                                            'promoted': 'PROMOTED',
                                                            'image_labels': 'IMAGE_LABELS'})

# Data normalization
products_dataset_df['GENDER'].fillna('Any', inplace = True)
products_dataset_df.loc[products_dataset_df['PROMOTED'] == 'true', 'PROMOTED'] = 'Y'
products_dataset_df['PROMOTED'].fillna('N', inplace = True)

### Save to CSV and upload to S3 bucket
The items dataset is now ready so we can save the dataframe to a local CSV file before uploading to the S3 bucket. This new dataset will be used when we retrain Personalize with the new image label data in lab 3.

In [None]:
products_dataset_df.to_csv(items_filename, index=False)
s3.Bucket(bucket).Object(items_filename).upload_file(items_filename)


## Lab 1 Summary
In this lab, you have used Rekognition to generate image labels for each product in the Retail Demo Store. In the next lab, you will ingest this new product data into Neptune and construct a graph from it.

### Store Variables Needed in the Next Lab
We will pass some variables initialized in this lab by storing them in the notebook environment.

In [None]:
%store ddb_table
%store bucket
%store items_filename
%store Uid


### Continue to Lab 2
Open [Lab 2](./Lab-2-Construct-Graph-with-Neptune.ipynb) to continue the workshop.