# Lab 3- Add Rekognition Labels to Personalize (Optional)
In this lab we will re-train Personalize with the newly generated label data from Rekognition.

## Setup
To start, we have to prepare our environment by importing dependencies and creating clients.

### Import dependencies
The following libraries are needed for this lab.

In [None]:
import boto3
import json
import uuid
import time
from botocore.exceptions import ClientError

### Create clients
We will need the following AWS service clients in this lab.

In [None]:
personalize = boto3.client('personalize')

### Load variables saved in Lab 1
At the end of Lab 1 we saved some variables that we'll need in this lab. The following cell with load those variables into this lab environment.

In [None]:
%store -r

## Configure Amazon Personalize
Amazon Personalize requires a schema for each dataset it can map the column in our CSVs to fields for model training. Each schema is declared in JSON.

#### Get Existing Datasets
As we have already trained Personalize before, a dataset group already exists with our items, interactions and user data. The interactions and user data will remain the same in our new dataset group. In order to use the existing datasets, we have to get their ARNs. This can be achieved by finding the ARN of the dataset group, and using it to list existing datasets within that group.

In [None]:
datasetGroupArn = personalize.list_dataset_groups()[0]['datasetGroupArn']

datasets = personalize.list_datasets(datasetGroupArn=datasetGroupArn)
for dataset in datasets:
    if users_dataset_arn and interactions_dataset_arn:
        break
    elif dataset['datasetType'] == "USERS":
        users_dataset_arn = dataset['datasetArn']
    elif dataset['datasetType'] == "INTERACTIONS":
        interactions_dataset_arn = dataset['datasetArn']


#### Items Dataset Schema
Schemas in Personalize are immutable meaning we can't just make changes to our existing schema. Instead, we have to create a new schema. For this one, we were able to take the schema defined in the Personalization workshop amd simply add our new field definition. We then run the boto3 commands to create the new schema.


In [None]:
items_schema = {
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "PRICE",
            "type": "float"
        },
        {
            "name": "CATEGORY_L1",
            "type": "string",
            "categorical": True,
        },
        {
            "name": "CATEGORY_L2",
            "type": "string",
            "categorical": True,
        },
        {
            "name": "PRODUCT_DESCRIPTION",
            "type": "string",
            "textual": True
        },
        {
            "name": "GENDER",
            "type": "string",
            "categorical": True,
        },
        {
            "name": "PROMOTED",
            "type": "string"
        },
        { # Here we have our new field definition
            "name": "IMAGE_LABELS",
            "type": "string"
        }
    ],
    "version": "1.0"
}

try:
    create_schema_response = personalize.create_schema(
        name = "retaildemostore-products-items-with-rekognition",
        domain = 'ECOMMERCE',
        schema = json.dumps(items_schema)
    )
    items_schema_arn = create_schema_response['schemaArn']
    print(json.dumps(create_schema_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You already created this schema, seemingly')
    paginator = personalize.get_paginator('list_schemas')
    for paginate_result in paginator.paginate():
        for schema in paginate_result['schemas']:
            if schema['name'] == 'retaildemostore-products-items':
                items_schema_arn = schema['schemaArn']
                print(f"Using existing schema: {items_schema_arn}")
                break

### Create and Wait for Dataset Group
To re-train Personalize with our new field, we have to recreate the dataset group. To do so, we call an API to create a resource and have to wait for it to become active.

In [None]:
try:
    create_dataset_group_response = personalize.create_dataset_group(
        name = 'retaildemostore-products-with-rekognition',
        domain = 'ECOMMERCE'
    )
    dataset_group_arn = create_dataset_group_response['datasetGroupArn']
    print(json.dumps(create_dataset_group_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You already created this dataset group, seemingly')
    paginator = personalize.get_paginator('list_dataset_groups')
    for paginate_result in paginator.paginate():
        for dataset_group in paginate_result['datasetGroups']:
            if dataset_group['name'] == 'retaildemostore-products-with-rekognition':
                dataset_group_arn = dataset_group['datasetGroupArn']
                break

print(f'DatasetGroupArn = {dataset_group_arn}')

#### Wait for Dataset Group to Have ACTIVE Status

In [None]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))

    if status == "ACTIVE" or status == "CREATE FAILED":
        break

    time.sleep(15)

### Re-Create Items Dataset
We will re-create the items dataset and get the ARNs for the users and interactions datasets that were created when Personalize was first set up. We do not need to change these as the only difference is in the items dataset.

In [None]:
try:
    dataset_type = "ITEMS"
    create_dataset_response = personalize.create_dataset(
        name = "retaildemostore-products-items-with-rekognition",
        datasetType = dataset_type,
        datasetGroupArn = dataset_group_arn,
        schemaArn = items_schema_arn
    )

    items_dataset_arn = create_dataset_response['datasetArn']
    print(json.dumps(create_dataset_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this dataset, seemingly')
    paginator = personalize.get_paginator('list_datasets')
    for paginate_result in paginator.paginate(datasetGroupArn = dataset_group_arn):
        for dataset in paginate_result['datasets']:
            if dataset['name'] == 'retaildemostore-products-items-with-rekognition':
                items_dataset_arn = dataset['datasetArn']
                break

print(f'Items dataset ARN = {items_dataset_arn}')

### Create Import Job
As Personalize does not have permissions to access the CSV data in our S3 bucket by default, we have to attach the IAM role created in the Personalization workshop with the correct bucket policy attached.

In [None]:
iam = boto3.client("iam")

role_name = Uid+"-PersonalizeS3"

response = iam.get_role(RoleName = role_name)
role_arn = response['Role']['Arn']
print(json.dumps(response['Role'], indent=2, default = str))

#### Import Job for Items Dataset

In [None]:
import_job_suffix = str(uuid.uuid4())[:8]

items_create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "retaildemostore-products-items-" + import_job_suffix,
    datasetArn = items_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, items_filename)
    },
    roleArn = role_arn
)

items_dataset_import_job_arn = items_create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(items_create_dataset_import_job_response, indent=2))

#### Wait for Items Import Job to Complete
The import job will take about 10-15 minutes to complete.

In [None]:
%%time

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = items_dataset_import_job_arn
    )
    status = import_job_response["datasetImportJob"]['status']

    if status == "ACTIVE":
        print(f'Import job {items_dataset_import_job_arn} successfully completed')
        break
    elif status == "CREATE FAILED":
        print(f'Import job {items_dataset_import_job_arn} failed')
        if import_job_response["datasetImportJob"].get('failureReason'):
            print('   Reason: ' + import_job_response["datasetImportJob"]['failureReason'])
        break
    else:
        print('Import job still in progress')
        time.sleep(60)

### Create Campaign
With our updated items dataset imported into the dataset group, we can now go through and create the recommenders and solution versions. For more information on what the code below is doing, refer to Lab 3 of the Personalization workshop.

In [None]:
response = personalize.list_recipes(domain = "ECOMMERCE")

try:
    response = personalize.create_recommender(
      name = 'retaildemostore-recommended-for-you-with-rekognition',
      recipeArn = 'arn:aws:personalize:::recipe/aws-ecomm-recommended-for-you',
      datasetGroupArn = dataset_group_arn
    )
    rfy_recommender_arn = response['recommenderArn']
    print(json.dumps(response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You already created this recommender, seemingly')
    paginator = personalize.get_paginator('list_recommenders')
    for paginate_result in paginator.paginate(datasetGroupArn = dataset_group_arn):
        for recommender in paginate_result['recommenders']:
            if recommender['name'] == 'retaildemostore-recommended-for-you-with-rekognition':
                rfy_recommender_arn = recommender['recommenderArn']
                break

print(f'Recommended For You recommender ARN = {rfy_recommender_arn}')

try:
    response = personalize.create_recommender(
      name = 'retaildemostore-popular-items-with-rekognition',
      recipeArn = 'arn:aws:personalize:::recipe/aws-ecomm-popular-items-by-views',
      datasetGroupArn = dataset_group_arn
    )
    most_viewed_recommender_arn = response['recommenderArn']
    print(json.dumps(response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this recommender, seemingly')
    paginator = personalize.get_paginator('list_recommenders')
    for paginate_result in paginator.paginate(datasetGroupArn = dataset_group_arn):
        for recommender in paginate_result['recommenders']:
            if recommender['name'] == 'retaildemostore-popular-items-with-rekognition':
                most_viewed_recommender_arn = recommender['recommenderArn']
                break

print(f'Most Viewed recommender ARN = {most_viewed_recommender_arn}')

response = personalize.list_recipes()
custom_recipes = []
for recipe in response['recipes']:
    if not recipe.get('domain'):
        custom_recipes.append(recipe)
similar_items_recipe_arn = "arn:aws:personalize:::recipe/aws-similar-items"
ranking_recipe_arn = "arn:aws:personalize:::recipe/aws-personalized-ranking"
item_attribute_affinity_recipe_arn = 'arn:aws:personalize:::recipe/aws-item-attribute-affinity'

similar_items_solution_version_arn = None

try:
    create_solution_response = personalize.create_solution(
        name = "retaildemostore-related-items-with-rekognition",
        datasetGroupArn = dataset_group_arn,
        recipeArn = similar_items_recipe_arn
    )

    similar_items_solution_arn = create_solution_response['solutionArn']
    print(json.dumps(create_solution_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this solution, seemingly')
    paginator = personalize.get_paginator('list_solutions')
    for paginate_result in paginator.paginate(datasetGroupArn = dataset_group_arn):
        for solution in paginate_result['solutions']:
            if solution['name'] == 'retaildemostore-related-items-with-rekognition':
                similar_items_solution_arn = solution['solutionArn']
                print(f'Similar Items solution ARN = {similar_items_solution_arn}')

                response = personalize.list_solution_versions(
                    solutionArn = similar_items_solution_arn,
                    maxResults = 100
                )
                if len(response['solutionVersions']) > 0:
                    similar_items_solution_version_arn = response['solutionVersions'][-1]['solutionVersionArn']
                    print(f'Will use most recent solution version for this solution: {similar_items_solution_version_arn}')

                break

if not similar_items_solution_version_arn:
    create_solution_version_response = personalize.create_solution_version(
        solutionArn = similar_items_solution_arn
    )
    similar_items_solution_version_arn = create_solution_version_response['solutionVersionArn']
else:
    print(f'Solution version {similar_items_solution_version_arn} already exists; not creating')

ranking_solution_version_arn = None

try:
    create_solution_response = personalize.create_solution(
        name = "retaildemostore-personalized-ranking-with-rekognition",
        datasetGroupArn = dataset_group_arn,
        recipeArn = ranking_recipe_arn
    )

    ranking_solution_arn = create_solution_response['solutionArn']
    print(json.dumps(create_solution_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this solution, seemingly')
    paginator = personalize.get_paginator('list_solutions')
    for paginate_result in paginator.paginate(datasetGroupArn = dataset_group_arn):
        for solution in paginate_result['solutions']:
            if solution['name'] == 'retaildemostore-personalized-ranking-with-rekognition':
                ranking_solution_arn = solution['solutionArn']
                print(f'Ranking solution ARN = {ranking_solution_arn}')

                response = personalize.list_solution_versions(
                    solutionArn = ranking_solution_arn,
                    maxResults = 100
                )
                if len(response['solutionVersions']) > 0:
                    ranking_solution_version_arn = response['solutionVersions'][-1]['solutionVersionArn']
                    print(f'Will use most recent solution version for this solution: {ranking_solution_version_arn}')

                break
if not ranking_solution_version_arn:
    create_solution_version_response = personalize.create_solution_version(
        solutionArn = ranking_solution_arn
    )
    ranking_solution_version_arn = create_solution_version_response['solutionVersionArn']
else:
    print(f'Solution version {ranking_solution_version_arn} already exists; not creating')
item_attribute_affinity_solution_version_arn = None

try:
    create_solution_response = personalize.create_solution(
        name = "retaildemostore-item-attribute-affinity-with-rekognition",
        datasetGroupArn = dataset_group_arn,
        recipeArn = item_attribute_affinity_recipe_arn
    )

    item_attribute_affinity_solution_arn = create_solution_response['solutionArn']
    print(json.dumps(create_solution_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You already created this solution, seemingly')
    paginator = personalize.get_paginator('list_solutions')
    for paginate_result in paginator.paginate(datasetGroupArn = dataset_group_arn):
        for solution in paginate_result['solutions']:
            if solution['name'] == 'retaildemostore-item-attribute-affinity-with-rekognition':
                item_attribute_affinity_solution_arn = solution['solutionArn']
                print(f'Item Attribute Affinity solution ARN = {item_attribute_affinity_solution_arn}')

                response = personalize.list_solution_versions(
                    solutionArn = item_attribute_affinity_solution_arn,
                    maxResults = 100
                )
                if len(response['solutionVersions']) > 0:
                    item_attribute_affinity_solution_version_arn = response['solutionVersions'][-1]['solutionVersionArn']
                    print(f'Will use most recent solution version for this solution: {item_attribute_affinity_solution_version_arn}')

                break
if not item_attribute_affinity_solution_version_arn:
    create_solution_version_response = personalize.create_solution_version(
        solutionArn = item_attribute_affinity_solution_arn
    )
    item_attribute_affinity_solution_version_arn = create_solution_version_response['solutionVersionArn']
else:
    print(f'Solution version {item_attribute_affinity_solution_version_arn} already exists; not creating')

#### Wait for Recommenders and Solution Versions to Complete
It can take 40-60 minutes for all recommenders and solution versions to be created.

In [None]:
%%time

recommender_arns = [ rfy_recommender_arn, most_viewed_recommender_arn ]

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    for recommender_arn in reversed(recommender_arns):
        response = personalize.describe_recommender(
            recommenderArn = recommender_arn
        )
        status = response["recommender"]["status"]

        if status == "ACTIVE":
            print(f'Recommender {recommender_arn} successfully completed')
            recommender_arns.remove(recommender_arn)
        elif status == "CREATE FAILED":
            print(f'Recommender {recommender_arn} failed')
            if response["recommender"].get('failureReason'):
                print('   Reason: ' + response["recommender"]['failureReason'])
            recommender_arns.remove(recommender_arn)

    if len(recommender_arns) > 0:
        print('At least one recommender is still in progress')
        time.sleep(60)
    else:
        print("All recommenders have completed")
        break

soln_ver_arns = [
    similar_items_solution_version_arn,
    ranking_solution_version_arn,
    item_attribute_affinity_solution_version_arn
]

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    for soln_ver_arn in reversed(soln_ver_arns):
        soln_ver_response = personalize.describe_solution_version(
            solutionVersionArn = soln_ver_arn
        )
        status = soln_ver_response["solutionVersion"]["status"]

        if status == "ACTIVE":
            print(f'Solution version {soln_ver_arn} successfully completed')
            soln_ver_arns.remove(soln_ver_arn)
        elif status == "CREATE FAILED":
            print(f'Solution version {soln_ver_arn} failed')
            if soln_ver_response["solutionVersion"].get('failureReason'):
                print('   Reason: ' + soln_ver_response["solutionVersion"]['failureReason'])
            soln_ver_arns.remove(soln_ver_arn)

    if len(soln_ver_arns) > 0:
        print('At least one solution version is still in progress')
        time.sleep(60)
    else:
        print("All solution versions have completed")
        break

### Create Campaigns for Similar Items and Personalized Ranking Solutions
Once creation of the solution versions is complete, we can create the new campaigns.

#### Create Similar Items Campaign

In [None]:
try:
    create_campaign_response = personalize.create_campaign(
        name = "retaildemostore-related-items-with-rekognition",
        solutionVersionArn = similar_items_solution_version_arn,
        minProvisionedTPS = 1
    )

    similar_items_campaign_arn = create_campaign_response['campaignArn']
    print(json.dumps(create_campaign_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this campaign, seemingly. Will update campaign instead.')
    paginator = personalize.get_paginator('list_campaigns')
    for paginate_result in paginator.paginate(solutionArn = similar_items_solution_arn):
        for campaign in paginate_result['campaigns']:
            if campaign['name'] == 'retaildemostore-related-items-with-rekognition':
                similar_items_campaign_arn = campaign['campaignArn']
                print(f'Found existing campaign for solution: {similar_items_campaign_arn}')

                response = personalize.describe_campaign(campaignArn = similar_items_campaign_arn)
                if response['campaign']['solutionVersionArn'] == similar_items_solution_version_arn:
                    print('Campaign is already using the latest solution version')
                else:
                    print('Updating campaign with the latest solution version')
                    response = personalize.update_campaign(
                        campaignArn = similar_items_campaign_arn,
                        solutionVersionArn = similar_items_solution_version_arn,
                        minProvisionedTPS = 1
                    )
                    print(json.dumps(response, indent=2))
                break

#### Create Personalized Ranking Campaign

In [None]:
try:
    create_campaign_response = personalize.create_campaign(
        name = "retaildemostore-personalized-ranking-with-rekognition",
        solutionVersionArn = ranking_solution_version_arn,
        minProvisionedTPS = 1
    )

    ranking_campaign_arn = create_campaign_response['campaignArn']
    print(json.dumps(create_campaign_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this campaign, seemingly. Will update campaign instead.')
    paginator = personalize.get_paginator('list_campaigns')
    for paginate_result in paginator.paginate(solutionArn = ranking_solution_arn):
        for campaign in paginate_result['campaigns']:
            if campaign['name'] == 'retaildemostore-personalized-ranking-with-rekognition':
                ranking_campaign_arn = campaign['campaignArn']
                print(f'Found existing campaign for solution: {ranking_campaign_arn}')

                response = personalize.describe_campaign(campaignArn = ranking_campaign_arn)
                if response['campaign']['solutionVersionArn'] == ranking_solution_version_arn:
                    print('Campaign is already using the latest solution version')
                else:
                    print('Updating campaign with the latest solution version')
                    response = personalize.update_campaign(
                        campaignArn = ranking_campaign_arn,
                        solutionVersionArn = ranking_solution_version_arn,
                        minProvisionedTPS = 1
                    )
                    print(json.dumps(response, indent=2))
                break

#### Wait for Campaigns to have ACTIVE Status
This can take 15-20 minutes to complete.

In [None]:
%%time

campaign_arns = [ similar_items_campaign_arn, ranking_campaign_arn ]

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    for campaign_arn in reversed(campaign_arns):
        campaign_response = personalize.describe_campaign(
            campaignArn = campaign_arn
        )
        status = campaign_response["campaign"]["status"]
        if status == 'ACTIVE' and campaign_response.get('latestCampaignUpdate'):
            status = campaign_response['latestCampaignUpdate']['status']

        if status == "ACTIVE":
            print(f'Campaign {campaign_arn} successfully completed')
            campaign_arns.remove(campaign_arn)
        elif status == "CREATE FAILED":
            print(f'Campaign {campaign_arn} failed')
            if campaign_response["campaign"].get('failureReason'):
                print('   Reason: ' + campaign_response["campaign"]['failureReason'])
            campaign_arns.remove(campaign_arn)

    if len(campaign_arns) > 0:
        print('At least one campaign is still in progress')
        time.sleep(60)
    else:
        print("All campaigns have completed")
        break

## Replace Previous Campaigns
Now that the new campaigns are created, we have to replace the old campaigns in the Retail Demo Store Recommendations service. The Recommendations service is called by the Retail Demo Store Web UI when a user visits a page with personalized content capabilities. The Recommendations service checks Systems Manager Parameter values to determine the Personalize recommender and campaign ARNs to use for each of our personalization use cases.

Let's replace the recommender and campaign ARNs with the ones we just created.

### Update SSM Parameters to Enable Recommenders

In [None]:
ssm = boto3.client('ssm')

In [None]:
response = ssm.put_parameter(
    Name='/retaildemostore/personalize/recommended-for-you-arn',
    Description='Retail Demo Store Recommended For You Recommender/Campaign Arn Parameter',
    Value='{}'.format(rfy_recommender_arn),
    Type='String',
    Overwrite=True
)

In [None]:
response = ssm.put_parameter(
    Name='/retaildemostore/personalize/popular-items-arn',
    Description='Retail Demo Store Most Viewed Recommender/Campaign Arn Parameter',
    Value='{}'.format(most_viewed_recommender_arn),
    Type='String',
    Overwrite=True
)

### Update SSM Parameter to Enable Campaigns

In [None]:
response = ssm.put_parameter(
    Name='/retaildemostore/personalize/related-items-arn',
    Description='Retail Demo Store Also Viewed Recommender/Campaign Arn Parameter',
    Value='{}'.format(similar_items_campaign_arn),
    Type='String',
    Overwrite=True
)

In [None]:
response = ssm.put_parameter(
    Name='/retaildemostore/personalize/personalized-ranking-arn',
    Description='Retail Demo Store Personalized Ranking Campaign Arn Parameter',
    Value='{}'.format(ranking_campaign_arn),
    Type='String',
    Overwrite=True
)

## Evaluate Personalization in Retail Demo Store's Web UI
You can now go to the Retail Demo Store Web UI to see how recommendations have changed.

## Delete Old Campaigns and Recommenders
To avoid incurring unnecessary costs, you can delete the old campaigns and recommenders once you're happy with the changes.

## Lab 3 Summary
In this lab, you have re-created your Personalize resources so the recommenders include the image labels generated by Rekognition. In the next lab, we will set up OpenSearch to use the Neptune graph.

### Continue to Lab 4
Open [Lab 4](./Lab-4-Integrate-Neptune-with-OpenSearch.ipynb) to continue the workshop.
