# Retail Demo Store - Personalization Workshop

Welcome to the Retail Demo Store Personalization Workshop. In this module we're going to be adding three core personalization features powered by [Amazon Personalize](https://aws.amazon.com/personalize/): related product recommendations on the product detail page, personalized recommendations on the Retail Demo Store homepage, and personalized ranking of items on the featured product page and product search results. This will allow us to give our users targeted recommendations based on their activity.
We also illustrate another user case of Amazon Personalize:
Selecting a product to discount from a list of products for a user, using the "contextual metadata" facility of Amazon Personalize.

Recommended Time: 2 Hours

## Setup

To get started, we need to perform a bit of setup.
Walk through each of the following steps to configure your environment to
interact with the Amazon Personalize Service.

### Import Dependencies and Setup Boto3 Python Clients

Throughout this workshop we will need access to some common libraries and clients for connecting to AWS services.
We also have to retrieve Uid from a SageMaker notebook instance tag.

In [None]:
# Import Dependencies

import boto3
import json
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import time
import requests
import csv
import sys
import botocore
import uuid
from collections import defaultdict
import random
import numpy as np

from packaging import version
from botocore.exceptions import ClientError
from pathlib import Path

%matplotlib inline

# Setup Clients

personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')
personalize_events = boto3.client('personalize-events')

servicediscovery = boto3.client('servicediscovery')
ssm = boto3.client('ssm')


# The Uid is a unique ID and we need it to find the role made by CloudFormation
with open('/opt/ml/metadata/resource-metadata.json') as f:
    data = json.load(f)
sagemaker = boto3.client('sagemaker')
sagemakerResponce = sagemaker.list_tags(ResourceArn=data["ResourceArn"])
for tag in sagemakerResponce["Tags"]:
    if tag['Key'] == 'Uid':
        Uid = tag['Value']
        break

print('Uid:', Uid)
print('Region:', ssm.meta.region_name)

### Configure Bucket and Data Output Location

We will be configuring some variables that will store the location of our source data. When the Retail Demo Store stack was deployed in this account, an S3 bucket was created for you and the name of this bucket was stored in Systems Manager Parameter Store. Using the Boto3 client we can get the name of this bucket for use within our Notebook.

In [None]:
bucketresponse = ssm.get_parameter(
    Name='retaildemostore-stack-bucket'
)

# We will use this bucket to store our training data:
bucket = bucketresponse['Parameter']['Value']     # Do Not Change

# We will upload our training data in these files:
items_filename = "items.csv"                # Do Not Change
users_filename = "users.csv"                # Do Not Change
interactions_filename = "interactions.csv"  # Do Not Change

print('Bucket: {}'.format(bucket))

## Get, Prepare, and Upload User, Product, and Interaction Data

Amazon Personalize provides predefined recipes, based on common use cases, for training models. A recipe is a machine learning algorithm that you use with settings, or hyperparameters, and the data you provide to train an Amazon Personalize model. The data you provide to train a model are organized into separate datasets by the type of data being provided. A collection of datasets are organized into a dataset group. The three dataset types supported by Personalize are items, users, and interactions. Depending on the recipe type you choose, a different combination of dataset types are required. For all recipe types, an interactions dataset is required. Interactions represent how users interact with items. For example, viewing a product, watching a video, listening to a recording, or reading an article. For this workshop, we will be using a recipe that supports all three dataset types.

When we deployed the Retail Demo Store, it was deployed with an initial seed of fictitious User and Product data. We will use this data to train three models, or solutions, in the Amazon Personalize service which will be used to serve product recommendations, related items,  and to rerank product lists for our users. The User and Product data can be accessed from the Retail Demo Store's [Users](https://github.com/aws-samples/retail-demo-store/tree/master/src/users) and [Products](https://github.com/aws-samples/retail-demo-store/tree/master/src/products) microservices, respectively. We will access our data through microservice data APIs, process the data, and upload them as CSVs to S3. Once our datasets are in S3, we can import them into the Amazon Personalize service.

Let's get started.

### Get Products Service Instance

We will be pulling our Product data from the [Products Service](https://github.com/aws-samples/retail-demo-store/tree/master/src/products) that was deployed in Amazon Elastic Container Service as part of the Retail Demo Store. To connect to this service we will use [AWS Cloud Map](https://aws.amazon.com/cloud-map/)'s Service Discovery to discover an instance of the Product Service running in ECS, and then connect directly to that service instances to access our data.

In [None]:
response = servicediscovery.discover_instances(
    NamespaceName='retaildemostore.local',
    ServiceName='products',
    MaxResults=1,
    HealthStatus='HEALTHY'
)

products_service_instance = response['Instances'][0]['Attributes']['AWS_INSTANCE_IPV4']
print('Products Service Instance IP: {}'.format(products_service_instance))

#### Download and Explore the Products Dataset

In [None]:
response = requests.get('http://{}/products/all'.format(products_service_instance))
products = response.json()
products_df = pd.DataFrame(products)
pd.set_option('display.max_rows', 5)

products_df

#### Prepare and Upload Data

When training models in Amazon Personalize, we can provide meta data about our items. For this workshop we will add each product's category and style to the item dataset. The product's unique identifier is required. Then we will rename the columns in our dataset to match our schema (defined later) and those expected by Personalize. Finally, we will save our dataset as a CSV and copy it to our S3 bucket.

In [None]:
products_dataset_df = products_df[['id','category','style']]
products_dataset_df = products_dataset_df.rename(columns = {'id':'ITEM_ID','category':'CATEGORY','style':'STYLE'}) 

products_dataset_df.to_csv(items_filename, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(items_filename).upload_file(items_filename)

### Get Users Service Instance

We will be pulling our User data from the [Users Service](https://github.com/aws-samples/retail-demo-store/tree/master/src/users) that is deployed as part of the Retail Demo Store. To connect to this service we will use Service Discovery to discover an instance of the User Service, and then connect directly to that service instance to access our data.

In [None]:
response = servicediscovery.discover_instances(
    NamespaceName='retaildemostore.local',
    ServiceName='users',
    MaxResults=1,
    HealthStatus='HEALTHY'
)

users_service_instance = response['Instances'][0]['Attributes']['AWS_INSTANCE_IPV4']
print('Users Service Instance IP: {}'.format(users_service_instance))

#### Download and Explore the Users Dataset

In [None]:
response = requests.get('http://{}/users/all?count=10000'.format(users_service_instance))
users = response.json()
users_df = pd.DataFrame(users)
# Remove any users without a persona or gender (i.e. created in web UI)
users_df = users_df[(users_df['persona'].str.strip().astype(bool)) | (users_df['gender'].str.strip().astype(bool))]
pd.set_option('display.max_rows', 5)

users_df

#### Prepare and Upload Data

Similar to the items dataset we created above, we can provide metadata on our users when training models in Personalize. For this workshop we will include each user's age and gender. As before, we will name the columns to match our schema, save the data as a CSV, and upload to our S3 bucket.

In [None]:
users_dataset_df = users_df[['id','age','gender']]
users_dataset_df = users_dataset_df.rename(columns = {'id':'USER_ID','age':'AGE','gender':'GENDER'}) 

users_dataset_df.to_csv(users_filename, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(users_filename).upload_file(users_filename)

### Create User-Items Interactions Dataset

To mimic user behavior, we will be generating a new dataset that represents user interactions with items.
To make the interactions more realistic, we will use the pre-defined shopper persona for each user to generate event
types for products matching that persona. We will create events for viewing products, adding products to a cart,
checking out, and completing orders. 
The effect of discounts on interactions depends on a separate "discount persona" stored against the user.
The script also makes an effort to keep interactions balanced between and within categories and products.

For more information about how this script works, navigate to the script file at [generate_interactions_personalize.py](generate_interactions_personalize.py).

In [None]:
import generate_interactions_personalize as gi

In [None]:
# Where to put the generated data
gi.GENERATED_DATA_ROOT = "./"

# Let us keep things deterministic so we can replicate the data.
gi.RANDOM_SEED = 0

# Interactions will be generated between these dates 
# (we keep things deterministic by keeping fixed dates)
gi.FIRST_TIMESTAMP = 1591803782  # 2020-06-10, 18:43:02
gi.LAST_TIMESTAMP = 1599579782  # 2020-09-08, 18:43:02

# Minimum number of interactions to generate
gi.min_interactions = 675000

# Users are set up with 3 product categories on their personas, 
# such as beauty_electronics_outdoors
# [. If [0.6, 0.25, 0.15] it means 60% of the time they'll want to
# choose a product from the first category (beauty in this example),
# 25% from 2nd, etc.
gi.CATEGORY_AFFINITY_PROBS = [0.6, 0.25, 0.15]

# With this probability a product interaction will be with the product discounted
# Here we go the other way - what is the probability that a product that a user is already interacting
# with is discounted - (depending on whether user likes discounts or not,
# a feature available on their simulated profile as discount_persona)
gi.DISCOUNT_PROBABILITY = 0.2
gi.DISCOUNT_PROBABILITY_WITH_PREFERENCE = 0.5

# After interacting with a product, there are this many products within 
# the category that a user is likely to jump on next.
# The purpose of this constant is to keep recommendations focused
# if there are too many products in a category. 
gi.PRODUCT_AFFINITY_N = 4

# from 0 to 1. If 0 then products in busy categories get represented less. If 1 then all products same amount.
gi.NORMALISE_PER_PRODUCT_WEIGHT = 1.0

# Show progress every 30 seconds. The script takes some time to complete.
gi.PROGRESS_MONITOR_SECONDS_UPDATE = 30

# Percentages of each event type to generate
gi.product_added_percent = .08
gi.cart_viewed_percent = .05
gi.checkout_started_percent = .02
gi.order_completed_percent = .01

In [None]:
%%time

gi.generate_interactions(interactions_filename, users_df, products_df)

#### Open and Explore the Simulated Interactions Dataset

First let us see a few lines of the raw CSV data:

In [None]:
!head -n 5 $interactions_filename

Now let us load it as a Pandas dataframe. Note:

- An EVENT_TYPE column which can be used to train different Personalize campaigns and also to filter on recommendations.
- The custom DISCOUNT column which is a contextual metadata field, that Personalize reranking and user recommendation campaigns can take into account to guess on the best next product.

In [None]:
interactions_df = pd.read_csv(interactions_filename)
interactions_df

Chart the counts of each `EVENT_TYPE` generated for the interactions dataset. We're simulating a site where visitors heavily view/browse products and to a lesser degree add products to their cart and checkout.

In [None]:
categorical_attributes = interactions_df.select_dtypes(include = ['object'])

plt.figure(figsize=(16,3))
chart = sns.countplot(data = categorical_attributes, x = 'EVENT_TYPE')
plt.xticks(rotation=90, horizontalalignment='right')
plt.show()


Let us plot product persona vs product category distribution - so that you can see what categories get assigned to which types of persona. Note that because of attempts to reach balance between categories and products in the generation script, the proportions do not exactly match those configured in 

In [None]:
merged_df = interactions_df[['USER_ID', 'ITEM_ID']].astype({'USER_ID':str, 'ITEM_ID':str})
merged_df = merged_df.merge(users_df[['id', 'persona']], left_on='USER_ID', right_on='id').drop(columns=['id', 'USER_ID'])
merged_df = merged_df.merge(products_df[['id', 'category']], left_on='ITEM_ID', right_on='id').drop(columns=['id', 'ITEM_ID'])
merged_df
plot_df = merged_df.groupby(['persona', 'category'])['category'].count().unstack()
sns.heatmap(plot_df, annot=True, fmt="g", cmap='viridis')
plt.title('Heatmap of user persona vs product category')
plt.show()

#### Discount persona vs event type distribution

Let us see how the event distribution came out. We should see a different takeup of discounts between users with different discount personas.

In [None]:
merged_df = interactions_df.loc[interactions_df.EVENT_TYPE == 'ProductAdded'][['USER_ID', 'DISCOUNT']]
merged_df = merged_df[['USER_ID', 'DISCOUNT']].astype({'USER_ID':str}).merge(users_df, left_on='USER_ID', right_on='id')
merged_df 
plot_df = merged_df.groupby(['discount_persona', 'DISCOUNT'])[['id']].count().unstack()
plot_df = plot_df.droplevel(axis='columns', level=0)
plot_df.plot.bar()
plt.title('Event types according to discount persona')
plt.show()

#### Balance over products

Let us have a careful look at product and category distributions.
The interactions generation script ensures that there are small groups of products users tend to interact with, to maintain
strong training signals. If you look at the script you will see that although we choose
products randomly within a category, they are interacted with in small random groups in the category.


In [None]:
merged_df = interactions_df[['ITEM_ID', 'USER_ID']].astype({'ITEM_ID': str}).merge(products_df, left_on='ITEM_ID', right_on='id')
plot_df = merged_df.groupby(['USER_ID', 'category']).id.apply(set)
plot_df.apply(len).value_counts().sort_index().plot.bar()
plt.xlabel('Number of different products examined by user.')
plt.title(f'We should have reduced users to a small number of products\n'
          f'maximum size should be {gi.PRODUCT_AFFINITY_N+1}')
# The peak at 1 is the male jewellery - there is only one product
plt.show()

#### Prepare and Upload Data

Let us send our generated interactions data to S3 to be picked up by Amazon Personalize.

In [None]:
boto3.Session().resource('s3').Bucket(bucket).Object(interactions_filename).upload_file(interactions_filename)

## Configure Amazon Personalize

Now that we've prepared our three datasets and uploaded them to S3 we'll need to configure the Amazon Personalize service to understand our data so that it can be used to train models for generating recommendations.

If Personalize was enabled when you deployed the demo, the below logic was run by polling AWS Lambda function whose code is in the file `src/aws-lambda/personalize-pre-create-campaigns/personalize-pre-create-campaigns.py`.

### Create Schemas for Datasets

Amazon Personalize requires a schema for each dataset so it can map the columns in our CSVs to fields for model training. Each schema is declared in JSON using the [Apache Avro](https://avro.apache.org/) format.

Let's define and create schemas in Personalize for our datasets.

#### Items Datsaset Schema

In [None]:
items_schema = {
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "CATEGORY",
            "type": "string",
            "categorical": True,
        },
        {
            "name": "STYLE",
            "type": "string",
            "categorical": True,
        }
    ],
    "version": "1.0"
}

try:
    create_schema_response = personalize.create_schema(
        name = "retaildemostore-schema-items",
        schema = json.dumps(items_schema)
    )
    items_schema_arn = create_schema_response['schemaArn']
    print(json.dumps(create_schema_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this schema, seemingly')
    schemas = personalize.list_schemas(maxResults=100)['schemas']
    for schema_response in schemas:
        if schema_response['name'] == "retaildemostore-schema-items":
            items_schema_arn = schema_response['schemaArn']
            print(f"Using existing schema: {items_schema_arn}")
    


#### Users Dataset Schema

In [None]:
users_schema = {
    "type": "record",
    "name": "Users",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "AGE",
            "type": "int"
        },
        {
            "name": "GENDER",
            "type": "string",
            "categorical": True,
        }
    ],
    "version": "1.0"
}

try:
    create_schema_response = personalize.create_schema(
        name = "retaildemostore-schema-users",
        schema = json.dumps(users_schema)
    )
    print(json.dumps(create_schema_response, indent=2))
    users_schema_arn = create_schema_response['schemaArn']
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this schema, seemingly')
    schemas = personalize.list_schemas(maxResults=100)['schemas']
    for schema_response in schemas:
        if schema_response['name'] == "retaildemostore-schema-users":
            users_schema_arn = schema_response['schemaArn']
            print(f"Using existing schema: {users_schema_arn}")
    



#### Interactions Dataset Schema

In [None]:
interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "EVENT_TYPE",  # "ProductViewed", "OrderCompleted", etc.
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        },
        {
            "name": "DISCOUNT",  # This is the contextual metadata - "Yes" or null.
            "type": "string"
        },
    ],
    "version": "1.0"
}

try:
    create_schema_response = personalize.create_schema(
        name = "retaildemostore-schema-interactions",
        schema = json.dumps(interactions_schema)
    )
    print(json.dumps(create_schema_response, indent=2))
    interactions_schema_arn = create_schema_response['schemaArn']
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this schema, seemingly')
    schemas = personalize.list_schemas(maxResults=100)['schemas']
    for schema_response in schemas:
        if schema_response['name'] == "retaildemostore-schema-interactions":
            interactions_schema_arn = schema_response['schemaArn']
            print(f"Using existing schema: {interactions_schema_arn}")

### Create and Wait for Dataset Group

Next we need to create the dataset group that will contain our three datasets.

#### Create Dataset Group

In [None]:
create_dataset_group_response = personalize.create_dataset_group(
    name = 'retaildemostore'
)
dataset_group_arn = create_dataset_group_response['datasetGroupArn']
print(json.dumps(create_dataset_group_response, indent=2))

print(f'DatasetGroupArn = {dataset_group_arn}')

#### Wait for Dataset Group to Have ACTIVE Status

In [None]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(15)

### Create Items Dataset

Next we will create the datasets in Personalize for our three dataset types. Let's start with the items dataset.

In [None]:
dataset_type = "ITEMS"
create_dataset_response = personalize.create_dataset(
    name = "retaildemostore-dataset-items",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = items_schema_arn
)

items_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

### Create Users Dataset

In [None]:
dataset_type = "USERS"
create_dataset_response = personalize.create_dataset(
    name = "retaildemostore-dataset-users",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = users_schema_arn
)

users_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

### Create Interactions Dataset

In [None]:
dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
    name = "retaildemostore-dataset-interactions",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = interactions_schema_arn
)

interactions_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

## Import Datasets to Personalize

Up to this point we have generated CSVs containing data for our users, items, and interactions and staged them in an S3 bucket. We also created schemas in Personalize that define the columns in our CSVs. Then we created a datset group and three datasets in Personalize that will receive our data. In the following steps we will create import jobs with Personalize that will import the datasets from our S3 bucket into the service.

### Setup Permissions

By default, the Personalize service does not have permission to acccess the data we uploaded into the S3 bucket in our account. In order to grant access to the  Personalize service to read our CSVs, we need to set a Bucket Policy and create an IAM role that the Amazon Personalize service will assume.

#### Attach policy to S3 bucket

In [None]:
s3 = boto3.client("s3")

policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy));

#### Create S3 Read Only Access Role

In [None]:
iam = boto3.client("iam")

role_name = Uid+"-PersonalizeS3"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

try:
    create_role_response = iam.create_role(
        RoleName = role_name,
        AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
    );
    
except iam.exceptions.EntityAlreadyExistsException as e:
    print('Warning: role already exists:', e)
    create_role_response = iam.get_role(
        RoleName = role_name
    );

role_arn = create_role_response["Role"]["Arn"]
    
print('IAM Role: {}'.format(role_arn))
    
attach_response = iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
);

role_arn = create_role_response["Role"]["Arn"]

# Pause to allow role to be fully consistent
time.sleep(30)
print('Done.')

### Create Import Jobs

With the permissions in place to allow Personalize to access our CSV files, let's create three import jobs to import each file into its respective dataset. Each import job can take several minutes to complete so we'll create all three and then wait for them all to complete.

#### Create Items Dataset Import Job

In [None]:
items_create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "retaildemostore-dataset-items-import",
    datasetArn = items_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, items_filename)
    },
    roleArn = role_arn
)

items_dataset_import_job_arn = items_create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(items_create_dataset_import_job_response, indent=2))

#### Create Users Dataset Import Job

In [None]:
users_create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "retaildemostore-dataset-users-import",
    datasetArn = users_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, users_filename)
    },
    roleArn = role_arn
)

users_dataset_import_job_arn = users_create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(users_create_dataset_import_job_response, indent=2))

#### Create Interactions Dataset Import Job

In [None]:
interactions_create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "retaildemostore-dataset-interactions-import",
    datasetArn = interactions_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, interactions_filename)
    },
    roleArn = role_arn
)

interactions_dataset_import_job_arn = interactions_create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(interactions_create_dataset_import_job_response, indent=2))

### Wait for Import Jobs to Complete

It will take 10-15 minutes for the import jobs to complete, while you're waiting you can learn more about Datasets and Schemas here: https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html

We will wait for all three jobs to finish.

#### Wait for Items Import Job to Complete

In [None]:
%%time

import_job_arns = [ items_dataset_import_job_arn, users_dataset_import_job_arn, interactions_dataset_import_job_arn ]

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    for job_arn in reversed(import_job_arns):
        import_job_response = personalize.describe_dataset_import_job(
            datasetImportJobArn = job_arn
        )
        status = import_job_response["datasetImportJob"]['status']

        if status == "ACTIVE":
            print(f'Import job {job_arn} successfully completed')
            import_job_arns.remove(job_arn)
        elif status == "CREATE FAILED":
            print(f'Import job {job_arn} failed')
            if import_job_response.get('failureReason'):
                print('   Reason: ' + import_job_response['failureReason'])
            import_job_arns.remove(job_arn)

    if len(import_job_arns) > 0:
        print('At least one dataset import job still in progress')
        time.sleep(60)
    else:
        print("All import jobs have ended")
        break

In [None]:
%store items_dataset_arn
%store interactions_dataset_arn
%store dataset_group_arn
%store bucket
%store role_arn
%store role_name
%store products_dataset_df
%store users_dataset_df