# Welcome to UnicornFlix!

Congratulations! You have just been hired by UnicornFlix, which is a new Direct to Consumer Streaming service that launched and is jumping into the crowded space of Video on Demand/Free Ad Supported TV (FAST) providers. You have been hired into the Search & Discovery team which leads efforts around personalization. Currently most of your app does not provide a personalized experience, the movies are presented in a static order for all users. In order to prevent customer churn, you are looking to add personalized experiences. 

You’ve been asked by the founders to:

- Increase subscriber engagement by tailoring every experience to individual users
- Help users discover newly released content
- Increase the breadth of content offered to them from the UnicornFlix catalog
- Reduce the time to value by creating valuable recommendations in a short time

Throughout the course of this workshop you will be exploring your datasets, building/training several recommendation models and implementing recommendations with API's.

NOTE: importing and training the datasets will take longer than we have in this workshop. In order to complete this workshoop within the time set, we have already created several resources on your behalf.  However the notebooks are designed in such a way that all of the steps are included. if the resources have already been created the cell will return information about the resources, if the resources have not been created, it will create them for you. 


## In this Notebook

In this notebook, you will choose a dataset and prepare it for use with Amazon Personalize.

1. [How to Use the Notebook](#usenotebook)
1. [Introduction to Amazon Personalize Datasets](#datasets)
1. [Prepare the Item Metadata](#prepare_items)
1. [Prepare the Interactions Data](#prepare_interactions)
1. [Prepare the User Metadata](#prepare_users)
1. [Creating Amazon Personalize Resources and Importing data](#import)
1. [Configure an S3 bucket and an IAM  role](#bucket_role)
1. [Group Dataset](#group_dataset)
1. [Create the Interactions Schema](#interact_schema)
1. [Create the Items(Movies) Schema](#items_schema)
1. [Create the Users Schema](#users_schema)
1. [Import the interactions data](#import_interactions)
1. [Import the Item Metadata](#import_items)
1. [Import the User Metadata](#import_users)


## How to Use the Notebook <a class="anchor" id="usenotebook"></a>

The code is broken up into cells like the one below. There's a triangular Run button at the top of this page that you can click to execute each cell and move onto the next, or you can press `Shift` + `Enter` while in the cell to execute it and move onto the next one.

As a cell is executing you'll notice a line to the side showcase an `*` while the cell is running or it will update to a number to indicate the last cell that completed executing after it has finished exectuting all the code within a cell.

Simply follow the instructions below and execute the cells to get started with Amazon Personalize.

Python ships with a broad collection of libraries and we need to import those as well as the ones installed to help us like [boto3](https://aws.amazon.com/sdk-for-python/) (AWS SDK for python) and [Pandas](https://pandas.pydata.org/)/[Numpy](https://numpy.org/)  which are core data science tools.

In [None]:
# Get the latest version of botocore to ensure we have the latest features in the SDK
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install --upgrade --no-deps --force-reinstall botocore
import time
from time import sleep
import json
from datetime import datetime
import boto3
import pandas as pd
import numpy as np
data_dir = "poc_data"
!mkdir $data_dir

In [None]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

If this is a workshop and the resources were created for you, we will retrieve the variables of the resources created.

In [None]:
# Opening JSON file
f = open('../../automation/ml_ops/domain/Media-Pretrained/params.json')
parameters = json.load(f)

In [None]:
workshop_dataset_group_name = parameters['datasetGroup']['serviceConfig']['name']

interactions_schema_name = parameters['datasets']['interactions']['schema']['serviceConfig']['name']
interactions_dataset_name = parameters['datasets']['interactions']['dataset']['serviceConfig']['name']

items_schema_name = parameters['datasets']['items']['schema']['serviceConfig']['name']
items_dataset_name = parameters['datasets']['items']['dataset']['serviceConfig']['name']

users_schema_name = parameters['datasets']['users']['schema']['serviceConfig']['name']
users_dataset_name = parameters['datasets']['users']['dataset']['serviceConfig']['name']

#The following job names are the starting Strings of the job names that can be created
interactions_import_job_name = 'dataset_import_interaction'
items_import_job_name = 'dataset_import_item'
users_import_job_name = 'dataset_import_user'

for recommender in parameters['recommenders']:
    # This is currently configured assuming only one recommender of each type, if there are multiple 
    # recommenders of the same type further configuration is needed.
    if (recommender['serviceConfig']['recipeArn'] == 'arn:aws:personalize:::recipe/aws-vod-more-like-x'):
        recommender_more_like_x_name =recommender['serviceConfig']['name'] 
    if (recommender['serviceConfig']['recipeArn'] == 'arn:aws:personalize:::recipe/aws-vod-top-picks'):
        recommender_top_picks_for_you_name =recommender['serviceConfig']['name']
        
for solution in parameters['solutions']:
    # This is currently configured assuming only one solution of this type, if there are multiple 
    # solutions of the same type further configuration is needed.
    if (solution['serviceConfig']['recipeArn'] == 'arn:aws:personalize:::recipe/aws-personalized-ranking'):
        workshop_rerank_solution_name = solution['serviceConfig']['name'] 
        # This is currently configured assuming only one campaign, if there are multiple campaigns 
        # further configuration is needed.
        workshop_rerank_campaign_name = solution['campaigns'][0]['serviceConfig']['name'] 

## Introduction to Amazon Personalize Datasets <a class="anchor" id="datasets"></a>
[Back to top](#top)

Regardless of the use case, the algorithms all share a base of learning on user-item-interaction data which is defined by 3 core attributes:

1. **UserID** - The user who interacted
1. **ItemID** - The item the user interacted with
1. **Timestamp** - The time at which the interaction occurred

Generally speaking your data will not arrive in a perfect form for Personalize, and will take some modification to be structured correctly. This notebook guides you through that process.

### Items data

The item data consists of information about the content that is being interacted with, this generally comes from Content Management Systems (CMS). For the purpose of this workshop we will use the IMDb TT ID to provide a common identifier between the interactions data and the content metadata. Movielens provides its own identifier as well as a the IMDb TT ID (without the leading 'tt') in the 'links.csv' file. This dataset is not manatory, but provided good item metadata will ensure the best results in your trained models.

### Interactions data

The interaction data concists of information about the interactions the users of the fictional app will have with the content. This usually comes from analytics tools or Customer Data Platform's (CDP). The best interaction data for use for Amazon Personalize would include the sequential order of user beavior, what content was watched/clicked on and the order it was interacted with. To simulate our interaction data, we will be using data from the [MovieLens project](https://grouplens.org/datasets/movielens/). Movielens offers multiple versions of their dataset, for the purposes of this workshop we will be using the reduced version of this dataset (approx 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users). 

### User data

The user data is what information you have about you users, it usually comes from Customer relationship management (CRM) or Subscriber management systems. Since there is no user data included in the MovieLens data, we will be generating a small synthetic dataset to simulate this component of the workshop. This dataset is not manatory, but provided good item metadata will ensure the best results in your trained models

In this notebook we will be importing interactions, user and item data into your environment, inspecting it and converting it to a format that will allow you use it in Amazon Personalize to train models to get personalized recommendations.

The following diagram shows the resources that we will create in this section. with the section we are building  in this notebook highlighted in blue with a dashed outline.

![Workflow](images/01_Data_Layer_Resources.jpg)

## Prepare the Item Metadata <a class="anchor" id="prepare_items"></a>
[Back to top](#top)

Our fictional streaming service UnicornFlix has a massive catalog of over 9000 titles, which were acquired from many different sources, one challenge we have is that the catalog metadata is not standardized across all of these titles, and it is not very detailed. In order provide additional metadata for Amazon Personalize to use, and also to provide a consistent experience for our users we will leverage the IMDb Essential Metadata for Movies/TV/OTT dataset, which contains 

- 9+ million titles
- 12+ million names
- Film, TV, music and celebrities
- 1 billion ratings from the world’s largest entertainment fan community

IMDb has multiple datasets available in the Amazon Data Exchange
https://aws.amazon.com/marketplace/seller-profile?id=0af153a3-339f-48c2-8b42-3b9fa26d3367

For this workshop we have already extracted the data we needed and prepared it for use with the following information from the IMDb Essential Metadata for Movies/TV/OTT (Bulk data) dataset.

TITLE                      
YEAR                       
IMDB_RATING                
IMDB_NUMBEROFVOTES         
PLOT                       
US_MATURITY_RATING_STRING  
US_MATURITY_RATING         
GENRES 

In addition we added two fields that will help us with our fictional use case. Note: these are not derived from the  IMDb dataset

CREATION_TIMESTAMP         
PROMOTION


NOTE: 
Your use of IMDb data is for the sole purpose of completing the AWS workshop and/or tutorial. Any use of IMDb data outside of the AWS workshop and/or tutorial requires a data license from IMDb. To obtain a data license, please contact: imdb-licensing-support@imdb.com. You will not (and will not allow a third party to) (i) use IMDb data, or any derivative works thereof, for any purpose; (ii) copy, sublicense, rent, sell, lease or otherwise transfer or distribute IMDb data or any portion thereof to any person or entity for any purpose not permitted within the workshop and/or tutorial; (iii) decompile, disassemble, or otherwise reverse engineer or attempt to reconstruct or discover any source code or underlying ideas or algorithms of IMDb data by any means whatsoever; or (iv) knowingly remove any product identification, copyright or other notices from IMDb data.

Copy the IMDB item metadata that was added to this notebook instance during automated deployment of the workshop.

In [None]:
!mkdir poc_data/imdb
!cp ../../automation/ml_ops/poc_data/imdb/items.csv poc_data/imdb

Next, open the IMDB `items.csv` file and take a look at the first rows. This file has information about the movie.

In [None]:
item_data = pd.read_csv(data_dir + '/imdb/items.csv', sep=',', dtype={'PROMOTION': "string"},index_col=0)
item_data.head(5)

In [None]:
item_data.describe()

This does not really tell us much about the dataset, so we will explore a bit more and look at the raw information. We can see that genres often appear in groups. That is fine for us as Personalize supports this structure.

In [None]:
item_data.info()

Now we have our Catalog of titles that our service offers. We also have movies that would like to ensure are promoted in our recommendations. Since we are in Las Vegas, lets create a promotion for movies about or set in Las Vegas. First we will find the movies in our catalog that feature or are set in Las Vegas an set the metadata field to true.

In [None]:
mask = item_data['PLOT'].str.contains('las vegas', case=False, na=False)
item_data.at[mask, 'PROMOTION'] = 'true'
item_metadata = item_data
item_data[mask]

lets confirm that the changes we have made, have not introduced any null values

In [None]:
item_data.isnull().sum()

Looks good, we currently have no null values.

That's it! At this point the item data is ready to go, and we just need to save it as a CSV file.

In [None]:
items_filename = "item-meta.csv"
item_data.to_csv((data_dir+"/"+items_filename), index=True, float_format='%.0f')

## Prepare the Interactions data 

First, you will download the dataset from the [MovieLens project](https://grouplens.org/datasets/movielens/) website and unzip it in a new folder using the code below.

In [None]:
!cd $data_dir && wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
!cd $data_dir && unzip -o ml-latest-small.zip 
dataset_dir = data_dir + "/ml-latest-small/"

Take a look at the data files you have downloaded.

In [None]:
!ls $dataset_dir

We can look at the README.txt file

In [None]:
!pygmentize $data_dir/ml-latest-small/README.txt

The primary data we are interested in for a recommendation use case is the actual interactions that the users had with the titles(items). 

Open the `ratings.csv` file and take a look at the some rows from throughout the dataset.

In [None]:
interaction_data = pd.read_csv(dataset_dir + '/ratings.csv', sep=',', dtype={'userId': "int64", 'movieId': "str"})
interaction_data.sample(10)

To use Amazon Personalize, you need to save timestamps in Unix Epoch format.

Lets validate that the timestamp is actually in a Unix Epoch format by converting it into a more easily understood time/date format

In [None]:
arb_time_stamp = interaction_data.iloc[50]['timestamp']
print('timestamp')
print(arb_time_stamp)
print()
print('Date & Time')
print(datetime.utcfromtimestamp(arb_time_stamp).strftime('%Y-%m-%d %H:%M:%S'))

We will do some general summarization and inspection of the data to ensure that it will be helpful for Amazon Personalize

In [None]:
interaction_data.isnull().any()

In [None]:
interaction_data.info()

What you can see is that the Movielens dataset is that this dataset contains a userid, a movie id, the rating that the user gave the movie and the time the made this interaction. For the purposes of our fictional setvice UnicornFlix will stand in for our applications interaction data, which would actually be the click stream data of the titles that were watched, in the order they watched them.

### Convert the Interactions Data

The interaction data generally is acquired from anaytics or CDP platforms that can identify individual interactions with content/items within a platform. 

We need to do a few things to get this dataset ready to subsitute for our services interaction data.

First off, the movieId is a unique identifier provided by Movielens for each tite. However as we saw above IMDb has a much richer set of metadata about the content catalog. In order to use the IMDb data we will need to use a common  identifier between our items and our interactions dataset, which is the IMDb imdbId. To do this Movielens provides the 'links.csv' file which helps convert between the two identifiers.

In [None]:
links = pd.read_csv(dataset_dir + '/links.csv', sep=',', usecols=[0,1], encoding='latin-1', dtype={'movieId': "str", 'imdbId': "str", 'tmdbId': "str"})
pd.set_option('display.max_rows', 25)
links['imdbId'] = 'tt' + links['imdbId'].astype(object)
links

As you can see this provides a method to identify what the IMDb id is for every title in our interactions dataset, now we will convert the ratings.csv data to utilize the IMDb ID.

In [None]:
imdb_data = interaction_data.merge(links, on='movieId')
imdb_data.drop(columns='movieId')

Now we have a interactions dataset that matches our item catalog dataset. 

### Simulating a interaction dataset 

We are going to make one more modification to make the MoviesLens dataset more like the analytics data that a video streaming service would see in their interactions. MoviesLens is an explicit movie rating dataset, which means users are presented a movie and asked to give it a rating. For recommendation systems/personalization, the industry has moved on to using more implicit data. This is due to many reasons including low numbers of customers rating titles and customers tastes changing over time. Some of the benefits of implicit interaction data is that it is the actual behavior of all users and changes over time as their viewing behavior changes.

To convert the explicit interaction MovieLens ratings dataset into our fictional streaming service UnicornFlix's implicit dataset we are going to create a synthetic dataset using the ratings in MovieLens. 

- Implicit interactions are inherently positive interactions so we will be dropping any rating that is below 2 stars 
- Ratings of 2 and 3 stars are neutral to slightly positive, we are going to create synthetic "Click" events to simulate a viewer clicking on a title in the UnicornFlix app
- Ratings of 4 and 5 are overwhelmingly positive, we will use these to create synthetic "Watch" and "Click" events to simulate a viewer both clicking on a title and watching at least 80% of a title in the UnicornFlix app

NOTE: This will be directionaly accurate, but is not a good substite for actual temporal based interaction data, the order that viewers rated movies on the MovieLens website is not as good as the order of interactions on an actual Video On Demand Streaming app. For more information about the importance of the temporal interaction data see
https://www.amazon.science/publications/temporal-contextual-recommendation-in-real-time

In [None]:
watched_df = imdb_data.copy()
watched_df = watched_df[watched_df['rating'] > 3]
watched_df = watched_df[['userId', 'imdbId', 'timestamp']]
watched_df['EVENT_TYPE']='Watch'
watched_df.head()

In [None]:
clicked_df = imdb_data.copy()
clicked_df = clicked_df[clicked_df['rating'] > 1]
clicked_df = clicked_df[['userId', 'imdbId', 'timestamp']]
clicked_df['EVENT_TYPE']='Click'
clicked_df.head()

In [None]:
interactions_df = clicked_df.copy()
interactions_df = interactions_df.append(watched_df)
interactions_df.sort_values("timestamp", axis = 0, ascending = True, 
                 inplace = True, na_position ='last')

Lets look at what the new dataset looks like and ensure that the data reflects our fictional streaming services streaming analytics data

In [None]:
interactions_df

 Amazon Personalize has default column names for users, items, and timestamp. These default column names are `USER_ID`, `ITEM_ID`, `TIMESTAMP` and `EVENT_VALUE` for the [VIDEO_ON_DEMAND domain dataset](https://docs.aws.amazon.com/personalize/latest/dg/VIDEO-ON-DEMAND-datasets-and-schemas.html). The final modification to the dataset is to replace the existing column headers with the default headers.

In [None]:
interactions_df.rename(columns = {'userId':'USER_ID', 'imdbId':'ITEM_ID', 
                              'timestamp':'TIMESTAMP'}, inplace = True) 

We'll be using a subset of the IMDB dataset for this workshop that has been cleaned to remove movies that don't have valid values for the metadata we are using in out ITEMs dataset (we'll work with this more in the net section), so we'll need to make sure we don't have any interactions that have IMDB movie ids that are not in our subset of the IMDB data set.

In [None]:
movies = pd.read_csv(data_dir + '/imdb' + '/items.csv', sep=',', usecols=[0,1], encoding='latin-1', dtype={'movieId': "str", 'imdbId': "str", 'tmdbId': "str"})
pd.set_option('display.max_rows', 25)

Next, let's compare the number of ITEM_ID unique keys in the IMDB data to the ITEM_ID unique keys in the interactions.  They should be the same.

In [None]:
movies.nunique(axis=0)

The number of unique ITEM_IDs are not the same in the IMDB data and the interactions data, so we'll clean out the data points with ITEM_IDs that do not have item metadata from the interactions dataset.

In [None]:
interactions_df = interactions_df.merge(movies, on='ITEM_ID')
interactions_df.info()

We will also drop the `TITLE` column as it is not required in the interactions dataset.

In [None]:
interactions_df = interactions_df.drop(columns=['TITLE'])
interactions_df.info()

That's it! At this point the data is ready to go, and we just need to save it as a CSV file.

In [None]:
interactions_filename = "interactions.csv"
interactions_df.to_csv((data_dir+"/"+interactions_filename), index=False, float_format='%.0f')

## Prepare the User Metadata <a class="anchor" id="prepare_users"></a>
[Back to top](#top)

The dataset does not have any user metadata so we will create a synthetic metadata field that would be an example of the type of user metadata UnicornFlix may have in their CRM/Subcriber management system. This data will be used both for training of the models, but also can be used for inference filters, which will be covered in a later notebook.

In [None]:
# get all unique user ids from the interaction dataset

user_ids = interactions_df['USER_ID'].unique()
user_data = pd.DataFrame()
user_data["USER_ID"]= user_ids
user_data

### Adding User Metadata

The current dataset does not contain additiona user information. For this example, we'll randomly assign a membership level. For Ad Supported models this could indicate premium vs ad supported.

NOTE: This is a synthetic dataset and since it is randomly assigned, will be of little value to our mode, in a real world scenario this data would be accurate to the user data.

In [None]:
possible_membership_levels = ['silver', 'gold']
random = np.random.choice(possible_membership_levels, len(user_data.index), p=[0.5, 0.5])
user_data["MEMBERLEVEL"] = random
user_data

That's it! At this point the data is ready to go, and we just need to save it as a CSV file.

In [None]:
# Saving the data as a CSV file
users_filename = "users.csv"
user_data.to_csv((data_dir+"/"+users_filename), index=False, float_format='%.0f')

# Creating Amazon Personalize Resources and Importing data <a class="anchor" id="import"></a>

## Configure an S3 bucket and an IAM  role <a class="anchor" id="bucket_role"></a>
[Back to top](#top)

So far, we have downloaded, manipulated, and saved the data onto the Amazon EBS instance attached to instance running this Jupyter notebook. 

By default, the Amazon Personalize service does not have permission to access the data we uploaded into the S3 bucket in our account. In order to grant access to the Amazon Personalize service to read our CSVs, we need to set a Bucket Policy and create an IAM role that the Amazon Personalize service will assume. Let's set all of that up.

Use the metadata stored on the instance underlying this Amazon SageMaker notebook, to determine the region it is operating in. If you are using a Jupyter notebook outside of Amazon SageMaker, simply define the region as a string below. The Amazon S3 bucket needs to be in the same region as the Amazon Personalize resources we have been creating so far.

First, let us get the current notebook region. 

In [None]:
with open('/opt/ml/metadata/resource-metadata.json') as notebook_info:
    data = json.load(notebook_info)
    resource_arn = data['ResourceArn']
    region = resource_arn.split(':')[3]
print("region:", region)

Amazon S3 bucket names are globally unique. To create a unique bucket name, the code below will append the string `personalizepocvod` to your AWS account number. Then it creates a bucket with this name in the region discovered in the previous cell. 

In [None]:
s3 = boto3.client('s3')
account_id = boto3.client('sts').get_caller_identity().get('Account')
bucket_name = account_id + "-" + region + "-" + "personalizepocvod"

#getting existing buckets in the account
response = s3.list_buckets()

if bucket_name in [x['Name'] for x in response['Buckets']]:
    print("The bucket already exists.")
else:
    if region == "us-east-1":
        bucket_responese = s3.create_bucket(Bucket=bucket_name)
    else:
        bucket_responese = s3.create_bucket(
            Bucket=bucket_name,
            CreateBucketConfiguration={'LocationConstraint': region}
            )
print('bucket_name:', bucket_name)

Set the S3 bucket policy
Amazon Personalize needs to be able to read the contents of your S3 bucket. So add a bucket policy which allows that.

In [None]:
policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:*Object",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket_name),
                "arn:aws:s3:::{}/*".format(bucket_name)
            ]
        }
    ]
}

try:
    bucket_current_policy = 'none'
    bucket_current_policy = s3.get_bucket_policy(Bucket=bucket_name)['Policy']
    
except s3.exceptions.from_code('NoSuchBucketPolicy') as e:    
    print("There is no current Bucket Policy for bucket " + bucket_name)
    
except Exception as e: 
    raise(e)

if (bucket_current_policy and policy == json.loads(bucket_current_policy)):
    print ("The policy is already associated with the S3 Bucket.")
else:
    print ("Adding the policy to the bucket.")
    print(s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy)))

### Create an IAM role

Amazon Personalize needs the ability to assume roles in AWS in order to have the permissions to execute certain tasks. Let's create an IAM role and attach the required policies to it so it can access data from S3. The code 

In [None]:
iam = boto3.client("iam")

role_name = account_id+"-PersonalizeS3-Immersion-Day"

assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

# Create or retrieve the role:
try:
    create_role_response = iam.create_role(
        RoleName = role_name,
        AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
    );
    role_arn = create_role_response["Role"]["Arn"]
    
except iam.exceptions.EntityAlreadyExistsException as e:
    print('Warning: role already exists: {}\n'.format(e))
    role_arn = iam.get_role(
        RoleName = role_name
    )["Role"]["Arn"];

print('IAM Role: {}\n'.format(role_arn))


# Attach the policy if it is not previously attached:
policy_arn = "arn:aws:iam::aws:policy/AmazonS3FullAccess"

if (policy_arn in [ x['PolicyArn'] for x in iam.list_attached_role_policies( RoleName = role_name)['AttachedPolicies']]):
    print ('The policy {} is already attached to this role.'.format(policy_arn))
else:
    print ("Attaching the role_policy")
    attach_response = iam.attach_role_policy(
        RoleName = role_name,
        PolicyArn = "arn:aws:iam::aws:policy/AmazonS3FullAccess"
    );
    print ("30s pause to allow role to be fully consistent.")
    time.sleep(30)
    print('Done.')

### Upload data to S3

Now that your Amazon S3 bucket has been created, upload the CSV files of our 3 datasets (Item, Interaction and User).

NOTE: We will cover real-time data in a future notebook.

In [None]:
interactions_file_path = data_dir + "/" + interactions_filename

try:
    s3.get_object(
        Bucket=bucket_name,
        Key=interactions_filename,
    )
    print("{} already exists in the bucket {}".format(interactions_file_path, bucket_name))
except s3.exceptions.NoSuchKey:
    boto3.Session().resource('s3').Bucket(bucket_name).Object(interactions_filename).upload_file(interactions_file_path)
    print("File {} uploaded to bucket {}".format(interactions_filename, bucket_name))

items_file_path = data_dir + "/" + items_filename
try:
    s3.get_object(
        Bucket=bucket_name,
        Key=items_filename,
    )
    print("{} already exists in the bucket {}".format(items_file_path, bucket_name))
except s3.exceptions.NoSuchKey:
    boto3.Session().resource('s3').Bucket(bucket_name).Object(items_filename).upload_file(items_file_path)
    print("File {} uploaded to bucket {}".format(items_filename, bucket_name))

users_file_path = data_dir + "/" + users_filename
try:
    s3.get_object(
        Bucket=bucket_name,
        Key=users_filename,
    )
    print("{} already exists in the bucket {}".format(users_file_path, bucket_name))
except s3.exceptions.NoSuchKey:
    boto3.Session().resource('s3').Bucket(bucket_name).Object(users_filename).upload_file(users_file_path)
    print("File {} uploaded to bucket {}".format(users_filename, bucket_name))

## Create the Dataset Group <a class="anchor" id="group_dataset"></a>
[Back to top](#top)

The highest level of isolation and abstraction with Amazon Personalize is a *dataset group*. Information stored within one of these dataset groups has no impact on any other dataset group or models created from one - they are completely isolated. This allows you to run many experiments and is part of how we keep your models private and fully trained only on your data. 

Before importing the data prepared earlier, there needs to be a dataset group and a dataset added to it that handles the interactions.

Dataset groups can house the following types of information:

* User-item-interactions
* Event streams (real-time interactions)
* User metadata
* Item metadata

We need to create the dataset group that will contain our three datasets.

Your dataset group can be one of the following types:

* A Domain dataset group, where you create preconfigured resources for different business domains and use cases, such as getting recommendations for similar videos (VIDEO_ON_DEMAND domain) or best selling items (ECOMMERCE domain). You choose your business domain, import your data, and create recommenders. You use recommenders in your application to get recommendations. Use a [Domain dataset group](https://docs.aws.amazon.com/personalize/latest/dg/domain-dataset-groups.html) if you have a video on demand or e-commerce application and want Amazon Personalize to find the best configurations for your use cases. If you start with a Domain dataset group, you can also add custom resources such as solutions with solution versions trained with recipes for custom use cases.


* A [Custom dataset group](https://docs.aws.amazon.com/personalize/latest/dg/custom-dataset-groups.html), where you create configurable resources for custom use cases and batch recommendation workflows. You choose a recipe, train a solution version (model), and deploy the solution version with a campaign. You use a campaign in your application to get recommendations. Use a Custom dataset group if you don't have a video on demand or e-commerce application or want to configure and manage only custom resources, or want to get recommendations in a batch workflow. If you start with a Custom dataset group, you can't associate it with a domain later. Instead, create a new Domain dataset group.

You can create and manage Domain dataset groups and Custom dataset groups with the AWS console, the AWS Command Line Interface (AWS CLI), or programmatically with the AWS SDKs.

### Create the Dataset Group

The following cell will create a new dataset group with the name `personalize-poc-movielens`.

In [None]:
try: 
    create_dataset_group_response = personalize.create_dataset_group(
        name = workshop_dataset_group_name,
        domain='VIDEO_ON_DEMAND'
    )

    workshop_dataset_group_arn = create_dataset_group_response['datasetGroupArn']
    print(json.dumps(create_dataset_group_response, indent=2))
    print ('\nCreating the Dataset Group with dataset_group_arn = {}'.format(workshop_dataset_group_arn))

except personalize.exceptions.ResourceAlreadyExistsException as e:
    workshop_dataset_group_arn = 'arn:aws:personalize:'+region+':'+account_id+':dataset-group/'+workshop_dataset_group_name 
    print ('\nThe the Dataset Group with dataset_group_arn = {} already exists'.format(workshop_dataset_group_arn))
    print ('\nWe will be using the existing Dataset Group dataset_group_arn = {}'.format(workshop_dataset_group_arn))


#### Wait for Dataset Group to have ACTIVE Status 

Before we can use the Dataset Group to create more resources below, it must be active. This can take a minute or two. Execute the cell below and wait for it to show the ACTIVE status. It checks the status of the dataset group every 30 seconds, up to a maximum of 3 hours.

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = workshop_dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(30)

Now that you have a dataset group, you can create a dataset for the interaction data.

## Create the Interactions Schema <a class="anchor" id="interact_schema"></a>
[Back to top](#top)

Now that we've loaded and prepared our three datasets we'll configure the Amazon Personalize service to understand our data so that it can be used to train models for generating recommendations. Amazon Personalize requires a schema for each dataset, so it can map the columns in our CSVs to fields for model training. Each schema is declared in JSON using the [Apache Avro](https://avro.apache.org/) format. 

First, define a schema to tell Amazon Personalize what type of dataset you are uploading. There are several mandatory fields that are required in the schema, depending on the type of dataset. More detailed information can be found in the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html).

The interactions dataset has three required columns: `ITEM_ID`, `USER_ID`, and `TIMESTAMP`. The `TIMESTAMP` represents when the user interated with an item and must be expressed in Unix timestamp format (seconds). For this dataset we also have an `EVENT_TYPE` column. These must be defined in the same order in the schema as they appear in the dataset.

In [None]:
interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "EVENT_TYPE", # "Watch", "Click", etc.
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        }
    ],
    "version": "1.0"
}


try:
    create_schema_response = personalize.create_schema(
        name = interactions_schema_name,
        schema = json.dumps(interactions_schema),
        domain='VIDEO_ON_DEMAND'
    )
    print(json.dumps(create_schema_response, indent=2))
    workshop_interactions_schema_arn = create_schema_response['schemaArn']
    print ('\nCreating the Interactions Schema with workshop_interactions_schema_arn = {}'.format(workshop_interactions_schema_arn))
    
except personalize.exceptions.ResourceAlreadyExistsException:
    workshop_interactions_schema_arn = 'arn:aws:personalize:'+region+':'+account_id+':schema/'+interactions_schema_name 
    print('The schema {} already exists.'.format(workshop_interactions_schema_arn))
    print ('\nWe will be using the existing Interactions Shema with workshop_interactions_schema_arn = {}'.format(workshop_interactions_schema_arn))
 

### Create the Interactions Dataset

With a schema created, you can create a dataset within the dataset group. Note that this does not load the data yet, but creates a schema of what the data looks like. 

In [None]:
try:
    dataset_type = 'INTERACTIONS'
    create_dataset_response = personalize.create_dataset(
        name = interactions_dataset_name,
        datasetType = dataset_type,
        datasetGroupArn = workshop_dataset_group_arn,
        schemaArn = workshop_interactions_schema_arn
    )

    workshop_interactions_dataset_arn = create_dataset_response['datasetArn']
    print(json.dumps(create_dataset_response, indent=2))
    print ('\nCreating the Interactions Dataset with workshop_interactions_dataset_arn = {}'.format(workshop_interactions_dataset_arn))
    
except personalize.exceptions.ResourceAlreadyExistsException:
    workshop_interactions_dataset_arn =  'arn:aws:personalize:'+region+':'+account_id+':dataset/'+workshop_dataset_group_name+'/INTERACTIONS'
    print('The Interactions Dataset {} already exists.'.format(workshop_interactions_dataset_arn))
    print ('\nWe will be using the existing Interactions Dataset with workshop_interactions_dataset_arn = {}'.format(workshop_interactions_dataset_arn))
        

## Create the Items (Movies) Schema<a class="anchor" id="items_schema"></a>
[Back to top](#top)

First, we define a schema to tell Amazon Personalize what type of dataset we are uploading. There are several reserved and mandatory keywords required in the schema, based on the type of dataset. More detailed information can be found in the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html).

Our item metadata data has the following columns: `ITEM_ID`, `TITLE`, `YEAR`, `IMDB_RATING`,`IMDB_NUMBEROFVOTES`,  `PLOT`, `US_MATURITY_RATING_STRING`, `US_MATURITY_RATING`,`GENRES`, `CREATION_TIMESTAMP`, and `PROMOTION` fields. These must be defined in the same order in the schema as they appear in the dataset.

In [None]:
items_schema = {
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "TITLE",
            "type": "string"
        },
        {
            "name": "YEAR",
            "type": "int"
        },
        {
            "name": "IMDB_RATING",
            "type": "int"
        },
        {
            "name": "IMDB_NUMBEROFVOTES",
            "type": "int"
        },
        {
            "name": "PLOT",
            "type": "string",
            "textual": True
        },
        {
            "name": "US_MATURITY_RATING_STRING",
            "type": "string"
        },
        {
            "name": "US_MATURITY_RATING",
            "type": "int"
        },
        {
            "name": "GENRES",
            "type": "string",
            "categorical": True
        },
        {
            "name": "CREATION_TIMESTAMP",
            "type": "long"
        },
        {
            "name": "PROMOTION",
            "type": "string"
        }
    ],
    "version": "1.0"
}

try:
    create_schema_response = personalize.create_schema(
        name = items_schema_name,
        schema = json.dumps(items_schema),
        domain='VIDEO_ON_DEMAND'
    )
    workshop_items_schema_arn = create_schema_response['schemaArn']
    print(json.dumps(create_schema_response, indent=2))

    print ('\nCreating the Items Schema with workshop_items_schema_arn = {}'.format(workshop_items_schema_arn))
    
except personalize.exceptions.ResourceAlreadyExistsException:
    workshop_items_schema_arn = 'arn:aws:personalize:'+region+':'+account_id+':schema/'+items_schema_name 
    print('The schema {} already exists.'.format(workshop_items_schema_arn))
    print ('\nWe will be using the existing Items Schema with workshop_items_schema_arn = {}'.format(workshop_items_schema_arn))
 

### Create the Items Dataset

With a schema created, you can create a dataset within the dataset group. Note that this does not load the data yet, but creates a schema of what the data looks like. 

In [None]:
try:
    
    dataset_type = "ITEMS"
    create_dataset_response = personalize.create_dataset(
        name = items_dataset_name,
        datasetType = dataset_type,
        datasetGroupArn = workshop_dataset_group_arn,
        schemaArn = workshop_items_schema_arn
    )

    workshop_items_dataset_arn = create_dataset_response['datasetArn']
    print(json.dumps(create_dataset_response, indent=2))

    print ('\nCreating the Items Dataset with workshop_items_dataset_arn = {}'.format(workshop_items_dataset_arn))
    
except personalize.exceptions.ResourceAlreadyExistsException:
    workshop_items_dataset_arn =  'arn:aws:personalize:'+region+':'+account_id+':dataset/'+workshop_dataset_group_name+'/ITEMS'
    print('The Items Dataset {} already exists.'.format(workshop_items_dataset_arn))
    print ('\nWe will be using the existing Items Dataset with workshop_items_dataset_arn = {}'.format(workshop_items_dataset_arn))   

## Create the Users Schema<a class="anchor" id="users_schema"></a>
[Back to top](#top)

First, define a schema to tell Amazon Personalize what type of dataset you are uploading. There are several reserved and mandatory keywords required in the schema, based on the type of dataset. More detailed information can be found in the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html).

Here, you will create a schema for user data, which requires the `USER_ID`, and an additonal metadata field, in this case `GENDER`. These must be defined in the same order in the schema as they appear in the dataset.

In [None]:
users_schema = {
    "type": "record",
    "name": "Users",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "MEMBERLEVEL",
            "type": "string",
            "categorical": True
        }
    ],
    "version": "1.0"
}
    
try:
    create_schema_response = personalize.create_schema(
        name = users_schema_name,
        schema = json.dumps(users_schema),
        domain='VIDEO_ON_DEMAND'
    )
    
    workshop_users_schema_arn = create_schema_response['schemaArn']
    print(json.dumps(create_schema_response, indent=2))

    print ('\nCreating the Users Schema with workshop_users_schema_arn = {}'.format(workshop_users_schema_arn))
    
except personalize.exceptions.ResourceAlreadyExistsException:
    workshop_users_schema_arn = 'arn:aws:personalize:'+region+':'+account_id+':schema/'+users_schema_name 
    print('The schema {} already exists.'.format(workshop_users_schema_arn))
    print ('\nWe will be using the existing Users Schema with workshop_users_schema_arn = {}'.format(workshop_users_schema_arn))
 

### Create the Users dataset

With a schema created, you can create a dataset within the dataset group. Note that this does not load the data yet, but creates a schema of what the data looks like. 

In [None]:
try:
    dataset_type = "USERS"
    create_dataset_response = personalize.create_dataset(
        name = users_dataset_name,
        datasetType = dataset_type,
        datasetGroupArn = workshop_dataset_group_arn,
        schemaArn = workshop_users_schema_arn
    )

    workshop_users_dataset_arn = create_dataset_response['datasetArn']
    print(json.dumps(create_dataset_response, indent=2))

    print ('\nCreating the Users Dataset with workshop_users_dataset_arn = {}'.format(workshop_users_dataset_arn))
    
except personalize.exceptions.ResourceAlreadyExistsException:
    workshop_users_dataset_arn =  'arn:aws:personalize:'+region+':'+account_id+':dataset/'+workshop_dataset_group_name+'/USERS'
    print('The Users Dataset {} already exists.'.format(workshop_users_dataset_arn))
    print ('\nWe will be using the existing Users Dataset with workshop_users_dataset_arn = {}'.format(workshop_users_dataset_arn))

Let's wait until all the datasets have been created.

In [None]:
%%time

max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
    describe_dataset_response = personalize.describe_dataset(
        datasetArn = workshop_interactions_dataset_arn
    )
    status_interaction_dataset =  describe_dataset_response["dataset"]['status']
    print("Interactions Dataset: {}".format(status_interaction_dataset))
    
    if status_interaction_dataset == "ACTIVE":
        print("Build succeeded for {}".format(workshop_interactions_dataset_arn))
        
    elif status_interaction_dataset == "CREATE FAILED":
        print("Build failed for {}".format(workshop_interactions_dataset_arn))
        break
        
    if not status_interaction_dataset == "ACTIVE":
        print("The interaction dataset creation is still in progress")
    else:
        print("The interaction dataset  is ACTIVE")
        

    describe_dataset_response = personalize.describe_dataset(
        datasetArn = workshop_items_dataset_arn
    )
    status_item_dataset =  describe_dataset_response["dataset"]['status']
    print("Items Dataset: {}".format(status_item_dataset))
    
    if status_item_dataset == "ACTIVE":
        print("Build succeeded for {}".format(workshop_items_dataset_arn))
        
    elif status_item_dataset == "CREATE FAILED":
        print("Build failed for {}".format(workshop_items_dataset_arn))
        break
        
    if not status_item_dataset == "ACTIVE":
        print("The item dataset creation is still in progress")
    else:
        print("The item dataset  is ACTIVE")
    
    describe_dataset_response = personalize.describe_dataset(
        datasetArn = workshop_users_dataset_arn
    )
    status_user_dataset =  describe_dataset_response["dataset"]['status']
    print("Users Dataset: {}".format(status_user_dataset))
    
    if status_user_dataset == "ACTIVE":
        print("Build succeeded for {}".format(workshop_users_dataset_arn))
        
    elif status_user_dataset == "CREATE FAILED":
        print("Build failed for {}".format(workshop_users_dataset_arn))
        break
        
    if not status_user_dataset == "ACTIVE":
        print("The user dataset creation is still in progress")
    else:
        print("The user dataset  is ACTIVE")
    
    if status_interaction_dataset == "ACTIVE" and status_item_dataset == "ACTIVE" and status_user_dataset == 'ACTIVE':
        break
        
    time.sleep(30)

## Import the Item Metadata <a class="anchor" id="import_items"></a>
[Back to top](#top)

Earlier you created the dataset group and dataset to house your information, now you will execute an import job that will load the item data from the S3 bucket into the Amazon Personalize dataset. 

In [None]:
# Checking if the import job already exists

# List the import jobs
items_dataset_import_jobs = personalize.list_dataset_import_jobs(
    datasetArn=workshop_items_dataset_arn,
    maxResults=100
)['datasetImportJobs']

job_exists = False
job_arn = None

#check if there is an existing job with the prefix
for job in items_dataset_import_jobs:
    if (items_import_job_name in job['jobName']):
        job_exists = True
        job_arn = job['datasetImportJobArn']
    
if (job_exists):
    workshop_items_dataset_import_job_arn =  job_arn
    print('The Items Import Job {} already exists.'.format(workshop_items_dataset_import_job_arn))
    print ('\nWe will be using the existing Items Import Job with workshop_items_dataset_import_job_arn = {}'.format(workshop_items_dataset_import_job_arn))
        
else:
    # If there is no import job with the prefix, create it:    
    create_dataset_import_job_response = personalize.create_dataset_import_job(
        jobName = items_import_job_name,
        datasetArn = workshop_items_dataset_arn,
        dataSource = {
            "dataLocation": "s3://{}/{}".format(bucket_name, items_filename)
        },
        roleArn = role_arn
    )

    workshop_items_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
    print(json.dumps(create_dataset_import_job_response, indent=2))
    print ('\nImporting the Items Data with workshop_items_dataset_import_job_arn = {}'.format(workshop_items_dataset_import_job_arn))
    
    

## Import the interactions data <a class="anchor" id="import_interactions"></a>
[Back to top](#top)

Earlier you created the dataset group and dataset to house your information, so now you will execute an import job that will load the interactions data from the S3 bucket into the Amazon Personalize dataset. 

In [None]:
# Check if the import job already exists

# List the import jobs
interactions_dataset_import_jobs = personalize.list_dataset_import_jobs(
    datasetArn=workshop_interactions_dataset_arn,
    maxResults=100
)['datasetImportJobs']

#check if there is an existing job with the prefix
job_exists = False  
job_arn = None

for job in interactions_dataset_import_jobs:
    if (interactions_import_job_name in job['jobName']):
        job_exists = True
        job_arn = job['datasetImportJobArn']
    
if (job_exists):
    workshop_interactions_dataset_import_job_arn = job_arn
    print('The Interactions Import Job {} already exists.'.format(workshop_interactions_dataset_import_job_arn))
    print ('\nWe will be using the existing Interactions Import Job with workshop_interactions_dataset_import_job_arn = {}'.format(workshop_interactions_dataset_import_job_arn))
        
else:
    # If there is no import job with the prefix, create it:   
    create_dataset_import_job_response = personalize.create_dataset_import_job(
        jobName = interactions_import_job_name,
        datasetArn = workshop_interactions_dataset_arn,
        dataSource = {
            "dataLocation": "s3://{}/{}".format(bucket_name, interactions_filename)
        },
        roleArn = role_arn
    )
    workshop_interactions_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
    print(json.dumps(create_dataset_import_job_response, indent=2))
    
    print ('\nImporting the Interactions Data with workshop_interactions_dataset_import_job_arn = {}'.format(workshop_interactions_dataset_import_job_arn))


## Import the User Metadata <a class="anchor" id="import_users"></a>
[Back to top](#top)

Earlier you created the dataset group and dataset to house your information, now you will execute an import job that will load the user data from the S3 bucket into the Amazon Personalize dataset. 

In [None]:
# Checking if the import job already exists

# List the import jobs
users_dataset_import_jobs = personalize.list_dataset_import_jobs(
    datasetArn=workshop_users_dataset_arn,
    maxResults=100
)['datasetImportJobs']

#check if there is an existing job with the prefix
job_exists = False 
job_arn = None      
for job in users_dataset_import_jobs:
    if (users_import_job_name in job['jobName']):
        job_exists = True
        job_arn = job['datasetImportJobArn']

if (job_exists):
    workshop_users_dataset_import_job_arn =  job_arn
    print('The Users Import Job {} already exists.'.format(workshop_users_dataset_import_job_arn))
    print ('\nWe will be using the existing Users Import Job with workshop_users_dataset_import_job_arn = {}'.format(workshop_users_dataset_import_job_arn))
        
else:
    # If there is no import job with the prefix, create it:  
    create_dataset_import_job_response = personalize.create_dataset_import_job(
        jobName = users_import_job_name,
        datasetArn = workshop_users_dataset_arn,
        dataSource = {
            "dataLocation": "s3://{}/{}".format(bucket_name, users_filename)
        },
        roleArn = role_arn
    )

    workshop_users_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
    print(json.dumps(create_dataset_import_job_response, indent=2))
    
    print ('\nImporting the Users Data with workshop_users_dataset_import_job_arn = {}'.format(workshop_users_dataset_import_job_arn))
    
    

Before we can use the dataset, the import job must be active. Execute the cell below and wait for it to show the ACTIVE status. It checks the status of the import job every minute, up to a maximum of 6 hours.

Importing the data can take some time, depending on the size of the dataset. In this workshop, the data import job has already been done for you. If you are not using the workshop environment, this should take around 15 minutes. While you're waiting you can learn more about Datasets and Schemas in [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html). 

We need to wait for the data imports to complete.

In [None]:
max_time = time.time() + 6*60*60 # 10 hours
while time.time() < max_time:

    # Interactions dataset import
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = workshop_interactions_dataset_import_job_arn
    )
    status_interactions_import = describe_dataset_import_job_response["datasetImportJob"]['status']
    
    if status_interactions_import == "ACTIVE":
        print("Build succeeded for {}".format(workshop_interactions_dataset_import_job_arn))
        
    elif status_interactions_import == "CREATE FAILED":
        print("Build failed for {}".format(workshop_interactions_dataset_import_job_arn))
        break
        
    if not status_interactions_import == "ACTIVE":
        print("The interactions dataset import is still in progress")
    else:
        print("The interactions dataset import is ACTIVE")

    # Items dataset import
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = workshop_items_dataset_import_job_arn
    )
    status_items_import = describe_dataset_import_job_response["datasetImportJob"]['status']
    
    if status_items_import == "ACTIVE":
        print("Build succeeded for {}".format(workshop_items_dataset_import_job_arn))
        
    elif status_items_import == "CREATE FAILED":
        print("Build failed for {}".format(workshop_items_dataset_import_job_arn))
        break
        
    if not status_items_import == "ACTIVE":
        print("The items dataset import is still in progress")
    else:
        print("The items dataset import is ACTIVE")
        
        
   # Users dataset import  
    describe_users_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = workshop_users_dataset_import_job_arn
    )
    status_users_import = describe_users_dataset_import_job_response["datasetImportJob"]['status']
    
    if status_users_import == "ACTIVE":
        print("Build succeeded for {}".format(workshop_users_dataset_import_job_arn))
        
    elif status_users_import == "CREATE FAILED":
        print("Build failed for {}".format(workshop_users_dataset_import_job_arn))
        break
        
    if not status_users_import == "ACTIVE":
        print("The user dataset import is still in progress")
    else:
        print("The user dataset import is ACTIVE")
        

    if status_interactions_import == "ACTIVE" and status_items_import == 'ACTIVE' and status_users_import  == 'ACTIVE':
        break

    print()
    time.sleep(30)

Congratulations! You now have imported data from your 3 fictional environments (Content Management System, Analytics/Customer Data Platform and Customer Resource Management/Subscriber Management System)!

We will use this data in the next labs. In order to use this data we will store these variables so subsequent notebooks can use this data. 

In [None]:
%store dataset_dir
%store data_dir
%store interactions_filename
%store items_filename
%store users_filename
%store workshop_dataset_group_arn
%store workshop_interactions_dataset_arn
%store workshop_items_dataset_arn
%store workshop_users_dataset_arn
%store workshop_interactions_schema_arn
%store workshop_items_schema_arn
%store workshop_users_schema_arn
%store workshop_rerank_solution_name
%store workshop_rerank_campaign_name
%store recommender_more_like_x_name
%store recommender_top_picks_for_you_name

[Go to the next notebook `02_Training_Layer_Recap.ipynb`](02_Training_Layer_Recap.ipynb) to continue.