# Personalize example

Amazon Personalize is a machine learning service that makes it easy for developers to create individualized recommendations for customers using their applications.

Machine learning is being increasingly used to improve customer engagement by powering personalized product and content recommendations, tailored search results, and targeted marketing promotions. However, developing the machine-learning capabilities necessary to produce these sophisticated recommendation systems has been beyond the reach of most organizations today due to the complexity. Amazon Personalize allows developers with no prior machine learning experience to easily build sophisticated personalization capabilities into their applications, using machine learning technology perfected from years of use on Amazon.com.

With Amazon Personalize, you provide an activity stream from your application – clicks, page views, signups, purchases, and so forth – as well as an inventory of the items you want to recommend, such as articles, products, videos, or music. You can also choose to provide Amazon Personalize with additional demographic information from your users such as age, or geographic location. Amazon Personalize will process and examine the data, identify what is meaningful, select the right algorithms, and train and optimize a personalization model that is customized for your data. All data analyzed by Amazon Personalize is kept private and secure, and only used for your customized recommendations. You can start serving personalized recommendations via a simple API call. You pay only for what you use, and there are no minimum fees and no upfront commitments.

Amazon Personalize is like having your own Amazon.com machine learning personalization team at your disposal, 24 hours a day.

In this example we will pull down the opensource movielens dataset. We will import this into Personalize, and then use the model to make personalized recommendations.


![title](Personalize-how-it-works.png)

# The Boto3 Personalize APIs.

there are three Personalize APIs. Each play a different role. 

### Boto3 Personalize

This is the API where you configure your data sets, define your model, select the algorithm and kick off training.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/personalize.html

### Boto3 Personalize Runtime   
    
With this API you can provide a user id or item id, and get back recommendations.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/personalize-runtime.html

### Boto3 Personalize Events

With this API, you can send new user intercation events back to Personalize whjich gets used for further training.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/personalize-events.html

### The steps in the example are as follows

![title](Personalize-steps-apis.png)

#### Import all the modules we will need

In [90]:
import boto3

import json
import numpy as np
import pandas as pd
import time
import datetime
from IPython.display import JSON



#### we created a Personalize client which we will use across this notebook
We also set a timer so that we can track the 

In [91]:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')
start = time.time()

In [92]:
def createBucket(bucketname):
    s3 = boto3.client('s3')
    response = s3.list_buckets()
    existingbuckets = [d['Name'] for d in response["Buckets"]]
    #print(existingbuckets)
    if bucketname not in existingbuckets:
        print("creating bucket " + bucketname)
        s3.create_bucket(Bucket=bucketname)
    else:
        print("bucket exists! " + bucketname)


### Specify a Bucket and Data Output Location

In [93]:
accountid = boto3.client('sts').get_caller_identity().get('Account')
bucket = "aimlbootcamp" + accountid

createBucket(bucket)

#bucket = "personalize-demo"       # replace with the name of your S3 bucket
filename = "movie-lens-100k.csv"  # replace with a name that you want to save the dataset under

bucket exists! aimlbootcamp247322960887


### Download, Prepare, and Upload Training Data

#### Download and Explore the Dataset

In [94]:
!wget -N http://files.grouplens.org/datasets/movielens/ml-100k.zip
!unzip -o ml-100k.zip
data = pd.read_csv('./ml-100k/u.data', sep='\t', names=['USER_ID', 'ITEM_ID', 'RATING', 'TIMESTAMP'])
pd.set_option('display.max_rows', 5)
data

--2020-01-23 12:57:41--  http://files.grouplens.org/datasets/movielens/ml-100k.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘ml-100k.zip’ not modified on server. Omitting download.

Archive:  ml-100k.zip
  inflating: ml-100k/allbut.pl       
  inflating: ml-100k/mku.sh          
  inflating: ml-100k/README          
  inflating: ml-100k/u.data          
  inflating: ml-100k/u.genre         
  inflating: ml-100k/u.info          
  inflating: ml-100k/u.item          
  inflating: ml-100k/u.occupation    
  inflating: ml-100k/u.user          
  inflating: ml-100k/u1.base         
  inflating: ml-100k/u1.test         
  inflating: ml-100k/u2.base         
  inflating: ml-100k/u2.test         
  inflating: ml-100k/u3.base         
  inflating: ml-100k/u3.test         
  inflating: ml-100k/u4.base         
  inflat

Unnamed: 0,USER_ID,ITEM_ID,RATING,TIMESTAMP
0,196,242,3,881250949
1,186,302,3,891717742
...,...,...,...,...
99998,13,225,2,882399156
99999,12,203,3,879959583


#### Prepare and Upload Data

In [95]:
data = data[data['RATING'] > 3.6]                # keep only movies rated 3.6 and above
data = data[['USER_ID', 'ITEM_ID', 'TIMESTAMP']] # select columns that match the columns in the schema below
data.to_csv(filename, index=False)

boto3.Session().resource('s3').Bucket(bucket).Object(filename).upload_file(filename)

### Next we will do the parts highlighted in red

![title](Personalize-create-dataset.png)

### Create Schema

First we need to create a schema that describes the data we are importing.


In [96]:
def createschema(schema, name):
    
    response = personalize.list_schemas(
        maxResults=100
    )

    #print("response: ", response)
    
    for item in response["schemas"]:
        if item["name"] == name:
            return item["schemaArn"]

    create_schema_response = personalize.create_schema(
        name = name,
        schema = json.dumps(schema)
    )

    schema_arn = create_schema_response['schemaArn']
    #print(json.dumps(create_schema_response, indent=2))
    return schema_arn

In [97]:
schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        }
    ],
    "version": "1.0"
}

schema_arn = createschema(schema, "aimlbootcamp-schema-personalize-20191102")
print("schema_arn: ", schema_arn)


schema_arn:  arn:aws:personalize:us-east-1:247322960887:schema/aimlbootcamp-schema-personalize-20191102


### Create and Wait for Dataset Group

#### Create Dataset Group

We create a dataset group next

In [98]:
def createdatasetGroup(name):
    response = personalize.list_dataset_groups(
        maxResults=100
    )
    print("response: ", response)
    
    for item in response["datasetGroups"]:
        if item["name"] == name:
            return item["datasetGroupArn"]
    create_dataset_group_response = personalize.create_dataset_group(
        name = name
    )

    dataset_group_arn = create_dataset_group_response['datasetGroupArn']
    #print(json.dumps(create_dataset_group_response, indent=2))   
    return dataset_group_arn

In [99]:
datasetPrefix = "aimlbootcamp-20200122"

dataset_group_arn = createdatasetGroup(datasetPrefix)
print("dataset_group_arn: ", dataset_group_arn)


response:  {'datasetGroups': [{'name': 'aimlbootcamp-20191102', 'datasetGroupArn': 'arn:aws:personalize:us-east-1:247322960887:dataset-group/aimlbootcamp-20191102', 'status': 'ACTIVE', 'creationDateTime': datetime.datetime(2019, 12, 27, 18, 37, 18, 797000, tzinfo=tzlocal()), 'lastUpdatedDateTime': datetime.datetime(2019, 12, 27, 18, 37, 39, 933000, tzinfo=tzlocal())}, {'name': 'aimlbootcamp-20200122', 'datasetGroupArn': 'arn:aws:personalize:us-east-1:247322960887:dataset-group/aimlbootcamp-20200122', 'status': 'ACTIVE', 'creationDateTime': datetime.datetime(2020, 1, 22, 14, 44, 25, 300000, tzinfo=tzlocal()), 'lastUpdatedDateTime': datetime.datetime(2020, 1, 22, 14, 44, 27, 4000, tzinfo=tzlocal())}, {'name': 'danielc', 'datasetGroupArn': 'arn:aws:personalize:us-east-1:247322960887:dataset-group/danielc', 'status': 'ACTIVE', 'creationDateTime': datetime.datetime(2019, 9, 1, 3, 37, 13, 569000, tzinfo=tzlocal()), 'lastUpdatedDateTime': datetime.datetime(2019, 9, 1, 3, 37, 41, 721000, tzinf

#### Wait for Dataset Group to Have ACTIVE Status

In [100]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(5)

DatasetGroup: ACTIVE


### Create Dataset

In [101]:
def createdataset(name, dataset_type, dataset_group_arn, schema_arn):
    response = personalize.list_datasets(
        datasetGroupArn=dataset_group_arn,
        maxResults=100
    )
    
    #print("response: ", response)
    for item in response["datasets"]:
        print("inspecting: ", item)
        if item["name"] == name or item["datasetType"] == dataset_type:
            #return item["datasetArn"]
            response = personalize.delete_dataset(
                datasetArn=item["datasetArn"]
            )
            max_time = time.time() + 2*60 # 10 minutes
            while time.time() < max_time:
                try:
                    response = personalize.describe_dataset(datasetArn=item["datasetArn"])
                except Exception as e:
                    if "ResourceNotFoundException".lower() in str(e).lower():
                        print("delete completed")
                        break
                except:
                    raise
                                    
                status = response["dataset"]["status"]
                print("DatasetGroup: {}".format(status))

                time.sleep(5)
            
            
    create_dataset_response = personalize.create_dataset(
        name = name,
        datasetType = dataset_type,
        datasetGroupArn = dataset_group_arn,
        schemaArn = schema_arn
    )
    print("dataset created....")
    dataset_arn = create_dataset_response['datasetArn']
    #print(json.dumps(create_dataset_response, indent=2))
    return dataset_arn

In [102]:
dataset_type = "INTERACTIONS"
dataset_arn = createdataset(datasetPrefix + "-dataset", dataset_type, dataset_group_arn, schema_arn)


print("datasetarn: ", dataset_arn)

inspecting:  {'name': 'aimlbootcamp-20200122-dataset', 'datasetArn': 'arn:aws:personalize:us-east-1:247322960887:dataset/aimlbootcamp-20200122/INTERACTIONS', 'datasetType': 'INTERACTIONS', 'status': 'ACTIVE', 'creationDateTime': datetime.datetime(2020, 1, 23, 2, 38, 35, 367000, tzinfo=tzlocal()), 'lastUpdatedDateTime': datetime.datetime(2020, 1, 23, 2, 38, 35, 367000, tzinfo=tzlocal())}
DatasetGroup: DELETE PENDING
DatasetGroup: DELETE PENDING
delete completed
dataset created....
datasetarn:  arn:aws:personalize:us-east-1:247322960887:dataset/aimlbootcamp-20200122/INTERACTIONS


In [103]:
end = time.time()
print("time so far: ", end - start)

time so far:  11.66714859008789


### Next we will do the steps highlighted in red

![title](Personalize-create-solution-campaign.png)

### Prepare, Create, and Wait for Dataset Import Job

#### Attach Policy to S3 Bucket

In [104]:
s3 = boto3.client("s3")

policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicyBootcamp",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicyAIMLBootcamp",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:PutObject",
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}

policycreateresponse = s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy))

#### Create Personalize Role

In [105]:
def createPersonalizeIAMRole(role_name):
    iam = boto3.client("iam")
    assume_role_policy_document = {
        "Version": "2012-10-17",
        "Statement": [
            {
              "Effect": "Allow",
              "Principal": {
                "Service": "personalize.amazonaws.com"
              },
              "Action": "sts:AssumeRole"
            }
        ]
    }
    

    #print("response: ", response)
    #return
    try:
        create_role_response = iam.create_role(
            RoleName = role_name,
            AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
        )
        role_arn = create_role_response["Role"]["Arn"]
    except Exception as e:
        if "EntityAlreadyExists".lower() in str(e).lower():
            print("the role already exists!")
            response = iam.list_roles(
                PathPrefix="/",
                MaxItems=1000
            )
            #print("all roles: ", response)
            for item in response["Roles"]:
                if item["RoleName"] == role_name:
                    role_arn = item["Arn"]
                    break
                    
    except:
        raise

    # AmazonPersonalizeFullAccrole_arness provides access to any S3 bucket with a name that includes "personalize" or "Personalize" 
    # if you would like to use a bucket with a different name, please consider creating and attaching a new policy
    # that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
    policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
    iam.attach_role_policy(
        RoleName = role_name,
        PolicyArn = policy_arn
    )
    policy_arn = "arn:aws:iam::aws:policy/AmazonS3FullAccess"
    iam.attach_role_policy(
        RoleName = role_name,
        PolicyArn = policy_arn
    )   
    print("pausing execution to allow for IAM role propagation.....")
    time.sleep(30) # wait to allow IAM role policy attachment to propagate

    return role_arn   

In [106]:
role_name = "PersonalizeRoleAIMLBootcamp-imports-2"
role_arn = createPersonalizeIAMRole(role_name)
#json.dumps(create_dataset_response, indent=2)
print(role_arn)


the role already exists!
pausing execution to allow for IAM role propagation.....
arn:aws:iam::247322960887:role/PersonalizeRoleAIMLBootcamp-imports-2


#### Create Dataset Import Job

In [107]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = datasetPrefix + "-import-job-2",
    datasetArn = dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, filename)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
#print(json.dumps(create_dataset_import_job_response, indent=2))
JSON(create_dataset_import_job_response)

<IPython.core.display.JSON object>

#### Wait for Dataset Import Job to Have ACTIVE Status

In [108]:
end = time.time()
print("time so far: ", end - start)

time so far:  42.33345341682434


In [None]:
max_time = time.time() + 3*60*60 # 3 hours
showme = " "
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print(showme, datetime.datetime.now(), " DatasetImportJob: {}".format(status), "             ", end='\r')
    showme += "*"
    if len(showme)> 10:
        showme = " "
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(3.5719)

 **** 2020-01-23 13:03:25.411522  DatasetImportJob: CREATE IN_PROGRESS                   

![title](Personalize-datagroup-dataset-user-interaction.png)

### Select Recipe

Personalize comes with predefined recipies that you can choose from depending on your case.

https://docs.aws.amazon.com/personalize/latest/dg/working-with-predefined-recipes.html

In [None]:
list_recipes_response = personalize.list_recipes()
recipe_arn = "arn:aws:personalize:::recipe/aws-hrnn" # aws-hrnn selected for demo purposes
list_recipes_response

### Create and Wait for Solution

#### Create Solution

In [None]:
def createSolution(name, dataset_group_arn, recipe_arn):
    response = personalize.list_solutions(
        datasetGroupArn=dataset_group_arn,
        maxResults=100
    )
    #print(response)
    for item in response["solutions"]:
        if item["name"] == name:
            print("solution with the same name already exists. Deleting the existing one.")
            try:
                response = personalize.delete_solution(solutionArn=item["solutionArn"])
            except Exception as e:
                if "ResourceInUseException".lower() in str(e).lower():
                    print("delete failed as resource is in use")
            except:
                raise
        
            return createSolution(name+"1", dataset_group_arn, recipe_arn) #delete old one. create new one with a new name 
            
    create_solution_response = personalize.create_solution(
        name = name,
        datasetGroupArn = dataset_group_arn,
        recipeArn = recipe_arn
    )

    solution_arn = create_solution_response['solutionArn']
    #print(json.dumps(create_solution_response, indent=2))
    return solution_arn

In [None]:
solution_arn = createSolution(datasetPrefix + "-aimlbootcampExampleSolution", dataset_group_arn, recipe_arn)
print(solution_arn)


#### Create Solution Version

In [None]:


create_solution_version_response = personalize.create_solution_version(
    solutionArn = solution_arn
)

solution_version_arn = create_solution_version_response['solutionVersionArn']
print(json.dumps(create_solution_version_response, indent=2))


### Behind the scenes the following is happening

![title](Personalize-behind-the-scenes.png)

#### Wait for Solution Version to Have ACTIVE Status

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_solution_version_response = personalize.describe_solution_version(
        solutionVersionArn = solution_version_arn
    )
    status = describe_solution_version_response["solutionVersion"]["status"]
    print(datetime.datetime.now(), " Solution Version Status: {}".format(status), "             ", end='\r')
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

In [None]:
end = time.time()
print("time so far: ", end - start)

### In the console you should see something like this

![title](Personalize-solution-version.png)

#### Get Metrics of Solution

In [None]:
get_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = solution_version_arn
)
JSON(get_solution_metrics_response)
#print(json.dumps(get_solution_metrics_response, indent=2))

### Create and Wait for Campaign

#### Create Campaign

In [None]:
def createcampaign(name, solutionVersionArn):

    response = personalize.list_campaigns(
        solutionArn=solutionVersionArn,
        maxResults=100
    )
    #print(response)

    for item in response["campaigns"]:
        if item["name"] == name:
            print("campaign already exists (deleting): ", name)
            response = personalize.delete_campaign(campaignArn=item["campaignArn"])
            return createcampaign(name+"1", solutionVersionArn) #recursively create/delete until no more collisions

    try:
        create_campaign_response = personalize.create_campaign(
            name = name,
            solutionVersionArn = solution_version_arn,
            minProvisionedTPS = 1
        )
    except Exception as e:
        if "ResourceAlreadyExistsException".lower() in str(e).lower():
            return createcampaign(name+"1", solutionVersionArn) #try recursively
    except:
        raise

    campaign_arn = create_campaign_response['campaignArn']
    #print(json.dumps(create_campaign_response, indent=2))
    print("created new campaign: ", name, campaign_arn)
    return campaign_arn

#### Wait for Campaign to Have ACTIVE Status

In [None]:
campaign_arn = createcampaign("aimlbootcampcampaign", solution_version_arn)
print(campaign_arn)

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_campaign_response = personalize.describe_campaign(
        campaignArn = campaign_arn
    )
    status = describe_campaign_response["campaign"]["status"]
    end = time.time()
    print("Campaign: {}".format(status), "time so far: ", end - start)
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

### Get Recommendations

#### Select a User and an Item

In [None]:
end = time.time()
print("time so far (s): ", end - start)

In [None]:
items = pd.read_csv('./ml-100k/u.item', sep='|', usecols=[0,1], encoding='latin-1')
items.columns = ['ITEM_ID', 'TITLE']

user_id, item_id, _ = data.sample().values[0]
item_title = items.loc[items['ITEM_ID'] == item_id].values[0][-1]
print("USER: {}".format(user_id))
print("ITEM: {}".format(item_title))

#items

#### Call GetRecommendations

Note that here we use the Run-time API. The specific method we are calling is this one.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/personalize-runtime.html#PersonalizeRuntime.Client.get_recommendations



In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(user_id),
    itemId = str(item_id)
)

item_list = get_recommendations_response['itemList']
title_list = [items.loc[items['ITEM_ID'] == np.int(item['itemId'])].values[0][-1] for item in item_list]

print("Recommendations: {}".format(json.dumps(title_list, indent=2)))

In [None]:

def putevents(userid, trackingid, sessionid, eventlist):
    client = boto3.client('personalize-events')
    response = client.put_events(
        trackingId=trackingid,
        userId=userid,
        sessionId=sessionid,
        eventList=eventlist
    )
    return response

In [None]:
# TO DO - implement