# Personalize Sample with Database

The goal of this notebook is to show how to extract data from a database in a batch process and to inject it into Amazon Personalize in order to generate recommendations. 

This process will use the Django ORM to query the database as well as Pandas for manipulating the query results into usable CSVs for Amazon Personalize.

### Installing Personalize Into Boto3

Given that Amazon Personalize is a service in preview, the boto3 library and AWS CLI must be updated manually in order to allow calls to be enabled.

In [1]:
import boto3

import json
import numpy as np
import pandas as pd
import time

!wget -N https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize.json
!wget -N https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize-runtime.json
!aws configure add-model --service-model file://`pwd`/personalize.json --service-name personalize
!aws configure add-model --service-model file://`pwd`/personalize-runtime.json --service-name personalize-runtime

personalize = boto3.client(service_name='personalize', endpoint_url='https://personalize.us-east-1.amazonaws.com')
personalize_runtime = boto3.client(service_name='personalize-runtime', endpoint_url='https://personalize-runtime.us-east-1.amazonaws.com')

--2019-02-01 09:51:32--  https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize.json
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.204.24
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.204.24|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘personalize.json’ not modified on server. Omitting download.

--2019-02-01 09:51:32--  https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize-runtime.json
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.204.24
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.204.24|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘personalize-runtime.json’ not modified on server. Omitting download.



### Specify a Bucket and Data Output Location

Next specify the bucket that you will be using for your own data, this is where you will initially upload CSV copies of the data extracted from your database.

The filenames should reflect their role in the Personalize process.

In [9]:
bucket = "djangopersonalizedemo"           # replace with the name of your S3 bucket


user_metadata = "user_metadata.csv"
item_metadata = "item_metadata.csv"
user_interaction = "user_interaction.csv"

### Export, Prepare, and Upload Training Data

First in this process we will export the user_metadata items and place them in a CSV.


In [2]:
# Imports for Django and Pandas

import json
import datetime
import django
django.setup()

from movielens.models import User
from movielens.models import Item
from movielens.models import UserData

  """)


#### Prepare and Upload User Data

In [56]:
user_df = pd.DataFrame(list(User.objects.all().values('user_id', 'age','gender','occupation','zip_code')))
pd.set_option('display.max_rows', 5)
user_df.columns = ['AGE', 'GENDER', 'OCCUPATION', 'USER_ID', 'ZIP_CODE']

In [57]:
user_df.to_csv(user_metadata, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(user_metadata).upload_file(user_metadata)

#### Prepare and Upload Item Data

In [82]:
item_df = pd.DataFrame(list(Item.objects.all().values('movie_id', 'movie_title', 'release_date', 'IMDB_URL',
          'unknown', 'Action', 'Adventure', 'Animation', 'Children', 'Comedy', 'Crime',
          'Documentary', 'Drama', 'Fantasy', 'Film_Noir', 'Horror', 'Musical', 'Mystery',
          'Romance', 'Sci_Fi', 'Thriller', 'War', 'Western')))
item_df.columns
new_columns = []
for item in item_df.columns:
    new_columns.append(item.upper())
item_df.columns = new_columns
item_df=item_df.rename(columns = {'MOVIE_ID':'ITEM_ID'})
item_df.head()

Unnamed: 0,ACTION,ADVENTURE,ANIMATION,CHILDREN,COMEDY,CRIME,DOCUMENTARY,DRAMA,FANTASY,FILM_NOIR,...,MYSTERY,ROMANCE,SCI_FI,THRILLER,WAR,WESTERN,ITEM_ID,MOVIE_TITLE,RELEASE_DATE,UNKNOWN
0,0,0,1,1,1,0,0,0,0,0,...,0,0,0,0,0,0,1,Toy Story (1995),01-Jan-1995,0
1,1,1,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,2,GoldenEye (1995),01-Jan-1995,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,3,Four Rooms (1995),01-Jan-1995,0
3,1,0,0,0,1,0,0,1,0,0,...,0,0,0,0,0,0,4,Get Shorty (1995),01-Jan-1995,0
4,0,0,0,0,0,1,0,1,0,0,...,0,0,0,1,0,0,5,Copycat (1995),01-Jan-1995,0


In [83]:
item_df.to_csv(item_metadata, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(item_metadata).upload_file(item_metadata)

#### Prepare and Upload User Interaction Data

In [51]:
# Note we are using the Django Queryset functionality to select ratings only >= 3.6
interaction_df = pd.DataFrame(list(UserData.objects.filter(rating__gte=3.6)
                                   .values('user_id', 'item_id', 'rating', 'timestamp')))
interaction_df.columns = ['ITEM_ID', 'RATING', 'TIMESTAMP', 'USER_ID']

interaction_df


Unnamed: 0,ITEM_ID,RATING,TIMESTAMP,USER_ID
0,1014,5,879781125,286
1,222,5,876042340,200
...,...,...,...,...
55373,538,4,892685437,676
55374,204,5,879795543,716


In [52]:
interaction_df.to_csv(user_interaction, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(user_interaction).upload_file(user_interaction)

### Create Schema

The next large step will be creating schemas for all 3 files and then placing them inside the Personalize service.

#### Creating the Interaction Schema First

This is required to make Personalize function so we will start with the last data exported.

In [53]:
interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "RATING",
            "type": "int"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        },
        {
            "name": "USER_ID",
            "type": "string"
        }
    ],
    "version": "1.0"
}

create_schema_response = personalize.create_schema(
    name = "django-interactions-schemafinal",
    schema = json.dumps(interactions_schema)
)

schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:us-east-1:059124553121:schema/django-interactions-schemafinal",
  "ResponseMetadata": {
    "RequestId": "f2e46480-63b8-4d26-9a2e-88172520cf2b",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 20:34:04 GMT",
      "x-amzn-requestid": "f2e46480-63b8-4d26-9a2e-88172520cf2b",
      "content-length": "97",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Create and Wait for Dataset Group

#### Create Dataset Group

In [42]:
create_dataset_group_response = personalize.create_dataset_group(
    name = "django-dataset-group"
)

dataset_group_arn = create_dataset_group_response['datasetGroupArn']
print(json.dumps(create_dataset_group_response, indent=2))

{
  "datasetGroupArn": "arn:aws:personalize:us-east-1:059124553121:dataset-group/django-dataset-group",
  "ResponseMetadata": {
    "RequestId": "d5a7c6aa-734a-45b6-9078-81ec62b90e45",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 20:03:15 GMT",
      "x-amzn-requestid": "d5a7c6aa-734a-45b6-9078-81ec62b90e45",
      "content-length": "99",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Wait for Dataset Group to Have ACTIVE Status

In [43]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetGroup: CREATE PENDING
DatasetGroup: ACTIVE


### Create Dataset

In [44]:
dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = schema_arn
)

dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:us-east-1:059124553121:dataset/django-dataset-group/INTERACTIONS",
  "ResponseMetadata": {
    "RequestId": "5d2f717c-a3c6-4cb4-b404-3bd982155472",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 20:04:20 GMT",
      "x-amzn-requestid": "5d2f717c-a3c6-4cb4-b404-3bd982155472",
      "content-length": "101",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Prepare, Create, and Wait for Dataset Import Job

#### Attach policy to S3 bucket

In [25]:
s3 = boto3.client("s3")

policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy));

#### Create S3 Read Only Access Role

In [31]:
iam = boto3.client("iam")

role_name = "PersonalizeS3RoleDjango"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

create_role_response = iam.create_role(
    RoleName = role_name,
    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
);

iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
);

role_arn = create_role_response["Role"]["Arn"]
print(role_arn)

arn:aws:iam::059124553121:role/PersonalizeS3RoleDjango


#### Create Dataset Import Job

In [54]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "django-dataset-import-job5",
    datasetArn = dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, user_interaction)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:us-east-1:059124553121:dataset-import-job/django-dataset-import-job5",
  "ResponseMetadata": {
    "RequestId": "4f331351-db65-467a-a2a9-47d37d876d73",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 20:42:48 GMT",
      "x-amzn-requestid": "4f331351-db65-467a-a2a9-47d37d876d73",
      "content-length": "114",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Wait for Dataset Import Job and Dataset Import Job Run to Have ACTIVE Status

In [55]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    
    dataset_import_job = describe_dataset_import_job_response["datasetImportJob"]
    if "latestDatasetImportJobRun" not in dataset_import_job:
        status = dataset_import_job["status"]
        print("DatasetImportJob: {}".format(status))
    else:
        status = dataset_import_job["latestDatasetImportJobRun"]["status"]
        print("LatestDatasetImportJobRun: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE PENDING
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: ACTIVE


## Import User Data:

In [62]:
# ['AGE', 'GENDER', 'OCCUPATION', 'USER_ID', 'ZIP_CODE']
user_schema = {
    "type": "record",
    "name": "User",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "AGE",
            "type": "int"
        },
        {
            "name": "GENDER",
            "type": "string"
        },
        {
            "name": "OCCUPATION",
            "type": "string"
        },
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ZIP_CODE",
            "type": "string"
        }
    ],
    "version": "1.0"
}

create_schema_response = personalize.create_schema(
    name = "django-users-schema",
    schema = json.dumps(user_schema)
)

schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:us-east-1:059124553121:schema/django-users-schema",
  "ResponseMetadata": {
    "RequestId": "d7685786-91f7-4d38-9cff-56124fa3fe1a",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 21:02:50 GMT",
      "x-amzn-requestid": "d7685786-91f7-4d38-9cff-56124fa3fe1a",
      "content-length": "85",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


Now given that the dataset group was already created, we will continue to use that but updating the dataset type below, then uploading our data into the system.

In [66]:
dataset_type = "USERS"
create_dataset_response = personalize.create_dataset(
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = schema_arn
)

dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:us-east-1:059124553121:dataset/django-dataset-group/USERS",
  "ResponseMetadata": {
    "RequestId": "81733c49-a749-413b-90fe-eebcab78f5f2",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 21:06:16 GMT",
      "x-amzn-requestid": "81733c49-a749-413b-90fe-eebcab78f5f2",
      "content-length": "94",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [67]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "django-import-users",
    datasetArn = dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, user_metadata)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:us-east-1:059124553121:dataset-import-job/django-import-users",
  "ResponseMetadata": {
    "RequestId": "8b5d613f-48b0-4bb2-8a75-5f0d4e1dbff6",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 21:07:29 GMT",
      "x-amzn-requestid": "8b5d613f-48b0-4bb2-8a75-5f0d4e1dbff6",
      "content-length": "107",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [68]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    
    dataset_import_job = describe_dataset_import_job_response["datasetImportJob"]
    if "latestDatasetImportJobRun" not in dataset_import_job:
        status = dataset_import_job["status"]
        print("DatasetImportJob: {}".format(status))
    else:
        status = dataset_import_job["latestDatasetImportJobRun"]["status"]
        print("LatestDatasetImportJobRun: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

LatestDatasetImportJobRun: CREATE PENDING
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: ACTIVE


## Import Item Data:

In [84]:
item_schema = {
    "type": "record",
    "name": "Item",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ACTION",
            "type": "int"
        },
        {
            "name": "ADVENTURE",
            "type": "int"
        },
        {
            "name": "ANIMATION",
            "type": "int"
        },
        {
            "name": "CHILDREN",
            "type": "int"
        },
        {
            "name": "COMEDY",
            "type": "int"
        },
        {
            "name": "CRIME",
            "type": "int"
        },
        {
            "name": "DOCUMENTARY",
            "type": "int"
        },
        {
            "name": "DRAMA",
            "type": "int"
        },
        {
            "name": "FANTASY",
            "type": "int"
        },
        {
            "name": "FILM_NOIR",
            "type": "int"
        },
        {
            "name": "HORROR",
            "type": "int"
        },
        {
            "name": "IMDB_URL",
            "type": "string"
        },
        {
            "name": "MUSICAL",
            "type": "int"
        },
        {
            "name": "MYSTERY",
            "type": "int"
        },
        {
            "name": "ROMANCE",
            "type": "int"
        },
        {
            "name": "SCI_FI",
            "type": "int"
        },
        {
            "name": "THRILLER",
            "type": "int"
        },
        {
            "name": "WAR",
            "type": "int"
        },
        
        {
            "name": "WESTERN",
            "type": "int"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "MOVIE_TITLE",
            "type": "string"
        },
        {
            "name": "RELEASE_DATE",
            "type": "string"
        },
        {
            "name": "UNKNOWN",
            "type": "int"
        }
    ],
    "version": "1.0"
}

create_schema_response = personalize.create_schema(
    name = "django-items-schema-final",
    schema = json.dumps(item_schema)
)

schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:us-east-1:059124553121:schema/django-items-schema-final",
  "ResponseMetadata": {
    "RequestId": "c7b4949f-a458-4ce4-9ea2-0840358980ed",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 22:10:50 GMT",
      "x-amzn-requestid": "c7b4949f-a458-4ce4-9ea2-0840358980ed",
      "content-length": "91",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [85]:
dataset_type = "ITEMS"
create_dataset_response = personalize.create_dataset(
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = schema_arn
)

dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

ResourceAlreadyExistsException: An error occurred (ResourceAlreadyExistsException) when calling the CreateDataset operation: Another dataset of type: ITEMS already exists in the provided dataset group

In [87]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "django-import-items-final",
    datasetArn = dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, item_metadata)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:us-east-1:059124553121:dataset-import-job/django-import-items-final",
  "ResponseMetadata": {
    "RequestId": "d079eec1-4cfc-429c-ac83-cf98ba48a722",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 22:11:38 GMT",
      "x-amzn-requestid": "d079eec1-4cfc-429c-ac83-cf98ba48a722",
      "content-length": "113",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [88]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    
    dataset_import_job = describe_dataset_import_job_response["datasetImportJob"]
    if "latestDatasetImportJobRun" not in dataset_import_job:
        status = dataset_import_job["status"]
        print("DatasetImportJob: {}".format(status))
    else:
        status = dataset_import_job["latestDatasetImportJobRun"]["status"]
        print("LatestDatasetImportJobRun: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE PENDING
LatestDatasetImportJobRun: CREATE FAILED


### Select Recipe

In [91]:
recipe_list = [
    "arn:aws:personalize:::recipe/awspersonalizehrnnmodel",
    "arn:aws:personalize:::recipe/awspersonalizedeepfmmodel",
    "arn:aws:personalize:::recipe/awspersonalizesimsmodel",
    "arn:aws:personalize:::recipe/awspersonalizeffnnmodel",
    "arn:aws:personalize:::recipe/popularity-baseline"
]

recipe_arn = recipe_list[0]
print(recipe_arn)

arn:aws:personalize:::recipe/awspersonalizehrnnmodel


### Create and Wait for Solution

#### Create Solution

In [92]:
create_solution_response = personalize.create_solution(
    name = "Django-movielens-soln",
    datasetGroupArn = dataset_group_arn,
    recipeArn = recipe_arn,
    minProvisionedTPS = 1
)

solution_arn = create_solution_response['solutionArn']
print(json.dumps(create_solution_response, indent=2))

{
  "solutionArn": "arn:aws:personalize:us-east-1:059124553121:solution/Django-movielens-soln",
  "ResponseMetadata": {
    "RequestId": "d8476fd4-c10f-41be-92c1-835dd6c3200c",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 22:45:14 GMT",
      "x-amzn-requestid": "d8476fd4-c10f-41be-92c1-835dd6c3200c",
      "content-length": "91",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Wait for Solution to Have ACTIVE Status

In [93]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_solution_response = personalize.describe_solution(
        solutionArn = solution_arn
    )
    status = describe_solution_response["solution"]["status"]
    print("Solution: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREA

#### Get Metrics of Solution

In [94]:
get_metrics_response = personalize.get_metrics(
    solutionArn = solution_arn
)

print(json.dumps(get_metrics_response, indent=2))

{
  "metrics": {
    "arn:aws:personalize:us-east-1:059124553121:model/awspersonalizehrnnmodel-be286b5e": {
      "_num_evaluation_users": 91.0,
      "_num_unique_items": 1448.0,
      "_user_history_length_10_pct_quantile": 12.0,
      "_user_history_length_50_pct_quantile": 34.0,
      "_user_history_length_90_pct_quantile": 140.0,
      "_user_history_length_mean": 53.18681318681319,
      "coverage": 0.27624309392265195,
      "mean_reciprocal_rank": 0.03940127734228371,
      "normalized_discounted_cumulative_gain_at_10": 0.045510255107105546,
      "normalized_discounted_cumulative_gain_at_25": 0.07627253696341692,
      "normalized_discounted_cumulative_gain_at_5": 0.02671073140739992,
      "precision_at_10": 0.008791208791208791,
      "precision_at_25": 0.008351648351648353,
      "precision_at_5": 0.006593406593406594
    }
  },
  "ResponseMetadata": {
    "RequestId": "06f291df-fa41-40e5-91b9-ba42f50d2144",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-typ

### Create and Wait for Campaign

#### Create campaign

In [95]:
create_campaign_response = personalize.create_campaign(
    name = "Django-campaign",
    solutionArn = solution_arn,
    updateMode = "MANUAL"
)

campaign_arn = create_campaign_response['campaignArn']
print(json.dumps(create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:059124553121:campaign/Django-campaign",
  "ResponseMetadata": {
    "RequestId": "0a0f6c44-b923-44b0-bb4f-7dbc5f48069c",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 01 Feb 2019 23:26:32 GMT",
      "x-amzn-requestid": "0a0f6c44-b923-44b0-bb4f-7dbc5f48069c",
      "content-length": "85",
      "connection": "keep-alive"
    },
    "RetryAttempts": 3
  }
}


#### Wait for Campaign to Have ACTIVE Status

In [96]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_campaign_response = personalize.describe_campaign(
        campaignArn = campaign_arn
    )
    status = describe_campaign_response["campaign"]["status"]
    print("Campaign: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

Campaign: CREATE PENDING
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: ACTIVE


### Get Recommendations

#### Select a User and an Item

In [19]:
#items = pd.read_csv('./ml-100k/u.item', sep='|', usecols=[0,1], header=None)
#items.columns = ['ITEM_ID', 'TITLE']

item_title = item_df.loc[items['ITEM_ID'] == item_id].values[0][-1]
print("USER: {}".format(user_id))
print("ITEM: {}".format(item_title))

items

USER: 711
ITEM: Silence of the Lambs, The (1991)


Unnamed: 0,ITEM_ID,TITLE
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
...,...,...
1680,1681,You So Crazy (1994)
1681,1682,Scream of Stone (Schrei aus Stein) (1991)


#### Call GetRecommendations

In [113]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(155),
)


print("Recommendations: ")
item_list = get_recommendations_response['itemList']
for item in item_list:
    print(item_df.iloc[int(item['itemId'])]['MOVIE_TITLE'])
#print("Recommendations: {}".format(json.dumps(title_list, indent=2)))

Recommendations: 
Conspiracy Theory (1997)
Fly Away Home (1996)
Critical Care (1997)
Paradise Lost: The Child Murders at Robin Hood Hills (1996)
3 Ninjas: High Noon At Mega Mountain (1998)
Devil's Advocate, The (1997)
Mrs. Brown (Her Majesty, Mrs. Brown) (1997)
Kiss the Girls (1997)
Deconstructing Harry (1997)
In the Company of Men (1997)
Time Tracers (1995)
Donnie Brasco (1997)
L.A. Confidential (1997)
Starship Troopers (1997)
Ulee's Gold (1997)
Wag the Dog (1997)
Ice Storm, The (1997)
Midnight in the Garden of Good and Evil (1997)
FairyTale: A True Story (1997)
Dark City (1998)
U Turn (1997)
Desperate Measures (1998)
Full Monty, The (1997)
Murder at 1600 (1997)
Afterglow (1997)
