# Personalize Sample with Database

The goal of this notebook is to show how to extract data from a database in a batch process and to inject it into Amazon Personalize in order to generate recommendations. 

This process will use the Django ORM to query the database as well as Pandas for manipulating the query results into usable CSVs for Amazon Personalize.

### Installing Personalize Into Boto3

Given that Amazon Personalize is a service in preview, the boto3 library and AWS CLI must be updated manually in order to allow calls to be enabled.

In [1]:
import boto3

import json
import numpy as np
import pandas as pd
import time

!wget -N https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize.json
!wget -N https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize-runtime.json
!aws configure add-model --service-model file://`pwd`/personalize.json --service-name personalize
!aws configure add-model --service-model file://`pwd`/personalize-runtime.json --service-name personalize-runtime

personalize = boto3.client(service_name='personalize', endpoint_url='https://personalize.us-east-1.amazonaws.com')
personalize_runtime = boto3.client(service_name='personalize-runtime', endpoint_url='https://personalize-runtime.us-east-1.amazonaws.com')

--2019-02-11 20:34:37--  https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize.json
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.225.88
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.225.88|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘personalize.json’ not modified on server. Omitting download.

--2019-02-11 20:34:37--  https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize-runtime.json
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.225.88
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.225.88|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘personalize-runtime.json’ not modified on server. Omitting download.



### Specify a Bucket and Data Output Location

You will need to create an S3 bucket in order to store all of your data for Amazon Personalize.

Create a bucket by opening a new tab and visiting: https://s3.console.aws.amazon.com/s3/home?region=us-east-1#

Once there click the `Create Bucket` button. You will then be prompted for a unique name, please choose one. It is recommended to use something relevant to the lab like `first_namedjangopersonalize` then click `Next`. Click `Next` again. All of the options that are default are fine so click `Next` once more. The same goes for permissions so click `Create Bucket` and note the name of the bucket you created. You will update the cell below with that value.

Once the bucket has been created, continue on with the process below

Next specify the bucket that you will be using for your own data, this is where you will initially upload CSV copies of the data extracted from your database.

The filenames should reflect their role in the Personalize process.

In [2]:
bucket = "djangopersonalizedemo"           # replace with the name of your S3 bucket


user_metadata = "user_metadata.csv"
item_metadata = "item_metadata.csv"
user_interaction = "user_interaction.csv"

### Export, Prepare, and Upload Training Data

First in this process we will export the user_metadata items and place them in a CSV.


In [3]:
# Imports for Django and Pandas

import json
import datetime
import django
django.setup()

from movielens.models import User
from movielens.models import Item
from movielens.models import UserData

#### Prepare and Upload User Data

In [4]:
user_df = pd.DataFrame(list(User.objects.all().values('user_id', 'age','gender','occupation','zip_code')))
pd.set_option('display.max_rows', 5)
user_df.columns = ['AGE', 'GENDER', 'OCCUPATION', 'USER_ID', 'ZIP_CODE']
user_df.head()

Unnamed: 0,AGE,GENDER,OCCUPATION,USER_ID,ZIP_CODE
0,24,M,technician,1,85711
1,53,F,other,2,94043
2,23,M,writer,3,32067
3,24,M,technician,4,43537
4,33,F,other,5,15213


In [5]:
user_df.to_csv(user_metadata, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(user_metadata).upload_file(user_metadata)

#### Prepare and Upload Item Data

In [24]:
item_df = pd.DataFrame(list(Item.objects.all().values('movie_id', 'movie_title', 'release_date')))
item_df.columns
new_columns = []
for item in item_df.columns:
    new_columns.append(item.upper())
item_df.columns = new_columns
item_df=item_df.rename(columns = {'MOVIE_ID':'ITEM_ID'})
item_df.head()

Unnamed: 0,ITEM_ID,MOVIE_TITLE,RELEASE_DATE
0,1,Toy Story (1995),01-Jan-1995
1,2,GoldenEye (1995),01-Jan-1995
2,3,Four Rooms (1995),01-Jan-1995
3,4,Get Shorty (1995),01-Jan-1995
4,5,Copycat (1995),01-Jan-1995


In [25]:
item_df.to_csv(item_metadata, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(item_metadata).upload_file(item_metadata)

#### Prepare and Upload User Interaction Data

In [8]:
# Note we are using the Django Queryset functionality to select ratings only >= 3.6
interaction_df = pd.DataFrame(list(UserData.objects.filter(rating__gte=3.6)
                                   .values('user_id', 'item_id', 'rating', 'timestamp')))
interaction_df.columns = ['ITEM_ID', 'RATING', 'TIMESTAMP', 'USER_ID']

interaction_df


Unnamed: 0,ITEM_ID,RATING,TIMESTAMP,USER_ID
0,474,4,884182806,298
1,465,5,891628467,253
...,...,...,...,...
55373,538,4,892685437,676
55374,204,5,879795543,716


In [9]:
interaction_df.to_csv(user_interaction, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(user_interaction).upload_file(user_interaction)

### Create Schema

The next large step will be creating schemas for all 3 files and then placing them inside the Personalize service.

#### Creating the Interaction Schema First

This is required to make Personalize function so we will start with the last data exported. If you get an error that the resource already exists, change the name variable below.

In [10]:
interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "RATING",
            "type": "int"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        },
        {
            "name": "USER_ID",
            "type": "string"
        }
    ],
    "version": "1.0"
}

create_schema_response = personalize.create_schema(
    name = "interactions-schemadjd",
    schema = json.dumps(interactions_schema)
)

schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:us-east-1:059124553121:schema/interactions-schemadjd",
  "ResponseMetadata": {
    "RequestId": "de8c7937-967b-4201-b1f4-e1c80db7a9c5",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 20:35:05 GMT",
      "x-amzn-requestid": "de8c7937-967b-4201-b1f4-e1c80db7a9c5",
      "content-length": "88",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Create and Wait for Dataset Group

#### Create Dataset Group

In [11]:
create_dataset_group_response = personalize.create_dataset_group(
    name = "personalens-dataset-group"
)

dataset_group_arn = create_dataset_group_response['datasetGroupArn']
print(json.dumps(create_dataset_group_response, indent=2))

{
  "datasetGroupArn": "arn:aws:personalize:us-east-1:059124553121:dataset-group/personalens-dataset-group",
  "ResponseMetadata": {
    "RequestId": "386d2038-8821-45aa-812b-d39d5b496daa",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 20:35:10 GMT",
      "x-amzn-requestid": "386d2038-8821-45aa-812b-d39d5b496daa",
      "content-length": "104",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Wait for Dataset Group to Have ACTIVE Status

In [12]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetGroup: CREATE PENDING
DatasetGroup: ACTIVE


### Create Dataset

In [13]:
dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = schema_arn
)

dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:us-east-1:059124553121:dataset/personalens-dataset-group/INTERACTIONS",
  "ResponseMetadata": {
    "RequestId": "c6b1db21-26e5-4a9f-9279-caa201f5ad54",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 20:36:21 GMT",
      "x-amzn-requestid": "c6b1db21-26e5-4a9f-9279-caa201f5ad54",
      "content-length": "106",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Prepare, Create, and Wait for Dataset Import Job

#### Attach policy to S3 bucket

In [14]:
s3 = boto3.client("s3")

policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy));

#### Create S3 Read Only Access Role

Note you pay need to update the role_name if you get a notice that it already exists

In [15]:
iam = boto3.client("iam")

role_name = "PersonalizeS3RoleDjangoF"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

create_role_response = iam.create_role(
    RoleName = role_name,
    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
);

iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
);

role_arn = create_role_response["Role"]["Arn"]
print(role_arn)

arn:aws:iam::059124553121:role/PersonalizeS3RoleDjangoF


#### Create Dataset Import Job

In [16]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "django-dataset-import-jobI",
    datasetArn = dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, user_interaction)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:us-east-1:059124553121:dataset-import-job/django-dataset-import-jobI",
  "ResponseMetadata": {
    "RequestId": "3b60419c-a585-4fa4-951f-6706eb4e5c9f",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 20:37:32 GMT",
      "x-amzn-requestid": "3b60419c-a585-4fa4-951f-6706eb4e5c9f",
      "content-length": "114",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Wait for Dataset Import Job and Dataset Import Job Run to Have ACTIVE Status

In [17]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    
    dataset_import_job = describe_dataset_import_job_response["datasetImportJob"]
    if "latestDatasetImportJobRun" not in dataset_import_job:
        status = dataset_import_job["status"]
        print("DatasetImportJob: {}".format(status))
    else:
        status = dataset_import_job["latestDatasetImportJobRun"]["status"]
        print("LatestDatasetImportJobRun: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE PENDING
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: ACTIVE


## Import User Data:

In [18]:
# ['AGE', 'GENDER', 'OCCUPATION', 'USER_ID', 'ZIP_CODE']
user_schema = {
    "type": "record",
    "name": "User",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "AGE",
            "type": "int"
        },
        {
            "name": "GENDER",
            "type": "string"
        },
        {
            "name": "OCCUPATION",
            "type": "string"
        },
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ZIP_CODE",
            "type": "string"
        }
    ],
    "version": "1.0"
}

create_schema_response = personalize.create_schema(
    name = "django-users-schemaF",
    schema = json.dumps(user_schema)
)

schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:us-east-1:059124553121:schema/django-users-schemaF",
  "ResponseMetadata": {
    "RequestId": "8195882c-a738-4cb9-95ba-b6c01fa690cb",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 20:52:46 GMT",
      "x-amzn-requestid": "8195882c-a738-4cb9-95ba-b6c01fa690cb",
      "content-length": "86",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


Now given that the dataset group was already created, we will continue to use that but updating the dataset type below, then uploading our data into the system.

In [19]:
dataset_type = "USERS"
create_dataset_response = personalize.create_dataset(
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = schema_arn
)

dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:us-east-1:059124553121:dataset/personalens-dataset-group/USERS",
  "ResponseMetadata": {
    "RequestId": "8599f41c-feb8-4ffc-bb40-595e1c926c63",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 20:52:51 GMT",
      "x-amzn-requestid": "8599f41c-feb8-4ffc-bb40-595e1c926c63",
      "content-length": "99",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [20]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "django-import-usersF",
    datasetArn = dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, user_metadata)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:us-east-1:059124553121:dataset-import-job/django-import-usersF",
  "ResponseMetadata": {
    "RequestId": "56789edc-71a6-4f10-a904-3ef3f87b61f7",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 20:52:57 GMT",
      "x-amzn-requestid": "56789edc-71a6-4f10-a904-3ef3f87b61f7",
      "content-length": "108",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [21]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    
    dataset_import_job = describe_dataset_import_job_response["datasetImportJob"]
    if "latestDatasetImportJobRun" not in dataset_import_job:
        status = dataset_import_job["status"]
        print("DatasetImportJob: {}".format(status))
    else:
        status = dataset_import_job["latestDatasetImportJobRun"]["status"]
        print("LatestDatasetImportJobRun: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE PENDING
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: ACTIVE


## Import Item Data:

In [26]:
item_schema = {
    "type": "record",
    "name": "Item",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "MOVIE_TITLE",
            "type": "string"
        },
        {
            "name": "RELEASE_DATE",
            "type": "string"
        },
    ],
    "version": "1.0"
}

create_schema_response = personalize.create_schema(
    name = "django-items-schema-finalF",
    schema = json.dumps(item_schema)
)

schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:us-east-1:059124553121:schema/django-items-schema-finalF",
  "ResponseMetadata": {
    "RequestId": "81a1d44d-1a80-466b-8c95-bca2a5de5e7e",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 21:34:40 GMT",
      "x-amzn-requestid": "81a1d44d-1a80-466b-8c95-bca2a5de5e7e",
      "content-length": "92",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [27]:
dataset_type = "ITEMS"
create_dataset_response = personalize.create_dataset(
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = schema_arn
)

dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:us-east-1:059124553121:dataset/personalens-dataset-group/ITEMS",
  "ResponseMetadata": {
    "RequestId": "a547fba4-4f75-4af1-b450-8e0ea82ef7b5",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 21:34:48 GMT",
      "x-amzn-requestid": "a547fba4-4f75-4af1-b450-8e0ea82ef7b5",
      "content-length": "99",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [29]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "django-import-items-finalF",
    datasetArn = dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, item_metadata)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:us-east-1:059124553121:dataset-import-job/django-import-items-finalF",
  "ResponseMetadata": {
    "RequestId": "8f3325f2-176a-47f8-ba23-abcd4a1edc63",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 21:35:00 GMT",
      "x-amzn-requestid": "8f3325f2-176a-47f8-ba23-abcd4a1edc63",
      "content-length": "114",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [30]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    
    dataset_import_job = describe_dataset_import_job_response["datasetImportJob"]
    if "latestDatasetImportJobRun" not in dataset_import_job:
        status = dataset_import_job["status"]
        print("DatasetImportJob: {}".format(status))
    else:
        status = dataset_import_job["latestDatasetImportJobRun"]["status"]
        print("LatestDatasetImportJobRun: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE PENDING
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: CREATE IN_PROGRESS
LatestDatasetImportJobRun: ACTIVE


### Select Recipe

In [32]:
recipe_list = [
    "arn:aws:personalize:::recipe/awspersonalizehrnnmodel",
    "arn:aws:personalize:::recipe/awspersonalizedeepfmmodel",
    "arn:aws:personalize:::recipe/awspersonalizesimsmodel",
    "arn:aws:personalize:::recipe/awspersonalizeffnnmodel",
    "arn:aws:personalize:::recipe/popularity-baseline"
]

recipe_arn = recipe_list[0]
print(recipe_arn)

arn:aws:personalize:::recipe/awspersonalizehrnnmodel


### Create and Wait for Solution

#### Create Solution

In [33]:
create_solution_response = personalize.create_solution(
    name = "Dj-movielens-soln",
    datasetGroupArn = dataset_group_arn,
    recipeArn = recipe_arn,
    minProvisionedTPS = 1
)

solution_arn = create_solution_response['solutionArn']
print(json.dumps(create_solution_response, indent=2))

{
  "solutionArn": "arn:aws:personalize:us-east-1:059124553121:solution/Dj-movielens-soln",
  "ResponseMetadata": {
    "RequestId": "93eff6ca-c227-4e8b-a6b7-a8235c488d43",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 21:53:33 GMT",
      "x-amzn-requestid": "93eff6ca-c227-4e8b-a6b7-a8235c488d43",
      "content-length": "87",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Wait for Solution to Have ACTIVE Status

In [34]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_solution_response = personalize.describe_solution(
        solutionArn = solution_arn
    )
    status = describe_solution_response["solution"]["status"]
    print("Solution: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

Solution: CREATE PENDING
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE IN_PROGRESS
Solution: CREATE I

#### Get Metrics of Solution

In [35]:
get_metrics_response = personalize.get_metrics(
    solutionArn = solution_arn
)

print(json.dumps(get_metrics_response, indent=2))

{
  "metrics": {
    "arn:aws:personalize:us-east-1:059124553121:model/awspersonalizehrnnmodel-eecea2ab": {
      "_num_evaluation_users": 91.0,
      "_num_unique_items": 1448.0,
      "_user_history_length_10_pct_quantile": 12.0,
      "_user_history_length_50_pct_quantile": 34.0,
      "_user_history_length_90_pct_quantile": 140.0,
      "_user_history_length_mean": 53.18681318681319,
      "coverage": 0.2776243093922652,
      "mean_reciprocal_rank": 0.025642051692471864,
      "normalized_discounted_cumulative_gain_at_10": 0.045510255107105546,
      "normalized_discounted_cumulative_gain_at_25": 0.065279773604059,
      "normalized_discounted_cumulative_gain_at_5": 0.02671073140739992,
      "precision_at_10": 0.00989010989010989,
      "precision_at_25": 0.007032967032967033,
      "precision_at_5": 0.008791208791208791
    }
  },
  "ResponseMetadata": {
    "RequestId": "26a6f26a-b8f3-4bc7-9afa-69a0a769e324",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type":

### Create and Wait for Campaign

#### Create campaign

In [36]:
create_campaign_response = personalize.create_campaign(
    name = "Dj-campaign",
    solutionArn = solution_arn,
    updateMode = "MANUAL"
)

campaign_arn = create_campaign_response['campaignArn']
print(json.dumps(create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:059124553121:campaign/Dj-campaign",
  "ResponseMetadata": {
    "RequestId": "387c1b3a-1256-40d8-995e-8f4b5275fb0d",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 11 Feb 2019 23:07:57 GMT",
      "x-amzn-requestid": "387c1b3a-1256-40d8-995e-8f4b5275fb0d",
      "content-length": "81",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Capture the campaignARN

In the above response you will see a campaign arn like `arn:aws:personalize:us-east-1:059124553121:campaign/Dj-campaign` save this for use when you go back to the application side. That is how you will communicate with your model in order to generate recommendations.

#### Wait for Campaign to Have ACTIVE Status

In [37]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_campaign_response = personalize.describe_campaign(
        campaignArn = campaign_arn
    )
    status = describe_campaign_response["campaign"]["status"]
    print("Campaign: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

Campaign: CREATE PENDING
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: ACTIVE


### Get Recommendations

#### Call GetRecommendations

In [38]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(155),
)


print("Recommendations: ")
item_list = get_recommendations_response['itemList']
for item in item_list:
    print(item_df.iloc[int(item['itemId'])]['MOVIE_TITLE'])
#print("Recommendations: {}".format(json.dumps(title_list, indent=2)))

Recommendations: 
Marvin's Room (1996)
3 Ninjas: High Noon At Mega Mountain (1998)
Evita (1996)
Full Monty, The (1997)
Starship Troopers (1997)
George of the Jungle (1997)
Wag the Dog (1997)
Fierce Creatures (1997)
Game, The (1997)
In & Out (1997)
MatchMaker, The (1997)
Dark City (1998)
Donnie Brasco (1997)
Heat (1995)
L.A. Confidential (1997)
Wings of the Dove, The (1997)
As Good As It Gets (1997)
Soul Food (1997)
Event Horizon (1997)
Replacement Killers, The (1998)
U Turn (1997)
Fly Away Home (1996)
One Night Stand (1997)
Cop Land (1997)
Devil's Advocate, The (1997)


### Wrapping Up Here

Now that you have completed this notebook it is time to go back to the documentation, specifically the `IntegratingAmazonPersonalizeWithDjango.md` file.
It will be linked to in the page that referred you to this notebook.