# Creating Clickstream Data

Recommendations can be made more context aware and accurate by including user behavior like clicks. This notebook assumes you have fully deployed the Personalens project both as an application and the Personalize service in your AWS account.

## Getting Started

Before running the rest of this notebook fill out the variables below:

In [9]:
# Obtain these from the DB Notebook
campaignArn = "arn:aws:personalize:us-east-1:059124553121:campaign/Django-campaign"


In [5]:
# Imports
import boto3

import json
import numpy as np
import pandas as pd
import time

!wget -N https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize.json
!wget -N https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize-runtime.json
!aws configure add-model --service-model file://`pwd`/personalize.json --service-name personalize
!aws configure add-model --service-model file://`pwd`/personalize-runtime.json --service-name personalize-runtime

personalize = boto3.client(service_name='personalize', endpoint_url='https://personalize.us-east-1.amazonaws.com')
personalize_runtime = boto3.client(service_name='personalize-runtime', endpoint_url='https://personalize-runtime.us-east-1.amazonaws.com')



--2019-02-06 17:10:22--  https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize.json
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.216.192
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.216.192|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘personalize.json’ not modified on server. Omitting download.

--2019-02-06 17:10:22--  https://s3-us-west-2.amazonaws.com/personalize-cli-json-models/personalize-runtime.json
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.216.192
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.216.192|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘personalize-runtime.json’ not modified on server. Omitting download.



In [6]:
# Imports for Django and Pandas

import json
import datetime
import django
django.setup()

from movielens.models import User
from movielens.models import Item
from movielens.models import UserData

## Creating an IAM Role

Follow the instructions here, then save the roleARN in the variable below.

https://docs.aws.amazon.com/personalize/latest/dg/getting-started.html#gs-create-role-with-permissions

In [11]:
roleArn = "arn:aws:iam::059124553121:role/PersonalizeRole"

## Create a Dataset group for the clickstream data

In [7]:
personalize = boto3.client('personalize')
response = personalize.create_dataset_group(name="MovieClickGroup")
print(response)

{'datasetGroupArn': 'arn:aws:personalize:us-east-1:059124553121:dataset-group/MovieClickGroup', 'ResponseMetadata': {'RequestId': '8f2dbdc6-47f3-4d0a-a8a5-81584c7287a9', 'HTTPStatusCode': 200, 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1', 'date': 'Wed, 06 Feb 2019 17:16:14 GMT', 'x-amzn-requestid': '8f2dbdc6-47f3-4d0a-a8a5-81584c7287a9', 'content-length': '94', 'connection': 'keep-alive'}, 'RetryAttempts': 0}}


## Getting a Tracking ID

A tracking ID associates an event with a dataset group and authorizes you to send data to Amazon Personalize. You generate a tracking ID by calling the CreateEventTracker API. You supply the dataset group ARN and the ARN of the IAM role that you created in Create an IAM Role.

In [12]:
# Obtain the dataset group ARN from the previous output:
datasetGroupArn = "arn:aws:personalize:us-east-1:059124553121:dataset-group/MovieClickGroup"
response = personalize.create_event_tracker(
    name="MovieClickTracker",
    datasetGroupArn=datasetGroupArn,
    roleArn=roleArn
)
print(response['eventTrackerArn'])
print(response['trackingId'])

arn:aws:personalize:us-east-1:059124553121:event-tracker/23a51644
32693937-8dc7-4a8e-8215-8c27c3025167


In [None]:
# Save the Tracker ID and ARN:
eventTrackerArn = "arn:aws:personalize:us-east-1:059124553121:event-tracker/23a51644"
trackingId = "32693937-8dc7-4a8e-8215-8c27c3025167"

## Define the Interaction Schema

In [None]:
schema = {
    "schema": {
        "name": "click-stream-schema",
        "schemaArn": "arn:aws:personalize:us-west-2:acct-id:schema/event-interactions-schema",
        "schema": {
          "type": "record",
          "name": "Interactions",
          "namespace": "com.amazonaws.concierge.schema",
          "fields": [
            {
              "name": "user_id",
              "type": "string"
            },
            {
              "name": "session_id",
              "type": "string"
            },
            {
              "name": "timestamp",
              "type": "long"
            }
            {
              "name": "event_type",
              "type": "string"
            },
            {
              "name": "item_id",
              "type": "string"
            },
            {
              "name": "event_value",
              "type": "string"
            },
          ],
          "version": "1.0"
        }"
    }
}