# Environment Setup

Amazon Personalize will access source data from (and optionally export batch recommendations to) Amazon S3... So before we start using Personalize, we should set up our bucket(s) and permissions.

Production environments will typically automate this setup via tools like [AWS CloudFormation](https://aws.amazon.com/cloudformation/) and the [AWS Cloud Development Kit](https://aws.amazon.com/cdk/).

Since we're just experimenting, we'll instead use this notebook to keep the setup easily customizable for your environment. (Assuming you're running the notebook with appropriate IAM and S3 administrative permissions).


In [None]:
# Python Built-Ins:
import json
from time import sleep

# External Dependencies:
import boto3  # (AWS Python SDK)

## Skipping this Notebook

**If** you have buckets and permissions set up already, or plan to work through the following steps in the AWS Console instead of running the Python code - you'll need to **store** your setup to work with the rest of the notebooks in this series.

Simply un-comment the code below (can select all the contents of the cell and press `Control`+`/`), replace the placeholder values with your own, and then run it:

In [None]:
# region = "ap-southeast-1"  # or "us-east-1", etc etc: Whichever AWS region you're working in
# %store region
# bucket_name = "DOC-EXAMPLE-BUCKET"  # Whatever you named your data bucket
# %store bucket_name
# export_bucket_name = bucket_name  # (Assuming you want to export results to the same bucket?)
# %store export_bucket_name
# personalize_role_name = "PersonalizeRolePOC"
# %store personalize_role_name

## Connecting to an AWS Region

Assuming you're running this notebook on [Amazon SageMaker](https://aws.amazon.com/sagemaker/), it will already be associated with a particular [AWS Region](https://aws.amazon.com/about-aws/global-infrastructure/) and be running with certain [AWS IAM Permissions](https://aws.amazon.com/iam/) (defined by the **notebook execution role**).

If you're running the notebook locally, you may need to explicitly log in e.g. using `aws configure` from the [AWS CLI](https://aws.amazon.com/cli/), and set the specific region you'd like to work with.

In [None]:
session = boto3.Session(region_name=None)  # To set a specific region, replace None with e.g. "us-east-1"
region = session.region_name  # We'll save the configured region to initialize later notebooks
print(region)
%store region

iam = session.client("iam")
s3 = session.resource("s3")

## S3 Bucket(s)

Amazon Personalize will read historical data from S3, and may export batch recommendations to S3.

By default, we'll create a single bucket for both with a partially-randomized name (since S3 bucket names must be globally unique).

You can customize this setup (e.g. to use an existing bucket instead) and/or configure through the [Amazon S3 Console](https://s3.console.aws.amazon.com/s3/home).

Just be sure to `%store` a valid `bucket_name` and `export_bucket_name` which exist in the same `region`: We'll use this below and in later notebooks.

In [None]:
# Choose a source data bucket name:
bucket_name = "{}-{}-personalizepocvod".format(
    session.client("sts").get_caller_identity().get("Account"),
    region,
)
print(bucket_name)
%store bucket_name

# Create the bucket (assuming it's new):
if region != "us-east-1":
    s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={ "LocationConstraint": region })
else:
    s3.create_bucket(Bucket=bucket_name)

In [None]:
# We'll assume any exports can go in the same bucket:

export_bucket_name = bucket_name
%store export_bucket_name

## S3 Bucket Policies

To access data in these buckets, Amazon Personalize needs permissions. This means creating **both**:

- An **execution role** with appropriate permissions to grant access to individual **import jobs** running with it, and
- A **bucket policy** to allow the Amazon Personalize **service** access in the first place

Below, we'll set up bucket policies for the buckets set up above:

In [None]:
for bucket in set((bucket_name, export_bucket_name)):
    policy = {
        "Version": "2012-10-17",
        "Id": "PersonalizeS3BucketAccessPolicy",
        "Statement": [
            {
                "Sid": "PersonalizeS3BucketAccessPolicy",
                "Effect": "Allow",
                "Principal": {
                    "Service": "personalize.amazonaws.com",
                },
                "Action": [
                    "s3:*Object",
                    "s3:ListBucket",
                ],
                "Resource": [
                    f"arn:aws:s3:::{bucket}",
                    f"arn:aws:s3:::{bucket}/*",
                ],
            },
        ],
    }
    s3.BucketPolicy(bucket).put(Policy=json.dumps(policy))
    print(f"Added policy to {bucket}")

## IAM Role for Personalize

To access data in these buckets, Amazon Personalize needs permissions. This means creating a **role** with appropriate access the buckets and which can be assumed by the Personalize service.

By default, we'll create a new role and attach necessary permissions here. You can customize this setup and/or configure through the [AWS IAM Console](https://console.aws.amazon.com/iam/home).

Just be sure to `%store` a valid `personalize_role_arn`: We'll use this in later notebooks.

In [None]:
personalize_role_name = "PersonalizeRolePOC"

create_role_response = iam.create_role(
    RoleName=personalize_role_name,
    AssumeRolePolicyDocument=json.dumps({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "personalize.amazonaws.com",
                },
                "Action": "sts:AssumeRole",
            },
        ]
    }),
)

personalize_role_arn = create_role_response["Role"]["Arn"]
print(personalize_role_arn)
%store personalize_role_arn

# Note that AmazonPersonalizeFullAccess provides access to some specifically-named default S3 buckets as well,
# but we just want it for the Forecast permissions themselves:
iam.attach_role_policy(
    RoleName=personalize_role_name,
    PolicyArn="arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess",
)

# By default (since we're experimenting), this code attaches over-generous S3 permissions (full access):
iam.attach_role_policy(
    RoleName=personalize_role_name,
    PolicyArn="arn:aws:iam::aws:policy/AmazonS3FullAccess",
)
# You could instead use something like the below to give access to *only* the relevant buckets:
# inline_s3_policy = {
#     "Version": "2012-10-17",
#     "Statement": [
#         {
#             "Effect": "Allow",
#             "Action": "s3:*",
#             "Resource": [
#                 # (Assuming you're not running in a different partition e.g. aws-cn)
#                 f"arn:aws:s3:::{bucket_name}",
#                 f"arn:aws:s3:::{bucket_name}/*",
#             ]
#         },
#     ],
# }
# if bucket_name != export_bucket_name:
#     inline_s3_policy["Statement"][0]["Resource"].append(f"arn:aws:s3:::{export_bucket_name}")
#     inline_s3_policy["Statement"][0]["Resource"].append(f"arn:aws:s3:::{export_bucket_name}/*")

# iam.put_role_policy(
#     RoleName=personalize_role_name,
#     PolicyName="PersonalizePoCBucketAccess",
#     PolicyDocument=json.dumps(inline_s3_policy)
# )

# IAM policy attachments *may* take up to a minute to propagate, so just to be safe:
sleep(60) 

In [None]:
# In case you get an iam.exceptions.EntityAlreadyExistsException on the role, can instead:
# personalize_role_arn = iam.get_role(RoleName=personalize_role_name)["Role"]["Arn"]
# print(personalize_role_arn)
# %store personalize_role_arn

## All set!

Your environment should now be all set up with:

- S3 bucket(s) `bucket_name` and `export_bucket_name` for importing source data to Amazon Personalize, and optionally exporting batch recommendations
- An IAM role `personalize_role_arn` granting Amazon Personalize permission to interact with those buckets

You're now ready to move on to the next stage: Preparing input data for the model!