## Configure Amazon OpenSearch

### Install required packages

In [None]:
!pip install requests_auth_aws_sigv4
!pip install requests
!pip install opensearch-py

### Reload modules

In [None]:
%reload_ext autoreload
%autoreload 2

### Imports

In [None]:
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_auth_aws_sigv4 import AWSSigV4
import utils

#### Import the saved variables from the previous notebook.

In [None]:
%store -r

### Define the Amazon OpenSearch endpoints
These values can be found out from the `Output` section of the cloudformation stack

In [None]:
HOST = "REPLACE_ME_WITH_HOST"
PORT = "443"
endpoint = f"https://{HOST}/"

### Initialize variables

In [None]:
bucket_prefix = 'personalized-opensearch-ranking'
domain_name = 'os-domain'

### Create an IAM service role for OpenSearch Service, and grant it permission to get a personalized ranking from your Amazon Personalize campaign.

In [None]:
role_suffix = "opensearch-role-for-personalize"

In [None]:
role_arn_for_personalize = utils.create_iam_role_for_personalize(role_suffix, campaign_arn)

### Checking the Amazon Opensearch connection

To connect to the Amazon OpenSearch domain created earlier via CloudFormation, we will use the opensearch-py client for Python. OpenSearch Service requires Signature Version 4 for request signing. This means signing requests against the "es" service name and the AWS Region where the domain is located. The requests_auth_aws_sigv4 package will be used to sign the requests, leveraging the credentials from the SageMaker notebook's execution role when creating the auth object. This allows establishing a secure connection to the OpenSearch domain from the notebook.

In [None]:
auth = AWSSigV4('es')

client = OpenSearch(
    hosts=[{'host': HOST, 'port': PORT}],
    http_auth=auth,
    use_ssl=True,
    connection_class=RequestsHttpConnection
)

info = client.info()
opensearch_version = info["version"]["number"]
print(f"Connection succeeded with version: {opensearch_version}")

You should see cluster information if the setup was successful.

### Upload the index data to the movies index
We will now upload index data to the movies index in our OpenSearch domain. A data file called data.jsonl has been provided in the repository, and we will use this file to load the index data. The data.jsonl file has index information on movies such as the title, genres, year etc. 

Let's first take a look at the first few records from the index file

In [None]:
!head -10 data.jsonl

In [None]:
utils.bulk_upload("data.jsonl", endpoint, auth)

### Associating the Plugin to the Amazon OpenSearch Domain
Here we associate the amazon-personalized-ranking plugin with your domain. The plugin is preinstalled, and you don't have to import it from Amazon S3. You associate the plugin the same way that you associate an OpenSearch Service package. **If you have already associated the package via console you can skip this step.**

This step might take ~ 20 - 30 minutes to complete.

In [None]:
package_id = utils.get_opensearch_package_id('amazon-personalized-ranking', opensearch_version)

In [None]:
print(f"Associating package with {package_id} for domain {domain_name}")
utils.associate_package(package_id, domain_name)

### Configuring the plugin

After you install the Amazon Personalize Search Ranking plugin, you're ready to configure it by creating an OpenSearch search pipeline.

A search pipeline is a set of request and response processors that run sequentially in the order that you create them. When you create a search pipeline for the plugin, you specify a personalized_search_ranking response processor

You can use the following  method to create a search pipeline with a personalized_search_ranking response processor on an OpenSearch Service domain.

In [None]:
utils.update_pipeline("intelligent_ranking", "0.7", campaign_arn, role_arn_for_personalize, region, HOST, PORT)

where:

* **intelligent_ranking** = A name that you want to give the pipeline
* **1.0** = Weight. The emphasis that the response processor puts on personalization when it re-ranks results. Specify a value within a range of 0.0–1.0. The closer to 1.0 that it is, the more likely it is that results from Amazon Personalize rank higher. If you specify 0.0, no personalization occurs and OpenSearch takes precedence.
* **campaign_arn** = The Amazon Resource Name (ARN) of the Amazon Personalize campaign to use, to personalize results
* **iam_role_arn** = For OpenSearch Service,  the role that you created when setting up permissions for OpenSearch Service to access your Amazon Personalize resources.
* **region** = The AWS Region where you created your Amazon Personalize campaign
* **HOST** = The OpenSearch domain endpoint hostname 
* **PORT** = The OpenSearch domain endpoint port 

After you create a search pipeline with a personalized_search_ranking response processor, you're ready to start applying the plugin to OpenSearch queries. You can apply it to an OpenSearch index or an individual OpenSearch query. For more information, see [Applying the plugin to OpenSearch queries](https://docs.aws.amazon.com/personalize/latest/dg/opensearch-personalizing-results.html).

We will save these variables to use later in the [3.Testing.ipynb](./3.Testing.ipynb) notebook.

In [None]:
%store HOST
%store PORT
%store endpoint
%store region
%store campaign_arn
%store role_arn_for_personalize