In [None]:
#Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
#SPDX-License-Identifier: MIT-0

# Open Search Serverless Collection creation

This notebook demonstrates how to create an OpenSearch Serverless Collection using the AWS Python SDK (Boto3). OpenSearch Serverless is a fully managed service that makes it easy to launch and run OpenSearch clusters in the cloud. It simplifies the deployment and management of OpenSearch by automatically provisioning, configuring, and scaling the resources required to run OpenSearch

### Install required libraries
The following cell installs required python libraries specified in the 'requirements.txt' file.

In [None]:
#This cell installs the required libraries specified in the 'requirements.txt' file
!pip install -r requirements.txt --quiet

### Required permissions

Your role or user will need a certain number of policies attached to execute the below code including AmazonBedrockFullAccess, AmazonOpenSearchServiceFullAccess, and the following policy for OpenSearchServerless. This policy grants full access to the OpenSearch Serverless service, allowing you to create, manage, and delete OpenSearch Serverless resources.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "aoss:*",
                "Resource": "*"
            }
        ]
    }

### Open Search Collection Creation
The next code cell imports the necessary Python libraries and defines a function `create_opensearch_collection` that creates an OpenSearch Serverless Collection. This function takes two arguments: `collection_name` (the desired name for the collection) and `open_search_access_role` (the ARN of the IAM role that should have access to the collection). It performs the following steps:

1. Initializes the Boto3 client for OpenSearch Serverless.
2. Defines the network security policy and encryption security policy for the collection.
3. Creates the network security policy and encryption security policy using the OpenSearch Serverless client.
4. If an `open_search_access_role` is provided, it creates a data access policy that grants the specified role permissions to perform various operations on the collection and its indices.
5. Creates the OpenSearch Serverless Collection with the specified name and type `VECTORSEARCH`.
6. Returns the names of the created security policies and the collection response.

In [None]:
import boto3
import json
import pprint as pp


def create_opensearch_collection(collection_name, open_search_access_role):
    # Initialize boto3 clients
    opensearch_client = boto3.client('opensearchserverless')

    # Define network security policy
    network_security_policy = json.dumps(
        [
            {
                "Rules": [
                {
                    "Resource": [
                    f"collection/{collection_name}"
                    ],
                    "ResourceType": "dashboard"
                },
                {
                    "Resource": [
                    f"collection/{collection_name}"
                    ],
                    "ResourceType": "collection"
                }
                ],
                "AllowFromPublic": True
            }
            ]
    )

    
    encryption_security_policy = json.dumps(
        {
            "Rules": [
                {
                    "Resource": [
                        f"collection/{collection_name}"
                    ],
                    "ResourceType": "collection",
                }
            ],
            "AWSOwnedKey": True
        },
        indent=2
    )

    # Create network security policy
    net_policy_response = opensearch_client.create_security_policy(
        name=f"{collection_name}-network-policy",
        policy=network_security_policy,
        type='network'
    )
    network_policy_name = net_policy_response["securityPolicyDetail"]["name"]


    # Create encryption security policy
    enc_policy_response = opensearch_client.create_security_policy(
        name=f"{collection_name}-security-policy",
        policy=encryption_security_policy,
        type='encryption'
    )
    encryption_policy_name = enc_policy_response["securityPolicyDetail"]["name"]
    

    # Create data access policy if the access role is provided
    data_access_policy_name = ""

    if open_search_access_role:
        data_access_policy = json.dumps(
            [
                {
                    "Rules": [
                    {
                        "Resource": [
                        f"collection/{collection_name}"
                        ],
                        "Permission": [
                        "aoss:CreateCollectionItems",
                        "aoss:DeleteCollectionItems",
                        "aoss:UpdateCollectionItems",
                        "aoss:DescribeCollectionItems"
                        ],
                        "ResourceType": "collection"
                    },
                    {
                        "Resource": [
                        f"index/{collection_name}/*"
                        ],
                        "Permission": [
                        "aoss:CreateIndex",
                        "aoss:DeleteIndex",
                        "aoss:UpdateIndex",
                        "aoss:DescribeIndex",
                        "aoss:ReadDocument",
                        "aoss:WriteDocument"
                        ],
                        "ResourceType": "index"
                    }
                    ],
                    "Principal": [open_search_access_role],
                    "Description": "data-access-rule"
                }
            ]
        )


        data_access_policy_name = f"{collection_name}-access"
        if len(data_access_policy_name) > 32:
            raise ValueError('Policy name exceeds maximum length of 32 characters')

        cfn_access_policy_response = opensearch_client.create_access_policy(
            name=data_access_policy_name,
            description='Policy for data access',
            policy=data_access_policy,
            type='data',
        )


    # Create OpenSearch collection
    collection_response = opensearch_client.create_collection(
        name=collection_name,
        type='VECTORSEARCH'
    )

    return encryption_policy_name, network_policy_name, data_access_policy_name, collection_response



In this code cell, we first retrieve identity of logged in user/role and then invoke create_opensearch_collection function created in the previous cell to create open search collection.
The function returns the names of the created security policies and the collection response, which is printed at the end of the cell.

In [None]:
# Get the caller identity ARN
sts_client = boto3.client('sts')
caller_identity = sts_client.get_caller_identity()
identity_arn = caller_identity['Arn']
identity_arn

#create the collection
collection_name = 'semantic-search'
open_search_access_role = identity_arn
encryption_policy_name, network_policy_name, data_access_policy_name, collection_response = create_opensearch_collection(collection_name, open_search_access_role)
collection_response

This code cell extracts the collection ID and region from the collection response obtained in the previous cell. It then constructs the OpenSearch Serverless endpoint URL (`os_host`) using the collection ID, region, and the domain suffix `.aoss.amazonaws.com`. Finally, it prints the `os_host` value, which can be used to connect to the OpenSearch Serverless cluster and perform various operations.

In [None]:
collection_id = collection_response['createCollectionDetail']['id']

region = collection_response['createCollectionDetail']['arn'].split(":")[3]

os_host = ".".join([collection_id, region, "aoss.amazonaws.com"])

print(os_host)

The next code cell stores the values of `collection_id`, `collection_name`, `encryption_policy_name`, `network_policy_name`, `data_access_policy_name`, and `os_host` in the notebook's store. This allows these values to be accessed and used in the subsequent notebooks in the workshop.

In [None]:
%store collection_id
%store collection_name
%store encryption_policy_name
%store network_policy_name
%store data_access_policy_name
%store os_host