## GenAI Workshop

This workshop walks through setting up an AI-powered system using AWS services, OpenSearch, and Flowise. The infrastructure is partially deployed via CloudFormation `base-infra` Stack, which includes SageMaker. After launching `base-infra` Stack from the AWS Web UI, the remaining setup is completed through a Jupyter notebook. This includes deploying `knowledge-base` CloudFormation Stack, configuring OpenSearch, and integrating with Flowise.


### Dependencies installation

In [1]:
!pip install requests-aws4auth

Collecting requests-aws4auth
  Downloading requests_aws4auth-1.3.1-py3-none-any.whl.metadata (18 kB)
Downloading requests_aws4auth-1.3.1-py3-none-any.whl (24 kB)
Installing collected packages: requests-aws4auth
Successfully installed requests-aws4auth-1.3.1


# -----------------------------------------------------------------------------
AWS User Creation for Flowise Workflow

 For proper integration with Flowise we need security credentials (AWS_ACCESS_KEY & AWS_SECRET_ACCESS_KEY), 
 for that AWS IAM user must be set up with the appropriate permissions.
 Refer to the AWS IAM documentation for creating and managing IAM users and policies:

   https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html

 Ensure that the user is granted with the necessary permissions
 for the Flowise workflow (e.g., access to services like AWS Bedrock, AWS SageMaker).
# -----------------------------------------------------------------------------

In [2]:
import boto3
import requests
import json
from requests_aws4auth import AWS4Auth

### Creating an OpenSearch Serverless Index

In this step, we create an index in our OpenSearch Serverless collection. This index will be used to store and retrieve vector-based or keyword-based search data. OpenSearch Serverless enables scalable, real-time search and analytics without managing underlying infrastructure.

In [3]:
aoss_client = boto3.client('opensearchserverless')
coll_list_response = aoss_client.list_collections()

matching_collections = [
    collection for collection in coll_list_response.get("collectionSummaries", [])
    if collection['name'] == 'genai-vectors-collection'
]

assert len(matching_collections) == 1, "Expected exactly 1 collection with specified name"

collection = matching_collections[0]
collection_name = collection["name"]
collection_id = collection["id"]  # Get the collection ID

# Now, fetch the collection details using batch_get_collection
collection_details_response = aoss_client.batch_get_collection(ids=[collection_id])

# Extract collection endpoint
collection_detail = collection_details_response.get("collectionDetails", [])[0]
collection_endpoint = collection_detail.get("collectionEndpoint")

print(f"Found Collection: {collection['name']}")
print(f"Endpoint: {collection_endpoint}")

Found Collection: genai-vectors-collection
Endpoint: https://yrkldffv58unjd5jtlre.us-east-1.aoss.amazonaws.com


In [4]:
INDEX_NAME = "genai-vectors-collection"

index_mapping = {
  "settings": {
    "index.knn": True
  },
  "mappings": {
    "properties": {
      "vector-field": {
        "type": "knn_vector",
        "dimension": 1536,
        "method": {
          "engine": "faiss",
          "name": "hnsw"
        }
      },
      "text": {
        "type": "text"
      },
      "metadata": {
        "type": "keyword"
      }
    }
  }
}

# OpenSearch API URL for index creation
index_url = f"{collection_endpoint}/{INDEX_NAME}"

# OpenSearch requires authentication (IAM or Basic Auth)
headers = {
    "Content-Type": "application/json"
}

session = boto3.Session()
credentials = session.get_credentials()
aws_auth = AWS4Auth(
    credentials.access_key, credentials.secret_key,
    session.region_name, "aoss",
    session_token=credentials.token
)

# Send request to create index
response = requests.put(index_url, auth=aws_auth, headers=headers, data=json.dumps(index_mapping))

# Print response
print("Index Creation Response:", response.status_code, response.text)

Index Creation Response: 200 {"acknowledged":true,"shards_acknowledged":true,"index":"genai-vectors-collection"}


### Deploying the Knowledge Base CloudFormation Stack
We can deploy `knowledge-base` CloudFormation Stack manually or using this notebook. This stack handles additional AWS resources that complement our AI system - Vector Store and S3 Data Source.


In [5]:
cf_client = boto3.client("cloudformation")

KB_STACK_NAME = "wshop-kb"
KB_YAML_FILE_PATH = "../gen-ai-cloudformation/knowledge-base.yaml"

with open(KB_YAML_FILE_PATH, "r") as file:
    template_body = file.read()

response = cf_client.create_stack(
    StackName=KB_STACK_NAME,
    TemplateBody=template_body,
    Capabilities=["CAPABILITY_IAM", "CAPABILITY_NAMED_IAM"],  # Add if IAM roles are defined in the template
)

print("Stack creation started:", response["StackId"])
response

Stack creation started: arn:aws:cloudformation:us-east-1:329599618802:stack/wshop-kb/d4997050-fd19-11ef-b8c4-0e67ecf200f5


{'StackId': 'arn:aws:cloudformation:us-east-1:329599618802:stack/wshop-kb/d4997050-fd19-11ef-b8c4-0e67ecf200f5',
 'ResponseMetadata': {'RequestId': '6e525c14-ac51-4fcc-9cbf-4cb75811e9d3',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '6e525c14-ac51-4fcc-9cbf-4cb75811e9d3',
   'date': 'Sun, 09 Mar 2025 19:08:05 GMT',
   'content-type': 'text/xml',
   'content-length': '378',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

#### Waiting for operation to complete

In [6]:
import time

max_time = 30


def wait_for_stack(stack_name):
    timeout = time.time() + max_time

    while time.time() < timeout:
        response = cf_client.describe_stacks(StackName=stack_name)
        stack_status = response["Stacks"][0]["StackStatus"]
        print(f"Stack Status: {stack_status}")

        if stack_status in ["CREATE_COMPLETE", "ROLLBACK_IN_PROGRESS", "ROLLBACK_COMPLETE", "CREATE_FAILED"]:
            break

        time.sleep(5)


# Monitor the stack creation process
wait_for_stack(KB_STACK_NAME)

Stack Status: CREATE_IN_PROGRESS
Stack Status: CREATE_COMPLETE


Ensure the final status is "CREATE_COMPLETE" before proceeding.

### Uploading an example document to S3

To test our knowledge base setup, we upload an example document to Amazon S3. This document will be indexed and retrieved later.

In [7]:
s3_client = boto3.client("s3")
sts_client = boto3.client("sts")

account_id = sts_client.get_caller_identity()["Account"]

example_file_name = "wellarchitected-framework-pages-6-11.pdf"
BUCKET_NAME = "genai-workshop-docs-" + account_id
LOCAL_FILE_PATH = "../" + example_file_name
S3_OBJECT_KEY = "docs/" + example_file_name

s3_client.upload_file(LOCAL_FILE_PATH, BUCKET_NAME, S3_OBJECT_KEY)

### Starting a data source synchronization

Once the document is uploaded, we trigger a synchronization process. This ensures that our new document is indexed and available in OpenSearch.

IMPORTANT: Take note of the Knowledge Base ID generated during this step. This ID is required for updating Flowise Agent Flow, ensuring it retrieves documents from the correct knowledge base.

In [8]:
bedrock_client = boto3.client("bedrock-agent")

KB_NAME = "gen-ai-workshop-kb"
DATA_SOURCE_NAME = "genai-workshop-kb-datasource"

kb_list = bedrock_client.list_knowledge_bases()
kb_id = None
for kb in kb_list["knowledgeBaseSummaries"]:
    if kb["name"] == KB_NAME:
        kb_id = kb["knowledgeBaseId"]
        break

if not kb_id:
    raise ValueError(f"Knowledge Base '{KB_NAME}' not found.")
else:
    print(f"Knowledge Base '{KB_NAME}' found with ID: {kb_id}")

Knowledge Base 'gen-ai-workshop-kb' found with ID: WB6LZQI1HJ


In [9]:
ds_list = bedrock_client.list_data_sources(knowledgeBaseId=kb_id)
ds_id = None
for ds in ds_list["dataSourceSummaries"]:
    if ds["name"] == DATA_SOURCE_NAME:
        ds_id = ds["dataSourceId"]
        break

if not ds_id:
    raise ValueError(f"Data Source '{DATA_SOURCE_NAME}' not found in KB '{KB_NAME}'.")
else:
    print(f"Data Source '{DATA_SOURCE_NAME}' found")

sync_response = bedrock_client.start_ingestion_job(
    knowledgeBaseId=kb_id,
    dataSourceId=ds_id
)

print("Sync job started:", sync_response["ingestionJob"]["ingestionJobId"])

Data Source 'genai-workshop-kb-datasource' found
Sync job started: AOH1IYLJI9


### Updating the Flowise Agent Flow with the New Knowledge Base ID
Finally, we update Flowise Agent Flow to use the correct Knowledge Base ID.

Once saved, the Flowise pipeline is ready for use! 🎉