# TwelveLabs Multimodal Embedding Search using Amazon Bedrock and Amazon S3 Vectors
Work with TwelveLabs Marengo Embed 2.7 Model and Amazon S3 Vectores
![TwelveLabs Embedding](./images/12labs-embed-s3vectors.png)

The TwelveLabs Marengo Embed 2.7 model generates embeddings from video, text, audio, or image inputs. These embeddings can be used for similarity search, clustering, and other machine learning tasks. The model supports asynchronous inference through the StartAsyncInvoke API.

[Amazon S3 Vectors](https://aws.amazon.com/s3/features/vectors/) is the first cloud object store with native support to store and query vectors, delivering purpose-built, cost-optimized vector storage for AI agents, AI inference, and semantic search of your content stored in Amazon S3.

This sample demonstrates how to use the TwelveLabs Marengo Embed 2.7 model—available through Amazon Bedrock—to generate embeddings from a sample video and perform dynamic search using Amazon S3 as the vector store.

In [None]:
!pip install --upgrade pip setuptools wheel
!pip install faiss-cpu==1.7.4
!pip install boto3 --upgrade

Restart kernal

In [None]:
import os, sys
os.execv(sys.executable, ['python'] + sys.argv)

In [None]:
import boto3
import json

bedrock = boto3.client('bedrock-runtime')
s3 = boto3.client('s3')
s3vectors = boto3.client('s3vectors') 

In [None]:
model_id = 'twelvelabs.marengo-embed-2-7-v1:0'

s3_bucket = '<YOUR_S3_BUCKET>'
s3_prefix = '<YOUR_S3_PREFIX>' # For example: 'twelvelabs-test'
aws_account_id = '<YOUR_AWS_ACCOUNT_ID>'

s3vector_bucket = "<S3_VECTOR_BUCKET_NAME_TO_CREATE>"
s3vector_index = "<S3_VECTOR_INDEX_NAME_TO_CREATE>"

## Download a Sample Video and Upload to S3 as Input
We'll use the TwelveLabs Marengo model to generate embeddings from this video and perform content-based search.

![Meridian](./images/sample-video-meridian.png)
We will use an open-source sample video, [Meridian](https://en.wikipedia.org/wiki/Meridian_(film) ), as input to generate embeddings.

In [None]:
# Download a sample video to local disk
sample_name = 'NetflixMeridian.mp4'
source_url = f'https://ws-assets-prod-iad-r-pdx-f3b3f9f1a7d6a3d0.s3.us-west-2.amazonaws.com/335119c4-e170-43ad-b55c-76fa6bc33719/NetflixMeridian.mp4'
!curl {source_url} --output {sample_name}

# Upload to S3
s3_input_key = f'{s3_prefix}/video/{sample_name}'
s3.upload_file(sample_name, s3_bucket, s3_input_key)
print(f"Uploaded to s3://{s3_bucket}/{s3_input_key}")

## Generate Multimodal Embeddings Using TwelveLabs Marengo 2.7 Model
We use Bedrock’s [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html) to run the embedding task asynchronously. In this example, the video is hosted on S3—ideal for handling large video files. The API also supports providing the video as a base64-encoded string within the payload. Refer to the [documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-marengo.html?trk=769a1a2b-8c19-4976-9c45-b6b1226c7d20&sc_channel=el) for more details.

In [None]:
import uuid

s3_output_prefix = f'{s3_prefix}/output/{uuid.uuid4()}'
response = bedrock.start_async_invoke(
    modelId=model_id,
    modelInput={
        "inputType": "video",
        "mediaSource": {
            "s3Location": {
                "uri": f's3://{s3_bucket}/{s3_input_key}',
                "bucketOwner": aws_account_id
            }
        }
    },
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": f's3://{s3_bucket}/{s3_output_prefix}'
        }
    }
)

# Print Job ID
invocation_arn = response["invocationArn"]
print("Async Job Started")
print("Invocation Arn:", invocation_arn)

The result will be available in S3 once the task is complete. The code snippet below wait until the output.json file is ready and read it from the output path specified in your request.

In [None]:
import time
from IPython.display import clear_output
from datetime import datetime

def wait_for_output_file(s3_bucket, s3_prefix, invocation_arn):
    # Wait until task complete
    status = None
    while status not in ["Completed", "Failed", "Expired"]:
        response = bedrock.get_async_invoke(invocationArn=invocation_arn)
        status = response['status']
        clear_output(wait=True)
        print(f"Embedding task status: {status}")
        time.sleep(5)

    # List objects in the prefix
    response = s3.list_objects_v2(Bucket=s3_bucket, Prefix=f'{s3_prefix}')

    # Look for output.json
    data = []
    output_key = None
    for obj in response.get('Contents', []):
        if obj['Key'].endswith('output.json'):
            output_key = obj['Key']
            if output_key:
                obj = s3.get_object(Bucket=s3_bucket, Key=output_key)
                content = obj['Body'].read().decode('utf-8')
                data += json.loads(content).get("data")

    return data

In [None]:
from IPython.display import display, JSON
output = wait_for_output_file(s3_bucket, s3_output_prefix, invocation_arn)
display(JSON(output))

## Create an Amazon S3 Vector Bucket and Index
In this example, we use Amazon S3 vectors to store the embeddings generated in the previous steps to serve light search as an example.

In [None]:
# Create a S3 vector bucket
s3vectors.create_vector_bucket(vectorBucketName=s3vector_bucket)
print(f"Vector bucket '{s3vector_bucket}' created successfully.")

In [None]:
# Create an index in the vector store
vector_dimension = 1024
distance_metric = 'cosine' # or 'euclidean'

s3vectors.create_index(
    vectorBucketName=s3vector_bucket,
    indexName=s3vector_index,
    dataType='float32',  # Common data type for vector embeddings
    dimension=vector_dimension,
    distanceMetric=distance_metric
)
print(f"Vector index '{s3vector_index}' created successfully in bucket '{s3vector_bucket}'.")


## Store the Embeddings into the S3 vector index
You can use the Python boto3 library to index and query the S3 vector store

In [None]:
import boto3
import json

# Create Bedrock Runtime and S3 Vectors clients in the AWS Region of your choice. 
s3vectors = boto3.client("s3vectors")

embeddings = []
for o in output:
    embeddings.append({
            "key": f'{o["embeddingOption"]} {o["startSec"]} {o["endSec"]}',
            "data": {"float32": o["embedding"]},
            "metadata": {"embeddingOption": o["embeddingOption"], "startSec": o["startSec"], "endSec": o["endSec"]}
        })

# Write embeddings into vector index with metadata.
s3vectors.put_vectors(
    vectorBucketName=s3vector_bucket,   
    indexName=s3vector_index,   
    vectors=embeddings
)

## Generate a search embedding from an image
We'll create an embedding from this image using the same model.

<img src="./images/meridian-car.png" width="50%">

In [None]:
# Read image
import base64
base64_string = None
with open('./images/meridian-car.png', "rb") as image_file:
    base64_string = base64.b64encode(image_file.read()).decode("utf-8")

import uuid
query_prefix = f'{s3_prefix}/input/{uuid.uuid4()}'

# Create an input embedding
response = bedrock.start_async_invoke(
    modelId=model_id,
    modelInput = {
        "inputType": "image",
        "mediaSource": {
            "base64String": base64_string
        }
    },
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": f's3://{s3_bucket}/{query_prefix}'
        }
    }
)

# Print Job ID
invocation_arn = response["invocationArn"]
print("Async Job Started")
print("Invocation Arn:", invocation_arn)

query = wait_for_output_file(s3_bucket, query_prefix, invocation_arn)
display(JSON(query))

## Search the Vector Store
We now perform a similarity search against the vector index using boto3.

In [None]:
embedding = query[0]["embedding"]

# Query vector index.
response = s3vectors.query_vectors(
    vectorBucketName=s3vector_bucket,
    indexName=s3vector_index,
    queryVector={"float32": embedding}, 
    topK=3, 
    returnDistance=True,
    returnMetadata=True
)
print(json.dumps(response["vectors"], indent=2))

## Display the Video Clips
Now, we display the video to help you visualize the clips returned from the search.

In [None]:
# Format data for display
start_times = []

for clip in response["vectors"]:
    #print(clip)
    start_times.append((round(clip["metadata"]["startSec"],2), f'{round(float(clip["metadata"]["startSec"]),2)} - {round(float(clip["metadata"]["endSec"]),2)}s (score: {round(clip["distance"],3)})'))

In [None]:
from IPython.display import HTML
import boto3

# Generate a presigned URL for the video in S3
s3 = boto3.client('s3')
url = s3.generate_presigned_url(
    ClientMethod='get_object',
    Params={'Bucket': s3_bucket, 'Key': s3_input_key},
    ExpiresIn=3600
)

Clicking the buttons below the video will take you to the timestamp where each clip begins.

In [None]:
# Generate buttons HTML
buttons_html = ''.join([
    f'<button onclick="jumpTo({time})">{label}</button> '
    for time, label in start_times
])

html = f"""
<video id="videoPlayer" width="640" controls>
  <source src="{url}" type="video/mp4">
  Your browser does not support the video tag.
</video>

<div style="margin-top:10px;display:block;">
  {buttons_html}
</div>

<script>
  var video = document.getElementById('videoPlayer');

  function jumpTo(time) {{
    video.currentTime = time;
    video.play();
  }}
</script>
"""

display(HTML(html))

## Cleanup
Delete the video and the embedding files from S3. Delete the S3 vector store and index.

In [None]:
# List all objects under the prefix
response = s3.list_objects_v2(Bucket=s3_bucket, Prefix=s3_prefix)

if 'Contents' in response:
    # Create a list of object identifiers to delete
    objects_to_delete = [{'Key': obj['Key']} for obj in response['Contents']]

    # Delete the objects
    s3.delete_objects(
        Bucket=s3_bucket,
        Delete={'Objects': objects_to_delete}
    )
    print(f"Deleted {len(objects_to_delete)} objects from '{s3_prefix}' in bucket '{s3_bucket}'.")
else:
    print(f"No objects found under prefix '{s3_prefix}'.")


In [None]:
# Delete vector index
response = s3vectors.delete_index(
    vectorBucketName=s3vector_bucket,
    indexName=s3vector_index
)
print(response)

response = s3vectors.delete_vector_bucket(
    vectorBucketName=s3vector_bucket
)