# Deploy Jina Models on AWS SageMaker


--------------------

## <font color='orange'>Important:</font>

Please visit model detail page in <a href="https://aws.amazon.com/marketplace/pp/prodview-5iljbegvoi66w">https://aws.amazon.com/marketplace/pp/prodview-5iljbegvoi66w</a> to learn more. <font color='orange'>If you do not have access to the link, please contact account admin for the help.</font>

You will find details about the model including pricing, supported region, and end user license agreement. To use the model, please click “<font color='orange'>Continue to Subscribe</font>” from the detail page, come back here and learn how to deploy and inference.

-------------------



This notebook was created with the **Data Science 3.0 image** on **ml.t3.medium** instance on SageMaker Studio

[Jina Embeddings](https://jina.ai/embeddings/) and [Jina Reranker](https://jina.ai/reranker/) are now available to use with [SageMaker](https://aws.amazon.com/pm/sagemaker/) from the [AWS Marketplace](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy). 

This notebook walks you through creating a [Retrieval-augmented generation (RAG)](https://jina.ai/news/full-stack-rag-with-jina-embeddings-v2-and-llamaindex/) application in AWS SageMaker for a collection of YouTube video transcripts. The models we will use are Jina Embeddings v2 - English, Jina Reranker v1, and the [Mistral-7B-Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) large language model, all of which are available to SageMaker users.

You will need to have an AWS account. If you are not already an AWS user, you can [sign up for an account](https://portal.aws.amazon.com/billing/signup) on the AWS website.

## Set Up

Install the jina-sagemaker package and additional dependencies

In [None]:
 !pip install sagemaker jina-sagemaker setuptools  --upgrade 

## Configure a Role

You will need an [AWS role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) with sufficient permissions to use the resources required for this tutorial. 


In [None]:
import sagemaker

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
role_name = role.split(["/"][-1])
region = sagemaker_session.boto_region_name


print(f"The Amazon Resource Name (ARN) of the role used for this demo is: {role}")
print(f"The name of the role used for this demo is: {role_name[-1]}")
print(f"The default region is: {region}")

To verify that the role above has required permissions for this tutorial:

1. Go to the IAM console: https://console.aws.amazon.com/iam/home.
2. Select **Roles**.
3. Enter the role name in the search box to search for that role. 
4. Select the role.
5. Use the **Permissions** tab to verify this role has required permissions below attached:
    
        1. aws-marketplace:ViewSubscriptions
        2. aws-marketplace:Unsubscribe
        3. aws-marketplace:Subscribe

# Subscribe to Jina AI Models on AWS Marketplace

Subscribe to the [Jina Embeddings v2 base English](https://aws.amazon.com/marketplace/pp/prodview-5iljbegvoi66w) and [Jina Reranker v1 ](https://aws.amazon.com/marketplace/pp/prodview-avmxk2wxbygd6).

When you’ve subscribed to them, we get the models’ ARNs for your AWS region and store them in the variable names `embedding_package_arn` and `reranker_package_arn` respectively. The code in this tutorial will reference them using those variable names.

In [None]:

def get_arn_for_model(region, model_name):
    model_package_map = {
        "us-east-1": f"arn:aws:sagemaker:us-east-1:253352124568:model-package/{model_name}",
        "us-east-2": f"arn:aws:sagemaker:us-east-2:057799348421:model-package/{model_name}",
        "us-west-1": f"arn:aws:sagemaker:us-west-1:382657785993:model-package/{model_name}",
        "us-west-2": f"arn:aws:sagemaker:us-west-2:594846645681:model-package/{model_name}",
        "ca-central-1": f"arn:aws:sagemaker:ca-central-1:470592106596:model-package/{model_name}",
        "eu-central-1": f"arn:aws:sagemaker:eu-central-1:446921602837:model-package/{model_name}",
        "eu-west-1": f"arn:aws:sagemaker:eu-west-1:985815980388:model-package/{model_name}",
        "eu-west-2": f"arn:aws:sagemaker:eu-west-2:856760150666:model-package/{model_name}",
        "eu-west-3": f"arn:aws:sagemaker:eu-west-3:843114510376:model-package/{model_name}",
        "eu-north-1": f"arn:aws:sagemaker:eu-north-1:136758871317:model-package/{model_name}",
        "ap-southeast-1": f"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/{model_name}",
        "ap-southeast-2": f"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/{model_name}",
        "ap-northeast-2": f"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/{model_name}",
        "ap-northeast-1": f"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/{model_name}",
        "ap-south-1": f"arn:aws:sagemaker:ap-south-1:077584701553:model-package/{model_name}",
        "sa-east-1": f"arn:aws:sagemaker:sa-east-1:270155090741:model-package/{model_name}",
    }

    return model_package_map[region]

embedding_package_arn = get_arn_for_model(region, "jina-embeddings-v2-base-en")
reranker_package_arn = get_arn_for_model(region, "jina-reranker-v1-base-en")

# Load the Dataset

In this tutorial, we are going to use a collection of videos provided by the YouTube channel [TU Delft Online Learning](https://www.youtube.com/@tudelftonlinelearning1226). This channel produces a variety of educational materials in STEM subjects. Its programming is [CC-BY licensed](https://creativecommons.org/licenses/by/3.0/legalcode).

We downloaded 193 videos from the channel and processed them with OpenAI’s open-source [Whisper speech recognition model](https://openai.com/research/whisper). We used the smallest model ([`openai/whisper-tiny`](https://huggingface.co/openai/whisper-tiny) [on Hugging Face](https://huggingface.co/openai/whisper-tiny)) to process the videos into transcripts. 

The transcripts have been organized into a CSV file, which you can [download from here](https://tbd.todo).

## Install Requirements

This data is CSV format and will be handled using `pandas` dataframes.

In [None]:
!pip install requests pandas

## Download the Data into a Dataframe

In [None]:
import pandas

# Load the CSV file
data_url = "https://github.com/jina-ai/workshops/raw/main/notebooks/embeddings/sagemaker/tu_delft.csv"
tu_delft_dataframe = pandas.read_csv(data_url)

Run the line below to inspect the first few lines of the dataframe.

In [None]:
tu_delft_dataframe.head()

# Start the Jina Embeddings v2 Endpoint

The code below will launch an instance of `ml.g5.xlarge` on AWS to run the embedding model.

It may take several minutes for this to finish.

In [None]:
from sagemaker.huggingface import get_huggingface_llm_image_uri

# retrieve the image uri based on instance type
def get_image_uri(instance_type):
  key = "huggingface-tei" if instance_type.startswith("ml.g") or instance_type.startswith("ml.p") else "huggingface-tei-cpu"
  return get_huggingface_llm_image_uri(key, version="1.2.3")

In [None]:
import json
from sagemaker.huggingface import HuggingFaceModel

# sagemaker config
instance_type = "ml.g5.xlarge"

# Define Model and Endpoint configuration parameter
config = {
  'HF_MODEL_ID': "jinaai/jina-embeddings-v2-base-en", # model_id from hf.co/models
}

# create HuggingFaceModel with the image uri
emb_model = HuggingFaceModel(
  role=role,
  image_uri=get_image_uri(instance_type),
  env=config
)

In [None]:
# Deploy model to an endpoint
# https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model.deploy
emb = emb_model.deploy(
  initial_instance_count=1,
  instance_type=instance_type,
)


In [None]:
data = {
  "inputs": "the mesmerizing performances of the leads keep the film grounded and keep the audience riveted .",
}
 
res = emb.predict(data=data)
 
 
# print some results
print(f"length of embeddings: {len(res[0])}")
print(f"first 10 elements of embeddings: {res[0][:10]}")

In [None]:
import threading
import time
number_of_threads = 10
number_of_requests = int(3900 // number_of_threads)
print(f"number of threads: {number_of_threads}")
print(f"number of requests per thread: {number_of_requests}")
 
def send_requests():
    for _ in range(number_of_requests):
        # input counted at https://huggingface.co/spaces/Xenova/the-tokenizer-playground for 100 tokens
        emb.predict(data={"inputs": "Hugging Face is a company and a popular platform in the field of natural language processing (NLP) and machine learning. They are known for their contributions to the development of state-of-the-art models for various NLP tasks and for providing a platform that facilitates the sharing and usage of pre-trained models. One of the key offerings from Hugging Face is the Transformers library, which is an open-source library for working with a variety of pre-trained transformer models, including those for text generation, translation, summarization, question answering, and more. The library is widely used in the research and development of NLP applications and is supported by a large and active community. Hugging Face also provides a model hub where users can discover, share, and download pre-trained models. Additionally, they offer tools and frameworks to make it easier for developers to integrate and use these models in their own projects. The company has played a significant role in advancing the field of NLP and making cutting-edge models more accessible to the broader community. Hugging Face also provides a model hub where users can discover, share, and download pre-trained models. Additionally, they offer tools and frameworks to make it easier for developers and ma"})
 
# Create multiple threads
threads = [threading.Thread(target=send_requests) for _ in range(number_of_threads) ]
# start all threads
start = time.time()
[t.start() for t in threads]
# wait for all threads to finish
[t.join() for t in threads]
print(f"total time: {round(time.time() - start)} seconds")
