# Using Solar Embedding Large on SageMaker JumpStart


**Solar Embedding Large** is a powerful multilingual embedding model offering robust performance across multiple languages, including English, Korean, Japanese, and more. It's specifically fine-tuned for retrieval tasks, significantly enhancing multilingual retrieval results. This model is divided into two specialized versions: 'solar-embedding-1-large-query', optimized for embedding user's question, and 'solar-embedding-1-large-passage', designed for embedding documents to be searched. Utilizing these purpose-specific models increases retrieval efficiency, which leads to improved performance of Retrieval Augmented Generation (RAG) systems.
This sample notebook shows you how to deploy [Solar Embedding Large](url) using Amazon SageMaker.

## Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  

## Contents:
1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)
2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
   1. [Create an endpoint](#A.-Create-an-endpoint)
   2. [Prepare input payload](#B.-Prepare-input-payload)
   3. [Perform real-time inference](#C.-Perform-real-time-inference)
   4. [Calculate cosine similarity](#D.-Perform-semantic-search)
3. [Clean-up](#4.-Clean-up)
   1. [Delete the endpoint](#A.-Delete-the-endpoint)
   2. [Delete the model](#B.-Delete-the-model)
    

## Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

## 1. Subscribe to the model package

To subscribe to the model package:
1. Open [Solar Embedding Large](url) model package listing page.
2. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
3. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
4. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

We offer two types of packages.
* Solar Embedding Large (Support `ml.g5.2xlarge`)

In [None]:
%pip install sseclient-py

In [None]:
import time
import json

import sseclient

import boto3
import sagemaker
from sagemaker import ModelPackage, get_execution_role

In [None]:
role = get_execution_role()
sagemaker_session = sagemaker.Session()

sagemaker_runtime = boto3.client("sagemaker-runtime")

In [None]:
# Choose one of our model packages
# We offer two types of packages.

model_package_name = "solar-embedding-1-large-240510-0a30f31604033f9690bcc6cd580c42d7"

# Mapping for Model Packages
model_package_map = {
    "us-east-1": f"arn:aws:sagemaker:us-east-1:865070037744:model-package/{model_package_name}",
    "us-east-2": f"arn:aws:sagemaker:us-east-2:057799348421:model-package/{model_package_name}",
    "us-west-1": f"arn:aws:sagemaker:us-west-1:382657785993:model-package/{model_package_name}",
    "us-west-2": f"arn:aws:sagemaker:us-west-2:594846645681:model-package/{model_package_name}",
    "ca-central-1": f"arn:aws:sagemaker:ca-central-1:470592106596:model-package/{model_package_name}",
    "eu-central-1": f"arn:aws:sagemaker:eu-central-1:446921602837:model-package/{model_package_name}",
    "eu-west-1": f"arn:aws:sagemaker:eu-west-1:985815980388:model-package/{model_package_name}",
    "eu-west-2": f"arn:aws:sagemaker:eu-west-2:856760150666:model-package/{model_package_name}",
    "eu-west-3": f"arn:aws:sagemaker:eu-west-3:843114510376:model-package/{model_package_name}",
    "eu-north-1": f"arn:aws:sagemaker:eu-north-1:136758871317:model-package/{model_package_name}",
    "ap-southeast-1": f"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/{model_package_name}",
    "ap-southeast-2": f"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/{model_package_name}",
    "ap-northeast-2": f"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/{model_package_name}",
    "ap-northeast-1": f"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/{model_package_name}",
    "ap-south-1": f"arn:aws:sagemaker:ap-south-1:077584701553:model-package/{model_package_name}",
    "sa-east-1": f"arn:aws:sagemaker:sa-east-1:270155090741:model-package/{model_package_name}",
}


region = sagemaker_session.boto_region_name
if region not in model_package_map.keys():
    raise Exception(f"Current boto3 session region {region} is not supported.")

model_package_arn = model_package_map[region]

print(f"Model Package: '{model_package_arn}'")

## 2. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

In [None]:
model_name = "solar-embedding-1-large"
content_type = "application/json"

real_time_inference_instance_type = ("ml.g5.2xlarge",)

### A. Create an endpoint

In [None]:
# create a deployable model from the model package.
model = ModelPackage(
    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

endpoint_name = sagemaker.utils.name_from_base(model_name)
print(f"endpoint name: '{endpoint_name}'")

In [None]:
# Deploy the model
model.deploy(1, real_time_inference_instance_type, endpoint_name=endpoint_name)

Once endpoint has been created, you would be able to perform real-time inference.

### B. Prepare input payload

We support request/response payload compitable to OpenAI's Chat completion endpoint.

Supported parameters:
- input(list of string or string)*: A single text string, or an array of texts to embed. 
- model(string)*: Name of the model utilized to carry out the embedding. Current available models are `solar-embedding-1-large-query` and `solar-embedding-1-large-passage`.

**required*

In [None]:
# Input is single string
input = {
    "input": "How is the performance of Solar embeddings?",
    "model": "solar-embedding-1-large-query",
}

In [None]:
# Input is a list of string
input = {
    "input": [
        "Solar embeddings are awesome.",
        "Solar mini embedding model demonstrates strong performance in multiple languages.",
    ],
    "model": "solar-embedding-1-large-passage",
}

### C. Perform real-time inference

In [None]:
# real-time inference
def invoke_endpoint(endpoint_name, payload):
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="application/json",
        Body=json.dumps(payload),
    )
    data = response["Body"].read().decode("utf-8")

    return data

In [None]:
response = json.loads(invoke_endpoint(endpoint_name, input))

In [None]:
print(response)

In [None]:
print("embeddings: ")
print(response["data"][0]["embedding"])

### D. Perform semantic search

In [None]:
query = "A man is eating pasta."

documents = [
    "A man is eating food.",
    "A man is walking around the park.",
    "A man is running fast.",
    "A man is eating Italian food.",
    "A Rabbit is sleeping.",
]

In [None]:
# get query embedding
query_embedding_response = json.loads(
    invoke_endpoint(
        endpoint_name, {"input": query, "model": "solar-embedding-1-large-query"}
    )
)
query_embedding = query_embedding_response["data"][0]["embedding"]

# get passage embeddings
passage_embedding_response = json.loads(
    invoke_endpoint(
        endpoint_name, {"input": documents, "model": "solar-embedding-1-large-passage"}
    )
)
passage_embeddings = [
    embeds_object["embedding"] for embeds_object in passage_embedding_response["data"]
]

In [None]:
from typing import List
import math


def cosine_similarity(vec1: List[float], vec2: List[float]) -> float:
    dot_product = sum(v1 * v2 for v1, v2 in zip(vec1, vec2))

    magnitude1 = math.sqrt(sum(v**2 for v in vec1))
    magnitude2 = math.sqrt(sum(v**2 for v in vec2))

    if magnitude1 == 0 or magnitude2 == 0:
        return 0
    else:
        return dot_product / (magnitude1 * magnitude2)

In [None]:
similarities = []
for passage_embedding in passage_embeddings:
    similarities.append(cosine_similarity(query_embedding, passage_embedding))

print("Top 1 document is:")
print(documents[similarities.index(max(similarities))])

## 3. Clean-up

### A. Delete the endpoint

Now that you have successfully performed a real-time inference, you can delete the endpoint and avoid being charged.

In [None]:
model.sagemaker_session.delete_endpoint(endpoint_name)
model.sagemaker_session.delete_endpoint_config(endpoint_name)

### B. Delete the model

In [None]:
model.delete_model()