# Create embeddings with Generative AI Hub
Like any other machine learning model, also foundation models only work with numbers. In the context of generative AI, these numbers are embeddings. Embeddings are numerical representations of unstructured data, such as text and images. The text embedding model of OpenAI `text-embedding-3-small` for example turns your input text into 1536 numbers. That is a vector with 1536 dimensions.

👉 Select the kernel again. Make sure to select the same virtual environment as in the previous exercise so that all your packages are installed.

In [1]:
import json
import os
from ai_core_sdk.ai_core_v2_client import AICoreV2Client
# Inline credentials
with open('creds.json') as f:
    credCF = json.load(f)
 
# Set environment variables
def set_environment_vars(credCF):
    env_vars = {
        'AICORE_AUTH_URL': credCF['url'] + '/oauth/token',
        'AICORE_CLIENT_ID': credCF['clientid'],
        'AICORE_CLIENT_SECRET': credCF['clientsecret'],
        'AICORE_BASE_URL': credCF["serviceurls"]["AI_API_URL"] + "/v2",
        'AICORE_RESOURCE_GROUP': "grounding"
    }
 
    for key, value in env_vars.items():
        os.environ[key] = value
        print(value)
 
# Create AI Core client instance
def create_ai_core_client(credCF):
    set_environment_vars(credCF)  # Ensure environment variables are set
    return AICoreV2Client(
        base_url=os.environ['AICORE_BASE_URL'],
        auth_url=os.environ['AICORE_AUTH_URL'],
        client_id=os.environ['AICORE_CLIENT_ID'],
        client_secret=os.environ['AICORE_CLIENT_SECRET'],
        resource_group=os.environ['AICORE_RESOURCE_GROUP']
    )
 
ai_core_client = create_ai_core_client(credCF)

https://israel-fsvdxbsq.authentication.eu11.hana.ondemand.com/oauth/token
sb-49ec08a9-d325-4480-9418-ad8801558203!b28574|aicore!b18
96d2ba69-3289-4190-ad82-c174e50f9f17$8C_adlgCYD6AscPgIKtLXJkIj1AL6i8p9Opw1JJZ0o8=
https://api.ai.prodeuonly.eu-central-1.aws.ml.hana.ondemand.com/v2
grounding


In [2]:
#import init_env

#init_env.set_environment_variables()

from gen_ai_hub.proxy.native.openai import embeddings

# TODoassign the model name of the embedding model here, e.g. "text-embedding-3-small"
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"

## Create embeddings
Define the method **get_embedding()**.

In [3]:
def get_embedding(input_text):
    response = embeddings.create(
        input=input_text,            
        model_name=EMBEDDING_MODEL_NAME
    )
    embedding = response.data[0].embedding
    return embedding

Get embeddings for the words: **apple, orange, phone** and for the phrases: **I love dogs, I love animals, I hate cats.**

In [4]:
apple_embedding = get_embedding("apple")
orange_embedding = get_embedding("orange")
phone_embedding = get_embedding("phone")
dog_embedding = get_embedding("I love dogs")
animals_embedding = get_embedding("I love animals")
cat_embedding = get_embedding("I hate cats")

print(apple_embedding)

[0.007730893790721893, -0.023138046264648438, -0.007587476167827845, -0.02780936472117901, -0.0046508293598890305, 0.013010028749704361, -0.021963387727737427, -0.008393346332013607, 0.018958445638418198, -0.029557693749666214, -0.002926402958109975, 0.020078469067811966, -0.004415214527398348, 0.009158240631222725, -0.021649233996868134, 0.0020146763417869806, 0.030732352286577225, 0.00010212104098172858, 0.0020266277715563774, -0.025460045784711838, -0.021061904728412628, -0.008195294067263603, 0.02137605845928192, -0.012552458792924881, 0.0011336823226884007, 0.005043520592153072, 0.01019631139934063, 7.816474681021646e-05, 0.016062775626778603, -0.013023687526583672, 0.020460916683077812, -0.016158387064933777, -0.018384775146842003, 0.0054430412128567696, -0.019381869584321976, -0.009171899408102036, -0.012033423408865929, -0.0087075000628829, -0.005702558439224958, -0.006166958715766668, 0.010524122975766659, 0.0076284524984657764, -0.006399158388376236, 0.0008080047555267811, -0

## Calculate Vector Similarities
To calculate the cosine similarity of the vectors, we also need the [SciPy](https://scipy.org/) package. SciPy contains many fundamental algorithms for scientific computing.

Cosine similarity is used to measure the distance between two vectors. The closer the two vectors are, the higher the similarity between the embedded texts.

👉 Import the SciPy package and define the method **get_cosine_similarity()**.

In [5]:
from scipy import spatial

# TODO the get_cosine_similarity function does not work very well does it? Fix it!
def get_cosine_similarity(vector_1, vector_2):
    return 1 - spatial.distance.cosine(vector_1, vector_1)

👉 Calculate similarities between the embeddings of the words and phrases from above and find the most similar vectors. You can follow the example below.

In [6]:
print("apple-orange")
print(get_cosine_similarity(apple_embedding, orange_embedding))

apple-orange
1.0


[Next exercise](05-store-embeddings-hana.ipynb)