# 4.3 Embeddings

OpenAI’s text embeddings transforms strings to numbers, that allows to measure the relatedness of text strings. Embeddings are commonly used for:

* Search (where results are ranked by relevance to a query string)
* Clustering (where text strings are grouped by similarity)
* Recommendations (where items with related text strings are recommended)
* Anomaly detection (where outliers with little relatedness are identified)
* Diversity measurement (where similarity distributions are analyzed)
* Classification (where text strings are classified by their most similar label)

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.


Like mentioned before, Embeddings have a lot of uses but this tutorial will only focus on how to make the request and show simple method to compare two strings.

**Models**

Right now there are three available models for Embeddings. The ones with "-3" on the name are third generation models.

* text-embedding-3-small	
* text-embedding-3-large
* text-embedding-ada-002

**1 - Import the necessary libraries and start the client**

In [2]:
from openai import OpenAI
client = OpenAI()

**2 - Retrieve the embeddings**

*model* - changes the model use to retieve information  
*input* - string message you want to retrieve the embeddings from

In [4]:
response1 = client.embeddings.create(
    input="We are testing to see if this string has any similarities to another one.",
    model="text-embedding-3-small"
)

embeddings1 = response1.data[0].embedding

print(f"Embeddings Example - {embeddings1[0]}, {embeddings1[1]}")

Embeddings Example - -0.00921399425715208, -0.03483395278453827


### Compare two strings using embeddings and cosine similarity

**1 - Import numpy and cosine_similarity**


In [5]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

**2 - Retrieve the embeddings for the second string**

In [7]:
response2 = client.embeddings.create(
    input="We're experimenting to determine if this string bears resemblance to another.",
    model="text-embedding-3-small"
)

embeddings2 = response2.data[0].embedding

**3 - Convert the embeddings to numpy arrays**

In [8]:
embeddings1 = np.array(embeddings1).reshape(1, -1)
embeddings2 = np.array(embeddings2).reshape(1, -1)

**4 - Calculate the similarity**

The closer the score is to 1 the similar it is

In [10]:
similarity_score = cosine_similarity(embeddings1, embeddings2)

final_score = float(format(similarity_score[0][0], ".2f"))

print(f"Similarity: {final_score}")

Similarity: 0.88
