# Khipus.ai
## Embeddings

### Demo: Generating Embeddings
<span>© Copyright Notice 2025, Khipus.ai - All Rights Reserved.</span>

## Introduction

Embeddings are numerical representations of text that capture semantic meaning. In this notebook, we will explore how to generate embeddings using OpenAI's `textembeddingada-002` model, compare embeddings using cosine similarity, and integrate them into a Retrieval Augmented Generation (RAG) pipeline.

In [20]:
# Import required libraries
import openai
import numpy as np
import os

In [None]:
# Set Azure OpenAI configuration
openai.api_key = os.getenv("OPENAI_API_KEY", "Replace with your OpenAI API key") #Use this key: BjSM1Dwo5UZVvPUizHw8w0n8i7TM3fHIK3GjbeIYX5Z1nqffyiCBJQQJ99BBACYeBjFXJ3w3AAABACOGRhVh
openai.api_base = "https://khipus-aoai.openai.azure.com"
openai.api_type = "azure"
openai.api_version = "2023-05-15"


## Generating Embeddings

The code above demonstrates how to generate an embedding for a given text using the `textembeddingada-002` model.

In [22]:
# Function to generate embeddings using the deployment of text-embedding-ada-002 on Azure OpenAI
def get_embedding(text, deployment="text-embedding-ada-002"):
    response = openai.Embedding.create(input=[text], deployment_id=deployment)
    return response['data'][0]['embedding']

# Example text
text = "Khipus"
embedding_vector = get_embedding(text)

# Print the first 5 dimensions of the embedding for brevity
print(f"Embedding for text (first 5 dimensions): {embedding_vector[:5]}...")

Embedding for text (first 5 dimensions): [0.005524002946913242, -0.00524763623252511, 0.027703257277607918, -0.038917750120162964, -0.021496662870049477]...


The text-embedding-ada-002 model typically returns an embedding vector with 1536 dimensions. In this notebook, the DataFrame is created with one row per dimension (i.e., 1536 rows), where each row shows the index and corresponding embedding value.

In [23]:
import pandas as pd

# Create a DataFrame with the embedding_vector values
df_embedding = pd.DataFrame({
    'Index': list(range(len(embedding_vector))),
    'Value': embedding_vector
})

# Display the table
df_embedding

Unnamed: 0,Index,Value
0,0,0.005524
1,1,-0.005248
2,2,0.027703
3,3,-0.038918
4,4,-0.021497
...,...,...
1531,1531,0.021403
1532,1532,0.003872
1533,1533,0.002447
1534,1534,-0.003333
