## What are Embeddings?
- Concept from Natural Language Processing (NLP)
- Numerical representation of text
![Screenshot 2025-10-13 at 3.56.31 PM.png](attachment:37452e5c-5ac1-4d9f-bdda-576f520fa7a4.png)

- Text is mapped onto a *multi-dimensional* **vector space**
- The numbers outputted by the model are the text's location in the space
- Similar words appear *closer together*
- Dissimilar words appear *further away*
![Screenshot 2025-10-13 at 3.58.05 PM.png](attachment:45ab7d74-c9bd-4c9c-ba7e-643c2f01d79c.png)

### Why are embeddings useful?
- Embeddings allow *semantic meaning* to be captured
- **Semantic meaning**: context and intent behind text
- Example:
  - "Which way is it to the supermarket?"
  - "Could I have directions to the shop?"

### Semantic search engines
Traditional search engines
- Use **keyword** pattern matching
- May miss the true intent
- Will miss word variations
![Screenshot 2025-10-13 at 4.01.01 PM.png](attachment:3f28148c-bbd2-4dfd-a4d6-d75c972f085e.png)

- Use **embeddings** to understand intent and context
![Screenshot 2025-10-13 at 4.02.07 PM.png](attachment:219822f6-e85c-46c3-b760-d55a5465668c.png)

### Recommendation systems
Example: Job post recommendations
- Recommend jobs based on descriptions already viewed
- Mitigates variations in job title
![Screenshot 2025-10-13 at 4.03.01 PM.png](attachment:31ed7908-3798-4322-a852-943eaca13cf9.png)

### Classification
Classification tasks:
- Classify sentiment
- Cluster observations
- Categorization
- Example: **Classifying news headlines**
![Screenshot 2025-10-13 at 4.06.03 PM.png](attachment:42730af8-2db0-4e0c-b544-b3fa175d04de.png)

### Creating an Embeddings request
Embeddings endpoint

In [None]:
from openai import OpenAI

client = OpenAI(api_key="<OPENAI_API_KEY>")
response = client.embeddings.create( 
    model="text-embedding-3-small",
    input="""Embeddings are a numerical representation of text that can be used to measure the relatedness between 
    two pieces of text."""
)

response_dict = response.model_dump()
print(response_dict)

### Embeddings response
![Screenshot 2025-10-13 at 4.07.40 PM.png](attachment:69f6e7d3-8b87-42b0-a20f-e4a534dedb0c.png)

### Extracting the embeddings

In [None]:
print(response_dict['data'][0]['embedding'])

# Output: [0.0023064255, ...., -0.0028842222]

In [None]:
# Practice 1

# Create an OpenAI client
client = OpenAI(api_key="<OPENAI_API_TOKEN>")

# Create a request to obtain embeddings
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="""Today, I learned the concept of embeddings, its uses cases, how it works, and its importance in building impactful 
    AI systems. Next up, vector databases and RAG."""
)

# Convert the response into a dictionary
response_dict = response.model_dump()
print(response_dict)

In [None]:
# Practice 2

# Extract the total_tokens from response_dict
print(response_dict['usage']['total_tokens'])

# Extract the embeddings from response_dict
print(response_dict['data'][0]['embedding'])

## Investigating the Vector Space

### Example: Embedding headlines

In [None]:
articles = [  
    {"headline": "Economic Growth Continues Amid Global Uncertainty", "topic": "Business"},  
    {"headline": "Interest rates fall to historic lows", "topic": "Business"},  
    {"headline": "Scientists Make Breakthrough Discovery in Renewable Energy", "topic": "Science"},   
    {"headline": "India Successfully Lands Near Moon's South Pole", "topic": "Science"},  
    {"headline": "New Particle Discovered at CERN", "topic": "Science"}, 
    {"headline": "Tech Company Launches Innovative Product to Improve Online Accessibility", "topic": "Tech"},    
    {"headline": "Tech Giant Buys 49% Stake In AI Startup", "topic": "Tech"},  
    {"headline": "New Social Media Platform Has Everyone Talking!", "topic": "Tech"},  
    {"headline": "The Blues get promoted on the final day of the season!", "topic": "Sport"}, 
    {"headline": "1.5 Billion Tune-in to the World Cup Final", "topic": "Sport"}
]