# Generating Embeddings with Gemini

This notebook sets up our connection to Google's Gemini API for creating text embeddings.

## What are Embeddings?

Embeddings are the magic that makes vector databases work. They convert text (or other data) into lists of numbers that capture meaning.

For example, these phrases would have similar embeddings:
- "I love dogs"
- "Puppies are my favorite"

While these would have very different embeddings:
- "I love dogs"
- "Interest rates are rising"

Embeddings capture semantic relationships - words and phrases with similar meanings have similar vectors, even if they don't share any keywords.

## Why Gemini?

We're using Google's Gemini API to create high-quality embeddings:
- 768-dimensional vectors capture rich semantic information
- Optimized for finding similar content
- State-of-the-art performance for search and recommendation tasks

In [None]:
#| default_exp gen_emb

In [None]:
#| export
import os
from google import genai
from google.genai import types
import numpy as np

## Setting Up the Client

First, we import the necessary libraries and set up our client.

Make sure you have an API key for Gemini stored in your environment variables as 'GEMINI_API_KEY'.

In [None]:
#| export
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
def get_embedding(contents, model="text-embedding-004"):
    return np.array(client.models.embed_content(model=model, contents=contents, config=types.EmbedContentConfig(task_type="SEMANTIC_SIMILARITY")).embeddings[0].values)

## The Embedding Function

Our `get_embedding` function:
1. Takes text content as input
2. Sends it to Gemini's embedding API
3. Gets back a 768-dimensional vector
4. Returns it as a numpy array

We're using the "SEMANTIC_SIMILARITY" task type, which optimizes the embeddings for finding related content.

The resulting vectors can be compared using cosine similarity or other distance metrics to find similar content.

In [None]:
get_embedding("Dow Jumps 1,000 Points Tuesday as Markets Rebound from Recent Losses").shape

(768,)

## Testing Our Function

Let's test with a simple headline. The resulting embedding is a 768-dimensional vector.

Each dimension in this vector represents some aspect of the text's meaning, learned by the model during training. The specific meaning of each dimension isn't interpretable by humans, but the overall pattern of values captures the semantic content.

We can use these embeddings to:
- Find similar content
- Group related items
- Build recommendation systems
- Enable semantic search

Next, see 03_headline_embeddings.ipynb for a complete example of using these embeddings.

In [None]:
#| hide
import nbdev; nbdev.nbdev_export()