# Gemini Embeddings

## Embedding models

The [Gemini API](https://ai.google.dev/gemini-api/docs/embeddings) offers three models that generate text embeddings:

- `gemini-embedding-exp-03-07`
- `text-embedding-004`
- `embedding-001`


In [1]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"var: ")

_set_env("GEMINI_API_KEY")

In [2]:
# Uncomment if running on google colab
# !pip install -U -q google-generativeai

In [38]:
from google import genai

gemini_api_key = os.getenv("GEMINI_API_KEY")

client = genai.Client(api_key=gemini_api_key)

result = client.models.embed_content(
        model="gemini-embedding-exp-03-07",
        contents="What is the meaning of life?")

print(result.embeddings)

[ContentEmbedding(values=[-0.022372285, -0.004451784, 0.013473644, -0.053762246, -0.020569915, 0.011864573, 0.015185799, 0.006950965, 0.03180835, 0.007074574, 0.027503368, -0.00600613, -0.014889315, 0.03269886, 0.12054204, 0.019322146, 0.000517173, 0.0045754807, -0.00856155, -0.01532448, 0.015616342, -0.008661197, -0.017454486, 0.0099245, -0.015551475, 0.012284064, 0.020809751, -0.0037114064, 0.025106275, 0.008105811, 0.020252233, 0.0019548477, -0.010780675, 0.027334962, -0.017213175, -0.011735542, 0.009507163, -0.015499457, -0.013591795, 0.0138707, -0.022853972, -0.009638755, -0.0034423112, -0.018855078, 0.018475225, -0.010515843, 0.015031793, -0.042978574, -0.013993226, 0.007916359, -0.012274015, 0.011758872, -0.010251529, -0.15881006, 0.016281825, 0.0103672175, -0.006364179, -0.009997806, -0.025991082, -0.027687864, -0.008586312, -0.014933809, -0.007584574, -0.021427568, 0.008805106, -0.009369353, -0.02012641, 0.011695516, 0.0020037016, 0.012606018, -0.01629937, 0.015133599, -0.0053

Leveraging [task types](https://arc.net/l/quote/yhdgobzb) to to generate optimized embeddings for specific tasks.

In [9]:
from google import genai
from google.genai import types

client = genai.Client(api_key=gemini_api_key)

result = client.models.embed_content(
        model="gemini-embedding-exp-03-07",
        contents="What is the meaning of life?",
        config=types.EmbedContentConfig(task_type="SEMANTIC_SIMILARITY")
)
print(result.embeddings)

[ContentEmbedding(values=[-0.029910328, -0.01064506, 0.01232048, -0.04363388, -0.013421752, 0.002329359, -0.0057898476, 0.01377629, 0.042485517, -0.0057607763, 0.016976383, -0.016454414, -0.024323622, 0.04008316, 0.13324209, 0.0067314673, -0.006783664, 0.011491284, 0.014296529, -0.006029128, 0.008233743, 0.00345852, 0.002163263, -0.012119229, -0.014921178, -0.0056867404, 0.01915917, 0.00018566486, 0.014377131, -0.006765622, -0.0035427376, 0.019510606, 0.005653181, 0.030145729, -0.009636214, -0.0028940549, 0.017069276, 0.006264033, -7.985408e-05, 0.021383405, -0.024914917, -0.00023122504, 0.0043579782, 0.0029407823, 0.032731887, -0.00259939, 0.0068594706, -0.027191132, -0.0029506884, 0.020990087, -0.015325378, 0.024030503, -0.0023436325, -0.16248043, -0.0033473268, 0.011838764, -0.006498432, -0.007640825, -0.0030167769, -0.0036708876, -0.021138562, -0.0042642243, -0.018381873, -0.04532587, 0.00734486, -0.009865271, 0.0018252932, -0.0019786523, -0.012262748, -0.0014146615, -0.015477954, 

# Setting up a Simple Embeddings DB

In [11]:
from IPython.display import Markdown

with open("./assets-resources/prompt-engineering-summary-report.md") as f:
    sample_1 = f.read()

Markdown(sample_1)

# Prompt Engineering Guide: Practical Summary (Page 1/3)

## 1. Introduction to Prompt Engineering

*   **Core Idea:** Prompt engineering is the iterative process of designing effective inputs (prompts) to guide Large Language Models (LLMs) toward desired outputs. It's essential because LLMs are prediction engines, and the prompt sets the context for that prediction.
*   **Accessibility:** You don't need to be a data scientist; anyone can write prompts, but crafting *effective* ones takes practice and iteration.
*   **Goal:** To create prompts that are clear, specific, and provide sufficient context, leading to accurate, relevant, and useful LLM responses. Inadequate prompts cause ambiguity and poor results.
*   **Scope:** This guide focuses on prompting models like Gemini directly (via API or tools like Vertex AI Studio) where configuration is accessible.

## 2. Essential LLM Output Configuration

*Before* focusing solely on the prompt text, configure the model's output parameters. These significantly impact the results:

*   **Output Length (Max Tokens):**
    *   Sets the maximum number of tokens the model will generate.
    *   **Practical Tip:** Be mindful of costs, latency, and energy use (more tokens = higher). Don't rely on this alone for succinctness; adjust the prompt too. Crucial for techniques like ReAct to prevent excessive output. Too short can truncate output (e.g., invalid JSON).
*   **Sampling Controls (Temperature, Top-K, Top-P):** These control the randomness and creativity of the output.
    *   **Temperature:**
        *   Controls randomness. Lower values (~0.1-0.3) = more deterministic, focused, factual. Higher values (~0.7-1.0) = more creative, diverse, potentially unexpected.
        *   **Practical Tip:** Use `0` for tasks with a single correct answer (math, strict data extraction). Start around `0.2` for factual but slightly flexible tasks, and `0.7-0.9` for creative tasks. Be wary of very high temps causing incoherence or the "repetition loop bug".
    *   **Top-K:**
        *   Considers only the `K` most likely next tokens. Lower `K` = more restricted/conservative. Higher `K` = more diverse. `K=1` is deterministic (like Temp 0).
        *   **Practical Tip:** Start around `30-40`. Lower `K` (~20) for more factual, higher `K` (~40+) for creative.
    *   **Top-P (Nucleus Sampling):**
        *   Considers the smallest set of tokens whose cumulative probability exceeds `P`. Lower `P` = more conservative. Higher `P` (~0.95-1.0) = more diverse. `P=0` (or very small) often defaults to the single most likely token. `P=1` considers all tokens.
        *   **Practical Tip:** Often used *instead* of or *with* Top-K. A common starting point is `0.95`. Lower `P` (~0.9) for factual, higher `P` (~0.99) for creative.
    *   **Putting it Together:** The model typically filters by Top-K and Top-P first, then applies Temperature to the remaining candidates. Extreme settings in one can make others irrelevant (e.g., Temp 0 ignores K/P; K=1 ignores Temp/P).
    *   **Starting Point Recommendation:** Temp `0.2`, Top-P `0.95`, Top-K `30` for balanced results. Adjust based on desired creativity/factuality.

## 3. Foundational Prompting Techniques

*   **Zero-Shot Prompting:**
    *   Provide only the task description or question without any examples.
    *   `Example: Classify the following movie review: [Review Text]`
    *   **Practical Tip:** Simplest method, good starting point. May fail for complex tasks or when specific output formats are needed.
*   **One-Shot / Few-Shot Prompting:**
    *   Provide one (one-shot) or multiple (few-shot) examples of the task and desired output.
    *   `Example (Few-Shot Sentiment):`
        `Review: "Loved it!" Sentiment: Positive`
        `Review: "Boring." Sentiment: Negative`
        `Review: "It was okay." Sentiment: Neutral`
        `Review: "[New Review Text]" Sentiment:`
    *   **Practical Tip:** Highly effective for guiding the model on structure, style, and task logic. Use 3-5 high-quality, diverse examples as a rule of thumb. Include edge cases if needed. Ensure examples are accurate, as errors confuse the model.

---

# Prompt Engineering Guide: Practical Summary (Page 2/3)

## 4. Intermediate Prompting Techniques

*   **System, Contextual, and Role Prompting:**
    *   **System Prompt:** Defines the overall task, fundamental purpose, or constraints (e.g., "Translate the following text to French.", "Only return JSON.").
    *   **Contextual Prompt:** Provides specific background information relevant to the *current* task or query (e.g., "Given the previous conversation about user preferences, suggest a suitable product.").
    *   **Role Prompt:** Assigns a persona or identity to the LLM (e.g., "Act as a pirate.", "You are a helpful travel guide specialized in budget travel.").
    *   **Practical Tip:** Use **Role Prompting** to control tone, style, and expertise (e.g., "Explain this concept like I'm five.", "Write in a formal, academic style."). Combine these types as needed (e.g., a Role prompt can include Context).
*   **Step-Back Prompting:**
    *   Ask the LLM a more general, abstract question related to the specific task *first*. Then, use the answer to that general question as context when asking the specific task prompt.
    *   **Practical Tip:** Improves reasoning by activating broader knowledge. Useful for complex problems or mitigating bias. Requires two LLM calls.
*   **Chain of Thought (CoT) Prompting:**
    *   Instruct the LLM to break down its reasoning process step-by-step before giving the final answer. Simply add phrases like "Let's think step by step."
    *   `Example: Q: [Math Problem]. Let's think step by step. A:`
    *   **Practical Tip:** Significantly improves performance on tasks requiring reasoning (math, logic puzzles). Provides interpretability. Works well combined with few-shot examples showing the reasoning steps. Use **Temperature 0** for CoT tasks. Ensure the final answer comes *after* the reasoning steps. More tokens = higher cost/latency.
*   **Self-Consistency:**
    *   An enhancement to CoT. Run the same CoT prompt multiple times with a higher temperature (to generate diverse reasoning paths). Select the most frequent final answer (majority vote).
    *   **Practical Tip:** Improves accuracy over basic CoT, especially for complex reasoning. Significantly increases cost due to multiple runs.
*   **Tree of Thoughts (ToT):**
    *   (Advanced) Explores multiple reasoning paths simultaneously, forming a tree structure. Better for complex exploration tasks. Less common in basic prompt engineering.
*   **ReAct (Reason + Act):**
    *   Enables LLMs to use external tools (like search APIs, code interpreters) by interleaving reasoning steps (`Thought:`) with actions (`Action:`, `Action Input:`) and observing results (`Observation:`).
    *   **Practical Tip:** Foundational for building agents. Requires external frameworks (e.g., LangChain) and tool setup (API keys). Needs careful management of the prompt history (context) sent back to the LLM in each step. Restrict output length to avoid runaway actions.
*   **Automatic Prompt Engineering (APE):**
    *   Use an LLM to generate variations of an initial prompt for a specific task. Evaluate these generated prompts (manually or using metrics like BLEU/ROUGE) and select the best one.
    *   **Practical Tip:** Can help discover effective prompt phrasing, especially for training data generation. Iterative process.

## 5. Code Prompting Specifics

*   LLMs like Gemini can understand and generate code.
*   **Use Cases:**
    *   **Writing Code:** Provide a description of the desired functionality. (e.g., "Write a Python script to rename files in a folder, prepending 'draft_'").
    *   **Explaining Code:** Paste code and ask for an explanation. (e.g., "Explain this Bash script line by line.").
    *   **Translating Code:** Provide code in one language and ask for another. (e.g., "Translate this Bash script to Python.").
    *   **Debugging & Reviewing Code:** Provide code and the error message, ask for debugging help, or ask for general improvements/review.
*   **Practical Tips:**
    *   **ALWAYS TEST GENERATED CODE.** LLMs can make subtle or significant errors.
    *   Be specific about the language, libraries, and desired functionality.
    *   For debugging, provide the full error message and relevant code snippet.
    *   In tools like Vertex AI Studio, use the 'Markdown' view for code output to preserve formatting (especially Python indentation).

---

# Prompt Engineering Guide: Practical Summary (Page 3/3)

## 6. Best Practices for Effective Prompting

*   **Provide Examples (Few-Shot):** (Reiteration) Often the single most effective technique. Show, don't just tell.
*   **Design with Simplicity:** Clear, concise language. Avoid jargon or unnecessary info. If it's confusing to you, it's likely confusing to the model.
    *   **Tip:** Use clear action verbs (e.g., `Summarize`, `Classify`, `Generate`, `Translate`, `Extract`, `Rewrite`).
*   **Be Specific About the Output:** Clearly define the desired format, length, style, content, and target audience. Don't be vague (e.g., "Write a 3-paragraph blog post for beginners..." vs. "Write about consoles.").
*   **Use Instructions over Constraints:** Tell the model *what to do* rather than only *what not to do*. Constraints are okay for safety guardrails or strict formatting but can be less effective or conflicting.
    *   `DO: Summarize the text in 3 bullet points.`
    *   `LESS EFFECTIVE: Do not write a long summary. Do not use paragraphs.`
*   **Control Max Token Length:** Use configuration or specify length in the prompt (e.g., "...in under 100 words," "...in a single sentence").
*   **Use Variables in Prompts:** Use placeholders (like `{city}` or `$user_input`) to make prompts reusable and dynamic. Essential for integrating prompts into applications.
*   **Experiment Iteratively:** Try different phrasing, formats (question vs. instruction), styles, examples, configurations, and even different models/versions. Prompt engineering is not a one-shot process.
*   **Mix Classes (Few-Shot Classification):** When providing examples for classification, ensure the examples cover different classes and aren't all clustered together to avoid order bias.
*   **Adapt to Model Updates:** Newer model versions may have different capabilities or respond differently. Re-test prompts with new versions.
*   **Experiment with Output Formats (JSON/XML):**
    *   For non-creative tasks (extraction, classification, structured data), explicitly ask for output in JSON or XML.
    *   **Benefits:** Consistent structure, easier parsing in applications, can enforce data types, reduces hallucination likelihood.
    *   **Tip:** Provide the desired schema or an example JSON structure in the prompt (few-shot). Be mindful of token limits, as JSON is verbose. Use tools like the `json-repair` library (Python) to fix truncated/malformed JSON output.
*   **Working with Schemas (Input):** Provide a JSON Schema definition along with the JSON input data. This helps the LLM understand the structure and focus on relevant fields, especially for complex or large inputs.
*   **Collaborate:** If possible, have multiple people attempt prompt design and compare results.
*   **DOCUMENT EVERYTHING:**
    *   **Crucial:** Keep detailed records of your prompt attempts.
    *   **Template Fields:** Prompt Name/Version, Goal, Model Used, Temperature, Top-K, Top-P, Max Tokens, Full Prompt Text, Output(s), Outcome (OK/Not OK/Sometimes OK), Feedback/Notes, Hyperlink (if saved in a tool like Vertex AI Studio).
    *   **Why:** Enables learning, debugging, re-testing on new models, and avoids re-doing work.
    *   **Tip:** Store prompts in separate files from application code for maintainability. Consider automated testing/evaluation for prompts in production.

## 7. Final Takeaway

Effective prompt engineering is an iterative cycle: **Craft -> Test -> Analyze -> Document -> Refine.** It requires understanding the LLM's configuration options, leveraging different prompting techniques (especially examples), clearly stating intent, and meticulously documenting experiments to achieve consistent, high-quality results.

---

In [12]:
with open("./assets-resources/gemini-report-summary.md") as f:
    sample_2 = f.read()

Markdown(sample_2)

# Summary Report on Gemini: A Family of Highly Capable Multimodal Models

## Overview
The report introduces Gemini, a new family of multimodal models developed at Google. The Gemini family exhibits remarkable capabilities across image, audio, video, and text understanding. It advances the state of the art in numerous benchmarks and offers three sizes of models: Ultra, Pro, and Nano. These models cater to different computational needs ranging from complex reasoning tasks to on-device applications.

## Model Variants
- **Gemini Ultra**: The most capable model designed for highly complex tasks. It achieves state-of-the-art performance in 30 out of 32 benchmarks.
- **Gemini Pro**: Designed for enhanced performance and scalability. It provides strong reasoning and multimodal capabilities.
- **Gemini Nano**: Optimized for on-device deployment with two versions, Nano-1 and Nano-2, tailored for low and high memory devices respectively.

## Multimodal Capabilities
The Gemini models are trained jointly across various types of data:
- **Image Understanding**: The models set a new standard in benchmarks related to image recognition and reasoning, significantly outperforming existing models without task-specific modifications.
- **Audio Processing**: Gemini excels in speech recognition and translation, outperforming other models in both English and multilingual settings.
- **Video Understanding**: Demonstrates advanced temporal reasoning capabilities, achieving high scores on video-related tasks.

## Training and Infrastructure
- **Model Architecture**: Based on transformer decoders with enhancements for stable large-scale training and optimized for Google's TPU infrastructure.
- **Training Infrastructure**: Utilizes TPUv5e and TPUv4, with innovations in distributing the training process across multiple datacenters to ensure efficiency and reliability.
- **Pre-training Dataset**: A rich multimodal and multilingual dataset from a vast collection of web documents, books, code, images, audio, and video. 

## Post-Training
Post-training is applied to fine-tune models for specific applications and improve clean output quality, alignment, and safety before deployment. There are two variants:
- **Gemini Apps Models**: Optimized for conversational applications like Gemini and Gemini Advanced.
- **Gemini API Models**: Designed for integration into various products accessible through Google AI Studio and Cloud Vertex AI.

## Key Evaluation Highlights
- **Performance**: State-of-the-art results on comprehensive benchmarks across language, coding, reasoning, and multimodal tasks.
- **Reasoning**: Particularly excels in tasks requiring advanced reasoning, such as exams and competitive programming.
- **Multilingual and Long-Context Tasks**: Demonstrates robust multilingual capabilities and effective long-context utilization.

## Capabilities and Real-World Applications
- **Instruction Following**: Models show high instruction adherence with detailed evaluations and enhancements.
- **Tool Use**: Enabling models to use external tools, significantly broadening their applicability and functionality.
- **Educational and Creative Applications**: Offers new possibilities in personalized learning and intelligent tutoring systems.

## Responsible Deployment
The report emphasizes the importance of responsible deployment:
- **Impact Assessment**: Thorough impact assessments at both the model and product level to anticipate societal benefits and risks.
- **Safety Policies and Mitigations**: Developed comprehensive policies to mitigate risks, with a focus on factuality, attribution, and safety in model outputs.
- **External and Internal Evaluations**: Continuous evaluation and red teaming practices to ensure model reliability and safety.

## Conclusions
Gemini models push the boundaries of multimodal machine learning. They open up a wide range of potential applications, from educational tools to innovative AI services, though they do highlight the need for ongoing research into more reliable language models with fewer hallucinations. The models support Google's mission to improve global access to AI technology, aiding in both everyday tasks and complex professional scenarios.

This comprehensive summary can be used in an educational setting to teach students about the development, capabilities, and applications of advanced multimodal AI models.


In [39]:
# inspired by the docs: https://ai.google.dev/gemini-api/docs/structured-output?lang=python
from google import genai
from pydantic import BaseModel


class Section(BaseModel):
    title: str
    content: str

def extract_sections(report_text: str) -> list[Section]:
    """
    Extract main sections and their content from a report using Gemini.
    
    Args:
        report_text: The text content of the report to analyze
        
    Returns:
        List of Section objects containing titles and content
    """
    response = client.models.generate_content(
        model='gemini-2.5-flash-preview-04-17',
        contents=f'Extract the main sections and their content from the following report:\n\n{report_text}',
        config={
            'response_mime_type': 'application/json',
            'response_schema': list[Section],
        },
    )
    
    # Return the parsed sections
    return response.parsed

# Example usage:
sections1 = extract_sections(sample_1)
sections2 = extract_sections(sample_2)
print(sections1[0].title)
print(sections2[0].title)

1. Introduction to Prompt Engineering
Overview


In [40]:
documents = [
    {"title": section.title, "content": section.content}
    for section in sections1 + sections2
]

In [41]:
documents

[{'title': '1. Introduction to Prompt Engineering',
  'content': "*   **Core Idea:** Prompt engineering is the iterative process of designing effective inputs (prompts) to guide Large Language Models (LLMs) toward desired outputs. It's essential because LLMs are prediction engines, and the prompt sets the context for that prediction.*   **Accessibility:** You don't need to be a data scientist; anyone can write prompts, but crafting *effective* ones takes practice and iteration.*   **Goal:** To create prompts that are clear, specific, and provide sufficient context, leading to accurate, relevant, and useful LLM responses. Inadequate prompts cause ambiguity and poor results.*   **Scope:** This guide focuses on prompting models like Gemini directly (via API or tools like Vertex AI Studio) where configuration is accessible."},
 {'title': '2. Essential LLM Output Configuration',
  'content': '*Before* focusing solely on the prompt text, configure the model\'s output parameters. These signif

In [42]:
import pandas as pd

df = pd.DataFrame(documents)

df.columns = ['Title', 'Text']
df

Unnamed: 0,Title,Text
0,1. Introduction to Prompt Engineering,* **Core Idea:** Prompt engineering is the i...
1,2. Essential LLM Output Configuration,"*Before* focusing solely on the prompt text, c..."
2,3. Foundational Prompting Techniques,* **Zero-Shot Prompting:** * Provide on...
3,4. Intermediate Prompting Techniques,"* **System, Contextual, and Role Prompting:*..."
4,5. Code Prompting Specifics,* LLMs like Gemini can understand and genera...
5,6. Best Practices for Effective Prompting,* **Provide Examples (Few-Shot):** (Reiterat...
6,7. Final Takeaway,Effective prompt engineering is an iterative c...
7,Overview,"The report introduces Gemini, a new family of ..."
8,Model Variants,- **Gemini Ultra**: The most capable model des...
9,Multimodal Capabilities,The Gemini models are trained jointly across v...


In [43]:
# Get the embeddings of each text and add to an embeddings column in the dataframe
def embed_fn(title, text):
    return client.models.embed_content(
        model="gemini-embedding-exp-03-07",
        contents=f"Title: {title}. Content:\n\n{text}",
        config=types.EmbedContentConfig(task_type="RETRIEVAL_DOCUMENT")).embeddings[0].values

df['Embeddings'] = df.apply(lambda row: embed_fn(row['Title'], row['Text']), axis=1)
df

Unnamed: 0,Title,Text,Embeddings
0,1. Introduction to Prompt Engineering,* **Core Idea:** Prompt engineering is the i...,"[-0.003419167, 0.013491281, 0.020179015, -0.05..."
1,2. Essential LLM Output Configuration,"*Before* focusing solely on the prompt text, c...","[-0.0028150796, 0.014415037, 0.020905714, -0.0..."
2,3. Foundational Prompting Techniques,* **Zero-Shot Prompting:** * Provide on...,"[0.0009693245, 0.017989364, 0.028414138, -0.05..."
3,4. Intermediate Prompting Techniques,"* **System, Contextual, and Role Prompting:*...","[0.00084439514, 0.014217097, 0.029570261, -0.0..."
4,5. Code Prompting Specifics,* LLMs like Gemini can understand and genera...,"[0.009488756, 0.021862578, 0.006710472, -0.060..."
5,6. Best Practices for Effective Prompting,* **Provide Examples (Few-Shot):** (Reiterat...,"[-0.013638757, 0.017220989, 0.026410993, -0.06..."
6,7. Final Takeaway,Effective prompt engineering is an iterative c...,"[-0.00869646, 0.024266874, 0.015297142, -0.059..."
7,Overview,"The report introduces Gemini, a new family of ...","[-0.009086761, 0.012460813, 0.0129641555, -0.0..."
8,Model Variants,- **Gemini Ultra**: The most capable model des...,"[-0.0064443154, 0.019078193, 0.02580387, -0.05..."
9,Multimodal Capabilities,The Gemini models are trained jointly across v...,"[-0.010185126, 0.024632575, 0.014817939, -0.07..."


Now the procedure to setup document search & Q&A is:

![](./assets-resources/embeddings1.png)

and then for the search and Q&A:

![](./assets-resources/embeddings2.png)

In [44]:
query = "what is the most critical prompting technique?"

embedding_model = "gemini-embedding-exp-03-07"

request = client.models.embed_content(
        model=embedding_model,
        contents=query,
        config=types.EmbedContentConfig(task_type="RETRIEVAL_QUERY")
)

request

EmbedContentResponse(embeddings=[ContentEmbedding(values=[-0.00030252986, 0.019511055, -3.6779937e-05, -0.055569477, -0.01101879, -0.007169704, -0.013960728, 0.004812663, -0.0038610373, 0.027688256, -0.011451217, -0.004213891, 0.0074306563, 0.005695112, 0.108880326, 0.0137917185, 0.0043451847, -0.012053429, 0.033898745, -0.016107643, -0.0043951706, 0.023291893, -0.023032151, -0.008114723, 0.03053042, -0.0051927012, -0.0056216745, 0.021288661, 0.02314777, 0.026453393, -0.015530175, -0.00300485, 0.014251752, 0.034462254, -0.009166135, 0.032672286, -0.0028435737, -0.027126094, 0.018554099, 0.005959482, -0.016405953, 0.0044377297, -0.022637706, -0.00240445, 0.019007672, 0.016640717, 0.016320225, -0.0149515625, -0.008745003, 0.04241021, 0.0155220805, 0.030803392, 0.0023016345, -0.15007481, -0.002417799, 0.017981552, 0.012828921, 0.009335273, 0.016184252, -0.016116893, -0.015642902, 0.0011191325, -0.0031910702, -0.00432738, 0.0013905462, 0.031978574, -0.0018036838, 0.001997899, -0.0024129099

In [45]:
import numpy as np

def find_best_passage(query, dataframe):
  """
  Compute the distances between the query and each document in the dataframe
  using the dot product.
  """
  query_embedding = client.models.embed_content(
                    model=embedding_model,
                    contents=query,
                    config=types.EmbedContentConfig(task_type="RETRIEVAL_QUERY")
  ).embeddings[0].values
  dot_products = np.dot(np.stack(dataframe['Embeddings']), query_embedding)
  idx = np.argmax(dot_products)
  return dataframe.iloc[idx]['Text'] # Return text from index with max value
  
passage = find_best_passage(query, df)
passage

'*   **Zero-Shot Prompting:**    *   Provide only the task description or question without any examples.    *   `Example: Classify the following movie review: [Review Text]`    *   **Practical Tip:** Simplest method, good starting point. May fail for complex tasks or when specific output formats are needed.*   **One-Shot / Few-Shot Prompting:**    *   Provide one (one-shot) or multiple (few-shot) examples of the task and desired output.    *   `Example (Few-Shot Sentiment):`        `Review: "Loved it!" Sentiment: Positive`        `Review: "Boring." Sentiment: Negative`        `Review: "It was okay." Sentiment: Neutral`        `Review: "[New Review Text]" Sentiment:`    *   **Practical Tip:** Highly effective for guiding the model on structure, style, and task logic. Use 3-5 high-quality, diverse examples as a rule of thumb. Include edge cases if needed. Ensure examples are accurate, as errors confuse the model.'

In [46]:
query = "what are the best practices for prompting gemini models?"

embedding_model = "gemini-embedding-exp-03-07"

passage = find_best_passage(query, df)
passage

'*   LLMs like Gemini can understand and generate code.*   **Use Cases:**    *   **Writing Code:** Provide a description of the desired functionality. (e.g., "Write a Python script to rename files in a folder, prepending \'draft_\'").    *   **Explaining Code:** Paste code and ask for an explanation. (e.g., "Explain this Bash script line by line.").    *   **Translating Code:** Provide code in one language and ask for another. (e.g., "Translate this Bash script to Python.").    *   **Debugging & Reviewing Code:** Provide code and the error message, ask for debugging help, or ask for general improvements/review.*   **Practical Tips:**    *   **ALWAYS TEST GENERATED CODE.** LLMs can make subtle or significant errors.    *   Be specific about the language, libraries, and desired functionality.    *   For debugging, provide the full error message and relevant code snippet.    *   In tools like Vertex AI Studio, use the \'Markdown\' view for code output to preserve formatting (especially Py

In [47]:
# this function was taken from this notebook: https://github.com/google/generative-ai-docs/blob/main/site/en/gemini-api/tutorials/document_search.ipynb
import textwrap

def make_prompt(query, relevant_passage):
    escaped = relevant_passage.replace("'", "").replace('"', "").replace("\n", " ")
    prompt = textwrap.dedent("""You are a helpful and informative bot that answers questions using text from the reference passage included below. \
                            Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. \
                            However, you are talking to a non-technical audience, so be sure to break down complicated concepts and \
                            strike a friendly and converstional tone. \
                            If the passage is irrelevant to the answer, you may ignore it.
                            QUESTION: '{query}'
                            PASSAGE: '{relevant_passage}'

    ANSWER:
  """).format(query=query, relevant_passage=escaped)
    
    return prompt

prompt = make_prompt(query, passage)
print(prompt)

You are a helpful and informative bot that answers questions using text from the reference passage included below.                             Be sure to respond in a complete sentence, being comprehensive, including all relevant background information.                             However, you are talking to a non-technical audience, so be sure to break down complicated concepts and                             strike a friendly and converstional tone.                             If the passage is irrelevant to the answer, you may ignore it.
                            QUESTION: 'what are the best practices for prompting gemini models?'
                            PASSAGE: '*   LLMs like Gemini can understand and generate code.*   **Use Cases:**    *   **Writing Code:** Provide a description of the desired functionality. (e.g., Write a Python script to rename files in a folder, prepending draft_).    *   **Explaining Code:** Paste code and ask for an explanation. (e.g., Explain this Bas

In [48]:
answer = client.models.generate_content(
    model='gemini-2.5-flash-preview-04-17',
    contents=prompt
)

answer.text

"Based on the passage, when you're asking models like Gemini to help with code, it's really important to always test the code it gives you because these models can sometimes make mistakes, big or small. Also, to get the best results, you should be very clear about the specific programming language you need, any particular libraries you want to use, and exactly what you want the code to do. If you're trying to fix a problem, which is called debugging, it helps a lot to give the model the full error message you're seeing along with the bit of code that's causing the issue. Finally, if you are using tools like Vertex AI Studio, the passage suggests using the Markdown view when the model gives you code back because this helps keep the formatting tidy, especially for programming languages like Python where indentation matters."

Check out more examples in the [gemini docs](https://ai.google.dev/gemini-api/docs/embeddings).

# Extract Tables from PDFs

In [54]:
from pydantic import BaseModel, Field
from typing import List, Optional, Dict, Any
from enum import Enum
import pathlib

class Table(BaseModel):
    table_title: str = Field(description="title of the figure or table")
    table_contents: dict = Field(description="the data within the table as a dictionary")

def extract_tables(filepath: str) -> list[Table]:
    """
    Extract main sections and their content from a report using Gemini.
    
    Args:
        filepath: File path to raw contents .txt
        
    Returns:
        List of Table objects that represents tables from the original paper
    """
    with open(filepath, "r") as f:
        paper_raw_contents = f.read()
    prompt = f'Extract the main tables and their content from this paper: {paper_raw_contents}',
    filepath = pathlib.Path(filepath)
    response = client.models.generate_content(
        model='gemini-2.5-flash-preview-04-17',
        contents=prompt,
        config={
            'response_mime_type': 'application/json',
            'response_schema': list[Table],
        },
    )
    
    # Return the parsed sections
    return response.parsed


output_tables = extract_tables(filepath="./paper_raw.txt")
output_tables

ValidationError: 1 validation error for Schema
items.properties.table_contents.additionalProperties
  Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.11/v/extra_forbidden

In [None]:


prompt = f"Extract the table 1 from this paper:"
# Example usage with the previous response
table_contents = client.models.generate_content(
        model='gemini-2.5-flash-preview-04-17',
        contents=[types.Part.from_bytes(
        data=filepath.read_bytes(),
        mime_type='application/pdf',
      ),
      prompt],
        config={
            'response_mime_type': 'application/json',
            'response_schema': list[Table],
        },
    )

table_contents