# Gemini for Text Generation

Can you read this???

In [None]:
!pip install -q -U google-genai

In [1]:
from google import genai
import os

gemini_api_key = os.getenv("GEMINI_API_KEY")

client = genai.Client(api_key=gemini_api_key)

response = client.models.generate_content(
    model="gemini-2.0-flash", contents="Explain how AI works in a few words"
)
print(response.text)

AI learns patterns from data to make predictions or decisions.



# Gemini for Image Understanding

In [3]:
from google import genai

client = genai.Client(api_key=gemini_api_key)

my_file = client.files.upload(file="./assets-resources/sample-image.png")

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[my_file, "Caption this image."],
)

print(response.text)

Here are the bounding box detections:
```json
[
  {"box_2d": [338, 58, 383, 175], "label": "arrow"}
]
```
The image shows the architecture and capabilities of the Gemini model. It highlights that Gemini models build on top of Transformer decoders and are trained to handle interleaved textual, audio, and visual inputs. The visual encoding is inspired by previous work, and the models can natively output images using discrete image tokens. The figure illustrates how the model supports various input types (text, audio, image, video) and produces both text and image outputs.



## To pass image data inline

In [5]:
from google.genai import types

with open('./assets-resources/sample-image.png', 'rb') as f:
    image_bytes = f.read()

response = client.models.generate_content(
model='gemini-2.0-flash',
contents=[
    types.Part.from_bytes(
    data=image_bytes,
    mime_type='image/png',
    ),
    'What is the diagram in this picture?'
]
)

print(response.text)

The diagram in the picture is Figure 2, which illustrates how Gemini models support interleaved sequences of text, image, audio, and video as inputs and can output responses with interleaved image and text. It shows how different input modalities (text, audio, image, video) are processed through a Transformer and then decoded into either image or text output.



See more examples for working with images in Gemini [here](https://ai.google.dev/gemini-api/docs/image-understanding).

# Document Understanding

In [7]:
from google import genai
from google.genai import types
import httpx

client = genai.Client(api_key=gemini_api_key)

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"

# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content

prompt = "Summarize this document in bullet points"
response = client.models.generate_content(
  model="gemini-2.0-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

Here's a summary of the document in bullet points, covering the key aspects:

**Overall Theme:**

*   The document describes AlphaFold, a new deep learning approach for protein structure prediction that significantly improves accuracy compared to previous methods.

**Key Findings & Results:**

*   AlphaFold uses a neural network to predict distances between pairs of amino acid residues, which convey more structural information than contact predictions alone.
*   The network constructs a protein-specific potential of mean force, which can be optimized via gradient descent to generate accurate 3D structures.
*   In the CASP13 competition, AlphaFold achieved high-accuracy structures (TM-scores >= 0.7) for 24 out of 43 free modeling domains, surpassing the next best method.
*   Distance predictions correlate well with true distances, with uncertainty in those predictions also captured by the network.
*   The distogram accuracy predicts the realized structure's accuracy.

**Methods:**

*   

## For locally stored pdfs

In [10]:
from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client(api_key=gemini_api_key)

# Uncomment this to download from the internet and save it locally
# doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"
# filepath = pathlib.Path('./assets-resources/prompt-eng-guide-google.pdf')
# filepath.write_bytes(httpx.get(doc_url).content)

# This example assumes the pdf is stored locally
# Retrieve and encode the PDF byte
filepath = pathlib.Path('./assets-resources/prompt-eng-guide-google.pdf')

prompt = "Write markdown style 3 page report on this prompt engineering guide with just the practical tips and relevant information."
response = client.models.generate_content(
  model="gemini-2.5-pro-preview-03-25",
  contents=[
      types.Part.from_bytes(
        data=filepath.read_bytes(),
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

Okay, here is a 3-page markdown report summarizing the practical tips and relevant information from the Google Prompt Engineering guide.

---

# Prompt Engineering Guide: Practical Summary (Page 1/3)

## 1. Introduction to Prompt Engineering

*   **Core Idea:** Prompt engineering is the iterative process of designing effective inputs (prompts) to guide Large Language Models (LLMs) toward desired outputs. It's essential because LLMs are prediction engines, and the prompt sets the context for that prediction.
*   **Accessibility:** You don't need to be a data scientist; anyone can write prompts, but crafting *effective* ones takes practice and iteration.
*   **Goal:** To create prompts that are clear, specific, and provide sufficient context, leading to accurate, relevant, and useful LLM responses. Inadequate prompts cause ambiguity and poor results.
*   **Scope:** This guide focuses on prompting models like Gemini directly (via API or tools like Vertex AI Studio) where configuration is 

More examples with PDFs [here](https://ai.google.dev/gemini-api/docs/document-processing?lang=python).

# Check Gemini Docs for All Capabilities

- [Audio Understanding](https://ai.google.dev/gemini-api/docs/audio)
- [Video Understanding](https://ai.google.dev/gemini-api/docs/video-understanding)