<a href="https://colab.research.google.com/github/d33disc/Bot-Generator-Bot/blob/main/quickstarts/PDF_Files.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2024 Google LLC.

In [2]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini API: Read a PDF

This notebook demonstrates how you can convert a PDF file so that it can be read by the Gemini API.

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/PDF_Files.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/>

## Setup

In [1]:
%pip install -Uq "google-generativeai>=0.7.2"

In [None]:
import google.generativeai as genai


import pathlib
import tqdm
import os

## Configure your API key

To run the following cell, your API key must be stored in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [3]:
from google.colab import userdata
genai.configure(api_key=userdata.get("GOOGLE_API_KEY"))

NameError: name 'genai' is not defined

In [4]:
import google.generativeai as genai
from google.colab import userdata

genai.configure(api_key=userdata.get("GOOGLE_API_KEY"))

## Download and inspect the PDF

Install the PDF processing tools. You don't need these to use the API, it's just used to display a screenshot of a page.

In [None]:
!apt install poppler-utils

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  poppler-utils
0 upgraded, 1 newly installed, 0 to remove and 21 not upgraded.
Need to get 186 kB of archives.
After this operation, 696 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 poppler-utils amd64 22.02.0-2ubuntu0.6 [186 kB]
Fetched 186 kB in 1s (214 kB/s)
Selecting previously unselected package poppler-utils.
(Reading database ... 124926 files and directories currently installed.)
Preparing to unpack .../poppler-utils_22.02.0-2ubuntu0.6_amd64.deb ...
Unpacking poppler-utils (22.02.0-2ubuntu0.6) ...
Setting up poppler-utils (22.02.0-2ubuntu0.6) ...
Processing triggers for man-db (2.10.2-1) ...


This PDF page is an article titled [Smoothly editing material properties of objects with text-to-image models and synthetic data](https://research.google/blog/smoothly-editing-material-properties-of-objects-with-text-to-image-models-and-synthetic-data/) available on the Google Research Blog.

In [1]:
import os
import google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

# Create the model
generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 40,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

model = genai.GenerativeModel(
  model_name="gemini-1.5-flash",
  generation_config=generation_config,
)

chat_session = model.start_chat(
  history=[
    {
      "role": "user",
      "parts": [
        "Generate a Docker script to create a simple linux machine that has python 3.10 (prebuilt) installed with following libraries: pandas, tensorflow, numpy",
      ],
    },
    {
      "role": "model",
      "parts": [
        "Here&#39;s a Docker script to create a simple Linux machine with Python 3.10 and the requested libraries:\n\n**Dockerfile:**\n\n```dockerfile\n# Base image\nFROM python:3.10-slim\n\n# Install dependencies\nRUN apt-get update &amp;&amp; apt-get install -y \\\n    libpq-dev \\\n    gcc\n\n# Install Python libraries\nRUN pip install pandas tensorflow numpy\n```\n\n**Explanation:**\n\n1. **Base image:** This Dockerfile uses the official `python:3.10-slim` image as a base, which provides a minimal Linux environment with Python 3.10 pre-installed.\n2. **Install dependencies:** \n    * `libpq-dev`: This package is required for building psycopg2, which might be needed by pandas for certain functionalities.\n    * `gcc`: This is needed for compiling some Python libraries.\n3. **Install Python libraries:** This line uses `pip` to install the required libraries: pandas, tensorflow, and numpy.\n\n**Building and Running the Image:**\n\n1. Save the Dockerfile as `Dockerfile` in your project directory.\n2. Build the Docker image:\n\n```bash\ndocker build -t my-python-image .\n```\n\n3. Run the image:\n\n```bash\ndocker run -it my-python-image\n```\n\nThis will create a container from the image and give you an interactive shell. You can then verify that Python and the libraries are installed correctly. \n",
      ],
    },
  ]
)

response = chat_session.send_message("INSERT_INPUT_HERE")

print(response.text)

KeyError: 'GEMINI_API_KEY'

In [4]:
import os
import google.generativeai as genai

In [2]:
import google.generativeai as genai
  from google.colab import userdata

  genai.configure(api_key=userdata.get("GOOGLE_API_KEY"))

IndentationError: unexpected indent (<ipython-input-2-d54a449e5cea>, line 2)

In [3]:
import google.generativeai as genai
from google.colab import userdata

genai.configure(api_key=userdata.get("GOOGLE_API_KEY"))

In [5]:
if not pathlib.Path('test.pdf').exists():
  !curl -o test.pdf https://storage.googleapis.com/generativeai-downloads/data/Smoothly%20editing%20material%20properties%20of%20objects%20with%20text-to-image%20models%20and%20synthetic%20data.pdf

NameError: name 'pathlib' is not defined

Look at one of the pages:

In [6]:
!pdftoppm test.pdf -f 1 -l 1 page-image -jpeg
!ls

/bin/bash: line 1: pdftoppm: command not found
sample_data


In [7]:
import PIL.Image

In [8]:
img = PIL.Image.open(f"page-image-1.jpg")
img.thumbnail([800, 800])
img

FileNotFoundError: [Errno 2] No such file or directory: 'page-image-1.jpg'

## Upload the file to the API

In [9]:
file_ref = genai.upload_file('test.pdf')

FileNotFoundError: [Errno 2] No such file or directory: 'test.pdf'

In [10]:
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

KeyError: 'GEMINI_API_KEY'

In [11]:
# Create the model
generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 40,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

model = genai.GenerativeModel(
  model_name="gemini-1.5-flash",
  generation_config=generation_config,
)

In [12]:
chat_session = model.start_chat(
  history=[
    {
      "role": "user",
      "parts": [
        "Generate a Docker script to create a simple linux machine that has python 3.10 (prebuilt) installed with following libraries: pandas, tensorflow, numpy",
      ],
    },
    {
      "role": "model",
      "parts": [
        "Here&#39;s a Docker script to create a simple Linux machine with Python 3.10 and the requested libraries:\n\n**Dockerfile:**\n\n

SyntaxError: unterminated string literal (detected at line 12) (<ipython-input-12-194ec873e795>, line 12)

In [None]:
\n\n**Explanation:**\n\n1. **Base image:** This Dockerfile uses the official `python:3.10-slim` image as a base, which provides a minimal Linux environment with Python 3.10 pre-installed.\n2. **Install dependencies:** \n    * `libpq-dev`: This package is required for building psycopg2, which might be needed by pandas for certain functionalities.\n    * `gcc`: This is needed for compiling some Python libraries.\n3. **Install Python libraries:** This line uses `pip` to install the required libraries: pandas, tensorflow, and numpy.\n\n**Building and Running the Image:**\n\n1. Save the Dockerfile as `Dockerfile` in your project directory.\n2. Build the Docker image:\n\n

In [None]:
\n\n3. Run the image:\n\n

In [None]:
\n\nThis will create a container from the image and give you an interactive shell. You can then verify that Python and the libraries are installed correctly. \n",
      ],
    },
  ]
)

In [None]:
\n\nThis will create a container from the image and give you an interactive shell. You can then verify that Python and the libraries are installed correctly. \n",
      ],
    },
  ]
)

In [15]:
response = chat_session.send_message("INSERT_INPUT_HERE")

print(response.text)

NameError: name 'chat_session' is not defined

In [18]:
chat_session = model.start_chat(
  history=[
    {
      "role": "user",
      "parts": [
        "Generate a Docker script to create a simple linux machine that has python 3.10 (prebuilt) installed with following libraries: pandas, tensorflow, numpy",
      ],
    },
    {
      "role": "model",
      "parts": [
        "Here&#39;s a Docker script to create a simple Linux machine with Python 3.10 and the requested libraries:\n\n**Dockerfile:**\n\n
# %%
\n\n**Explanation:**\n\n1. **Base image:** This Dockerfile uses the official `python:3.10-slim` image as a base, which provides a minimal Linux environment with Python 3.10 pre-installed.\n2. **Install dependencies:** \n    * `libpq-dev`: This package is required for building psycopg2, which might be needed by pandas for certain functionalities.\n    * `gcc`: This is needed for compiling some Python libraries.\n3. **Install Python libraries:** This line uses `pip` to install the required libraries: pandas, tensorflow, and numpy.\n\n**Building and Running the Image:**\n\n1. Save the Dockerfile as `Dockerfile` in your project directory.\n2. Build the Docker image:\n\n
# %%
\n\n3. Run the image:\n\n
# %%
\n\nThis will create a container from the image and give you an interactive shell. You can then verify that Python and the libraries are installed correctly. \n",
      ],
    },
  ]
)

SyntaxError: unterminated string literal (detected at line 12) (<ipython-input-18-ee76f03b448d>, line 12)

In [19]:
import google.generativeai as genai
from google.colab import userdata

genai.configure(api_key=userdata.get("GOOGLE_API_KEY"))

# Create the model
generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 40,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

model = genai.GenerativeModel(
  model_name="gemini-1.5-flash",
  generation_config=generation_config,
)

# Start the chat session
chat_session = model.start_chat(
  history=[
    {
      "role": "user",
      "parts": [
        "Generate a Docker script to create a simple linux machine that has python 3.10 (prebuilt) installed with following libraries: pandas, tensorflow, numpy",
      ],
    },
    {
      "role": "model",
      "parts": [
        "Here&#39;s a Docker script to create a simple Linux machine with Python 3.10 and the requested libraries:\n\n**Dockerfile:**\n\n

SyntaxError: unterminated string literal (detected at line 32) (<ipython-input-19-5e4b7af18d1f>, line 32)

In [20]:
\n\n**Explanation:**\n\n1. **Base image:** This Dockerfile uses the official `python:3.10-slim` image as a base, which provides a minimal Linux environment with Python 3.10 pre-installed.\n2. **Install dependencies:** \n    * `libpq-dev`: This package is required for building psycopg2, which might be needed by pandas for certain functionalities.\n    * `gcc`: This is needed for compiling some Python libraries.\n3. **Install Python libraries:** This line uses `pip` to install the required libraries: pandas, tensorflow, and numpy.\n\n**Building and Running the Image:**\n\n1. Save the Dockerfile as `Dockerfile` in your project directory.\n2. Build the Docker image:\n\n

SyntaxError: unexpected character after line continuation character (<ipython-input-20-875024239ae8>, line 1)

In [21]:
\n\n3. Run the image:\n\n

SyntaxError: unexpected character after line continuation character (<ipython-input-21-7b52e5e265a6>, line 1)

In [22]:
\n\nThis will create a container from the image and give you an interactive shell. You can then verify that Python and the libraries are installed correctly. \n",
      ],
    },
  ]
)

response = chat_session.send_message("INSERT_INPUT_HERE")

print(response.text)

SyntaxError: unexpected character after line continuation character (<ipython-input-22-d34835c485f1>, line 1)

In [16]:
response = chat_session.send_message("INSERT_INPUT_HERE")

NameError: name 'chat_session' is not defined

## Try it out

In [23]:
model = genai.GenerativeModel(model_name='gemini-2.0-flash')

The pages of the PDF file are each passed to the model as a screenshot of the page plus the text extracted by OCR.

In [24]:
model.count_tokens([file_ref, '\n\nCan you summarize this file as a bulleted list?'])

NameError: name 'file_ref' is not defined

In [25]:
response = model.generate_content(
    [file_ref, '\n\nCan you summarize this file as a bulleted list?']
)

NameError: name 'file_ref' is not defined

In [26]:
from IPython.display import Markdown
Markdown(response.text)

NameError: name 'response' is not defined

In addition, take a look at how the Gemini model responds when you ask questions about the images within the PDF.

In [27]:
response_2 = model.generate_content(
    [file_ref, '\n\nCan you explain the images on the first page of the document?']
)

NameError: name 'file_ref' is not defined

In [28]:
from IPython.display import Markdown
Markdown(response_2.text)

NameError: name 'response_2' is not defined

In [29]:
\n\nThis will create a container from the image and give you an interactive shell. You can then verify that Python and the libraries are installed correctly. \n",
      ],
    },
  ]
)

response = chat_session.send_message("INSERT_INPUT_HERE")

print(response.text)

SyntaxError: unexpected character after line continuation character (<ipython-input-29-d34835c485f1>, line 1)

If you observe the area of the header of the article, you can see that the model captures what is happening.

## Learning more

The File API lets you upload a variety of multimodal MIME types, including images, audio, and video formats. The File API handles inputs that can be used to generate content with `model.generateContent` or `model.streamGenerateContent`.

The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API.

* Learn more about prompting with [media files](https://ai.google.dev/gemini-api/docs/file-prompting-strategies) in the docs, including the supported formats and maximum length.
* Learn more about to extract structured outputs from PDFs in the [Structured outputs on invoices and forms](https://github.com/google-gemini/cookbook/blob/main/examples/Pdf_structured_outputs_on_invoices_and_forms.ipynb) example.


In [30]:
import pandas as pd

def process_data(file_path):
  """Reads data from a CSV file, performs calculations, and returns the result.

  Args:
    file_path: The path to the CSV file.

  Returns:
    The calculated result.
  """
  try:
    data = pd.read_csv(file_path)
    # Perform calculations on the data (example)
    result = data['column1'].sum()
    return result
  except FileNotFoundError:
    print(f"Error: File not found at {file_path}")
    return None
  except KeyError:
    print("Error: 'column1' not found in the data.")
    return None

In [31]:
def calculate_area(length, width):
  """Calculates the area of a rectangle.

  Args:
    length: The length of the rectangle.
    width: The width of the rectangle.

  Returns:
    The area of the rectangle.
  """

  # Input validation
  if not isinstance(length, (int, float)) or not isinstance(width, (int, float)):
    raise TypeError("Length and width must be numbers.")
  if length <= 0 or width <= 0:
    raise ValueError("Length and width must be positive.")

  # Calculate the area
  area = length * width

  # Assertion
  assert area >= 0, "Area cannot be negative."

  return area

In [33]:
!go get github.com/google/generative-ai-go

/bin/bash: line 1: go: command not found


In [34]:
!pip install google-genai




In [35]:
!apt-get update && apt-get install -y golang-go

Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:5 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:11 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [47.7 kB]
Get:12 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2,737 kB]
Get:13 https://developer.download.nvidia.com/compute/

In [36]:
import os

os.environ['GOPATH'] = '/path/to/your/go/workspace'  # Replace with your desired path
os.environ['GOBIN'] = os.path.join(os.environ['GOPATH'], 'bin')

In [37]:
!go version

go version go1.18.1 linux/amd64


In [38]:
!go get github.com/google/generative-ai-go

go: go.mod file not found in current directory or any parent directory.
	'go get' is no longer supported outside a module.
	To build and install a command, use 'go install' with a version,
	like 'go install example.com/cmd@latest'
	For more information, see https://golang.org/doc/go-get-install-deprecation
	or run 'go help get' or 'go help install'.
