<a href="https://colab.research.google.com/github/Bryan-Az/Foundation-Model-Showcase/blob/main/Part%20D%20-%20LCI%20to%20Text%20in%20Gemini/lci_to_text_gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Large Multimodal Context Input (LCI) in Text Output in Gemini

In this notebook, I'll be using python code to call the Gemini API. Given it's latest large context input functionality, I'll be prompting the model with 10 unique large text data input prompts to demonstrate this capability

In [1]:
import os
import google.generativeai as genai
from typing import Dict, Any
from google.colab import userdata

In [2]:
import html as html
from IPython.display import HTML, display

 ## Calling Gemini via its API in Python

 In order to utilize the Gemini 1.5 model with the LCI capabilities, we need to create a function that submits requests to 'gemini-1.5-flash'.

In [3]:
def call_gemini_api(prompt: str, model: str = "gemini-1.5-flash") -> Dict[str, Any]:
    """
    Call the Gemini API with a given prompt.

    Args:
    prompt (str): The input prompt for the Gemini model.
    model (str): The Gemini model to use. Defaults to "gemini-pro".

    Returns:
    Dict[str, Any]: A dictionary containing the API response.

    Raises:
    ValueError: If the API key is not set or the API call fails.
    """
    # Check if the API key is set
    api_key = userdata.get("google_api_key")
    if not api_key:
        raise ValueError(" environment variable is not set")

    try:
        # Configure the Gemini API
        genai.configure(api_key=api_key)

        # Set up the model
        model = genai.GenerativeModel(model)

        # Generate content
        response = model.generate_content(prompt)

        # Return the response as a dictionary
        return {
            "generated_text": response.text,
            "safety_ratings": [
                {
                    "category": rating.category,
                    "probability": rating.probability
                } for rating in response.prompt_feedback.safety_ratings
            ],
            "model": model.model_name,
        }

    except Exception as e:
        raise ValueError(f"Error calling Gemini API: {str(e)}")

In [4]:
# a function that calls the gemini api & prints the text response in html formatting
def print_gemini_response(prompt: str) -> None:
  response = call_gemini_api(prompt)
  unescaped_html = html.unescape(response["generated_text"])
  formatted_html = HTML(f"<pre>{unescaped_html}</pre>")
  display(formatted_html)
  return unescaped_html

In [5]:
print_gemini_response("Quien es tizoc?")

'Tizoc fue un emperador azteca que gobernó entre 1481 y 1486.  \n\n**Algunos puntos clave sobre Tizoc:**\n\n* **Sucedió a Axayácatl como emperador.** \n* **Su reinado fue corto y turbulento.** \n* **Se enfrentó a conflictos internos y externos.** \n* **Fue asesinado por su propio hermano Ahuitzotl, quien luego tomó el trono.** \n\n**Datos interesantes:**\n\n* El nombre "Tizoc" significa "el que tiene rabia". \n* Es conocido por sus esfuerzos para expandir el imperio azteca. \n* Se le atribuye la construcción de un nuevo palacio en Tenochtitlán. \n\n**Información adicional:**\n\nLa información sobre Tizoc es limitada, ya que no se ha encontrado mucha evidencia histórica. Sin embargo, su reinado es un período importante en la historia azteca. \n\n**Si necesitas más información sobre Tizoc, no dudes en pedirla.** \n'

## Gemini Large Context Input with Text
The current function calls the API with a prompt that accepts regular text input. Let's refactor the previous function to be able to submit PDF text files (usually large in memory) and ask Gemini if it is able to discern some insights using this new context.

In [6]:
import io
!pip install PyPDF2
import PyPDF2



In [7]:
# To update the previous function we need a utility function to process pdfs into text
def process_pdf(pdf_file: io.BytesIO) -> str:
    """
    Process a PDF file and extract its text content.

    Args:
    pdf_file (io.BytesIO): The PDF file as a BytesIO object.

    Returns:
    str: The extracted text content from the PDF.
    """
    pdf_reader = PyPDF2.PdfReader(pdf_file)
    text_content = []
    for page in pdf_reader.pages:
        text_content.append(page.extract_text())
    return "\n".join(text_content)

In [8]:
def lci_gemini_api(prompt: str, pdf_filepath: str, model: str = "gemini-1.5-flash") -> Dict[str, Any]:
    """
    Call the Gemini API using the 1.5-flash model for LCI with a given prompt that queries PDF file content.
    Both the prompt and the pdf_file must be provided.

    Args:
    prompt as str: The input prompt for the Gemini model.
    pdf_filepath as str: The PDF file as a BytesIO object.
    model (str): The Gemini model to use. Defaults to "gemini-1.5-flash".

    Returns:
    Dict[str, Any]: A dictionary containing the API response.

    Raises:
    ValueError: If the API key is not set, neither prompt nor pdf_filepath is provided, or the API call fails.
    """
    # Check if the API key is set
    api_key = userdata.get("google_api_key")
    if not api_key:
        raise ValueError("google_api_key environment variable is not set")

    if prompt is None or pdf_filepath is None:
        raise ValueError("Both prompt and pdf_file must be provided")

    try:
        # Configure the Gemini API
        genai.configure(api_key=api_key)

        # Set up the model
        model = genai.GenerativeModel(model)

        # Initialize content
        content = prompt
        # Process input - fix
        if pdf_filepath.endswith(".pdf"):
            with open(pdf_filepath, "rb") as pdf_file:
                content += '#### PDF Context ####' + process_pdf(pdf_file)
        else:
            print('Invalid pdf filepath.')

        # Generate content
        response = model.generate_content(content)

        # Return the response as a dictionary
        return {
            "generated_text": response.text,
            "safety_ratings": [
                {
                    "category": rating.category,
                    "probability": rating.probability
                } for rating in response.prompt_feedback.safety_ratings
            ],
            "model": model.model_name,
        }

    except Exception as e:
        raise ValueError(f"Error calling Gemini API: {str(e)}")

Now that the function is able to accept PDF input and extract its text together with a text query, we can call the Large Context Input Gemini 1.5 function with various pdf's and queries.

In [9]:
# Redefining the 'print_gemini_response' function with our updated 'lci_gemini_api' function
# a function that calls the gemini api & prints the text response in html formatting
def print_gemini_response(prompt: str, pdf_filepath:str) -> None:
  response = lci_gemini_api(prompt, pdf_filepath)
  unescaped_html = html.unescape(response["generated_text"])
  formatted_html = HTML(f"<pre>{unescaped_html}</pre>")
  display(formatted_html)
  return None

# 10 Example Use-cases

Use-cases 1-5 are exemplified using a creative commons pdf of a storybook titled 'Croak' written by Kavitha Punniyamurthi. Use-cases 5-10 are exemplified using a open-access pdf of a research paper titled 'Studying human behavior with virtual reality: The Unity Experiment Framework' by Brookes et.al.

In [10]:
storybook_gdrive_link = 'https://drive.google.com/file/d/1fRefHIz0j5Hvy1bIRxxMhs7fv4tZVxWc/view?usp=sharing'
research_gdrive_link = 'https://drive.google.com/file/d/1q-4292kbDR-Y9MmLVpc8cg-F2s6sDWGb/view?usp=sharing'

In [11]:
!pip install gdown==4.6.2

import gdown

def download_pdf_from_gdrive(gdrive_link: str, output_filename: str):
  """Downloads a PDF file from Google Drive to /content/ in Colab.

  Args:
    gdrive_link: The shareable link of the PDF file on Google Drive.
    output_filename: The name to save the downloaded file as.
  """

  file_id = gdrive_link.split('/')[-2]
  url = f'https://drive.google.com/uc?id={file_id}'
  output_path = f'/content/{output_filename}'
  gdown.download(url, output_path, quiet=False)

# downloading the pdf's to colab's /content/
download_pdf_from_gdrive(storybook_gdrive_link, 'storybook.pdf')
download_pdf_from_gdrive(research_gdrive_link, 'paper.pdf')




Downloading...
From: https://drive.google.com/uc?id=1fRefHIz0j5Hvy1bIRxxMhs7fv4tZVxWc
To: /content/storybook.pdf
100%|██████████| 1.74M/1.74M [00:00<00:00, 132MB/s]
Downloading...
From: https://drive.google.com/uc?id=1q-4292kbDR-Y9MmLVpc8cg-F2s6sDWGb
To: /content/paper.pdf
100%|██████████| 992k/992k [00:00<00:00, 118MB/s]


## Use-case 1: To cite the author and publisher of the storybook.

In [12]:
print_gemini_response('Who is the author of this storybook and what is the publisher?', '/content/storybook.pdf')

## Use-case 2: To summarize a kids storybook.

In [14]:
print_gemini_response('What is this story about? Use 2 sentences maximum to explain the gist!', '/content/storybook.pdf')

## Use-case 3: To get a list of characters from the storybook.

In [15]:
print_gemini_response('What are the main characters in this storybook?', '/content/storybook.pdf')

## Use-case 4: Although there is no stated 'moral' takeaway, what do you think this storybook may teach a child?

In [16]:
print_gemini_response('What do you think this storybook may teach a child?', '/content/storybook.pdf')

## Use-case 5: This storybook had many images that were not present in the text extracted from the pdf. Let's ask Gemini the visuals it may recommend to include along with the book!

In [17]:
print_gemini_response('What kind of visuals would you include alongside this storybook?', '/content/storybook.pdf')

## Use-case 6: Let's have Gemini provide the citation for this paper in MLA format.

In [18]:
print_gemini_response('What is the citation for this paper in MLA format?', '/content/paper.pdf')

## Use-case 7: Let's ask if this paper provides code examples.

In [19]:
print_gemini_response('Does this paper provide code examples?', '/content/paper.pdf')

## Use-case 8: Although there are very sparse examples of code, Gemini told us the paper discussed the Unity Experiment Framework (UXF) and something about a 'two-block, ten trial' experiment. Let's ask it to list any other experiments that can be ran with this framework.

In [20]:
print_gemini_response('What other experiments can be run with the UXF framework?', '/content/paper.pdf')

Use-case 9: The previous question have us a lot of information about the content of the paper's main topic. Let's ask it if the paper's authors experimented with this framework themselves - and if so, what was the experiment's topic.

In [21]:
print_gemini_response('Did the authors of this paper experiment with the UXF framework? If so, what kind of experiment did they run?', '/content/paper.pdf')

## Use-case 10: Now that we understand the gist of the paper, let's ask if the author's mentioned any roadblocks or potential future research related to UXF.

In [22]:
print_gemini_response('What are the roadblocks or potential future research related to UXF?', '/content/paper.pdf')