# Intro to GCP's Gemini 

## Overview

**Gemini** is a Google multimodal model that has the capability to **summarize, chat, and generate text from images or videos**. Gemini comes in two model versions **Gemini Pro** and **Gemini Pro Vision**, for this tutorial we will be looking into utilizing both models via python packages and GCPs model playground, **Vertex AI Studio**.

## Learning Objectives
+ Learn how to interact with Gemini as a chatbot from a Jupyter notebook

## Prerequisites
+ You need access to Vertex AI

## Install Packages

Update the google-cloud-aiplatform package

In [2]:
! pip install --upgrade google-cloud-aiplatform  langchain langchain-community

Collecting google-cloud-aiplatform
  Downloading google_cloud_aiplatform-1.74.0-py2.py3-none-any.whl.metadata (31 kB)
Collecting langchain
  Downloading langchain-0.3.9-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.9-py3-none-any.whl.metadata (2.9 kB)
Collecting google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1 (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-cloud-aiplatform)
  Downloading google_api_core-2.23.0-py3-none-any.whl.metadata (3.0 kB)
Collecting google-auth<3.0.0dev,>=2.14.1 (from google-cloud-aiplatform)
  Downloading google_auth-2.36.0-py2.py3-none-any.whl.metadata (4.7 kB)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-cloud-aiplatform)
  Downloading proto_plus-1.25.0-py3-none-any.whl.metadata (2.2 kB)
Collecting protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.2

Next we initialize the Gemini model by setting out project id and location. We are also pulling in the packages:
- **GenerativeModel:** Allows us to specify and launch the Gemini model we need (e.g. Gemini Pro, Gemini Pro Vision).
- **ChatSession:** Set Gemini Pro in chatbot mode.
- **Part:** Loads in files from buckets.
- **Image:** Loads in image files locally.
- **GenerationConfig:** Allows us to configure the models temperature, top p, top k, and max tokens.

In [None]:
from google.cloud import aiplatform
import vertexai
from vertexai.generative_models import GenerativeModel, Image, GenerativeModel, ChatSession, Part, GenerationConfig

import json
# Load env.json
try:
    with open("env.json") as f:
        config = json.load(f)
except FileNotFoundError:
    config = {}


# Assign parameters from Env.
project_id = config.get("NOTEBOOK_GCP_PROJECT_ID")
location = config.get("NOTEBOOK_GCP_LOCATION")


params = globals().get('parameters', {})
pid = params.get('NOTEBOOK_GCP_PROJECT_ID')
print(f"My PID: {pid}")

print("Checking parameters from ENV")
print(project_id)
print(location)

# TODO( FOR developer): If not defined in ENV earlier, uncomment and add it below
#project_id = "<PROJECT_ID>"
#location = "<LOCATION>" #(e.g., us-central1)

vertexai.init(project=project_id, location=location)

Checking parameters from ENV
None
None


## Get Started

### Gemini as a Chatbot

For dealing with text, code generation, natural language tasks we can use the **gemini-pro** model and to set our model in **chatbot mode** we need to use the `start_chat()` function. You will see below we also created a function named **get_chat_response** which will send the prompt or message we have for our model using the `send_message()` function and returns only the text of the chats response.

In [None]:
model = GenerativeModel("gemini-pro")
chat = model.start_chat()

def get_chat_response(chat: ChatSession, prompt: str):
    response = chat.send_message(prompt)
    return response.text

Now that we have our functions lets ask our Gemini chatbot some questions!

In [None]:
prompt = "Hello."
print(get_chat_response(chat, prompt))

In [None]:
prompt = "List gen ai use cases that are Life Science or Health Care related. "
print(get_chat_response(chat, prompt))

We can even ask it to **generate code or debug code**!

In [None]:
prompt = "create a python code that will replace all null values to zero within a csv file"
print(get_chat_response(chat, prompt))

### Gemini as a Summarizer

We can generate text like asking Gemini Pro to summarize articles we provide locally (using langchain). As of now Gemini does not support loading in documents that are not videos and images directly. 

First we will load in a file using langchains text loader. You can also use langchain to load in files from your bucket following the instructions [here](https://python.langchain.com/docs/integrations/document_loaders/google_cloud_storage_file).

In [None]:
#download the article
!wget --user-agent "Chrome" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10954554/pdf/41586_2024_Article_7159.pdf

In [None]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("41586_2024_Article_7159.pdf")
ex_file=loader.load()

We can configure our model to give us the best optimal output by setting the parameters below:
- **Max_Output_Token**: Max number of words to generate.
- **Temperature:** Controls randomness, higher values increase diversity meaning a more unique response make the model to think harder. Must be a number from 0 to 1.
- **Top_p (nucleus):** The cumulative probability cutoff for token selection. Lower values mean sampling from a smaller, more top-weighted nucleus. Must be a number from 0 to 1.
- **Top_k:** Sample from the k most likely next tokens at each step. Lower k focuses on higher probability tokens. This means the model choses the most probable words. Lower values eliminate fewer coherent words.


In [None]:
generation_config = GenerationConfig(
    temperature=0.9,
    top_p=1.0,
    top_k=32,
    candidate_count=1,
    max_output_tokens=8192,
)

def summarizer(file: str) -> str:
        
    # Query the model
    response = model.generate_content(
        [
            # Add an example query
            "summarize this file.",
            file
        ],
        generation_config=generation_config,
    )
    #print(response)
    return response.text

Here we are inputting only the page content from our document loader.

In [None]:
print(summarizer(ex_file[0].page_content))

### Gemini as a Image to Text Generator

Gemini Pro Vision can generate text from images and videos. These text can be descriptions or questions about the image or video. You can download an image or retrieve an image from your bucket or locally.

Images can only be in the following formats: 
- PNG - image/png
- JPEG - image/jpeg

Our function below takes in a prompt and the image, we have also included a if statement to recognize if the function should use `Image` to load in a image locally or `Part` to load it from a bucket.

In [None]:
def img2text(image_path: str, img_prompt: str) -> str:
    multimodal_model = GenerativeModel("gemini-pro-vision")
    if "gs://" in image_path:
        image1=Part.from_uri(image_path, mime_type="image/jpeg")
    else: 
        image1=Image.load_from_file(image_path)
        
    responses = multimodal_model.generate_content(
        [image1, img_prompt],
        generation_config={
            "max_output_tokens": 2048,
            "temperature": 0.4,
            "top_p": 1,
            "top_k": 32
        },
        stream=True,
    )
    for response in responses:
        print(response.text, end="")

Lets look at an image locally, by loading a image first, this a image of a Covid virus from the [CDC Public Health Image Library](https://phil.cdc.gov/details.aspx?pid=23312).

In [None]:
! wget -O example_image_covid.jpg "https://phil.cdc.gov//PHIL_Images/23312/23312_lores.jpg" 

Now run our function!

In [None]:
print(img2text("example_image_covid.jpg", "describe this image."))

Next we'll look at an image from a bucket.

In [None]:
print(img2text("gs://generativeai-downloads/images/scones.jpg", "describe this image."))

We can even ask for more details related to the items in our image!

In [None]:
img_prompt="How do you make whats in this image?"
image="gs://generativeai-downloads/images/scones.jpg"
print(img2text(image, img_prompt))

### Gemini as a Video to Text Generator

Just like images we will be using the same model Gemini Pro Vision. We can load videos locally and from a bucket just like images. Video files can only be in the following formats:
- MOV - video/mov
- MPEG - video/mpeg
- MP4 - video/mp4
- MPG - video/mpg
- AVI - video/avi
- WMV - video/wmv
- MPEGPS - video/mpegps
- FLS - video/flv

Our function below takes a video from a public bucket and asks for a prompt and the location of the video file.

In [None]:
def video2text(video_path: str, video_prompt: str) -> str:
    # Query the model
    multimodal_model = GenerativeModel("gemini-pro-vision")
    response = multimodal_model.generate_content(
        [
            # Add an example image
            Part.from_uri(
                video_path, mime_type="video/mp4"
            ),
            # Add an example query
            video_prompt,
        ],
        stream=True
    )
    for chunk in response :
        return print(chunk.text)


Run the function!

In [None]:
video_prompt = "What is this video about in detail?"
video = "gs://cloud-samples-data/video/Machine Learning Solving Problems Big, Small, and Prickly.mp4"
print(video2text(video, video_prompt))

## Gemini on Vertex AI Studio

You can also use Gemini Pro and Pro Vision in Vertex AI's playground called **Vertex AI Studio**. To locate Vertex AI Studio search Vertex AI and on the left hand side locate Vertex AI Studio as the image below shows. To utilize Gemini Pro Vision locate and click **Multimodal** you will have the option to use your own prompt or explore some of the other set prompts such as Extract text from images, image question answering , etc.

![Gemini1](../../images/Gemini_1.png)

For this tutorial we will select Open on the **Prompt Design** option. We will upload the COVID image we downloaded before by clicking **INSERT MEDIA** and selecting our file. Then we will ask it a question, here we asked "Describe treatments for the item in this image".

![Gemini3](../../images/Gemini_3.png)

To utilize Gemini Pro locate and click **Language** on the left side menu. You have the option to use a prompt or chat and if you would like to focus on text or code.

![Gemini2](../../images/Gemini_2.png)

Here we picked the **TEXT CHAT** option and asked the bot to describe covid and how it works.

![Gemini4](../../images/Gemini_4.png)