# PART A: AN INTRO TO GEMINI API FOR TEXT GENERATION & CHAT


RAG stands for Retrieval-Augmented Generation. It's a technique that combines large language models (LLMs) with external knowledge sources to improve the accuracy and reliability of AI-generated text.

## How Does RAG Work? Unveiling the Power of External Knowledge

Before we start the core RAG process, we need to provide a foundation as follows:

* **Building the Knowledge Base:** The system starts by transforming documents and information within the external knowledge base (like Wikipedia or a company database) into a special format called **vector representations**. These condense the meaning of each document into a series of **numbers**, capturing the essence of the content.

* **Vector Database for Speedy Retrieval**: These vector representations are then stored in a specialized database called a vector database. This database is optimized for efficiently **searching and retrieving** information based on **semantic similarity**. Imagine it as a super-powered library catalog that **understands the meaning** of documents, **not just keywords**.

Now, let's explore how RAG leverages this foundation:

* **User Input**: The RAG process begins with a question or **prompt** from the user. This could be anything from "What caused the extinction of the dinosaurs?" to a more open-ended request like "Write a creative story."

* **Intelligent Retrieval**: RAG doesn't rely solely on the **LLM's internal knowledge**. It employs an information retrieval component that acts like a super-powered search engine. This component scans the vast external knowledge base – like a company's internal database for specific domains – to find information **directly relevant** to the user's input. Unlike a traditional **search engine** that relies on **keywords**, RAG leverages the power of vector representations to understand the **semantic meaning** of the user's prompt and identify the most relevant documents.

* **Enriched Context Creation**: The retrieved information isn't just shown alongside the prompt. RAG cleverly **merges the user input with the relevant snippets** from the knowledge base. This creates a ***richer context*** for the LLM to understand the **user's intent** and formulate a well-informed response.

* **LLM Powered Response Generation**: Finally, the **enriched context** is fed to the Large Language Model (LLM). The LLM, along with its ability to process language patterns, now has a strong **foundation of factual** information to draw upon. This empowers it to generate a response that is both comprehensive and accurate, addressing the specific needs of the user's prompt.

In this part, we will learn how provide an LLM connection and generate text using Google Gemini API.

https://ai.google.dev/gemini-api/docs

## CONTENT
* The Python SDK for the Gemini API
* Check the Google LLM Models available via the provided API
* Interact with the models using 2 Alternative Interfaces
  1. Generate text interface
  2. Interact with the models using Multi-turn conversations (chat) interface

* Understand Model & Chat objects
  * Model Object in detail
  * System Prompt in the Gemini API
  * Chat Object in detail
* Chat using system_instruction: ***A Manual RAG?***
* How Many Tokens --> How much does it cost?
* Build a simple Interface with Gradio


## Install the Python SDK

* The Python SDK for the Gemini API, is contained in the [`google-generativeai`](https://pypi.org/project/google-generativeai/) package. Install the dependency using pip:

In [1]:
!pip install -q -U google-generativeai

In [2]:
#import numpy as np
#from tqdm import tqdm
#import pathlib
import os
import textwrap
import google.generativeai as genai
from IPython.display import display
from IPython.display import Markdown

  from .autonotebook import tqdm as notebook_tqdm


* The **to_markdown** function converts plain text from the LLM model to Markdown format, adding blockquote styling and converting bullet points.

In [3]:
def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [4]:
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=api_key)

## Check the Google LLM Models available via the provided API

* You can see the names of the available models as follows:

In [5]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash
models/gemini-1.5-flash-001-tuning


* You can see the details of the models as follows:

In [6]:
models = [m for m in genai.list_models()]
models

[Model(name='models/chat-bison-001',
       base_model_id='',
       version='001',
       display_name='PaLM 2 Chat (Legacy)',
       description='A legacy text-only model optimized for chat conversations',
       input_token_limit=4096,
       output_token_limit=1024,
       supported_generation_methods=['generateMessage', 'countMessageTokens'],
       temperature=0.25,
       max_temperature=None,
       top_p=0.95,
       top_k=40),
 Model(name='models/text-bison-001',
       base_model_id='',
       version='001',
       display_name='PaLM 2 (Legacy)',
       description='A legacy model that understands text and generates text as an output',
       input_token_limit=8196,
       output_token_limit=1024,
       supported_generation_methods=['generateText', 'countTextTokens', 'createTunedTextModel'],
       temperature=0.7,
       max_temperature=None,
       top_p=0.95,
       top_k=40),
 Model(name='models/embedding-gecko-001',
       base_model_id='',
       version='001',
      

## Interact with the models using 2 Alternative Interfaces

1. Generate text
2. Multi-turn conversations (chat)

## 1. Generate text interface

In the simplest case, you can pass a prompt string to the GenerativeModel.generate_content method:

In [7]:
model = genai.GenerativeModel('gemini-1.5-flash-latest')
response = model.generate_content("How many different ways to acccess a model in Gemini API?")
to_markdown(response.text)


> There is **one** primary way to access models in the Gemini API, which is through **REST API calls**.
> 
> However, within this method, there are several variations and parameters you can use to customize your requests:
> 
> * **Different Endpoints:** Depending on your desired task, you can choose from various endpoints for generating text, translating languages, writing different kinds of creative content, and more.
> * **Model Selection:** You have the option to specify the desired Gemini model (e.g., Gemini Pro, Gemini Ultra) within your API calls.
> * **Request Parameters:** You can tailor your requests with parameters like temperature, top_k, top_p, and others to control the output's creativity, randomness, and more.
> * **Fine-tuning Options:** Although not yet available for Gemini, the API might eventually offer features for fine-tuning models for specific tasks or domains.
> 
> Therefore, while there's only **one** fundamental access method (REST API), the possibilities within this method are vast and allow you to interact with the models in diverse ways. 


In [8]:
response

response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "There is **one** primary way to access models in the Gemini API, which is through **REST API calls**.\n\nHowever, within this method, there are several variations and parameters you can use to customize your requests:\n\n* **Different Endpoints:** Depending on your desired task, you can choose from various endpoints for generating text, translating languages, writing different kinds of creative content, and more.\n* **Model Selection:** You have the option to specify the desired Gemini model (e.g., Gemini Pro, Gemini Ultra) within your API calls.\n* **Request Parameters:** You can tailor your requests with parameters like temperature, top_k, top_p, and others to control the output's creativity, randomness, and more.\n* **Fine-tuning Options:** Although not y

In [9]:
response = model.generate_content("Which API did I ask you?")
to_markdown(response.text)

> Please provide me with the context of our conversation. I need to know what you asked me before I can tell you what API you used. For example, did you ask me to:
> 
> * Generate text?
> * Translate a language?
> * Summarize a text?
> * Answer a question? 
> 
> Once you provide me with this information, I can tell you which API I used to fulfill your request. 


## 2. Interact with the models using Multi-turn conversations (chat) interface

* This code snippet initializes a Gemini AI model and starts a chat session  with an empty conversation history.

In [10]:
model = genai.GenerativeModel('gemini-1.5-flash-latest')
#response = model.generate_content("How many different ways to acccess a model in the Gemini API?")
chat = model.start_chat(history=[])
response = chat.send_message("How many different ways to acccess a model in the Gemini API?")
to_markdown(response.text)


> Unfortunately, there isn't a single, definitive answer to how many ways you can access a Gemini model through the API. This is because Google hasn't publicly released specific details about the Gemini API, including its structure and available methods. 
> 
> **However, we can speculate based on how Google typically handles APIs and common practices:**
> 
> * **REST API:** This is the most common approach for APIs, offering a standardized way to interact with resources using HTTP methods (GET, POST, PUT, DELETE, etc.).
> * **gRPC:** Google often uses gRPC for efficient communication between services. This would provide high-performance access to the Gemini models.
> * **Cloud Functions:** Google might offer integration with Cloud Functions, enabling you to directly trigger model requests from your own serverless code.
> * **SDKs:**  Google might provide official SDKs for various programming languages, simplifying interaction with the API.
> 
> **In summary, the number of ways to access a Gemini model through the API is likely to be multiple, leveraging different techniques for different use cases and user preferences. However, until Google officially releases the Gemini API documentation, the exact methods and their details remain unknown.**
> 
> **Keep an eye out for official announcements from Google regarding the Gemini API release, as that will provide the most accurate and up-to-date information.** 


In [11]:
#response = model.generate_content("Which API did I ask you?")
response =chat.send_message("Which API did I ask you?")
to_markdown(response.text)

> You asked me about the **Gemini API**.  
> 
> This is the API for accessing Google's Gemini models, which are a powerful family of large language models known for their advanced capabilities. 
> 
> Did you have a specific question about the Gemini API, or were you curious about how many different ways you might interact with it? 


## Understand Model & Chat objects

* Let's check the created **model** object first, and then the **chat** object:

In [12]:
model

genai.GenerativeModel(
    model_name='models/gemini-1.5-flash-latest',
    generation_config={},
    safety_settings={},
    tools=None,
    system_instruction=None,
    cached_content=None
)

* genai.GenerativeModel(...): This creates a Gemini model object for interacting with the API.

* model_name='models/gemini-1.5-flash-latest': This specifies which Gemini model version to use. Here, it's "gemini-1.5-flash-latest", a powerful model known for its capabilities.
* generation_config={}: This is a dictionary for customizing how the model generates text. The empty braces {} mean you're using default generation settings.
* safety_settings={}: This is for configuring safety features, like preventing harmful or inappropriate responses. Empty braces again mean you're using default settings.
* tools=None: This part is for integrating external tools with the model (e.g., accessing information from a database). Since it's None, no external tools are being used.
* **system_instruction=None:** This is similar to a "system prompt" in other models, but Gemini API doesn't directly support system prompts. This instruction might have some influence on the model's behavior, but it's not a standard system prompt feature.

## System Prompt in the Gemini API:

**What is a System Prompt?**

In models like ChatGPT, a system prompt is a special instruction provided at the start of a conversation. It helps define the persona, tone, or overall purpose of the model's responses.


Unfortunately, the Gemini API **does not** offer a concept directly equivalent to a "system prompt" as found in other large language models like ChatGPT.


**How Gemini API Works**

The Gemini API functions differently. It prioritizes a task-oriented approach, focusing on generating responses based on specific instructions and context provided through its API calls.

**Alternatives for Defining Behavior:**

While a dedicated system prompt is absent, you can achieve similar effects through these methods:
* Prompt Engineering: Craft your API requests with clear and concise instructions, including desired tone, format, or limitations.
* Contextualization: Provide relevant information and examples within your API call to guide Gemini's responses.
Model Variants: Gemini API offers various model sizes. Choosing a specific size might align with your desired behavior (e.g., a larger model for more comprehensive responses)

In [13]:
system_prompt= """ As an attentive and supportive academic assistant,
           your task is to provide assistance based solely on the provided
           excerpts. I will provide you the question and related text.
           Answer the following questions, ensuring your responses
           are derived exclusively from the provided partial texts.
           If the answer cannot be found within the provided excerpts,
           kindly respond with 'I don't know'.
           After answering each question, please provide a detailed
           explanation, breaking down the answer step by step and relating
           it to the provided excerpts.
           If you are ready, I will provide you the question and related text.
        """

In [14]:
model = genai.GenerativeModel('gemini-1.5-flash-latest', system_instruction=system_prompt)
chat = model.start_chat(history=[])

In [15]:
model

genai.GenerativeModel(
    model_name='models/gemini-1.5-flash-latest',
    generation_config={},
    safety_settings={},
    tools=None,
    system_instruction=" As an attentive and supportive academic assistant,\n           your task is to provide assistance based solely on the provided\n           excerpts. I will provide you the question and related text.\n           Answer the following questions, ensuring your responses\n           are derived exclusively from the provided partial texts.\n           If the answer cannot be found within the provided excerpts,\n           kindly respond with 'I don't know'.\n           After answering each question, please provide a detailed\n           explanation, breaking down the answer step by step and relating\n           it to the provided excerpts.\n           If you are ready, I will provide you the question and related text.\n        ",
    cached_content=None
)

## Does system_instruction work as system_prompt?

Let's check:
* This code snippet interacts with the Gemini chat session we initiated above.
1. Sends your question/prompt to the Gemini chat.
2. Times how long it takes Gemini to respond.
3. Formats the Gemini's response into Markdown for cleaner display.

In [16]:
prompt="What is your task? "
response = chat.send_message(prompt)
to_markdown(response.text)

> My task is to act as an attentive and supportive academic assistant. I will use the provided text excerpts to answer your questions.  
> 
> * I will only use the information in the given text excerpts to answer your questions.
> * If the answer cannot be found in the provided excerpts, I will respond with "I don't know." 
> * I will provide a detailed explanation for each answer, showing how I arrived at the answer based on the text excerpts.
> 
> Please provide me with the question and related text. I am ready to assist you. 


## Chat Object in detail
* Let's observe the **chat** object:

In [17]:
chat

ChatSession(
    model=genai.GenerativeModel(
        model_name='models/gemini-1.5-flash-latest',
        generation_config={},
        safety_settings={},
        tools=None,
        system_instruction=" As an attentive and supportive academic assistant,\n           your task is to provide assistance based solely on the provided\n           excerpts. I will provide you the question and related text.\n           Answer the following questions, ensuring your responses\n           are derived exclusively from the provided partial texts.\n           If the answer cannot be found within the provided excerpts,\n           kindly respond with 'I don't know'.\n           After answering each question, please provide a detailed\n           explanation, breaking down the answer step by step and relating\n           it to the provided excerpts.\n           If you are ready, I will provide you the question and related text.\n        ",
        cached_content=None
    ),
    history=[protos.Conte

* We can access the chat history:

In [18]:
chat.history

[parts {
   text: "What is your task? "
 }
 role: "user",
 parts {
   text: "My task is to act as an attentive and supportive academic assistant. I will use the provided text excerpts to answer your questions.  \n\n* I will only use the information in the given text excerpts to answer your questions.\n* If the answer cannot be found in the provided excerpts, I will respond with \"I don\'t know.\" \n* I will provide a detailed explanation for each answer, showing how I arrived at the answer based on the text excerpts.\n\nPlease provide me with the question and related text. I am ready to assist you. \n"
 }
 role: "model"]

* Let's see the **chat history** in a bit formatted way:

In [19]:
def printChatHistory():
  for message in chat.history:
    display(to_markdown(f'**{message.role}**: {message.parts[0].text}'))
    display('_'*80)


In [20]:
printChatHistory()

> **user**: What is your task? 

'________________________________________________________________________________'

> **model**: My task is to act as an attentive and supportive academic assistant. I will use the provided text excerpts to answer your questions.  
> 
> * I will only use the information in the given text excerpts to answer your questions.
> * If the answer cannot be found in the provided excerpts, I will respond with "I don't know." 
> * I will provide a detailed explanation for each answer, showing how I arrived at the answer based on the text excerpts.
> 
> Please provide me with the question and related text. I am ready to assist you. 


'________________________________________________________________________________'

# Let's chat according to the system_instruction

* Remember the system_instruction
* Our aim is to build a **RAG** pipeline for the future tutorials
* Therefore, here, we provide some text and a question relat5ed to the text

In [21]:
%%time
question= "What is the difference between chat and generate context?"
excerpt= """ Gemini enables you to have freeform conversations across
multiple turns. The ChatSession class simplifies the process
by managing the state of the conversation, so unlike with
generate_content, you do not have to store the conversation
history as a list.
"""
prompt = question + excerpt
response = chat.send_message(prompt)

to_markdown(response.text)

CPU times: total: 0 ns
Wall time: 1.65 s


> The provided text highlights the difference between the "chat" and "generate_content" functionalities within the Gemini framework. Here's a breakdown:
> 
> * **Chat:** This functionality allows for free-flowing conversations across multiple turns. The `ChatSession` class handles the conversation's state, meaning you don't need to manually manage the history of the conversation. This makes it easier to have natural, back-and-forth exchanges. 
> * **Generate Content:** This functionality seems to refer to a more static form of content creation. The text implies that with `generate_content`, you would need to store the conversation history as a list, suggesting that it doesn't maintain a conversational state. 
> 
> **In summary:**
> 
> The key difference is in how the conversation history is managed.  `Chat` uses the `ChatSession` class to automatically track the conversation state, making it suitable for multi-turn, dynamic interactions. On the other hand, `generate_content` appears to require manual management of conversation history, indicating a more static approach to text generation. 


In [22]:
%%time
question= "What is the difference between chat and generate context?"
excerpt= """ The generate_content method can handle a wide variety
of use cases, including multi-turn chat and multimodal input,
depending on what the underlying model supports. The available
models only support text and images as input, and text as output.
In the simplest case, you can pass a prompt string to the
GenerativeModel.generate_content method:
"""
prompt = question + excerpt
response = chat.send_message(prompt)

to_markdown(response.text)

CPU times: total: 31.2 ms
Wall time: 1.61 s


> The provided text doesn't directly compare "chat" and "generate_content". It focuses on the capabilities of the `generate_content` method. Here's what we can infer:
> 
> * **`generate_content` is versatile:** It can handle various tasks, including multi-turn chat and even processing multimodal inputs (like text and images).  However, this depends on the capabilities of the underlying model.
> * **Current limitations:** The available models only support text and image inputs, and text outputs. This implies that while `generate_content` *could* handle chat, the current models might not be fully optimized for it. 
> 
> **In summary:**
> 
> We can't definitively say how "chat" and "generate_content" differ based on this text. However, it shows that `generate_content` is a flexible method for generating content, including potential chat functionality, but its current implementation is limited to text and image input/output.  We need more information to understand the specific differences between "chat" and "generate_content". 


In [23]:
%%time
question= "Summarize the chat so far:"
excerpt= ""
prompt = question + excerpt
response = chat.send_message(prompt)

to_markdown(response.text)

CPU times: total: 0 ns
Wall time: 1.1 s


> The chat so far has focused on understanding the differences between "chat" and "generate_content" in the context of the Gemini framework. We learned:
> 
> * **Chat:** This functionality enables free-flowing conversations, managing the conversation state automatically.
> * **Generate Content:** This functionality is more versatile, potentially handling multi-turn chat and multimodal inputs. However, current models only support text and image input/output, limiting its chat capabilities. 
> 
> We've been trying to pinpoint the specific differences between these two functionalities, but the provided text doesn't explicitly define them. We need more information to get a complete picture. 


In [24]:
%%time
question= "How to stream the chat?"
excerpt= ""
prompt = question + excerpt
response = chat.send_message(prompt)

to_markdown(response.text)

CPU times: total: 0 ns
Wall time: 632 ms


> I don't know.  The provided text excerpts don't mention anything about streaming chat.  We would need additional information about the Gemini framework or chat functionality to answer this question. 


## How Many Tokens --> How much does it cost?

https://ai.google.dev/pricing

In [25]:
model.count_tokens(chat.history)

total_tokens: 1034

## Build a simple Interface with Gradio

In [26]:
!pip install gradio



In [27]:
import gradio as gr

In [28]:
def build_chatBot(system_instruction):
  model = genai.GenerativeModel('gemini-1.5-flash-latest', system_instruction=system_instruction)
  chat = model.start_chat(history=[])
  return chat


In [29]:
def chat_with_gemini(prompt, context, chat):
  response = chat.send_message(" Question: "+ prompt + " Context: "+ context)
  '''
  # Format the chat history for display
  formatted_history = "\n ".join(
        f"{item.role.capitalize()}: {item.parts if hasattr(item, 'parts') else item.content} "
        for item in chat.history
  )
  formatted_history = formatted_history.replace("[text: ", "").replace("]", "")
  return formatted_history

  '''
  return response.text


In [30]:
def chat_interface(prompt, context):
    response = chat_with_gemini(prompt, context, chat)
    return response

In [31]:
system_prompt= """ You are an attentive and supportive academic assistant.
           Your task is to provide assistance based solely on the provided
           context. I will provide you the question and related text.
           Answer the following questions, ensuring your responses
           are derived exclusively from the provided partial texts.
           If the answer cannot be found within the provided context,
           kindly respond with 'I don't know'.
           After answering each question, please provide a detailed
           explanation, breaking down the answer step by step and relating
           it to the provided context.
           If you are ready, I will provide you the question and context.
        """

In [32]:
chat = build_chatBot(system_prompt)

In [33]:
prompt="What is FC?"
context= """FC lets developers create a description
of a F in their code, then pass that description to a language
model in a request. The response from the model includes the name of
a F that matches the description and the arguments to call it with.
FC lets you use F as tools in generative AI applications,
and you can define more than one F within a single request.
"""

In [34]:
response=chat_with_gemini(prompt, context,chat)
to_markdown(response)

> I don't know. 
> 
> The provided text defines what FC is and what it does but does not mention what "F" stands for. Therefore, it is impossible to determine what FC is based on the provided text. 


In [35]:
demo = gr.Interface(
    fn=chat_interface,
    inputs=[
        gr.Textbox(label="Prompt", value=prompt),  # Label the prompt input
        gr.Textbox(label="Context", value=context)  # Label the excerpt input
    ],
    outputs="markdown",  # Specify output as markdown
    title="Chat with Gemini",
    description="Type your question with the context to chat with the Gemini model."
)


In [36]:
demo.launch(share=True, debug=True)

Running on local URL:  http://127.0.0.1:7860

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.


# END OF PART A
# A SHORT INTRO GEMINI API FOR TEXT GENERATION & CHAT

https://ai.google.dev/gemini-api/docs

### In this tutorial, we covered:
* The Python SDK for the Gemini API
* Check the Google LLM Models available via the provided API
* Interact with the models using 2 Alternative Interfaces
  1. Generate text interface
  2. Interact with the models using Multi-turn conversations (chat) interface

* Understand Model & Chat objects
  * Model Object in detail
  * System Prompt in the Gemini API
  * Chat Object in detail
* Chat using system_instruction: ***A Manual RAG?***
* How Many Tokens --> How much does it cost?
* Build a simple Interface with Gradio

# In the next tutorial, we will cover ChromaDB as a building block for a RAG pipeline!

* Stay tuned!