# Path 1 - Gemini API
This path will guide you on the set-up and usage of Google Gemini's API.

### 0. Get and store the API key
First of all, you need to login with your google account and get an API key [here](https://aistudio.google.com/app/apikey). It is **very important** that you do not share your API key with anyone and that you do not have it in your Repository.

You can keep your API key in a secure local document and access it when needed. It is common to save the key as an environmental variable so that it can be accessed by your python script.

However, this means your API key is in plain text in your script. To avoid this, if you're using VS Code, you can add your API key to a `.env` file in your workspace root with the following line:

```sh
GEMINI_API_KEY="PASTE YOUR KEY HERE"
```

Alternatively, you can use the [dot-env library](https://github.com/theskumar/python-dotenv).

In [1]:
# You can check if the environment variable API_KEY has been set up properly by running this line
!if [ -z $GEMINI_API_KEY ]; then echo "\$GEMINI_API_KEY not found"; else echo "\$GEMINI_API_KEY found"; fi

$GEMINI_API_KEY found


### 1. First simple request
Now, you can write a simple script to see if everything is working properly.

In [2]:
from google import genai

# The client gets the API key from the environment variable `GEMINI_API_KEY`.
client = genai.Client()  # here you can also pass the api_key directly using os.environ['GEMINI_API_KEY']

default_model = "gemini-2.5-flash"

client

<google.genai.client.Client at 0x7ad0c4433fe0>

#### Exercise 1

Ask the model to generate content about a random topic and print the response in text.

Here is the [official documentation](https://ai.google.dev/gemini-api/docs/text-generation?lang=python#configure) to find the help you need.

In [3]:
response = client.models.generate_content(
    model=default_model,
    contents="What is love?"
)
print(response.text)

Love is one of the most profound, complex, and deeply human experiences, incredibly difficult to capture in a single, definitive statement. It's universally felt yet uniquely experienced by each individual.

Here's an attempt to break down its multifaceted nature:

1.  **A Deep Emotion and Connection:**
    *   At its core, love is a strong feeling of affection, care, and attachment towards someone or something.
    *   It involves a sense of deep connection, belonging, and a bond that can transcend time and space.

2.  **A Set of Behaviors and Actions:**
    *   Love is not just a feeling; it's also a **choice** and a **verb**. It manifests through actions like:
        *   **Care and Support:** Looking out for another's well-being, offering help, being present.
        *   **Empathy and Understanding:** Trying to see the world through another's eyes, listening, validating feelings.
        *   **Commitment and Loyalty:** Standing by someone, even through difficulties, and showing fai

### 2. Generation parameters

When asking the model to generate some text, there are different parameters that you can tune to improve on the final quality of the text. [Here](https://ai.google.dev/gemini-api/docs/models/generative-models#model-parameters) is an overview of the parameters that Gemini offers. Try some of them in different context and understand how they affect the final generated text.

#### Exercise 2

Play with the output temperature, which controls the randomness of the generated text `temperature=0` means deterministic output, while `temperature=1` means maximum randomness (try some intermediate value too). Consider keeping the `max_output_tokens` to 50 so that the output is not too long; if you do, you should also set a low `thinking_budget` to avoid an empty response.

In [30]:
from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="How does AI work?",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=0), # Disables thinking
        temperature=1,
        maxOutputTokens=50
    ),
)
print(response.text)


That's a fantastic question, and it's something many people are curious about! The term "AI" is a broad umbrella, so let's break down the core concepts and different ways it works.

At its heart, **AI


#### Exercise 3

Try out different `top_k` values, which controls how many tokens the model considers for output `top_k=1` means the model considers only one token for output (the one with the highest probability) `top_k=50` means the model considers the top 50 tokens for output.

In [38]:
from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="How does AI work?",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=0), # Disables thinking
        temperature=0,
        maxOutputTokens=50,
        topK=50
    ),
)
print(response.text)

That's a fantastic question, and the answer is both incredibly complex and surprisingly simple at its core. Let's break down how AI works, starting with the basics and moving to more advanced concepts.

## The Core Idea: Learning from Data


#### Exercise 4

The same exercise as before but now with `top_p`, which controls how the model selects tokens for output `top_p=0.1` means the model selects tokens that make up 10% of the cumulative probability mass `top_p=0.9` means the model selects tokens that make up 90% of the cumulative probability mass `top_p` filters tokens *after* applying `top_k`.

Can you determine a rule of thumb as to how `top_k` and `top_p` affect the output results? (If you can't try to push the values to extreme values)

In [63]:
from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="How does AI work?",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=0), # Disables thinking
        temperature=0.7,
        maxOutputTokens=50,
        topK=50,
        topP=0.1
    ),
)
print(response.text)

That's a fantastic question, and the answer is both complex and surprisingly simple at its core. Let's break down how AI works, starting with the basics and moving to more advanced concepts.

## The Core Idea: Learning from Data




Limiting the topK to 1 is like setting a low temperature because then it will only take the most likely token. The same goes for a low topP

### 3. Add images to the prompt

#### Exercise 5
Gemini, beside text also accepts images (and videos). Try prompting it with one. Choose an interesting image and prompt the model with a query about it.

You can use the [official documentation](https://ai.google.dev/gemini-api/docs/vision?lang=python#prompting-images).

Use [PIL](https://pillow.readthedocs.io/en/stable/) to load an image. It should already be present in the Python environment.

In [None]:
from PIL import Image

IMAGE_PATH = "./data/engineer_fitting_prosthetic_arm.jpg"

# Your code here

### 4. Retrieval Augmented Generation (RAG)

#### Exercise 6

Depending on the application of the project, you might need to extract text from given documents and include it as additional context. This becomes especially relevant if you have many documents that cannot possibly fit into the model's context window. To more easily implement a RAG pipeline we recommend the use of one of these libraries: [LangChain](https://python.langchain.com/v0.2/docs/introduction/), [LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/), [Haystack](https://docs.haystack.deepset.ai/docs/intro).

For the solution of this lab we will use *LangChain*.

It can be useful to split this exercise into these steps:
1. Read one or more documents using pdfminer
2. Split the documents into small chunks
3. Get and store the embeddings for each chunks
5. Given a query, retrieve the most relevant chunk(s) and appropriately prompt your LLM

**NOTE:** if you try to embed too many documents at once or too large documents you may run into rate limits. Possible solutions: 
* Reduce the number of chunks and/or their size
* Look at the HF version of this lab and use a local embedding model

In [None]:
import os  # langchain expects gemini's api key to be in the environment variable GOOGLE_API_KEY, use os to set it
from langchain_google_genai import GoogleGenerativeAIEmbeddings  # get embeddings from Gemini
from langchain_community.vectorstores import FAISS  # "db" to store and retrieve embeddings
from langchain_core.documents import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter  # split long documents
from pdfminer.high_level import extract_text  # extract text from pdfs

DOC_PATH = "./data/chain_of_thought_prompting.pdf"

# Suppose a user query
USER_QUERY = "What is CoT?"

# Your code here

### 5. Explore on your own
Gemini offers a bigger range of capabilities than those provided here, begin able to automatically handle multi-turn chats is one of them. Explore them on your own!

#### Exercise 7
Explore!

In [None]:
# Your code here

### 6. Create a user interface

#### Exercise 8
Since you are trying to build a complete application, you also need a nice user interface that interacts with the model. There are various libraries available for this purpose. Notably: [gradio](https://www.gradio.app/docs/gradio/interface) and [chat UI](https://huggingface.co/docs/chat-ui/index). For the solution of this lab, we will use gradio.

Gradio has pre-defined input/output blocks that are automatically inserted in the interface. You only need to provide an appropriate function that takes all the inputs and returns the relevant output. See documentation [here](https://www.gradio.app/docs/gradio/interface).

Use a ChatInterface to create a chatbot UI that let's you discuss with Gemini, then add multimodal capabilities for both Gradio and Gemini.

In [None]:
import gradio as gr

# This part closes the demo server if it is already running (which
# happens easily in notebooks) and prevents you from opening multiple
# servers at the same time.
if "demo" in locals() and demo.is_running:
    demo.close()

# Edit the parameters below
chats = {}  # store the chat history for each user (suppose multiple users)

# Your code here