# Workshop: Build with Gemini (Part 1)

<a target="_blank" href="https://colab.research.google.com/github/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-1-text-prompting.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This workshop teaches how to build with Gemini using the Gemini API and Python SDK.

Course outline:

- **Part1 (this notebook): Quickstart + Text prompting**
  - Text understanding
  - Streaming response
  - Chats
  - System prompts
  - Config options
  - Long context
  - Token usage
  - Final excercise: Chat with book

- **Part 2: Multimodal understanding (image, video, audio, docs, code)**

- **Part 3: Thinking models + agentic capabilities (tool usage)**

## 0. Use the Google AI Studio as playground

Explore and play with all models in the [Google AI Studio](https://aistudio.google.com/apikey).


## 1. Setup


Get a free API key in the [Google AI Studio](https://aistudio.google.com/apikey)

In [None]:
from google.colab import userdata

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

Install the [Google Gen AI Python SDK](https://github.com/googleapis/python-genai)

In [None]:
%pip install -q -U google-genai

Configure Client

In [None]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

Configure model. See all [models](https://ai.google.dev/gemini-api/docs/models)

In [2]:
MODEL = ... # TODO

## 2. Send your first prompt

In [None]:
# TODO

## 3. Text understanding

The simplest way to generate text is to provide the model with a text-only prompt. `contents` can be a single prompt, a list of prompts, or a combination of multimodal inputs.

In [None]:
# TODO

#### Streaming response

By default, the model returns a response after completing the entire text generation process. You can achieve faster interactions by using streaming to return outputs as they're generated.

In [None]:
# TODO

#### Chat

The SDK chat class provides an interface to keep track of conversation history. Behind the scenes it uses the same `generate_content` method.

In [None]:
# TODO

In [None]:
# TODO

#### Parameters

Every prompt you send to the model includes parameters that control how the model generates responses. You can configure these parameters, or let the model use the default options.

In [None]:
# TODO

- `max_output_tokens`: Sets the maximum number of tokens to include in a candidate.
- `temperature`: Controls the randomness of the output. Use higher values for more creative responses, and lower values for more deterministic responses. Values can range from [0.0, 2.0].
- `top_p`: Changes how the model selects tokens for output. Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value.
- `top_k`: Changes how the model selects tokens for output. A top_k of 1 means the selected token is the most probable among all the tokens in the model's vocabulary, while a top_k of 3 means that the next token is selected from among the 3 most probable using the temperature. Tokens are further filtered based on top_p with the final token selected using temperature sampling.
- `stop_sequences`: List of strings  (up to 5) that tells the model to stop generating text if one of the strings is encountered in the response. If specified, the API will stop at the first appearance of a stop sequence.
- `seed`: If specified, the model makes a best effort to provide the same response for repeated requests. By default, a random number is used.

#### System instructions

System instructions let you steer the behavior of a model based on your specific use case. When you provide system instructions, you give the model additional context to help it understand the task and generate more customized responses. The model should adhere to the system instructions over the full interaction with the user, enabling you to specify product-level behavior separate from the prompts provided by end users.

In [None]:
# TODO

#### Long context and token counting

Gemini 2.0 Flash and 2.5 Pro have a 1M token context window.

In practice, 1 million tokens could look like:

- 50,000 lines of code (with the standard 80 characters per line)
- All the text messages you have sent in the last 5 years
- 8 average length English novels
- 1 hour of video data

Let's feed in an entire book and ask questions:



In [None]:
import requests
res = requests.get("https://gutenberg.org/cache/epub/16317/pg16317.txt")
book = res.text

In [None]:
print(book[:100])

In [None]:
print(f"# charakters {len(book)}")
print(f"# words {len(book.split())}")
print(f"# tokens: ~{int(len(book.split()) * 4/3)}")   # rule of thumb: 100tokens=75words

In [None]:
# TODO

To understand the token usage, you can check `usage_metadata`:

In [None]:
# TODO

You can also use `count_tokens` to check the size of your input prompt(s):

In [None]:
# TODO

## Exercise: Chat with a book

Task:
- Create a chat
- Use a system prompt: `"You are an expert book reviewer with a witty tone."`
- Use a temperature of `1.5`
- Ask 1 to summarize the book
- Ask 1 question to explain more detail about a certain topic from the book
- Ask to create a social media post based on the book
- Print the total number of tokens used during the chat

In [1]:
# TODO

## Recap & Next steps

You learned
- Python SDK quickstart
- Text prompting
- Streaming and chats
- System prompts and config options
- Long context and token counting


More helpful resources:
- [API docs quickstart](https://ai.google.dev/gemini-api/docs/quickstart?lang=python)
- [Text generation docs](https://ai.google.dev/gemini-api/docs/text-generation)
- [Long context docs](https://ai.google.dev/gemini-api/docs/long-context)

Next steps:
- Part 2: Multimodal capabilities (image, video, audio, docs, code)