# Gemini API: General Parameters
This notebook covers:

* Temperature
* Max output length
* Token counting

## SetUp



In [1]:
!pip install -q -U google-generativeai

In [2]:
import google.generativeai as genai

In [4]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

## Model Temperature
Steve is indecisive about how to spend his Friday night as a Freshman at Berkeley. He decides to ask Gemini using the 1.5 Flash variant:

In [5]:
model_name = 'gemini-1.5-flash'
model = genai.GenerativeModel(model_name)

prompt = "Help me choose a fun way to spend my Friday night as a CS major at Berkeley in a single one-sentence idea"
response = model.generate_content(prompt)
print(response.text)

Hack a fun project with friends, fueled by copious amounts of caffeine and questionable pizza.



Steve is very smart and knows that Gemini is non-deterministic; the same prompt can result in different outputs! To demonstrate this, the following code passes the same prompt 5 different times:

In [6]:
outputs = []
prompt = "Help me choose a fun way to spend my Friday night as a CS major at Berkeley in a single one-sentence idea"
for i in range(5):
  response = model.generate_content(prompt)
  outputs.append(response.text)
for index, sentence in enumerate(outputs, start=1):
    print(f"{index}. {sentence}")

1. Hack a (legal and ethical) project with friends, fueled by copious amounts of caffeine and questionable pizza.

2. Grab some friends, conquer a challenging coding problem on HackerRank while fueled by late-night pizza and competitive spirit.

3. Grab some fellow CS majors, head to a nearby boba shop for a caffeine-fueled coding competition/hackathon fueled by late-night pizza.

4. Hack on a fun side project with friends, fueled by copious amounts of caffeine and pizza.

5. Grab some friends, conquer a challenging coding problem on HackerRank over boba and pizza, fueled by the thrill of competitive programming and the promise of weekend relaxation.



Gemini returns a different response each time. Hmm. Steve doesn't like the fact that his Friday night might be determined by random chance.

Fortunately, Gemini has a temperature parameter that controls the randomness of the output. Temperature values can range from 0.0 to 2.0. We can check the temperature of our current model as follows:

In [7]:
for m in genai.list_models():
    if m.name == 'models/gemini-1.5-flash':
        print(m)

Model(name='models/gemini-1.5-flash',
      base_model_id='',
      version='001',
      display_name='Gemini 1.5 Flash',
      description=('Alias that points to the most recent stable version of Gemini 1.5 Flash, our '
                   'fast and versatile multimodal model for scaling across diverse tasks.'),
      input_token_limit=1000000,
      output_token_limit=8192,
      supported_generation_methods=['generateContent', 'countTokens'],
      temperature=1.0,
      max_temperature=2.0,
      top_p=0.95,
      top_k=40)


Let's initialize a new model with a temperature of 0:

In [8]:
model_name = 'gemini-1.5-flash'
new_outputs = []
low_temp_model = genai.GenerativeModel(model_name, generation_config={"temperature": 0})
for i in range(5):
  response = low_temp_model.generate_content(prompt)
  new_outputs.append(response.text)
for index, sentence in enumerate(new_outputs, start=1):
    print(f"{index}. {sentence}")

1. Grab some friends, conquer a challenging coding problem on HackerRank over boba tea, and then debate the merits of various programming languages until the wee hours.

2. Grab some friends, conquer a challenging coding problem on HackerRank over boba tea, then debate the merits of various programming languages until sunrise.

3. Grab some friends, conquer a challenging coding problem on HackerRank over boba and pizza, fueled by the thrill of collaborative problem-solving and the promise of weekend relaxation.

4. Grab some friends, conquer a challenging coding problem on HackerRank over boba and pizza, fueled by the thrill of collaborative problem-solving and the promise of weekend relaxation.

5. Grab some friends, conquer a challenging coding problem on HackerRank over boba tea, then debate the merits of various programming languages until sunrise.



All the outputs are the same! Finally Steve can be sure how to spend his night. Conversely, setting temperature to the max value of 2.0 would have the opposite effect, yielding more unpredictable responses.

## Max Output Length
Let's say Steve wants his outputs to be below a certain length. In large language models, text is generated in tokens. For Gemini models, a token is equivalent to about 4 characters. 100 tokens are about 60-80 English words. He can set the max_output_tokens variable as follows:

In [9]:
model_name = 'gemini-1.5-flash'
short_response_model = genai.GenerativeModel(model_name, generation_config={"max_output_tokens": 5})
prompt = "Help me choose a fun way to spend my Friday night as a CS major at Berkeley in a single one-sentence idea"
response = short_response_model.generate_content(prompt)
print(response.text)

Hack a collaborative coding project


Notice that this simply halts token generation at a fixed quantity and does not guarantee that the output is complete.

## Token Count
Let's say that Steve is being charged for every token that he inputs to Gemini.

If Steve has billing enabled, the price of a paid request is controlled by the number of input and output tokens, so knowing how to count your tokens is important. As such, Steve might want to know the number of tokens in his prompt (input) before actually putting it into the model.

Let's create a new instance of Gemini 1.5 Flash and set our prompt:

In [10]:
model_name = 'gemini-1.5-flash'
token_n_model = genai.GenerativeModel(model_name, generation_config={"temperature": 0.0})
poem_prompt = "Write me a poem about Berkeley's campus"

In [11]:
response = token_n_model.generate_content(poem_prompt)
response.text

"The Campanile's chime, a brazen call,\nAcross the bay, its shadow falls.\nOn Sather Tower, the sunbeams gleam,\nA golden crown, a student's dream.\n\nThrough Strawberry Canyon, paths wind slow,\nWhere redwood giants softly grow.\nA whisper stirs in eucalyptus leaves,\nAs knowledge breathes, and spirit weaves.\n\nThe scent of incense, faintly sweet,\nOn Sproul Plaza, where students meet.\nA tapestry of voices rise,\nBeneath the ever-changing skies.\n\nFrom Bancroft's halls, to Wheeler's might,\nA vibrant pulse, both day and night.\nThe murmur soft of library lore,\nAnd protests echoing evermore.\n\nThe Golden Bear, a watchful gaze,\nThrough changing times, in sunlit haze.\nA campus steeped in history's grace,\nA vibrant heart, a timeless space.\n\nSo wander on, explore and find,\nThe beauty held within this mind,\nOf Berkeley's hills, a grand domain,\nWhere learning blooms, again, again.\n"

Before generating a response, we can check how many tokens are in this prompt using the `.count_tokens` function of the model:

In [12]:
prompt_token_count = token_n_model.count_tokens(poem_prompt)
output_token_count = token_n_model.count_tokens(response.text)
print(f'Tokens in prompt: {prompt_token_count} \n Estimated tokens in output {output_token_count}')


Tokens in prompt: total_tokens: 9
 
 Estimated tokens in output total_tokens: 223

