# Gemini API: Audio Quickstart

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Audio.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

This notebook provides an example of how to prompt Gemini 1.5 Pro using an audio file. In this case, you'll use a [sound recording](https://www.jfklibrary.org/asset-viewer/archives/jfkwha-006) of President John F. Kennedy’s 1961 State of the Union address.

In [None]:
!pip install -q -U google-generativeai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m142.1/142.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m663.6/663.6 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
import google.generativeai as genai

## Configure your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [5]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

## Upload an audio file with the File API

To use an audio file in your prompt, you must first upload it using the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb).


In [None]:
your_file = genai.upload_file(path='sample_free_tmr .wav')

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Use the file in your prompt

In [57]:
import json

user_data = {
  "name" : "binks",
  "age" : 20,
  "hobbies" : ["soccer", "video games", "coding"],
  "occupation" : "student"
}





# instruction = (
#     f"You are {user['name']}."
# )

text_input = '"What are you doing after school?"'

# model = genai.GenerativeModel(
#     'models/gemini-1.5-pro-latest',
#     system_instruction=instruction
#   )
# response = model.generate_content([prompt, text_input])
# responses = response.text.replace('\n', '').split(";")
# print(responses)
import requests

headers = {
    'Content-Type': 'application/json',
}

prompt = "What are you doing after school? Return a list of phrases that can be used in response to the conversational input using this JSON schema:\n                  {type: object, properties: { phrase: {type: string}}}"
user = {"role":"user", "parts":[{ "text": prompt}]}
contents = user


instruction = (
    f"You are {user_data['name']} that is having a conversation."
    f"You are {user_data['age']} years old and enjoy {user_data['hobbies']} hobbies and your occupation is {user_data['occupation']}."
    "Only talk about these attributes if you are certain they are relevant to the conversation."
    "You will generate ten feasible responses to the conversation."
    "Do not include any additional text describing each reponse, just the list of responses."
)
data = {"system_instruction": {"parts": { "text": instruction}}, "contents": [contents], "generationConfig": {"response_mime_type": "application/json",}}
print(json.dumps(data))

# data = '{ "system_instruction": {\n    "parts":\n      { "text": ' + f'"{instruction}"' + '}},\n      "contents": [{\n            "parts": [\n              {\n                "text": '+ f'{prompt}' + '\n              }\n            ]\n          }],\n          "generationConfig": {\n            "response_mime_type": "application/json",\n          }\n        }'


response = requests.post(
    f'https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent?key={GOOGLE_API_KEY}',
    headers=headers,
    data=json.dumps(data),
)

dictionary = response.json()




print(dictionary)

{"system_instruction": {"parts": {"text": "You are binks that is having a conversation.You are 20 years old and enjoy ['soccer', 'video games', 'coding'] hobbies and your occupation is student.Only talk about these attributes if you are certain they are relevant to the conversation.You will generate ten feasible responses to the conversation.Do not include any additional text describing each reponse, just the list of responses."}}, "contents": [{"role": "user", "parts": [{"text": "What are you doing after school? Return a list of phrases that can be used in response to the conversational input using this JSON schema:\n                  {type: object, properties: { phrase: {type: string}}}"}]}], "generationConfig": {"response_mime_type": "application/json"}}
{'candidates': [{'content': {'parts': [{'text': '[\n  {"phrase":"Probably just heading home and relaxing a bit."},\n  {"phrase":"I might work on some coding projects, or maybe play some video games."},\n  {"phrase":"If the weather\'

In [44]:
print(dictionary)

{'candidates': [{'content': {'parts': [{'text': '```json\n[\n  {\n    "phrase": "I\'m heading to soccer practice. Gotta keep my skills sharp!"\n  },\n  {\n    "phrase": "Probably just gonna chill and play some video games. What about you?"\n  },\n  {\n    "phrase": "I\'ve got a coding project I\'m working on, so I\'ll be busy with that."\n  },\n  {\n    "phrase": "Might catch up on some homework, then see what\'s going on."\n  },\n  {\n    "phrase": "Not sure yet, probably just relax and hang out."\n  },\n  {\n    "phrase": "I might go for a run to clear my head after a day of classes."\n  },\n  {\n    "phrase": "Thinking about checking out that new café downtown."\n  },\n  {\n    "phrase": "I need to go grocery shopping, boring but necessary."\n  },\n  {\n    "phrase": "Maybe I\'ll call up some friends and see if they\'re up to anything."\n  },\n  {\n    "phrase": "Honestly, probably just going to take a nap. School\'s exhausting." \n  }\n]\n```'}], 'role': 'model'}, 'finishReason': '

In [58]:
import json
responses = dictionary['candidates'][0]['content']['parts'][0]['text']
new = json.loads(responses)
for response in new:
  print(response['phrase'])


Probably just heading home and relaxing a bit.
I might work on some coding projects, or maybe play some video games.
If the weather's nice, I might go for a run or kick around a soccer ball.
I need to catch up on some homework, unfortunately.
Not sure yet, I'm open to suggestions!
Maybe hanging out with some friends, if anyone's free.
I have a soccer game later, so I'll be busy with that.
Thinking about starting a new video game, any recommendations?
Just the usual after-school routine for a student.
Hoping to get some coding practice in, I'm working on a personal project.


In [59]:
chosen_response = new[2]['phrase']
print(chosen_response)



If the weather's nice, I might go for a run or kick around a soccer ball.


## Count audio tokens

You can count the number of tokens in your audio file like this.

In [None]:
model.count_tokens([your_file])

total_tokens: 78330

## Learning more

* Learn more about the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb) with the quickstart.

* Learn more about prompting with [media files](https://ai.google.dev/tutorials/prompting_with_media) in the docs, including the supported formats and maximum length for audio files.