# Gemini API: Audio Quickstart

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Audio.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

This notebook provides an example of how to prompt Gemini 1.5 Pro using an audio file. In this case, you'll use a [sound recording](https://www.jfklibrary.org/asset-viewer/archives/jfkwha-006) of President John F. Kennedy’s 1961 State of the Union address.

In [None]:
!pip install -q -U google-generativeai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m142.1/142.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m663.6/663.6 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
import google.generativeai as genai

## Configure your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [5]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

## Upload an audio file with the File API

To use an audio file in your prompt, you must first upload it using the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb).


In [None]:
your_file = genai.upload_file(path='sample_free_tmr .wav')

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Use the file in your prompt

In [93]:
import json

user_data = {
  "name" : "binks",
  "age" : 20,
  "hobbies" : ["soccer", "video games", "coding"],
  "occupation" : "student"
}





# instruction = (
#     f"You are {user['name']}."
# )

text_input = '"What are you doing after school?"'

# model = genai.GenerativeModel(
#     'models/gemini-1.5-pro-latest',
#     system_instruction=instruction
#   )
# response = model.generate_content([prompt, text_input])
# responses = response.text.replace('\n', '').split(";")
# print(responses)
import requests

headers = {
    'Content-Type': 'application/json',
}
json_schema = "Return a list of phrases that can be used in response to the conversational input using this JSON schema:\n                  {type: object, properties: { phrase: {type: string}}}"
input = "What are you doing after school?"
prompt = f"{input} {json_schema}"
user = {"role":"user", "parts":[{ "text": prompt}]}
contents = [user]


instruction = (
    f"You are {user_data['name']} that is having a conversation."
    f"You are {user_data['age']} years old and enjoy {user_data['hobbies']} hobbies and your occupation is {user_data['occupation']}."
    "Only talk about these attributes if you are certain they are relevant to the conversation."
    "You will generate ten feasible responses to the conversation."
    "Do not include any additional text describing each reponse, just the list of responses."
)
data = {"system_instruction": {"parts": { "text": instruction}}, "contents": contents, "generationConfig": {"response_mime_type": "application/json",}}
print(json.dumps(data))

# data = '{ "system_instruction": {\n    "parts":\n      { "text": ' + f'"{instruction}"' + '}},\n      "contents": [{\n            "parts": [\n              {\n                "text": '+ f'{prompt}' + '\n              }\n            ]\n          }],\n          "generationConfig": {\n            "response_mime_type": "application/json",\n          }\n        }'


response = requests.post(
    f'https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent?key={GOOGLE_API_KEY}',
    headers=headers,
    data=json.dumps(data),
)

dictionary = response.json()




print(dictionary)

{"system_instruction": {"parts": {"text": "You are binks that is having a conversation.You are 20 years old and enjoy ['soccer', 'video games', 'coding'] hobbies and your occupation is student.Only talk about these attributes if you are certain they are relevant to the conversation.You will generate ten feasible responses to the conversation.Do not include any additional text describing each reponse, just the list of responses."}}, "contents": [{"role": "user", "parts": [{"text": "What are you doing after school? Return a list of phrases that can be used in response to the conversational input using this JSON schema:\n                  {type: object, properties: { phrase: {type: string}}}"}]}], "generationConfig": {"response_mime_type": "application/json"}}
{'candidates': [{'content': {'parts': [{'text': '[\n  {"phrase": "I haven\'t decided yet. Do you have any suggestions?"},\n  {"phrase": "I might hang out with some friends. What about you?"},\n  {"phrase": "Probably going to work on

In [94]:
import json
model_responses = json.loads(dictionary['candidates'][0]['content']['parts'][0]['text'])
print(model_responses)
text = model_responses
chosen_response = model_responses[0]['phrase']

[{'phrase': "I haven't decided yet. Do you have any suggestions?"}, {'phrase': 'I might hang out with some friends. What about you?'}, {'phrase': 'Probably going to work on some coding projects.'}, {'phrase': "Maybe I'll play some video games to unwind."}, {'phrase': 'I have soccer practice later today.'}, {'phrase': 'Not sure, it depends on what homework I have.'}, {'phrase': "I'm open to ideas, what are you up to?"}, {'phrase': 'Thinking about going to the gym.'}, {'phrase': 'Might just relax and watch some TV.'}, {'phrase': 'I need to catch up on some studying.'}]


In [95]:
print(chosen_response)

I haven't decided yet. Do you have any suggestions?


In [96]:
# model = {"role":"model", "parts":[{ "text": "Probably going to hang out with some friends."}]}
# contents.append(model)
input = "Do you want to get ice cream?."
prompt = f"You respond with {chosen_response}. The person you are talking with then says, {input} {json_schema}"
user = {"role":"user", "parts":[{ "text": prompt}]}

contents.append(user)
data = {"system_instruction": {"parts": { "text": instruction}}, "contents": contents, "generationConfig": {"response_mime_type": "application/json",}}

response = requests.post(
    f'https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent?key={GOOGLE_API_KEY}',
    headers=headers,
    data=json.dumps(data),
)

print(contents[1]['parts'])
dictionary = response.json()

print(dictionary)

[{'text': "You respond with I haven't decided yet. Do you have any suggestions?. The person you are talking with then says, Do you want to get ice cream?. Return a list of phrases that can be used in response to the conversational input using this JSON schema:\n                  {type: object, properties: { phrase: {type: string}}}"}]
{'candidates': [{'content': {'parts': [{'text': '[\n  {\n    "phrase": "Ice cream sounds great!"\n  },\n  {\n    "phrase": "I\'d love to, thanks for the suggestion!"\n  },\n  {\n    "phrase": "Sure, I\'m always down for ice cream."\n  },\n  {\n    "phrase": "Yeah, let\'s go get some ice cream."\n  },\n  {\n    "phrase": "That\'s a great idea, I\'m craving something sweet."\n  },\n  {\n    "phrase": "I could definitely go for some ice cream right now."\n  },\n  {\n    "phrase": "Is there a particular place you had in mind?"\n  },\n  {\n    "phrase": "What kind of ice cream are you thinking of getting?"\n  },\n  {\n    "phrase": "Let\'s do it! I haven\'t ha

In [97]:
model_responses = json.loads(dictionary['candidates'][0]['content']['parts'][0]['text'])
print(model_responses)
chosen_response = model_responses[0]['phrase']
print(chosen_response)

[{'phrase': 'Ice cream sounds great!'}, {'phrase': "I'd love to, thanks for the suggestion!"}, {'phrase': "Sure, I'm always down for ice cream."}, {'phrase': "Yeah, let's go get some ice cream."}, {'phrase': "That's a great idea, I'm craving something sweet."}, {'phrase': 'I could definitely go for some ice cream right now.'}, {'phrase': 'Is there a particular place you had in mind?'}, {'phrase': 'What kind of ice cream are you thinking of getting?'}, {'phrase': "Let's do it! I haven't had ice cream in ages."}, {'phrase': "Awesome, I'm in the mood for something cold and refreshing."}]
Ice cream sounds great!


## Count audio tokens

You can count the number of tokens in your audio file like this.

In [None]:
model.count_tokens([your_file])

total_tokens: 78330

## Learning more

* Learn more about the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb) with the quickstart.

* Learn more about prompting with [media files](https://ai.google.dev/tutorials/prompting_with_media) in the docs, including the supported formats and maximum length for audio files.