# Gemini API: Audio Quickstart

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Audio.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

This notebook provides an example of how to prompt Gemini 1.5 Pro using an audio file. In this case, you'll use a [sound recording](https://www.jfklibrary.org/asset-viewer/archives/jfkwha-006) of President John F. Kennedy’s 1961 State of the Union address.

In [None]:
!pip install -q -U google-generativeai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m142.1/142.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m663.6/663.6 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
import google.generativeai as genai

## Configure your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [4]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

## Upload an audio file with the File API

To use an audio file in your prompt, you must first upload it using the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb).


In [None]:
your_file = genai.upload_file(path='sample_free_tmr .wav')

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Use the file in your prompt

In [23]:
user = {
  "name" : "binks",
  "age" : 20,
  "hobbies" : ["soccer", "video games", "coding"],
  "occupation" : "student"
}



prompt = '"What are you doing after school? Return a list of phrases that can be used in response to the conversational input using this JSON schema\n                  {type: object, properties: { phrase: {type: string}}}"'
instruction = (
    f"You are {user['name']} that is having a conversation."
    f"You are {user['age']} years old and enjoy {user['hobbies']} hobbies and your occupation is {user['occupation']}."
    "Only talk about these attributes if you are certain they are relevant to the conversation."
    "You will generate ten feasible responses to the conversation."
    "Do not include any additional text describing each reponse, just the list of responses."
    "Return the responses as semicolon seperated values, with no newline characters or \n."
)

# instruction = (
#     f"You are {user['name']}."
# )

text_input = '"What are you doing after school?"'

# model = genai.GenerativeModel(
#     'models/gemini-1.5-pro-latest',
#     system_instruction=instruction
#   )
# response = model.generate_content([prompt, text_input])
# responses = response.text.replace('\n', '').split(";")
# print(responses)
import requests

headers = {
    'Content-Type': 'application/json',
}
data = '{ "system_instruction": {\n    "parts":\n      { "text": ' + f'"{instruction}"' + '}},\n      "contents": [{\n            "parts": [\n              {\n                "text": '+ f'{prompt}' + '\n              }\n            ]\n          }],\n          "generationConfig": {\n            "response_mime_type": "application/json",\n          }\n        }'

# data = '{ "system_instruction": {\n    "parts":\n      { "text": instruction}},\n      "contents": [{\n            "parts": [\n              {\n                "text": "List 5 popular cookie recipes using this JSON schema:\n                  {type: object, properties: { recipe_name: {type: string}}}"\n              }\n            ]\n          }],\n          "generationConfig": {\n            "response_mime_type": "application/json",\n          }\n        }'

response = requests.post(
    f'https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent?key={GOOGLE_API_KEY}',
    headers=headers,
    data=data,
)

dictionary = response.json()




print(dictionary)

{'candidates': [{'content': {'parts': [{'text': '[\n  {"phrase":"I\'m not sure yet, probably just relaxing or hanging out with friends."},\n  {"phrase":"I might play some video games or work on some coding projects."},\n  {"phrase":"If the weather\'s nice, I might go for a bike ride or play some soccer."},\n  {"phrase":"I have to do some homework, but after that, I\'m free."},\n  {"phrase":"I\'m going to grab a bite to eat with some friends."},\n  {"phrase":"I have a club meeting after school."},\n  {"phrase":"I\'m going to the gym to work out."},\n  {"phrase":"I\'m just going to head home and chill."},\n  {"phrase":"I have a shift at work."},\n  {"phrase":"I\'m going to catch up on some reading."}\n]\n'}], 'role': 'model'}, 'finishReason': 'STOP', 'index': 0, 'safetyRatings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE'}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE'}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGL

In [44]:
import json
responses = dictionary['candidates'][0]['content']['parts'][0]['text']
new = json.loads(responses)
for response in new:
  print(response['phrase'])



I'm not sure yet, probably just relaxing or hanging out with friends.
I might play some video games or work on some coding projects.
If the weather's nice, I might go for a bike ride or play some soccer.
I have to do some homework, but after that, I'm free.
I'm going to grab a bite to eat with some friends.
I have a club meeting after school.
I'm going to the gym to work out.
I'm just going to head home and chill.
I have a shift at work.
I'm going to catch up on some reading.


## Count audio tokens

You can count the number of tokens in your audio file like this.

In [None]:
model.count_tokens([your_file])

total_tokens: 78330

## Learning more

* Learn more about the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb) with the quickstart.

* Learn more about prompting with [media files](https://ai.google.dev/tutorials/prompting_with_media) in the docs, including the supported formats and maximum length for audio files.