# Gemini API: Audio Quickstart

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Audio.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

This notebook provides an example of how to prompt Gemini 1.5 Pro using an audio file. In this case, you'll use a [sound recording](https://www.jfklibrary.org/asset-viewer/archives/jfkwha-006) of President John F. Kennedy’s 1961 State of the Union address.

In [157]:
!pip install -q -U google-generativeai

In [158]:
import google.generativeai as genai

## Configure your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [159]:


genai.configure(api_key=GOOGLE_API_KEY)

## Upload an audio file with the File API

To use an audio file in your prompt, you must first upload it using the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb).


In [160]:
URL = "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3"

In [161]:
!wget -q $URL -O sample.mp3

SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc


In [162]:
input_file  = genai.upload_file(path='input.mp3', display_name= "input")
chant  = genai.upload_file(path='chant.mp3', display_name= "chant")

In [163]:
import json

# Specify the path to your JSON file
file_path = 'constrain.json'

# Open the JSON file for reading
with open(file_path, 'r') as file:
    data = json.load(file)

# Now `data` is a Python dictionary containing the data from the JSON file
print(data)


{'name': 'set_output_json_format', 'parameters': {'type': 'object', 'properties': {'count': {'type': 'string', 'description': 'Represents the number of times the user has chanted correctly'}, 'mantra': {'type': 'string', 'description': 'Represents the mantra chanted'}}}}


In [164]:
def make_strict_json(response):
    "converts the function to a stru"


## Use the file in your prompt

In [165]:
prompt = "Listen carefully to the following audio file. How many time did the input mantra occur in the chant? Return only the count and absolutely nothing else"
model = genai.GenerativeModel('models/gemini-1.5-pro-latest')
response = model.generate_content([prompt, input_file, chant])
print(response.text)

14 



## Count audio tokens

You can count the number of tokens in your audio file like this.

In [166]:
model.count_tokens([input_file, chant])

total_tokens: 1120

## Learning more

* Learn more about the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb) with the quickstart.

* Learn more about prompting with [media files](https://ai.google.dev/tutorials/prompting_with_media) in the docs, including the supported formats and maximum length for audio files.