# Gemini 1.5 Pro: Native Audio, File API, JSON Mode & 1M Context Window

This notebook provides an example of how to prompt Gemini 1.5 Pro using an audio file and perform RAG using a SingleStore database.

In [5]:
!pip install -q -U google-generativeai langchain langchain-google-genai langchain-openai langchain-community sqlalchemy 

In [7]:
import google.generativeai as genai

## Configure your API key

This API key will be from aistudio.google.com

In [31]:
import os

os.environ['GOOGLE_API_KEY']='api-key-here'

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

## Upload an audio file with the File API

To use an audio file in your prompt, you must first upload it using the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb).


In [28]:
URL = "https://ia803402.us.archive.org/14/items/lp_mozart-divertimento17-k-334-horn-quintet-k_wolfgang-amadeus-mozart-members-of-the-ber/disc1/01.03.%20Divertmento%20In%20D%20Major%2C%20K.%20334%20Menuetto.mp3"

In [97]:
!wget -q $URL -O sample.mp3

In [98]:
your_file = genai.upload_file(path='sample.mp3')

## Use the file in your prompt

In [68]:
prompt = "Listen carefully to the following audio file. Provide a one sentence summary."
model = genai.GenerativeModel('models/gemini-1.5-pro-latest')
multimodal_response = model.generate_content([prompt, your_file])
print(multimodal_response.text)

The audio file contains a classical music piece, seemingly for orchestra, featuring repeating melodic phrases and a dramatic crescendo.



## JSON Mode

You can also constrain the output of the generations to a specific schema.

In [81]:
prompt = """List the music genre and the instruments in the piece in JSON format.

Use this JSON schema:

MusicalDescription = {'genre': str, 'instruments': list[str]}
Return: MusicalDescription"""

response = model.generate_content(
    contents=[prompt, your_file],
)
# Use the response as a JSON string.
print(response.text)

```json
{
  "genre": "Classical",
  "instruments": ["oboe", "bassoon", "strings", "French horn"]
}
```


## RAG over audio files using SingleStoreDB

Now we will embed the text descriptions of the audio file(s). This allows us to search and retrieve relevant files for RAG later.

In [22]:
from langchain.vectorstores import SingleStoreDB
import os

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

os.environ["SINGLESTOREDB_URL"] = f'{connection_user}:{connection_password}@{connection_host}:{connection_port}/{connection_default_database}'

In [85]:
from langchain_core.documents import Document

# Load documents to the store
docsearch = SingleStoreDB.from_documents(
    docs,
    embeddings,
    table_name="notebook2",  # use table with a custom name
)

In [89]:
docsearch.add_texts([multimodal_response.text])

sample_data = ["a rap song, with a bouncy beat", "a high-energy EDM track"]

In [90]:
query = "I want to get all audio files related to beethoven"
docs = docsearch.similarity_search(query)  # Find documents that correspond to the query

In [96]:
print(docs[-1].page_content)

The audio file contains a classical music piece, likely from the Baroque period, featuring strings, woodwinds, and potentially harpsichord, with a repeating melodic motif and a section that transitions to a quieter, more delicate passage before returning to the original theme, eventually concluding with a short, playful flourish.

