In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Sheet Music Analysis with Gemini

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/sheet_music.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fdocument-processing%2Fsheet_music.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>       
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/sheet_music.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/document-processing/sheet_music.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
</table>


| | |
|-|-|
|Author(s) | [Holt Skinner](https://github.com/holtskinner) |

## Overview

[Sheet Music](https://en.wikipedia.org/wiki/Sheet_music) is the primary form of music notation used by composers and performers across the world. These pages contain information about the lyrics, pitches, rhythms, composer, text author, composition date, among others.

This notebook illustrates using Gemini to extract this metadata from sheet music PDFs.

These prompts and documents were demonstrated in the Google Cloud Next 2024 session "What's next with Gemini: Driving business impact with multimodal use cases".


## Getting Started


### Install Vertex AI SDK for Python

In [None]:
%pip install --upgrade --user -q google-cloud-aiplatform

### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [None]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>



### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).


In [None]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [1]:
# Define project information
PROJECT_ID = "YOUR_PROJECT_ID"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

# Initialize Vertex AI
import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

### Import libraries


In [7]:
from IPython.display import Markdown, display

from vertexai.preview.generative_models import (
    GenerationConfig,
    GenerativeModel,
    HarmCategory,
    HarmBlockThreshold,
    Part,
)

### Load the Gemini 1.5 Pro model

Gemini 1.5 Pro (`gemini-1.5-pro-preview-0409`) is a multimodal model that supports multimodal prompts. You can include text, image(s), PDFs, audio, and video in your prompt requests and get text or code responses.

In [3]:
model = GenerativeModel("gemini-1.5-pro-preview-0409")

generation_config = GenerationConfig(temperature=1.0, max_output_tokens=8192)
safety_settings = {
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_ONLY_HIGH
}

## Extract Structured Metadata from Sheet Music PDF

For this example, we will be using the popular classical music book [24 Italian Songs and Arias of the 17th and 18th Centuries](https://imslp.org/wiki/24_Italian_Songs_and_Arias_of_the_17th_and_18th_Centuries_(Various)), and extracting metadata about each song in the book.

In [4]:
sheet_music_extraction_prompt = """You are an expert in musicology and music history. I am going to give you a book of sheet music. Your task is to output structured metadata about each piece of music. Include the following details: Title, composer with lifetime, Tempo marking, composition year, and a brief description of the piece."""

In [9]:
# Extract the structrued metadata from a Sheet Music PDF

# Load file directly from Google Cloud Storage
file_part = Part.from_uri(
    uri="gs://github-repo/use-cases/sheet-music/24ItalianSongs.pdf",
    mime_type="application/pdf",
)

# Load contents
contents = [file_part, sheet_music_extraction_prompt]

# Send to Gemini
response = model.generate_content(
    contents,
    generation_config=generation_config,
    safety_settings=safety_settings,
)

# Display results
display(Markdown(response.text))

## Twenty-Four Italian Songs and Arias: Metadata

**Please note that composition years are not provided in the book, so they are omitted from the metadata below.**

**1. Per la gloria d'adorarvi (For the love my heart doth prize)**

* Composer: Giovanni Battista Bononcini (1672-1750) 
* Tempo Marking: Andante (♩ = 80)
* Description: An aria from the opera "Griselda," expressing the pain and hopelessness of unrequited love. 

**2. Amarilli, mia bella (Amarilli, my fair one)**

* Composer: Giulio Caccini (1551-1618)
* Tempo Marking: Moderato affettuoso (♩ = 88)
* Description: A madrigal, pleading with the beloved Amarilli to believe in the singer's true and tender love.

**3. Alma del core (Fairest adored)**

* Composer: Antonio Caldara (1670-1736)
* Tempo Marking: Tempo di Minuetto
* Description: A love song in a minuet style, praising the beauty of the beloved and pledging eternal devotion.

**4. Come raggio di sol (As on the swelling wave)**

* Composer: Antonio Caldara (1670-1736)
* Tempo Marking: Sostenuto (♩ = 56)
* Description: An aria comparing the fleeting nature of youth and beauty to the movement of waves and the sun's reflection on them.

**5. Sebben, crudele (Tho' not deserving)**

* Composer: Antonio Caldara (1670-1736)
* Tempo Marking: Allegretto grazioso (♩ = 54)
* Description: A canzonetta expressing the conflicting emotions of love and resentment towards a cruel and undeserving beloved.

**6. Vittoria, mio core! (Victorious my heart is!)**

* Composer: Giacomo Carissimi (1605-1674)
* Tempo Marking: Allegro con brio (♩. = 168)
* Description: A cantata celebrating the triumph of the heart over love's challenges, with a joyous and energetic melody.

**7. Danza, danza, fanciulla gentile (Dance, O dance, maiden gay)**

* Composer: Francesco Durante (1684-1755)
* Tempo Marking: Allegro con spirito (♩ = 138)
* Description: A lively arietta inviting a young maiden to dance to the music, with light and playful imagery.

**8. Vergin, tutto amor (Virgin, fount of love)**

* Composer: Francesco Durante (1684-1755)
* Tempo Marking: Largo religioso (♩ = 140)
* Description: A prayer to the Virgin Mary, seeking solace and mercy for a sinner's lament.

**9. Caro mio ben (Thou, all my bliss)**

* Composer: Giuseppe Giordani (Giordanello) (1744-1798)
* Tempo Marking: Larghetto (♩ = 60)
* Description: An arietta expressing the pain of separation from the beloved, with a tender and melancholic melody.

**10. O del mio dolce ardor (O thou belov'd)**

* Composer: Christoph Willibald von Gluck (1714-1787)
* Tempo Marking: Moderato (♩ = 44)
* Description: An aria longing for the beloved, with a yearning melody and expressive dynamics.

**11. Che fiero costume (How void of compassion)**

* Composer: Giovanni Legrenzi (1626-1690)
* Tempo Marking: Allegretto con moto (♩ = 56)
* Description: An arietta criticizing Cupid's cruel and heartless ways, highlighting the torment of unrequited love.

**12. Pur dicesti, o bocca bella (Mouth so charmful)**

* Composer: Antonio Lotti (1667-1740)
* Tempo Marking: Allegretto grazioso (♩ = 69)
* Description: A playful arietta questioning the beloved about the captivating power of their words and sweetness.

**13. Il mio bel foco (My joyful ardor)** 

* Composer: Benedetto Marcello (1686-1739)
* Tempo Marking: (Recitative) followed by Allegretto affettuoso 
* Description: A recitative and aria expressing unwavering love and devotion, with a passionate and lyrical melody. 

**14. Non posso disperar (I do not dare despond)**

* Composer: S. De Luca (15... - 16...)
* Tempo Marking: Andante grazioso (♩ = 80) 
* Description: An arietta acknowledging the pain of love but finding solace in the hope of eventual happiness.

**15. Lasciatemi morire! (No longer let me languish)**

* Composer: Claudio Monteverdi (1567-1643)
* Tempo Marking: Lento (♩ = 58) 
* Description: A lament from the opera "Arianna," expressing the despair and anguish of a heartbroken woman. 


You can see that Gemini extracted all of the relevant fields from the document.

### Song Identification with Audio

Now, let's try something more challenging, identifying a song being performed based on the sheet music. We have an audio clip of Holt Skinner performing one of the songs in the book, and we will ask Gemini to identify it based on the sheet music PDF.

In [10]:
song_identification_prompt = """Based on the sheet music PDF, what song is in the audio clip. Explain how you made the decision."""

In [12]:
# Load PDF file
pdf_part = Part.from_uri(
    uri="gs://github-repo/use-cases/sheet-music/24ItalianSongs.pdf",
    mime_type="application/pdf",
)

audio_part = Part.from_uri(
    uri="gs://github-repo/use-cases/sheet-music/24ItalianClip.mp3",
    mime_type="audio/mpeg",
)

# Load contents
contents = [pdf_part, audio_part, song_identification_prompt]

# Send to Gemini
response = model.generate_content(
    contents, generation_config=generation_config, safety_settings=safety_settings
)

# Display results
display(Markdown(response.text))

The song in the audio clip is "Sebben, crudele" by Antonio Caldara. 

This was determined by listening to the audio clip and comparing the melody and lyrics to the sheet music provided. The first page of the PDF document, "Contents", shows the titles of the songs and Arias in the book. "Sebben, crudele" is on page 19.  Pages 19-22 contain the sheet music for that song. The melody and lyrics in the audio clip are an exact match to the sheet music on these pages.