
# NaviGator Toolkit Whisper (Speech to text) Demo

This notebook is designed to test connection to the NaviGator Toolkit API ([see here for more information](https://it.ufl.edu/ai/navigator-toolkit/)) and use the [Whisper model](https://github.com/openai/whisper) to do speech to text transcription. You will need a NaviGator API key. The key should be stored in a `.json` file with the following format:

    {
      "OPENAI_API_KEY" : "Put your key here in the quotes",
      "base_url" : "https://api.ai.it.ufl.edu/"
    }

We suggest putting that in your home directory, to minimize the chances of accidentally adding and committing the file to a git repo. Remember that anyone with your API key can use NaviGator as you! 

In [1]:
import openai
import os
import json

## Load `.json` file with your key and api endpoint URL

In [2]:
# Set the path to your jsnkey file
key_file = '/home/magitz/navigator_api_keys.json'


# Load the JSON file
with open(key_file, 'r') as file:
    data = json.load(file)

# Extract the values
OPENAI_API_KEY = data.get('OPENAI_API_KEY')
base_url = data.get('base_url')

# Set the environment variable
os.environ['TOOLKIT_API_KEY'] = OPENAI_API_KEY


## Test connectivity and get model list

Reply should list the models that are available with your API key. An example, truncated, output is:

    SyncPage[Model](data=[Model(id='llama-3.1-70b-instruct', created=1677610602, object='model', owned_by='openai'), Model(id='sfr-embedding-mistral', created=1677610602, object='model', owned_by='openai'),...)], object='list')

In [3]:
# Check list of available models
client = openai.OpenAI(
    api_key=os.environ.get("TOOLKIT_API_KEY"),
    base_url=base_url
)

response = client.models.list()
 
print(response)

SyncPage[Model](data=[Model(id='flux.1-schnell', created=1677610602, object='model', owned_by='openai'), Model(id='granite-3.1-8b-instruct', created=1677610602, object='model', owned_by='openai'), Model(id='llama-3.1-8b-instruct', created=1677610602, object='model', owned_by='openai'), Model(id='mixtral-8x7b-instruct', created=1677610602, object='model', owned_by='openai'), Model(id='nim-mistral-7b-instruct', created=1677610602, object='model', owned_by='openai'), Model(id='llama-3.1-70b-instruct', created=1677610602, object='model', owned_by='openai'), Model(id='nomic-embed-text-v1.5', created=1677610602, object='model', owned_by='openai'), Model(id='gte-large-en-v1.5', created=1677610602, object='model', owned_by='openai'), Model(id='mistral-7b-instruct', created=1677610602, object='model', owned_by='openai'), Model(id='whisper-large-v3', created=1677610602, object='model', owned_by='openai'), Model(id='codestral-22b', created=1677610602, object='model', owned_by='openai'), Model(id=

In [4]:
# Print available models in better format
for model in response:
    print(model.id)

flux.1-schnell
granite-3.1-8b-instruct
llama-3.1-8b-instruct
mixtral-8x7b-instruct
nim-mistral-7b-instruct
llama-3.1-70b-instruct
nomic-embed-text-v1.5
gte-large-en-v1.5
mistral-7b-instruct
whisper-large-v3
codestral-22b
nim-llama-3.1-8b-instruct
flux.1-dev
llama-3.3-70b-instruct
sfr-embedding-mistral


## Test model completion

You may need to update the model from the list above as the available models change over time.

Here is an example response from the recording at `data/test_recording.m4a`:
   
    Transcription(text="This is a test audio recording. I want to test an AI model's ability to accurately generate a transcript of the recording. Thank you for listening, and go Gators!", language='en', task='transcribe', duration=12.736, words=None, segments=[{'id': 1, 'avg_logprob': -0.07349917508269611, 'compression_ratio': 1.296, 'end': 11.52, 'no_speech_prob': 0.0149383544921875, 'seek': 1273, 'start': 0.0, 'temperature': 0.0, 'text': " This is a test audio recording. I want to test an AI model's ability to accurately generate a transcript of the recording. Thank you for listening, and go Gators!", 'tokens': [50365, 639, 307, 257, 1500, 6278, 6613, 13, 286, 528, 281, 1500, 364, 7318, 2316, 311, 3485, 281, 20095, 8460, 257, 24444, 295, 264, 6613, 13, 1044, 291, 337, 4764, 11, 293, 352, 460, 3391, 0, 50941], 'words': None}])

In [9]:
# Set model from list above
model = "whisper-large-v3"

audio_file= open("data/test_recording.m4a", "rb")
transcription = client.audio.transcriptions.create(
      model="whisper-large-v3", 
      file=audio_file
  )
print(transcription)

Transcription(text="This is a test audio recording. I want to test an AI model's ability to accurately generate a transcript of the recording. Thank you for listening, and go Gators!", language='en', task='transcribe', duration=12.736, words=None, segments=[{'id': 1, 'avg_logprob': -0.07349917508269611, 'compression_ratio': 1.296, 'end': 11.52, 'no_speech_prob': 0.0149383544921875, 'seek': 1273, 'start': 0.0, 'temperature': 0.0, 'text': " This is a test audio recording. I want to test an AI model's ability to accurately generate a transcript of the recording. Thank you for listening, and go Gators!", 'tokens': [50365, 639, 307, 257, 1500, 6278, 6613, 13, 286, 528, 281, 1500, 364, 7318, 2316, 311, 3485, 281, 20095, 8460, 257, 24444, 295, 264, 6613, 13, 1044, 291, 337, 4764, 11, 293, 352, 460, 3391, 0, 50941], 'words': None}])


In [8]:
# Print nicely formatted message



Transcription(text="This is a test audio recording. I want to test an AI model's ability to accurately generate a transcript of the recording. Thank you for listening, and go Gators!", language='en', task='transcribe', duration=12.736, words=None, segments=[{'id': 1, 'avg_logprob': -0.07349917508269611, 'compression_ratio': 1.296, 'end': 11.52, 'no_speech_prob': 0.0149383544921875, 'seek': 1273, 'start': 0.0, 'temperature': 0.0, 'text': " This is a test audio recording. I want to test an AI model's ability to accurately generate a transcript of the recording. Thank you for listening, and go Gators!", 'tokens': [50365, 639, 307, 257, 1500, 6278, 6613, 13, 286, 528, 281, 1500, 364, 7318, 2316, 311, 3485, 281, 20095, 8460, 257, 24444, 295, 264, 6613, 13, 1044, 291, 337, 4764, 11, 293, 352, 460, 3391, 0, 50941], 'words': None}])


## Test seding a system prompt and cotent

In [6]:
# Set model from list above
model = 'mistral-7b-instruct'

response = client.chat.completions.create(
    model=model, # model to send to the proxy
    messages = [
       {
         "role": "system",
         "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."
      },
       {
         "role": "user",
         "content": "what is the largest galaxy?"
       }
     ]

)
 
print(response)

ChatCompletion(id='chat-4b55133175fe485488d3381415b9d62c', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=" In the grand cosmic theatre, where stars are the actors, the largest galaxy takes center stage. This celestial titan, known as IC 1101, stretches across 5.5 million light-years, a distance so vast, it's as if it's reaching out to touch the very edges of infinity.\n\nImagine a tapestry woven with countless stars, each a twinkling thread of light, intertwined with nebulas, galaxies, and cosmic dust. This tapestry, woven over eons, is IC 1101. It's a testament to the relentless dance of celestial bodies, a symphony of creation and destruction, all set against the backdrop of an unfathomable void.\n\nYet, the universe is a dynamic, ever-evolving place. Even this colossal galaxy is not static. It's a swirling, spiraling dance of stars, each with its own story to tell, each a fragment of time frozen, a moment etched in the fabric of 

In [7]:
# Print nicely formatted message

print(response.choices[0].message.content)

 In the grand cosmic theatre, where stars are the actors, the largest galaxy takes center stage. This celestial titan, known as IC 1101, stretches across 5.5 million light-years, a distance so vast, it's as if it's reaching out to touch the very edges of infinity.

Imagine a tapestry woven with countless stars, each a twinkling thread of light, intertwined with nebulas, galaxies, and cosmic dust. This tapestry, woven over eons, is IC 1101. It's a testament to the relentless dance of celestial bodies, a symphony of creation and destruction, all set against the backdrop of an unfathomable void.

Yet, the universe is a dynamic, ever-evolving place. Even this colossal galaxy is not static. It's a swirling, spiraling dance of stars, each with its own story to tell, each a fragment of time frozen, a moment etched in the fabric of the cosmos.

So, when you gaze upon the night sky, remember, you're looking at fragments of IC 1101, a glimpse into the vastness of the universe, a reminder of the 