# Additional End of week Exercise - week 2

Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.

This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!

If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.

I will publish a full solution here soon - unless someone beats me to it...

There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results.

In [None]:
# Agent that can listen for audio and convert it to text

In [None]:
import os
import gradio as gr
import google.generativeai as genai
from dotenv import load_dotenv


In [None]:
load_dotenv()

google_api_key = os.getenv('GOOGLE_API_KEY')
if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:8]}")
else:
    print("Google API Key not set")

genai.configure(api_key=google_api_key)
model = genai.GenerativeModel("gemini-2.0-flash")

In [None]:
def transcribe_translate_with_gemini(audio_file_path):
    if not audio_file_path:
        return "⚠️ No audio file received."

    prompt = (
        "You're an AI that listens to a voice message in any language and returns the English transcription. "
        "Please transcribe and translate the following audio to English. If already in English, just transcribe it."
    )

    uploaded_file = genai.upload_file(audio_file_path)

    # 🔁 Send prompt + uploaded audio reference to Gemini
    response = model.generate_content(
        contents=[
            {
                "role": "user",
                "parts": [
                    {"text": prompt},
                    uploaded_file  
                ]
            }
        ]
    )

    return response.text.strip()

In [None]:
gr.Interface(
    fn=transcribe_translate_with_gemini,
    inputs=gr.Audio(label="Record voice", type="filepath"),
    outputs="text",
    title="🎙️ Voice-to-English Translator (Gemini Only)",
    description="Speak in any language and get the English transcription using Gemini multimodal API."
).launch()
