You're very welcome! 😊

You're doing an awesome job exploring how to build practical AI tools — keep going! Whether it's for voice, text, financial analysis, or agent design, I'm here to help with:

* ✅ Code examples
* ✅ Architecture advice
* ✅ Learning resources
* ✅ Debugging help
* ✅ AI agent ideas

Whenever you're ready to take the next step (e.g., voice → cleaned English → saved to file or sent to Google Docs), just let me know.

Happy building! 🛠️✨


You're welcome! 🙌

**No, this version does *not* use the OpenAI API.**
It is 100% **free and local**, using only open-source libraries:

---

### ✅ What It Uses:

| Feature        | Tool      | Cost | Notes                            |
| -------------- | --------- | ---- | -------------------------------- |
| Text-to-speech | `pyttsx3` | Free | Works offline on Mac, no API key |
| Web UI         | `gradio`  | Free | Local browser-based interface    |

---

### ✅ What It Doesn't Use:

* ❌ No OpenAI API (no API key or cost)
* ❌ No internet access required after installing packages
* ❌ No paid services or subscriptions

---

### 📦 How It Works:

`pyttsx3` uses your Mac's built-in **macOS voice engine** (`NSSpeechSynthesizer` under the hood), which means:

* You get **Siri-quality voices**
* You can switch to different accents/voices available on your Mac
* It works even **offline**

---

If later you want to use **OpenAI for better voice generation** (like ultra-realistic voices), we *can* upgrade to use their TTS API — but that would involve cost. For now, this is perfect for practice and use cases like:

* Proofreading your own writing by listening
* Practicing English comprehension
* Reading long-form content aloud (articles, essays, etc.)

Let me know if you want a **reverse version next** (voice → text → cleaned English), or want to export to file/Google Docs.


In [1]:
import os
os.system("say 'Hello from Mac terminal'")

0

In [None]:
import pyttsx3

engine = pyttsx3.init()
voices = engine.getProperty('voices')
print(f"Available voices: {[voice.name for voice in voices]}")

engine.say("Hello from your Mac")
engine.runAndWait()


: 

In [1]:
import gradio as gr
import pyttsx3

In [None]:
# Install the dependencies before running this script:
# pip install gradio pyttsx3

# import gradio as gr
# import pyttsx3

# Initialize TTS engine
engine = pyttsx3.init()

# Optional: customize voice properties (rate, volume)
engine.setProperty('rate', 150)      # speed (default ~200)
engine.setProperty('volume', 1.0)    # 0.0 to 1.0

# Get available voices (for Mac, use 'com.apple.speech.synthesis.voice.samantha' or similar)
voices = engine.getProperty('voices')
voice_names = [v.name for v in voices]
voice_id_map = {v.name: v.id for v in voices}

# Function that converts text to speech
def read_text(text, voice_choice):
    engine.setProperty('voice', voice_id_map[voice_choice])
    engine.say(text)
    engine.runAndWait()
    return f"✅ Done reading {len(text)} characters."

# Gradio interface
iface = gr.Interface(
    fn=read_text,
    inputs=[
        gr.Textbox(lines=10, label="Enter text to read aloud"),
        gr.Dropdown(choices=voice_names, label="Choose voice", value=voice_names[0]),
    ],
    outputs="text",
    title="🗣️ Text-to-Speech Agent (Offline & Free)",
    description="Paste any text and click to read it aloud using your Mac's voice engine."
)

if __name__ == "__main__":
    iface.launch()


* Running on local URL:  http://127.0.0.1:7884
* To create a public link, set `share=True` in `launch()`.


: 

Absolutely! Here's a **complete, free, offline-friendly voice-to-text agent** you can run on your **MacBook**. It will:

---

## ✅ What This Agent Does (Voice → Text)

1. **Record your voice** using your mic
2. **Transcribe it into text** using **OpenAI Whisper (via `whisper` or `faster-whisper`)**
3. **Display it in a simple Gradio web UI**
4. **Optionally save it to a `.txt` file**

---

## 🛠️ Tools Used

| Tool                               | Purpose                    | Free? | Notes                               |
| ---------------------------------- | -------------------------- | ----- | ----------------------------------- |
| `gradio`                           | UI for file upload or mic  | ✅     | Runs locally                        |
| `whisper`                          | Voice-to-text (ASR) engine | ✅     | Local transcription (no API needed) |
| `sounddevice` + `scipy.io.wavfile` | Record audio               | ✅     | Optional, if you want live mic      |

---

## 🔧 Install Dependencies (One-Time Setup)

```bash
pip install gradio openai-whisper sounddevice scipy
```

> 💡 On Mac, you may also need `ffmpeg`. Install it with:

```bash
brew install ffmpeg
```

---

## 🧠 End-to-End Code: `voice_to_text_agent.py`

```python
import gradio as gr
import whisper
import tempfile
import os
import sounddevice as sd
from scipy.io.wavfile import write

# Load Whisper model once (base is good for most use cases; can change to 'small', 'medium', etc.)
model = whisper.load_model("base")

# Optional: live mic recording
def record_audio(duration=5, samplerate=16000):
    gr.Info(f"Recording for {duration} seconds...")
    recording = sd.rec(int(duration * samplerate), samplerate=samplerate, channels=1, dtype='int16')
    sd.wait()
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
    write(temp_file.name, samplerate, recording)
    return temp_file.name

# Transcribe uploaded or recorded audio
def transcribe_audio(audio_path):
    result = model.transcribe(audio_path)
    return result["text"]

# Combined function for Gradio interface
def transcribe_from_input(audio_file, record_duration):
    if audio_file is not None:
        audio_path = audio_file
    else:
        audio_path = record_audio(duration=record_duration)
    
    transcript = transcribe_audio(audio_path)
    os.remove(audio_path)  # clean up temp file
    return transcript

# Gradio Interface
iface = gr.Interface(
    fn=transcribe_from_input,
    inputs=[
        gr.Audio(source="upload", type="filepath", label="Upload audio (optional)"),
        gr.Slider(minimum=2, maximum=10, value=5, step=1, label="Record duration (seconds)"),
    ],
    outputs="text",
    title="🎙️ Voice to Text Agent (Offline Whisper)",
    description="Record or upload audio and transcribe it using Whisper, offline and free."
)

if __name__ == "__main__":
    iface.launch()
```

---

## ▶️ How to Use It

1. **Run the script:**

   ```bash
   python voice_to_text_agent.py
   ```

2. **In the browser UI:**

   * Either upload an audio file (`.wav`, `.mp3`)
   * Or click “Submit” to record your voice live (5 seconds by default)

3. ✅ You'll see your voice converted to text!

---

## 🧩 Optional Enhancements

* Save the transcript to a `.txt` or `.docx` file
* Clean up informal grammar (integrate with GPT)
* Add language selection
* Combine with previous agent (voice → clean English → read aloud)

---

## ❓Want the next version?

If you'd like:

* Reverse pipeline: Voice → Cleaned English → Google Doc
* A mobile-friendly version
* Whisper running faster (using `faster-whisper` with GPU)

Just say the word and I’ll build that too!


In [2]:
import gradio as gr
import whisper
import tempfile
import os
import sounddevice as sd
from scipy.io.wavfile import write

ModuleNotFoundError: No module named 'whisper'

In [None]:
import gradio as gr
import whisper
import tempfile
import os
import sounddevice as sd
from scipy.io.wavfile import write

# Load Whisper model once (base is good for most use cases; can change to 'small', 'medium', etc.)
model = whisper.load_model("base")

# Optional: live mic recording
def record_audio(duration=5, samplerate=16000):
    gr.Info(f"Recording for {duration} seconds...")
    recording = sd.rec(int(duration * samplerate), samplerate=samplerate, channels=1, dtype='int16')
    sd.wait()
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
    write(temp_file.name, samplerate, recording)
    return temp_file.name

# Transcribe uploaded or recorded audio
def transcribe_audio(audio_path):
    result = model.transcribe(audio_path)
    return result["text"]

# Combined function for Gradio interface
def transcribe_from_input(audio_file, record_duration):
    if audio_file is not None:
        audio_path = audio_file
    else:
        audio_path = record_audio(duration=record_duration)
    
    transcript = transcribe_audio(audio_path)
    os.remove(audio_path)  # clean up temp file
    return transcript

# Gradio Interface
iface = gr.Interface(
    fn=transcribe_from_input,
    inputs=[
        gr.Audio(source="upload", type="filepath", label="Upload audio (optional)"),
        gr.Slider(minimum=2, maximum=10, value=5, step=1, label="Record duration (seconds)"),
    ],
    outputs="text",
    title="🎙️ Voice to Text Agent (Offline Whisper)",
    description="Record or upload audio and transcribe it using Whisper, offline and free."
)

if __name__ == "__main__":
    iface.launch()


# Additional End of week Exercise - week 2
# call 2 tools

Now use everything you've learned from Week 2 to build a full prototype for the technical question/answerer you built in Week 1 Exercise.

This should include a Gradio UI, streaming, use of the system prompt to add expertise, and the ability to switch between models. Bonus points if you can demonstrate use of a tool!

If you feel bold, see if you can add audio input so you can talk to it, and have it respond with audio. ChatGPT or Claude can help you, or email me if you have questions.

I will publish a full solution here soon - unless someone beats me to it...

There are so many commercial applications for this, from a language tutor, to a company onboarding solution, to a companion AI to a course (like this one!) I can't wait to see your results.

## My Tutor text board
* ask gpt-40-mini a python math problem, then use my tool to do that  

In [1]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI  # import as class?
# import ollama
import gradio as gr


In [37]:
import openai  # import module
print(openai.__version__)


1.78.1
Full Response Tree:
{
  "id": "chatcmpl-BYENZQlkQVuEOP5UFSZ8TgEOJu9IO",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Hello! How can I assist you today?",
        "refusal": null,
        "role": "assistant",
        "annotations": [],
        "audio": null,
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1747498125,
  "model": "gpt-4o-mini-2024-07-18",
  "object": "chat.completion",
  "service_tier": "default",
  "system_fingerprint": "fp_0392822090",
  "usage": {
    "completion_tokens": 10,
    "prompt_tokens": 363,
    "total_tokens": 373,
    "completion_tokens_details": {
      "accepted_prediction_tokens": 0,
      "audio_tokens": 0,
      "reasoning_tokens": 0,
      "rejected_prediction_tokens": 0
    },
    "prompt_tokens_details": {
      "audio_tokens": 0,
      "cached_tokens": 0
    }
  }
}

Message Tree:
{
  "content": "Hello! How can I 

In [2]:
# constants

MODEL = 'gpt-4o-mini'
# MODEL_LLAMA = 'llama3.2'

In [3]:
# set up environment
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
openai = OpenAI()

API key looks good so far


In [4]:

system_message = (
    "You are a helpful assistant with access to two tools: "
    "1) 'count_letter_in_word' - to count how many times a letter appears in a word or the sentance. "
    "Use this when the user asks questions like 'How many R's are in Strawberry?', or " 
    "'how many E's are in this sentence?'. "
    "2) 'get_online_course_price' - to retrieve prices for online courses. "
    "Use this when the user asks about course prices like 'What is the price of AI Ethics course?'. "
    "Always pick the tool that best fits the user's question."
)

## Defined 2 tools:

In [5]:
# Let's start by making a useful function

def count_letter_in_word(word, letter):
    print(f"-> Tool: count_letter_in_word is called for counting letter in a word  => {word, letter}\n")
    counter = 0

    for c in word.lower():
        if c == letter.lower():
            counter += 1
    return counter

count_letter_in_word("how many a's in this sentance?", "A")

-> Tool: count_letter_in_word is called for counting letter in a word  => ("how many a's in this sentance?", 'A')



3

In [6]:
# There's a particular dictionary structure that's required to describe our function:
# has 2 parameters

count_function = {
    "name": "count_letter_in_word",
    "description": 
        "Count the number of a given letter in a specific word. "
        "Call this tool when the user asks questions like: "
        "'How many R's are in the word Strawberry?', "
        "'Count the letter A in banana', or "
        "'Number of T's in cat'.",
    "parameters": {
        "type": "object",
        "properties": {
            "word": {
                "type": "string",
                "description": "The word in which to count letters",
            },
            "letter": {
                "type": "string",
                "description": "The letter to count"
            }
            
        },
        "required": ["word", "letter"],
        "additionalProperties": False
    }
}

In [7]:
# Let's start by making a useful function

course_price = {"ai": "$119", "python": "$49", "machine learning": "$99", "llm": "$199"}

def get_online_course_price(online_course):
    print(f"-> Tool: get_online_course_price is called for course => {online_course}\n")
    
    course = online_course.lower()
    return course_price.get(course, "Unknown")

In [8]:
get_online_course_price("LLM")

-> Tool: get_online_course_price is called for course => LLM



'$199'

In [9]:
# There's a particular dictionary structure that's required to describe our function:

price_function = {
    "name": "get_online_course_price",
    "description": 
        "Retrieve the price of an online course. "
        "Call this tool only when the user asks about course prices, "
        "such as 'What is the price of AI Ethics course?' or "
        "'How much does the Python course cost?'",
    "parameters": {
        "type": "object",
        "properties": {
            "online_course": {
                "type": "string",
                "description": 
                    "The online course that the customer wants to take."
                    "The exact name of the online course (e.g., 'AI', 'Python', 'LLM'). "
                    "Do NOT include extra words like 'course', 'class', or 'training'."
            },
        },
        "required": ["online_course"],
        "additionalProperties": False
    }
}

In [10]:
# List of tools API can call
tools = [
    {"type": "function", "function": count_function}, 
    {"type": "function", "function": price_function}
]

In [60]:
from pprint import pprint
def chat(message, history):
    messages = [{"role": "system", "content": system_message}] + history + [{"role": "user", "content": message}]
    
    response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)

    # Convert to dict using the `.model_dump()` method (recommended for OpenAI SDK v1+)
    response_dict = response.model_dump()

    print("Full Response Tree:")
    print(json.dumps(response_dict, indent=2))

    print("\nMessage Tree:")
    print(json.dumps(response_dict['choices'][0]['message'], indent=2))

    
    if response.choices[0].finish_reason=="tool_calls":
        tool_messages = []
        for tool_call in response.choices[0].message.tool_calls:

            # tool_calls=[ChatCompletionMessageToolCall(id='call_mBxqr2fMPiYXppOtBnytGydo', 
            # function=Function(arguments='{"online_course":"LLM"}', 
            # name='get_online_course_price'), type='function')]))], 
            function_name = tool_call.function.name
            
            # arguments = {'online_course': 'LLM'}, a Python dictionary, from a json string
            # arguments = json.loads('{"online_course":"LLM"}')
            arguments = json.loads(tool_call.function.arguments)

            # Route to correct function
            if function_name == "count_letter_in_word":
                result = count_letter_in_word(**arguments)
                pprint(f"{function_name} result => {result}")
            elif function_name == "get_online_course_price":
                # if arguments = {'online_course': 'LLM'}, 
                # then **arguments is equivalent to calling the function with online_course="LLM".
                result = get_online_course_price(**arguments)
                pprint(f"{function_name} result => {result}")
                print(result)
                print()
            else:
                raise ValueError(f"Unknow tool: {function_name}")

            tool_response ={
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            }
            print(type(tool_response))
            pprint(f"tool_response => {tool_response}")
            print()
            tool_messages.append(tool_response)
            
        # Convert to dict using the `.model_dump()` method (recommended for OpenAI SDK v1+)
        messages.append(response.choices[0].message.model_dump())

        print("Full message w/o tool_message Tree:")
        print(json.dumps(messages, indent=2))
        
        messages.extend(tool_messages)
        print("Full message w/ tool_message Tree:")
        print(type(messages))
        print(json.dumps(messages, indent=2))
        
        response = openai.chat.completions.create(model=MODEL, messages=messages)
        # Convert to dict using the `.model_dump()` method (recommended for OpenAI SDK v1+)
        response_dict = response.model_dump()
        print("Full response after call tool Tree:")
        print(json.dumps(response_dict, indent=2))
        
    
    return response.choices[0].message.content

In [61]:
gr.ChatInterface(fn=chat, type="messages").launch()

* Running on local URL:  http://127.0.0.1:7883
* To create a public link, set `share=True` in `launch()`.




Full Response Tree:
{
  "id": "chatcmpl-BYEzaNiHwFUFTUp51rzQ6fb7E4WGq",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": null,
        "refusal": null,
        "role": "assistant",
        "annotations": [],
        "audio": null,
        "function_call": null,
        "tool_calls": [
          {
            "id": "call_PgIO8nATST7BQw6S8XCL7VC5",
            "function": {
              "arguments": "{\"online_course\":\"LLM\"}",
              "name": "get_online_course_price"
            },
            "type": "function"
          }
        ]
      }
    }
  ],
  "created": 1747500482,
  "model": "gpt-4o-mini-2024-07-18",
  "object": "chat.completion",
  "service_tier": "default",
  "system_fingerprint": "fp_54eb4bd693",
  "usage": {
    "completion_tokens": 18,
    "prompt_tokens": 332,
    "total_tokens": 350,
    "completion_tokens_details": {
      "accepted_prediction_tokens": 0,
      "audio_

In [16]:
# here is the question; type over this to ask something new

# question = """
# Please explain what this code does and why:
# yield from {book.get("author") for book in books if book.get("author")}
# """
# question = 'how does AI tool cursor work, be super brief please?'
# question = 'how does git work, please be super brief? please translate to chinese.'

# question = input("Enter your question: ")


In [27]:
# This function looks rather simpler than the one from my video, because we're taking advantage of the latest Gradio updates

def chat(message, history):
    messages = [{"role": "system", "content": system_message}] + history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(model=MODEL, messages=messages)
    return response.choices[0].message.content

gr.ChatInterface(fn=chat, type="messages").launch()

* Running on local URL:  http://127.0.0.1:7863
* To create a public link, set `share=True` in `launch()`.




In [17]:
# prompts

system_prompt = "You are a helpful technical tutor who answers questions about python code, software engineering, data science and LLMs"
user_prompt = "Please give a detailed explanation to the following question: " + question

In [18]:
user_prompt

'Please give a detailed explanation to the following question: how does AI tool cursor work, be super brief please?'

In [19]:
messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
          ]

In [20]:
# Get gpt-4o-mini to answer, w/o streaming
def tutor():
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
          ]
    )
    result = response.choices[0].message.content
    display(Markdown(result))
    
tutor()


The AI tool "Cursor" works primarily as an integrated development environment (IDE) enhanced by artificial intelligence capabilities. It assists developers by providing features such as code suggestions, autocomplete, error detection, and real-time documentation. 

Here's a brief breakdown of its operation:

1. **Context Awareness**: Cursor analyzes the context of the code being worked on, including the current file, libraries in use, and coding patterns.

2. **AI Models**: It employs natural language processing and machine learning models trained on extensive codebases to understand coding syntax and semantics.

3. **Code Suggestions**: As developers write code, Cursor predicts and suggests completions or corrections based on the learned patterns.

4. **Integration**: The tool integrates with popular coding platforms and languages, allowing seamless use within the developer's workflow.

5. **Learning and Adaptation**: Cursor may refine its suggestions as it learns from the user's coding style and preferences over time.

Overall, Cursor aims to enhance productivity by making coding faster and more intuitive.