# 🎓 Spanish Language Tutor

An interactive AI-powered Spanish language learning companion that combines conversational chat with natural speech synthesis for immersive language learning.

## 📋 Overview

This project creates an engaging Spanish tutoring experience using advanced AI technology. The system provides personalized Spanish lessons through interactive conversations, helping English speakers learn Spanish naturally. Each response includes both text and audio feedback using OpenAI's text-to-speech capabilities, creating an immersive learning environment that mimics real conversations with a native Spanish tutor.

## ✨ Key Features

- **🤖 AI-Powered Tutoring**: Uses OpenAI's `gpt-4o-mini` for intelligent, contextual Spanish language instruction
- **🔊 Text-to-Speech Integration**: Natural voice synthesis using OpenAI's TTS API with high-quality voice options
- **💬 Interactive Chat Interface**: Gradio-powered web interface for seamless conversation flow
- **📚 Conversation History**: Maintains context across interactions for personalized learning progression
- **🎯 English-Focused Explanations**: Provides explanations in English while teaching Spanish vocabulary and grammar
- **🌐 Web-Based Interface**: Accessible through any web browser for convenient learning sessions

## 🛠️ Technology Stack

| Component | Technology | Purpose |
|-----------|------------|---------|
| **AI Model** | OpenAI GPT-4o-mini | Language instruction and conversation |
| **Text-to-Speech** | OpenAI TTS API | Natural voice synthesis |
| **Web Interface** | Gradio | Interactive chat interface |
| **Audio Processing** | PyDub, SimpleAudio | Audio file handling and playback |
| **Language** | Python | Core development |
| **Environment** | Jupyter Notebook | Development and demonstration |

## 🚀 Installation Requirements

### Python Dependencies
```bash
pip install openai gradio python-dotenv pydub simpleaudio
```

### Environment Variables
- `OPENAI_API_KEY` - Required for both chat and TTS functionality

### System Requirements
- Audio output capability for speech synthesis
- Web browser for Gradio interface

## 🎯 Project Scope

- ✅ **Interactive Conversations**: Real-time Spanish tutoring through chat
- ✅ **Audio Responses**: Every response includes natural speech synthesis
- ✅ **Context Awareness**: Maintains conversation history for personalized learning
- ✅ **Beginner-Friendly**: Explanations provided in English for accessibility
- ✅ **Web Interface**: Easy-to-use browser-based chat interface
- ❌ **Visual Content**: Focuses on conversational learning, not visual aids

## 🏆 Skill Level

**Beginner to Intermediate** - Perfect for developers learning:
- OpenAI API integration (Chat & TTS)
- Interactive web applications with Gradio
- Audio processing and playback
- Conversational AI system design
- Language learning application development

## 🚀 Use Cases

- **📖 Spanish Learning**: Interactive Spanish language instruction
- **🗣️ Pronunciation Practice**: Hear correct Spanish pronunciation
- **💬 Conversation Practice**: Engage in Spanish conversations with AI tutor
- **📝 Grammar Instruction**: Learn Spanish grammar through examples
- **🎯 Vocabulary Building**: Expand Spanish vocabulary through context
- **🏫 Educational Supplement**: Support traditional Spanish language courses

## 💡 Benefits

- **🎧 Immersive Learning**: Audio responses create natural conversation experience
- **⏰ Available 24/7**: Practice Spanish anytime without scheduling constraints
- **🎯 Personalized Instruction**: AI adapts to individual learning pace and style
- **🔄 Immediate Feedback**: Instant responses and corrections
- **📱 Accessible**: Web-based interface works on any device with a browser
- **🎨 Engaging**: Interactive chat format makes learning enjoyable

## 🔧 Core Components

### `LLMClient` Class
Handles OpenAI API integration for both text generation and speech synthesis with support for conversation history.

### `TextToSpeech` Class  
Manages text-to-speech conversion with audio file processing, playback, and cleanup functionality.

### `SpanishTutor` Class
Main orchestration class that combines language model responses with audio output for comprehensive tutoring experience.

## 🎪 Interactive Features

- **Chat History**: Maintains conversation context for natural dialogue flow
- **Audio Playback**: Automatic speech synthesis for every tutor response
- **Web Interface**: Clean, user-friendly Gradio chat interface
- **Real-time Responses**: Immediate feedback and instruction

## 📚 Learning Approach

The tutor provides:
- Spanish translations and explanations
- Grammar tips and corrections  
- Cultural context and usage examples
- Pronunciation guidance through audio
- Progressive difficulty based on conversation

---

*This project demonstrates the integration of conversational AI, text-to-speech technology, and web interfaces to create an engaging language learning experience.*

In [20]:
!uv pip install simpleaudio

[2mUsing Python 3.12.11 environment at: /Users/daniela_veloz/Workspace/llm_portfolio/.venv[0m
[2mAudited [1m1 package[0m [2min 2ms[0m[0m


In [21]:
import os
import gradio as gr

## LLM Client

In [22]:
from openai import OpenAI

class LLMClient:
    """
    A client for interacting with language models through OpenAI's API.
    Supports both OpenAI's hosted models and local models via custom base URLs.
    """
    
    def __init__(self, model, base_url=None):
        """
        Initialize the LLM client.
        
        Args:
            model (str): The model name to use (e.g., 'gpt-4o-mini', 'gpt-3.5-turbo')
            base_url (str, optional): Custom base URL for local models. If provided,
                                     the model parameter is used as the API key.
        """
        self.model = model
        if base_url:
            self.openai = OpenAI(base_url=base_url, api_key=model)
        else:
            self.openai = OpenAI()

    def generate_text(self, user_prompt, system_prompt="", history=None) -> str:
        """
        Generate text response using the configured language model.
        
        Args:
            user_prompt (str): The user's input message
            system_prompt (str, optional): System instructions to guide the model's behavior
            history (List[Dict[str, str]], optional): Conversation history in OpenAI format.
                                                     Each dict should have 'role' and 'content' keys.
        
        Returns:
            str: The model's text response
        """
        if history is None:
            history = []
        
        messages = [{"role": "system", "content": system_prompt}] + history + [{"role": "user", "content": user_prompt}]
        
        response = self.openai.chat.completions.create(
            model=self.model,
            messages=messages,
        )
        return response.choices[0].message.content

    def generate_speech(self, voice, message):
        """
        Generate speech audio from text using OpenAI's TTS API.
        
        Args:
            voice (str): The voice to use for speech synthesis.
                        Options: alloy, echo, fable, onyx, nova, shimmer
            message (str): The text to convert to speech
            
        Returns:
            HttpxBinaryResponseContent: Raw audio content in MP3 format
            
        Raises:
            OpenAIError: If the API request fails

        """
        response = self.openai.audio.speech.create(
            model=self.model,
            voice=voice,
            input=message
        )
        return response

## Load OpenAI keys

In [23]:
# Initialization
from dotenv import load_dotenv

load_dotenv(override=True)

openai_api_key = os.getenv('OPENAI_API_KEY')
if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")

MODEL = "gpt-4o-mini"
openai = OpenAI()

OpenAI API Key exists and begins sk-proj-


## TextToSpeech module

In [24]:
from pydub import AudioSegment
from io import BytesIO
import tempfile
import os
import simpleaudio as sa

class TextToSpeech:
    """
    A text-to-speech client that converts text to audio using OpenAI's TTS API
    and plays it through the system's audio device.
    """
    
    def __init__(self, model="tts-1", voice="onyx"):
        """
        Initialize the TextToSpeech client.
        
        Args:
            model (str, optional): The TTS model to use. Defaults to "tts-1"
            voice (str, optional): The voice to use for speech synthesis. 
                                  Options: alloy, echo, fable, onyx, nova, shimmer
        """
        # Create a dedicated OpenAI client for TTS (always uses official OpenAI API)
        self.openai = OpenAI()
        self.model = model
        self.voice = voice
    
    def speak(self, message):
        """
        Convert text to speech and play the audio immediately.
        
        This method:
        1. Sends text to OpenAI's TTS API
        2. Converts the MP3 response to WAV format
        3. Creates a temporary audio file
        4. Plays the audio through the system speakers
        5. Cleans up temporary files
        
        Args:
            message (str): The text to convert to speech
            
        Raises:
            Exception: If there are issues with audio device or file operations
        """
        temp_file_name = None
        try:
            response = self.openai.audio.speech.create(
                model=self.model,
                voice=self.voice,
                input=message
            )

            audio_stream = BytesIO(response.content)
            audio = AudioSegment.from_file(audio_stream, format="mp3")

            # Create a temporary file in a folder where you have write permissions
            with tempfile.NamedTemporaryFile(suffix=".wav", delete=False, dir=os.path.expanduser(".")) as temp_audio_file:
                temp_file_name = temp_audio_file.name
                audio.export(temp_file_name, format="wav")

            # Load and play audio using simpleaudio
            wave_obj = sa.WaveObject.from_wave_file(temp_file_name)
            play_obj = wave_obj.play()
            play_obj.wait_done()  # Wait for playback to finish
            
        except Exception as e:
            print(f"Audio device error: {e}")
        finally:
            # Clean up the temporary file
            if temp_file_name and os.path.exists(temp_file_name):
                os.remove(temp_file_name)

def talker(message):
    """
    Backward compatibility function for legacy code.
    
    Args:
        message (str): The text to convert to speech
    """
    tts = TextToSpeech()
    tts.speak(message)

## Spanish Tutor Module

In [25]:
class SpanishTutor:
    """
    An AI-powered Spanish language tutor that provides interactive Spanish lessons
    through conversational chat with audio feedback.
    
    This class combines OpenAI's language model capabilities with text-to-speech
    functionality to create an immersive Spanish learning experience. The tutor
    provides explanations in English while teaching Spanish vocabulary, grammar,
    and conversation skills.
    
    Attributes:
        system_message (str): System prompt that defines the tutor's behavior and
                             instructions for providing Spanish language instruction
        llm_client (LLMClient): Client for generating text responses using language models
        tts_client (TextToSpeech): Client for converting text responses to spoken audio
    """

    system_message = """
    You are a language tutor and help students whose main language is english to learn spanish. As english is the main language of the students you need provide your answers in english and use spanish as needed.
    """

    def __init__(self, model="gpt-4o-mini"):
        """
        Initialize the Spanish tutor with specified language model.
        
        Args:
            model (str, optional): The language model to use for generating responses.
                                  Defaults to "gpt-4o-mini"
        """
        self.llm_client = LLMClient(model=model)
        self.tts_client = TextToSpeech()
    
    def chat(self, message, history):
        """
        Process a chat message and return a tutoring response with audio feedback.
        
        This method handles the main interaction flow:
        1. Sends the user's message and conversation history to the language model
        2. Generates an appropriate Spanish tutoring response
        3. Converts the response to audio and plays it
        4. Returns the text response for display
        
        Args:
            message (str): The user's input message or question about Spanish
            history (List[Dict[str, str]]): Conversation history in OpenAI format.
                                           Each dict contains 'role' and 'content' keys
        
        Returns:
            str: The tutor's response containing Spanish instruction, translations,
                 explanations, or corrections
        """
        reply = self.llm_client.generate_text(
            user_prompt=message,
            system_prompt=self.system_message,
            history=history
        )
        
        # Play the response as audio
        self.tts_client.speak(reply)
        
        return reply

## Testing the Spanish Tutor with gpt-4o-mini

In [26]:
MODEL = "gpt-4o-mini"

# Create Spanish tutor instance
spanish_tutor = SpanishTutor(MODEL)

def chat(message, history):
    """Backward compatibility function for Gradio interface"""
    return spanish_tutor.chat(message, history)

In [None]:
chat("""how could I say "I'm hungry""", [])

## Test Spanish Tutor with gradio interface

In [None]:
gr.ChatInterface(fn=chat, type="messages").launch()