# SpeechToSQL 
- Author: [Dooil Kwak](https://github.com/back2zion)
- Design: 
- Peer Review : [Ilgyun Jeong](https://github.com/johnny9210), [Jaehun Choi](https://github.com/ash-hun) 
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/19-Cookbook/01-SQL/02-SpeechToSQL.ipynb) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/19-Cookbook/01-SQL/02-SpeechToSQL.ipynb)


## Overview

The Speech to SQL system is a powerful tool that converts spoken language into SQL queries. It combines advanced speech recognition with natural language processing to enable hands-free database interactions.

**Key Features**:

- **Real-time Speech Processing**: 
  Captures and processes voice input in real-time, supporting various microphone configurations.

- **Accurate Speech Recognition**: 
  Uses Whisper model for reliable speech-to-text conversion with support for clear English queries.

- **SQL Query Generation**: 
  Transforms natural language questions into properly formatted SQL queries.

**System Requirements**:
- Python 3.8 or higher
- Working microphone
- Recommended: CUDA-capable GPU for faster processing

### Table of Contents 

- [Overview](#overview)
- [Installation and Setup](#installation-and-setup)
- [Audio Device Configuration](#audio-device-configuration)
- [Speech Recognition Setup](#speech-recognition-setup)
- [Basic Usage](#basic-usage)
- [Advanced Usage and Troubleshooting](#advanced-usage-and-troubleshooting)

### References
- [Faster Whisper Documentation > Python API Reference](https://github.com/guillaumekln/faster-whisper)
- [SoundDevice Documentation > Python API Reference](https://python-sounddevice.readthedocs.io/en/0.4.6/)
- [Wavio Documentation > Audio File Handling](https://github.com/WarrenWeckesser/wavio)
- [NumPy Documentation > Audio Processing](https://numpy.org/doc/stable/reference/routines.html#audio-processing)

## Installation and Setup

Before we begin, let's set up our environment with all necessary packages and configurations.

### Required Packages
First, we'll install the required Python packages for speech processing and SQL conversion:
- `sounddevice`: For audio capture
- `numpy`: For audio data processing
- `wavio`: For audio file handling
- `faster-whisper`: For speech recognition
- `requests`: For API communication

Let's install the necessary packages for our speech-to-SQL system. We'll use standard Python packages that work both in Jupyter Notebook and Google Colab.

In [1]:
# Install required packages
%pip install --quiet langchain-community openai sqlalchemy python-dotenv sounddevice numpy wavio faster-whisper torch
print("✓ Packages installed successfully!")

Note: you may need to restart the kernel to use updated packages.
✓ Packages installed successfully!




That's it! The `--quiet` flag keeps the output clean and simple. Now let's verify that everything is ready to use:

In [36]:
try:
    import sounddevice as sd
    import numpy as np
    import torch
    from faster_whisper import WhisperModel
    print("✓ All set! Let's move on to the next step.")
except ImportError as e:
    print(f"✗ Something's missing. Please try running the installation command again.")

✓ All set! Let's move on to the next step.


In [37]:
# Import necessary libraries
import sounddevice as sd
import numpy as np
import wavio
import os
import time
from faster_whisper import WhisperModel
import torch
from dotenv import load_dotenv

# Load environment variables
load_dotenv(override=True)

True

## Audio Device Configuration

A crucial first step is selecting the correct audio input device. Let's identify and configure your system's microphone.

**Note**: You'll see a filtered list of input devices only, making it easier to choose the correct microphone.

In [38]:
def list_audio_input_devices():
    """Display only audio input devices with clear formatting."""
    print("\nAvailable Audio Input Devices:")
    print("=" * 50)
    
    input_devices = []
    for idx, device in enumerate(sd.query_devices()):
        if device['max_input_channels'] > 0:  # Only show input devices
            # Skip duplicate devices (different APIs)
            device_name = device['name'].split(',')[0]  # Remove API information
            if not any(d['name'].startswith(device_name) for d in input_devices):
                input_devices.append({
                    'index': idx,
                    'name': device_name,
                    'channels': device['max_input_channels'],
                    'sample_rate': device['default_samplerate']
                })
                
                print(f"Device {idx}: {device_name}")
                print(f"  Channels: {device['max_input_channels']}")
                print(f"  Sample Rate: {device['default_samplerate']}Hz")
                print("-" * 50)
    
    return input_devices

# List available input devices
input_devices = list_audio_input_devices()


Available Audio Input Devices:
Device 0: Microsoft 사운드 매퍼 - Input
  Channels: 2
  Sample Rate: 44100.0Hz
--------------------------------------------------
Device 1: 머리에 거는 수화기(2- Kwak’s AirPods)
  Channels: 1
  Sample Rate: 44100.0Hz
--------------------------------------------------
Device 6: 주 사운드 캡처 드라이버
  Channels: 2
  Sample Rate: 44100.0Hz
--------------------------------------------------
Device 18: 머리에 거는 수화기 (@System32\drivers\bthhfenum.sys
  Channels: 1
  Sample Rate: 8000.0Hz
--------------------------------------------------
Device 21: 헤드셋 마이크 (@System32\drivers\bthhfenum.sys
  Channels: 1
  Sample Rate: 8000.0Hz
--------------------------------------------------


In [39]:
def test_audio_device(device_index, duration=1):
    """
    Test if an audio device works properly.
    Args:
        device_index (int): The index of the device to test
        duration (float): Test duration in seconds
    Returns:
        bool: True if device works, False otherwise
    """
    try:
        print(f"Testing audio device {device_index}...")
        with sd.InputStream(device=device_index, channels=1, samplerate=16000):
            print("✓ Device initialized successfully")
            return True
    except Exception as e:
        print(f"✗ Device test failed: {str(e)}")
        return False

### Audio Device Selection and Testing

After viewing the available devices above, you'll need to select and test your microphone. Choose a device with input channels (marked as "Channels: X" where X > 0).

**Important Tips**:
- Choose a device with clear device name (avoid generic names like "Default Input")
- Prefer devices with 1 or 2 input channels
- If using a USB microphone, make sure it's properly connected
- Test the device before proceeding to actual recording

In [40]:
# Let's test the first available input device as default
if input_devices:
    default_device = input_devices[0]
    print(f"\nTesting default device: {default_device['name']}")
    if test_audio_device(default_device['index']):
        # Set as default device
        os.environ['DEFAULT_DEVICE'] = str(default_device['index'])
        os.environ['SAMPLE_RATE'] = str(int(default_device['sample_rate']))
        print(f"\nDefault device set to: {default_device['name']}")
        print(f"Sample rate: {default_device['sample_rate']}Hz")
    else:
        print("\nPlease select a different device and try again.")
else:
    print("\nNo input devices found. Please check your microphone connection.")


Testing default device: Microsoft 사운드 매퍼 - Input
Testing audio device 0...
✓ Device initialized successfully

Default device set to: Microsoft 사운드 매퍼 - Input
Sample rate: 44100.0Hz


## Speech Recognition Setup

Now let's set up the speech recognition component using the Whisper model. While GPU (CUDA) can make the process faster, our system works perfectly fine on CPU too.

**Note**: The first time you run this, it will download the Whisper model. This might take a few minutes depending on your internet connection.

In [41]:
def initialize_whisper():
    """Initialize the Whisper model."""
    try:
        # Always use CPU for better compatibility
        model = WhisperModel(
            model_size_or_path="base",  # Using 'base' model for faster CPU processing
            device="cpu",
            compute_type="int8"  # Optimized for CPU
        )
        print("✓ Whisper model initialized successfully")
        return model
    except Exception as e:
        print(f"✗ Error initializing Whisper model: {str(e)}")
        print("Please make sure all packages are installed correctly.")
        return None

model = initialize_whisper()

✓ Whisper model initialized successfully


## Basic Usage

Let's implement the core components for speech-to-SQL conversion. We'll create a robust system that can:
1. Record audio from your microphone
2. Convert speech to text
3. Transform the text into SQL queries
4. Create database in 'data' folder

Now that we have our model initialized, let's create the core components for recording and processing speech. First, we'll create our `AudioRecorder` class:

In [48]:
# 1. Record audio from your microphone
import sounddevice as sd
import numpy as np
import wavio
import tempfile

class AudioRecorder:
    def __init__(self):
        self._samplerate = 16000
        self.audio_data = []
        self.recording = False
        self.stream = None

    def start_recording(self, device_id=0):
        """Start recording audio"""
        try:
            self.stream = sd.InputStream(
                channels=1,
                samplerate=self._samplerate,
                callback=self._audio_callback,
                device=device_id
            )
            self.audio_data = []
            self.recording = True
            self.stream.start()
            return True
        except Exception as e:
            print(f"Recording failed: {str(e)}")
            return False

    def _audio_callback(self, indata, frames, time, status):
        if self.recording:
            self.audio_data.append(indata.copy())

    def stop_and_process(self):
        """Stop recording and save audio data"""
        if self.stream:
            self.stream.stop()
            self.stream.close()
            self.recording = False
            if len(self.audio_data) > 0:
                audio = np.concatenate(self.audio_data)
                with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmpfile:
                    wavio.write(tmpfile.name, audio, self._samplerate, sampwidth=2)
                return tmpfile.name
        return None

Now it's time to convert speech to text.

In [49]:
# 2. Convert speech to text

from faster_whisper import WhisperModel

def initialize_whisper():
    """Initialize the Whisper model"""
    return WhisperModel("base", device="cpu", compute_type="int8")

class AudioProcessor:
    def __init__(self, model):
        self.model = model

    def transcribe_audio(self, audio_file):
        """Transcribe audio to text using Whisper"""
        try:
            segments, _ = self.model.transcribe(audio_file)
            return " ".join([segment.text for segment in segments])
        except Exception as e:
            print(f"Transcription failed: {str(e)}")
            return None

Transform the text into SQL queries!

In [55]:
# 3. Transform the text into SQL queries

from langchain_openai import OpenAI
from langchain.agents import create_sql_agent
from langchain.agents.agent_types import AgentType
from langchain.sql_database import SQLDatabase
from sqlalchemy import create_engine

class SQLQueryConverter:
    """Handle SQL query conversion using LangChain"""
    def __init__(self, database_path):
        engine = create_engine(f'sqlite:///{database_path}')
        db = SQLDatabase(engine)
        llm = OpenAI(temperature=0)
        self.agent = create_sql_agent(
            llm=llm,
            db=db,
            verbose=True,
            agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION
        )

    def convert_to_sql(self, query_text):
        """Convert natural language to SQL query"""
        try:
            result = self.agent.run(query_text)
            if isinstance(result, list):  # Verify that the SQL result is returned as a list
                return result  # Return SQL execution results
            else:
                return f"SQL Query: {result}"  # Return the query string itself

        except Exception as e:
            return f"Error processing query: {str(e)}"

Create database in 'data' folder.

In [60]:
# 4. Create database in 'data' folder

import sqlite3

def create_database():
    os.makedirs('data', exist_ok=True)
    db_path = os.path.join('data', 'sales_database.db')
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Create tables
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS customers (
        customer_id INTEGER PRIMARY KEY,
        name TEXT,
        email TEXT
    )
    ''')

    cursor.execute('''
    CREATE TABLE IF NOT EXISTS sales (
        sale_id INTEGER PRIMARY KEY,
        customer_id INTEGER,
        product_name TEXT,
        sale_amount REAL,
        sale_date DATE,
        FOREIGN KEY(customer_id) REFERENCES customers(customer_id)
    )
    ''')

    # Insert sample data
    cursor.executemany('INSERT INTO customers VALUES (?, ?, ?)', [
        (1, 'John Doe', 'john@example.com'),
        (2, 'Jane Smith', 'jane@example.com')
    ])

    cursor.executemany('INSERT INTO sales VALUES (?, ?, ?, ?, ?)', [
        (1, 1, 'Laptop', 1200.50, '2023-01-15'),
        (2, 1, 'Monitor', 300.75, '2023-02-20'),
        (3, 2, 'Smartphone', 800.00, '2023-03-10')
    ])

    conn.commit()
    conn.close()
    return db_path

Putting it all together.

In [61]:
from langchain.sql_database import SQLDatabase

def process_speech_to_sql():
    print("\n=== Starting Speech to SQL Process ===")
    print("Recording will start in:")
    for i in range(3, 0, -1):
        print(f"{i}...")
        time.sleep(1)

    recorder = AudioRecorder()
    if recorder.start_recording():
        print("\nSpeak your query now... (5 seconds)")
        time.sleep(5)
        audio_file = recorder.stop_and_process()
        print(f"Saved audio file: {audio_file}")

        if audio_file:
            model = initialize_whisper()
            processor = AudioProcessor(model)
            print("Processing audio...")
            text = processor.transcribe_audio(audio_file)
            print(f"Transcribed Text: {text}")

            db_path = create_database()
            converter = SQLQueryConverter(db_path)
            sql_query = converter.convert_to_sql(text)
            print(f"Generated SQL Query: {sql_query}")
            return sql_query

Let's try it out! Run this command to start recording:

In [None]:
query_text = process_speech_to_sql()


=== Starting Speech to SQL Process ===
Recording will start in:
3...
2...
1...

Speak your query now... (5 seconds)
Saved audio file: C:\Users\Public\Documents\ESTsoft\CreatorTemp\tmpumwr2th_.wav
Processing audio...
Transcribed Text: 


Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers, sales[0m[32;1m[1;3m I should query the schema of the customers table to see what columns I can use.
Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	customer_id INTEGER, 
	name TEXT, 
	email TEXT, 
	PRIMARY KEY (customer_id)
)

/*
3 rows from customers table:
customer_id	name	email
1	John Doe	john@example.com
2	Jane Smith	jane@example.com
*/[0m[32;1m[1;3m I should query the schema of the sales table to see what columns I can use.
Action: sql_db_schema
Action Input: sales[0m[33;1m[1;3m
CREATE TABLE sales (
	sale_id INTEGER, 
	customer_id INTEGER, 
	product_name TEXT, 
	sale_amount REAL, 
	sale_date DATE, 
	PRIMARY KEY (sale_id), 
	FOREIGN KEY(customer_id) REFERENCES customers (customer_id)
)

/*
3 rows from sales table:
sale_id	customer_id	product_name	sale_amount	sale_date
1	1	Laptop	1200.5	2023-01-15
2	1	Monitor	300.75	2023-02-20
3	2	Smartphone	800.0	2

: 

## Example Queries

Here are some example queries you can try with the system:

1. "Show sales figures for the last quarter"
2. "Find top 10 customers by revenue"
3. "List all products with inventory below 100 units"
4. "Calculate total sales by region"
5. "Get employee performance metrics for 2023"

These queries demonstrate the range of SQL operations our system can handle.

## Advanced Usage and Troubleshooting

### Common Issues and Solutions

1. **No audio device found**
   - Check if your microphone is properly connected
   - Try unplugging and reconnecting your microphone
   - Verify microphone permissions in your OS settings

2. **Poor recognition accuracy**
   - Speak clearly and at a moderate pace
   - Minimize background noise
   - Keep the microphone at an appropriate distance

3. **Device initialization errors**
   - Try selecting a different audio device
   - Restart your Python kernel
   - Check if another application is using the microphone

----