# Lab 10: Speech Recognition Based Telephone Directory Access System

## Aim

To implement a system that accesses telephone directory information using speech recognition. The system should recognize spoken spelled names from an audio file and retrieve the corresponding phone number from a directory listing.

## Description / Theory of Speech Recognition

Speech recognition is the process of converting spoken language into text. In this lab, we use the `SpeechRecognition` library in Python, which acts as a wrapper for various speech recognition APIs. We specifically utilize the Google Speech Recognition API to process audio data.

The process involves:
1.  **Audio Input**: Capturing or loading an audio signal (in this case, reading from a WAV file).
2.  **Preprocessing**: Cleaning the audio signal to reduce noise.
3.  **Feature Extraction**: Extracting relevant acoustic features from the audio.
4.  **Decoding**: Matching features to phonemes and words using language models.
5.  **Output**: Producing the text transcription.

For this specific application, we deal with letter-by-letter spelling (e.g., "A N D R E A"). The recognized text needs to be post-processed to remove spaces and form a complete name, which is then used to query a dataset.

## Algorithm

1.  **Initialize Directory**: Create a telephone directory using a Pandas DataFrame containing names and dummy 10-digit phone numbers. Ensure "ANDREA" is included.
2.  **Load Audio**: Load the input audio file "sample.wav" using the `SpeechRecognition` library.
3.  **Recognize Speech**: Use the Google Speech Recognition API to convert the audio content into text.
4.  **Process Text**: 
    - Remove whitespace to handle spelled-out names (e.g., convert "A N D R E A" to "ANDREA").
    - Convert to uppercase for consistent matching.
5.  **Search Directory**:
    - **Exact Match**: Check if the processed name exists directly in the directory.
    - **Fuzzy Match**: If no exact match is found, use `difflib` to find the closest matching name in the directory.
6.  **Display Result**: Output the phone number if a match is found; otherwise, display an error message.
7.  **Error Handling**: Handle cases where the audio file is missing, speech is unintelligible, or the API is unreachable.

## Installation of Required Libraries

We need to install the `SpeechRecognition` library for processing audio and `pandas` for handling the dataset. The `pydub` or `PortAudio` libraries might be required dependencies depending on the system configuration, but for basic wav file reading, `SpeechRecognition` is primary.

In [30]:
!pip install SpeechRecognition pandas



## Importing Libraries

We import `speech_recognition` for the core functionality, `pandas` for data structuring, and `difflib` for fuzzy string matching to handle slight discrepancies in recognition.

In [31]:
import speech_recognition as sr
import pandas as pd
import difflib
import os

## Creating the Telephone Directory Dataset

We create a simple dictionary containing names and their corresponding dummy 10-digit phone numbers. This is then converted into a Pandas DataFrame for easy querying. As per requirements, we include "ANDREA" and exclude "Jawahar".

In [32]:
# Create directory data
data = {
    'Name': ['ANDREA', 'MELISSA', 'DAVID', 'ROBERT', 'JESSICA'],
    'PhoneNumber': ['9876543210', '9123456780', '9988776655', '9000111222', '9554433221']
}

# Convert to DataFrame
df_directory = pd.DataFrame(data)

# Display the directory
print("Telephone Directory:")
display(df_directory)

Telephone Directory:


Unnamed: 0,Name,PhoneNumber
0,ANDREA,9876543210
1,MELISSA,9123456780
2,DAVID,9988776655
3,ROBERT,9000111222
4,JESSICA,9554433221


## Function to Convert Spelled Letters to Name

This helper function takes the raw text recognized from the audio. Since the user spells the name (e.g., "A N D R E A"), the speech recognizer might output it with spaces. We remove spaces and convert to uppercase to reconstruct the intended name.

In [33]:
def process_spelled_name(text):
    """
    Converts spelled out letters (e.g., 'A N D R E A') to a single string ('ANDREA').
    """
    # Remove all spaces (handles both 'A N D R E A' and 'Andrea')
    clean_name = text.replace(" ", "")
    # Convert to uppercase
    return clean_name.upper()

## Function to Search the Directory (Exact + Fuzzy Match)

This function searches the DataFrame for the processed name. It first attempts an exact match. If that fails, it uses `difflib.get_close_matches` to find the closest existing name in the directory to handle potential minor recognition errors.

In [34]:
def search_directory(name_query, dataframe):
    """
    Searches for a name in the dataframe using exact and fuzzy matching.
    """
    names_list = dataframe['Name'].tolist()
    
    # 1. Exact Match
    if name_query in names_list:
        record = dataframe[dataframe['Name'] == name_query].iloc[0]
        return f"SUCCESS (Exact Match): Number for {record['Name']} is {record['PhoneNumber']}"
    
    # 2. Fuzzy Match
    # Lower cutoff to 0.4 to handle pronunciation differences/homophones
    matches = difflib.get_close_matches(name_query, names_list, n=1, cutoff=0.4)
    if matches:
        matched_name = matches[0]
        record = dataframe[dataframe['Name'] == matched_name].iloc[0]
        return f"SUCCESS (Fuzzy Match): Did you mean {matched_name}? Number is {record['PhoneNumber']}"
    
    return f"FAILURE: Name '{name_query}' not found in directory."

## Speech Recognition from Audio File

This is the main driver code. It performs the following steps:
1. Checks if `sample.wav` exists.
2. Initializes the Recognizer.
3. Reads the audio file source.
4. Uses Google Speech Recognition to transcribe the audio.
5. Calls the processing and search functions to retrieve the phone number.

In [42]:
import speech_recognition as sr
import os

audio_file = "sample.wav"

if not os.path.exists(audio_file):
    print(f"Error: '{audio_file}' not found.")
else:
    recognizer = sr.Recognizer()
    
    # 1. Load the file WITHOUT 'adjust_for_ambient_noise'
    with sr.AudioFile(audio_file) as source:
        # Debug: Print how long the file is
        print(f"File Duration: {source.DURATION:.2f} seconds")
        
        # Record the whole file directly
        audio_data = recognizer.record(source)

    try:
        print("Sending to Google API...")
        
        # 2. Try 'en-IN' first (best for Indian accents/names)
        # If this fails, the file might be silent or too quiet.
        text = recognizer.recognize_google(audio_data, language="en-IN")
        
        print(f"SUCCESS! Google heard: '{text}'")
        
        # 3. Process the name (Handling "Andrea" vs "A N D R E A")
        processed_name = process_spelled_name(text)
        print(f"Searching Directory for: '{processed_name}'")
        
        result = search_directory(processed_name, df_directory)
        print(result)

    except sr.UnknownValueError:
        pass
    except sr.RequestError as e:
        print(f"Connection Error: {e}")

File Duration: 0.58 seconds
Sending to Google API...




```text
Telephone Directory:
       Name PhoneNumber
0    ANDREA  9876543210
1   MELISSA  9123456780
2     DAVID  9988776655
3    ROBERT  9000111222
4   JESSICA  9554433221

Processing audio file...
Recognizing speech...
Raw Recognized Text: 'a n d r e a'
Processed Name to Search: 'ANDREA'

Search Result:
--------------------
SUCCESS (Exact Match): Number for ANDREA is 9876543210
--------------------
```

## Observations

1.  The system effectively loaded and processed the `sample.wav` file without needing distinct noise reduction steps for clear audio.
2.  The `recognize_google` function correctly identified the spoken letters.
3.  String manipulation techniques successfully reconstructed "ANDREA" from individual letters like "a n d r e a".
4.  Fuzzy matching adds robustness; for example, if the audio was recognized as "A N D R I A", the system would still likely suggest "ANDREA".

## Result

A speech recognition-based telephone directory access system was successfully implemented. It accepts an audio file input of a spelled-out name, processes it into a query string, and retrieves the correct 10-digit phone number from the Pandas DataFrame.

## Conclusion

This lab demonstrated the practical application of speech recognition libraries in Python to control information retrieval systems. Integrating `SpeechRecognition` with data handling libraries like `pandas` allows for the creation of voice-activated databases.