<a href="https://colab.research.google.com/github/MK316/Spring2024/blob/main/DLTESOL/APP_pronunciation_checker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 	🐳 **Pronunciation checker (Sample)**

- Implementing pronunciation correction in a language learning application is quite advanced. It typically involves speech recognition to transcribe the user's spoken input and then compares it with the expected correct pronunciation. Here's a basic approach using Python:

  + Speech Recognition: Use a library like speech_recognition to capture and transcribe spoken input.
  + Comparison with Expected Pronunciation: Compare the transcribed text with the correct text.
  + Feedback to User: Provide feedback based on the comparison.

+ SpeechRecognition is for capturing and transcribing audio. python-Levenshtein provides a way to measure the difference between two sequences (in our case, spoken and expected text).

# [1] Installation

In [None]:
%%capture
!pip install SpeechRecognition
!pip install python-Levenshtein
!pip install gTTS pydub
from gtts import gTTS
from IPython.display import Audio, display
from pydub import AudioSegment

# Define TTS to generate sample audio

**Note:**

+ gTTS audio file generates 'mp3' file format, which is not recognized in SpeechRecognition.
+ Thus, we use additional code to convert 'mp3' to 'WAV'

In [None]:
def tts(text, lang='en'):
    tts = gTTS(text=text, lang=lang)
    tts.save("output.mp3")
    AudioSegment.from_mp3("output.mp3").export("output.wav", format="wav")
    return Audio("output.wav")

def ktts(text, lang="ko"):
    ktts = gTTS(text=text, lang=lang)
    ktts.save("k-output.mp3")
    AudioSegment.from_mp3("k-output.mp3").export("k-output.wav", format="wav")
    return Audio("k-output.wav")

In [None]:
# Generate audio in Korean accent.
text = "Upload your assignment to our E-learning website."
ktts(text)

In [None]:
#@markdown Provide expected text to evaluate the speech:
import speech_recognition as sr
from Levenshtein import ratio

def transcribe_audio(file_path):
    # Initialize the recognizer
    r = sr.Recognizer()

    # Open the audio file
    with sr.AudioFile(file_path) as source:
        print("Transcribing the audio file...")
        audio = r.record(source)  # Instead of listening, we now use record to capture the whole file

    # Use Google Web Speech API to transcribe
    try:
        text = r.recognize_google(audio)
        return text
    except sr.UnknownValueError:
        return "Could not understand audio"
    except sr.RequestError as e:
        return "Could not request results; {0}".format(e)

def pronunciation_correction(expected_text, file_path):
    user_spoken_text = transcribe_audio(file_path)
    print("Transcribed Text: " + user_spoken_text)

    # Compare the spoken text with the expected text
    similarity = ratio(expected_text.lower(), user_spoken_text.lower())
    print(f"Similarity: {similarity}")
    if similarity > 0.8:  # You can adjust this threshold
        return "Good pronunciation!"
    else:
        return "Try again, make sure to pronounce clearly."



# Example Usage
correct_text = input()
audio_file_path = "/content/k-output.wav"  # Replace with the path to your audio file
feedback = pronunciation_correction(correct_text, audio_file_path)
print(feedback)


# from google.colab import files
# uploaded = files.upload()
# # Assuming you uploaded a file named 'speech.wav'
# audio_file_path = uploaded
# feedback = pronunciation_correction(correct_text, audio_file_path)
# print(feedback)


Revise the code to work with an uploaded file

In [None]:
#@markdown Type the text to refer to for speech recognition (expected speech)
from google.colab import files
import speech_recognition as sr
from Levenshtein import ratio

def transcribe_audio(file_path):
    # Initialize the recognizer
    r = sr.Recognizer()

    # Open the audio file
    with sr.AudioFile(file_path) as source:
        print("Transcribing the audio file...")
        audio = r.record(source)  # Instead of listening, we now use record to capture the whole file

    # Use Google Web Speech API to transcribe
    try:
        text = r.recognize_google(audio)
        return text
    except sr.UnknownValueError:
        return "Could not understand audio"
    except sr.RequestError as e:
        return "Could not request results; {0}".format(e)

def pronunciation_correction(expected_text, file_path):
    user_spoken_text = transcribe_audio(file_path)
    print("Transcribed Text: " + user_spoken_text)

    # Compare the spoken text with the expected text
    similarity = ratio(expected_text.lower(), user_spoken_text.lower())
    print(f"Similarity: {similarity}")
    if similarity > 0.8:  # You can adjust this threshold
        return "Good pronunciation!"
    else:
        return "Try again, make sure to pronounce clearly."

# Example Usage
correct_text = input("Please provide the expected text: ")

uploaded = files.upload()  # This will prompt you to upload a file from your computer

# Assuming you uploaded a single file, extract the filename
file_name = next(iter(uploaded))
feedback = pronunciation_correction(correct_text, file_name)
print(feedback)


Gradio link

In [None]:
!pip install gradio

This gradio below works well (2024.02.17)

In [None]:
import gradio as gr
import speech_recognition as sr
from Levenshtein import ratio
import tempfile
import numpy as np
import soundfile as sf

def transcribe_audio(file_info):
    r = sr.Recognizer()

    # file_info[0] is the file name, file_info[1] is the NumPy array
    # Save the NumPy array to a temporary WAV file
    with tempfile.NamedTemporaryFile(delete=True, suffix=".wav") as tmpfile:
        sf.write(file=tmpfile.name, data=file_info[1], samplerate=44100, format='WAV')
        tmpfile.seek(0)

        with sr.AudioFile(tmpfile.name) as source:
            audio_data = r.record(source)

    try:
        text = r.recognize_google(audio_data)
        return text
    except sr.UnknownValueError:
        return "Could not understand audio"
    except sr.RequestError as e:
        return f"Could not request results; {e}"

def pronunciation_correction(expected_text, file_info):
    user_spoken_text = transcribe_audio(file_info)
    similarity = ratio(expected_text.lower(), user_spoken_text.lower())
    if similarity > 0.8:
        return "Good pronunciation!", similarity
    else:
        return "Try again, make sure to pronounce clearly.", similarity

iface = gr.Interface(
    fn=pronunciation_correction,
    inputs=[
        gr.Textbox(label="Expected Text"),
        gr.Audio(label="Upload Audio File", type="numpy")  # Specify type="numpy" to ensure file_info[1] is a NumPy array
    ],
    outputs=["text", "number"],
    title="Pronunciation Correction Tool"
)

iface.launch(debug=True)


## How it Works
This code uses Google's Web Speech API to transcribe it.
The pronunciation_correction function takes the expected correct text, records the user's speech, and compares the transcription to the expected text.
The similarity is measured using the Levenshtein ratio. If the similarity is above a certain threshold, it considers the pronunciation good.

## Considerations
+ Accuracy: Speech recognition accuracy can vary based on accent, speech clarity, and background noise.
+ Language Support: Google's speech recognition supports multiple languages, but you need to specify the language if it's not English.
+ Privacy: Inform users that their speech will be sent to Google's servers for transcription.
+ Improvement: For more advanced pronunciation analysis, you might consider phonetic comparison or integrating specialized APIs.