Skip to content

A text generation chatbot designed to accept user voice input, and provide a generated audio response.

Notifications You must be signed in to change notification settings

BradySteele/audio_chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

This simple text generation chatbot is designed to accept user voice input, and provide a generated audio response.

The backbone of this code is running on the Microsoft DialoGPT-Medium Pretrained Response Generation Model.

In addition, we implement the Google Speech Recognition API along with Google Text to Speech, and Hugging Face Transformers for Natural Language Processing.

Firstly, inside a speech_to_text method, the application accepts user audio input through the specified device microphone:

r = sr.Recognizer() # Create a new speech recognition instance
with sr.Microphone() as mic: # Create a new microphone instance utilizing the device mic
r.adjust_for_ambient_noise(mic, duration=1) # Dynamic energy threshold adjustment
audio = r.listen(mic) # Record from audio instance into audioData instance

Next, we have to leverage the Google Speech Recognition API on the audioData to ensure the data can be parsed:

self.text = r.recognize_google(audio)

In a text_to_speech method, we convert the generated text response into a .mp3 file for audio playback utilizing Google Text to Speech. We are implementing the basic capabilities of gTTS, without preprocessor functions:

speaker = gTTS(text=text, lang="en", slow=False)
speaker.save("robot.mp3")
os.system("afplay robot.mp3") # Code here is OS specific, 'afplay' for MacOS and 'start' for Windows
os.remove("robot.mp3") # Remove the audio file after output to regain storage

Beginning the conversation is dependant on a matching to the robot name variable in the initial statement, similar to "Hey Alexa":

return True if self.name in text.lower() else False

Without waking the bot, Google Speech Recognition will not proceed:

except sr.UnknownValueError:
  print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
  print("Could not request results from Google Speech Recognition service; {0}".format(e))

Inside of main we instantiate the Microsoft DialoGPT-Medium Pretrained Response Generation Model and enable data parallelism to split data in n partitions:

ai = ChatBot(name="robot")
nlp = transformers.pipeline("conversational", model="microsoft/DialoGPT-medium")
os.environ["TOKENIZERS_PARALLELISM"] = "true"

In the full code you can see static responses being utilized for development and testing. The Microsoft Model will be utilized, assuming those specific parameters (which can be removed) are not filled, calling our conversational pipeline with specific end of string token ID:

chat = nlp(transformers.Conversation(ai.text), pad_token_id=50256)

NOTE: The token ID can be abstracted by adding the AutoTokenizer class, encoding the user input and eos token, and returning PyTorch tensor objects instead of Python integers:

tokenizer = AutoTokenizer.from_pretrained(<model>)
...
foo = tokenizer.encode(input("bar") + tokenizer.eos_token, return_tensors='pt')

About

A text generation chatbot designed to accept user voice input, and provide a generated audio response.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages