GitHub - BradySteele/audio_chatbot: A text generation chatbot designed to accept user voice input, and provide a generated audio response.

This simple text generation chatbot is designed to accept user voice input, and provide a generated audio response.

The backbone of this code is running on the Microsoft DialoGPT-Medium Pretrained Response Generation Model.

In addition, we implement the Google Speech Recognition API along with Google Text to Speech, and Hugging Face Transformers for Natural Language Processing.

Firstly, inside a speech_to_text method, the application accepts user audio input through the specified device microphone:

r = sr.Recognizer() # Create a new speech recognition instance
with sr.Microphone() as mic: # Create a new microphone instance utilizing the device mic
r.adjust_for_ambient_noise(mic, duration=1) # Dynamic energy threshold adjustment
audio = r.listen(mic) # Record from audio instance into audioData instance

Next, we have to leverage the Google Speech Recognition API on the audioData to ensure the data can be parsed:

self.text = r.recognize_google(audio)

In a text_to_speech method, we convert the generated text response into a .mp3 file for audio playback utilizing Google Text to Speech. We are implementing the basic capabilities of gTTS, without preprocessor functions:

speaker = gTTS(text=text, lang="en", slow=False)
speaker.save("robot.mp3")
os.system("afplay robot.mp3") # Code here is OS specific, 'afplay' for MacOS and 'start' for Windows
os.remove("robot.mp3") # Remove the audio file after output to regain storage

Beginning the conversation is dependant on a matching to the robot name variable in the initial statement, similar to "Hey Alexa":

return True if self.name in text.lower() else False

Without waking the bot, Google Speech Recognition will not proceed:

except sr.UnknownValueError:
  print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
  print("Could not request results from Google Speech Recognition service; {0}".format(e))

Inside of main we instantiate the Microsoft DialoGPT-Medium Pretrained Response Generation Model and enable data parallelism to split data in n partitions:

ai = ChatBot(name="robot")
nlp = transformers.pipeline("conversational", model="microsoft/DialoGPT-medium")
os.environ["TOKENIZERS_PARALLELISM"] = "true"

In the full code you can see static responses being utilized for development and testing. The Microsoft Model will be utilized, assuming those specific parameters (which can be removed) are not filled, calling our conversational pipeline with specific end of string token ID:

chat = nlp(transformers.Conversation(ai.text), pad_token_id=50256)

NOTE: The token ID can be abstracted by adding the AutoTokenizer class, encoding the user input and eos token, and returning PyTorch tensor objects instead of Python integers:

tokenizer = AutoTokenizer.from_pretrained(<model>)
...
foo = tokenizer.encode(input("bar") + tokenizer.eos_token, return_tensors='pt')

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
bot.py		bot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

bot.py

bot.py

Repository files navigation

About

Releases

Packages

Languages

BradySteele/audio_chatbot

Folders and files

Latest commit

History

README.md

README.md

bot.py

bot.py

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages