## Imports

At the start of each section, I will also be providing links to the documentations for the libraries that I think it would be helpful for. It is always helpful to use the documentation! With libraries like this, you have to get used to them before you understand their capabilities. You won't initially have a good idea of what they can and can't do so searching things up is going to be the default way to learn anything specific to your needs!

In [1]:
# A dependency called pyaudio is troublesome to download. 
# You may have to use pip install pipwin followed by pipwin install pyaudio in adminstrator mode. 
# Read more about pipwin before using it, to understand why it exists and what the risks are with 
# using it's unoffical release downloads.

# Speech Recognition
import speech_recognition as sr

# Text to Speech
from gtts import gTTS
from pydub import AudioSegment
from pydub.playback import play

# Straightforward Chatbot
from transformers import pipeline, Conversation

## Speech Recognition

speech_recognition documentation: https://pypi.org/project/SpeechRecognition/1.2.3/

In [2]:
recognizer = sr.Recognizer()
text = ""

with sr.Microphone(device_index=0) as mic:
    print("Listening...")
    audio = recognizer.listen(mic)
    recognizer.adjust_for_ambient_noise(mic)
    
    try: 
        text = recognizer.recognize_google(audio)
    except sr.UnknownValueError: 
        text = "Speech not understood"
        
print(text)

Listening...
hello


## Text to Speech

gTTS and pydub documentation: https://gtts.readthedocs.io/en/latest/, https://github.com/jiaaro/pydub

Here we save the audio to a file and replay the file. After importing os, you can use os.remove([filepath]) if you don't want to save the files.

In [3]:
text_object = gTTS(text="Hello, how are you doing today?", lang="en", slow=False)
text_object.save(f"chatbot_recordings/example.mp3")

audio = AudioSegment.from_mp3(f"chatbot_recordings/example.mp3")
play(audio)

## Straightforward Chatbot

transformers documentation: https://huggingface.co/docs/transformers/index

source code for below: https://huggingface.co/transformers/v4.11.3/_modules/transformers/pipelines/conversational.html

This method will simply import your chosen model from Hugging Face's selection of pretrained models and allow you to use it directly. You can then use this output however you want. 

In [4]:
conversational_pipeline = pipeline("conversational", model="microsoft/DialoGPT-medium") 

conv = Conversation("Do you like the color yellow?")
print(conversational_pipeline([conv], pad_token_id=50256))

conv.add_user_input("Why do you think so?")
print(conversational_pipeline([conv], pad_token_id=50256))

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at microsoft/DialoGPT-medium.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


Conversation id: fa3dc1ef-0212-4061-a389-6e45ba0dc1f1 
user >> Do you like the color yellow? 
bot >> I do! 

Conversation id: fa3dc1ef-0212-4061-a389-6e45ba0dc1f1 
user >> Do you like the color yellow? 
bot >> I do! 
user >> Why do you think so? 
bot >> I like it because it's a nice color. 



In [5]:
conv2 = Conversation("What do you think about the weather?")
conv.add_user_input("Do you have another favorite color?")
print(conversational_pipeline([conv, conv2], pad_token_id=50256))

conv2.add_user_input("Should I play outside then?")
unprocessed = str(conversational_pipeline([conv2], pad_token_id=50256))

[Conversation id: fa3dc1ef-0212-4061-a389-6e45ba0dc1f1 
user >> Do you like the color yellow? 
bot >> I do! 
user >> Why do you think so? 
bot >> I like it because it's a nice color. 
user >> Do you have another favorite color? 
bot >> I have a few. 
, Conversation id: 6d8d6986-c522-4a70-a57f-8e0ecf667817 
user >> What do you think about the weather? 
bot >> It's nice. 
]


In [6]:
responses = [resp for resp in unprocessed.split("\n") if resp != ""]
print(responses)

['Conversation id: 6d8d6986-c522-4a70-a57f-8e0ecf667817 ', 'user >> What do you think about the weather? ', "bot >> It's nice. ", 'user >> Should I play outside then? ', "bot >> I'm not sure. "]


In [7]:
responses[-1].split(" >> ")[-1]

"I'm not sure. "

## Application

In [8]:
recognizer = sr.Recognizer()

def listen(): 
    with sr.Microphone() as mic: 
        print("Listening...")
        
        audio = recognizer.listen(mic)
        recognizer.adjust_for_ambient_noise(mic)

        try: 
            text = recognizer.recognize_google(audio)
        except sr.UnknownValueError: 
            text = "[BAD]"
            
    return text

In [9]:
count = 0
def speak(text):
    
    global count
    
    text_object = gTTS(text=text, lang="en", slow=False)
    text_object.save(f"chatbot_recordings/reply_{count}.mp3")

    audio = AudioSegment.from_mp3(f"chatbot_recordings/reply_{count}.mp3")
    play(audio)
    
    count +=1

In [10]:
def get_response(history): 
    responses = [resp for resp in history.split("\n") if resp != ""]
    return responses[-1].split(" >> ")[-1]

In [15]:
conversational_pipeline = pipeline("conversational", model="microsoft/DialoGPT-medium") 
conversation = Conversation()
history = []

print(f"-----Starting up Chatbot-----")

run = True
while run: 
    
    text = listen()
    
    if text == "[BAD]": 
        speak("Sorry, could you repeat that?")
        continue
        
    elif "talk about something else" in text: 
        history.append(conversation)
        conversation = Conversation()
        
    elif any([stop in text for stop in ["bye", "exit", "close"]]): 
        history.append(conversation)
        conversation = Conversation()
        run = False
        
    print(f"user >> {text}")
    conversation.add_user_input(text)   
    
    resp = get_response(str(conversational_pipeline([conversation], pad_token_id=50256)))
    speak(resp)
    print(f"bot >> {resp}")
        
        
print(f"-----Shutting down Chatbot-----")
print(history)

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at microsoft/DialoGPT-medium.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


-----Starting up Chatbot-----
Listening...
Listening...
user >> hey how are you feeling today
bot >> I'm feeling pretty good, thanks for asking. 
Listening...
user >> let's talk about something else
bot >> I'm not sure what you're talking about. 
Listening...
user >> I just got ice cream today
bot >> I'm not sure what you're talking about. 
Listening...
user >> you like ice cream
bot >> I like ice cream 
Listening...
user >> it's creep okay bye
bot >> I'm not going anywhere. 
-----Shutting down Chatbot-----
[Conversation id: 7f67e602-ff40-4e52-8a9f-8ce278f9504d 
user >> hey how are you feeling today 
bot >> I'm feeling pretty good, thanks for asking. 
, Conversation id: 689ecf46-ca18-4e58-ab20-ba6fe8daa710 
user >> let's talk about something else 
bot >> I'm not sure what you're talking about. 
user >> I just got ice cream today 
bot >> I'm not sure what you're talking about. 
user >> you like ice cream 
bot >> I like ice cream 
]
