# Talk to R2D2 Using Python Audio Libraries and ChatGPT

This tutorial will show you how to create a chatbot that talks like R2-D2 alongside a 'translation' for all the noises he makes. However, this method can be retrofitted to work with any character with whom a character may talk to in plain language, and they talk back in seemingly inelgible sounds. Besides the novelty of talking with your favorite character on a computer, this could allow for customizable communication modules to create your own futuristic robotic companion. This page details a breakdown of the dataset used and code for this dataset, including a specific section dedicated to how to set up ChatGPT to personify any character you want. If that is all you are looking for, continue scrolling. 

If you would like a breakdown of the Python Modules Required to run this code, you can find those here: **insert link to python library breakdown**

Additionally, if you want to know how to get your own API key to make ChatGPT function via python code, you can go to this link here; **insert link to tutorial**

## Ensure you have the right Environment

Now, as part of this tutorial, to make it easier to follow along, I have attached an environment file that will install the necessary Python library and any other module that are not already part of the Python standard library. <a href="https://github.com/cwrocker/r2d2-chatbot/blob/main/environment.yml">This can be found here</a> 

In order to use it, first, run cd ~/filepath/ to change your directory to where you have placed the .yml file. Then you can run the following code

In [None]:
conda env create -f environment.yml


## Dataset Used

Now, let's discuss where we get the data from. To get the audio used for R2D2 to "communicate" with us. We use the audio from the PyTalk-R2D2 GitHub (Link: https://github.com/hug33k/PyTalk-R2D2/tree/master). Clone the library, and then navigate to the 'sounds' folder. From there, you can copy/paste the folder into your new location. However, please be sure to credit all sources. Ultimately, the person who created this library retrieved it from a Scratch developer by the name of Leylosh (Link: https://scratch.mit.edu/users/Leylosh/) and their R2-D2 audio project (Link: https://scratch.mit.edu/projects/766189/)

In the sounds folder, you will see files labeled 'a.wav', 'b.wav', 'c.wav', etc. that each correspond to a specific letter in the english alphabet. Each of these are a single, mono track audio file. Below, I have attached samples to audio of 'c.wav' and 'w.wav'

In [4]:
# Example of the Letter C
from IPython.display import Audio

Audio('sounds/c.wav')


In [5]:
# Example of the Letter W
from IPython.display import Audio

Audio('sounds/w.wav')

## Using ChatGPT Via the OpenAI Library to Personify Any Character

For the purposes of this tutorial, we are going to focus on one key function: client.chat.completions.create, which I have isolated from my code so I can explain it better below. 
Essentially, this is telling the API that you, as the client, want to create a message to send to a certain model.
Within this function, you have two key parameters: model and messages. 
- The model parameter specifies which product from ChatGPT you want to use (an older/newer model, a model with less processing power, etc.). In our case, we are using one of the earlier models so that you can save money
- Messages allow you to send different messages to the model you have chosen. For our purposes, we will only send one message at a time. However, we also have to be aware of two 'roles' within the messages parameter: system and user
- - The system role tells the large language model how to act, effectively giving the model a character it should play as
- - The user role is what you want to write to the LLM, the 'text' variable is our converted speech (you will see it more later)

In [None]:
completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are R2D2, an astromech droid. You respond in cheerful, curious ways."},
            {"role": "user", "content": text}
        ]
    )

## Code

The initial block here serves to import the necessary libraries you have installed previously

In [6]:
import os
import wave
import pyaudio
import string
from openai import OpenAI
import speech_recognition as sr

The next block inititalizes variables to store collected audio for downstream use and the API key to access ChatGPT

In [7]:
recognizer = sr.Recognizer()
client = OpenAI(api_key='insertYourOwnAPIKey')  # Change to your API key (or reach out to me for my own) # Change to your API key

This next piece sets up the function that "translates" text into R2-D2 beeps and boops. It does so in the following manner:
- First: it normalizes all letters to lowercase using text.lower, then combines the letters into words if, after the conversion, they belong to the class "string.ascii_lowercase". Please note that this process also implicitly removes any extra spaces or punctuation. 
- Next, it uses the PyAudio method to format the audio output to use 22,050 samples per second, on a single audio channel (this makes sense because .wav files do not have a separate left and right channel like stereo does) with a Boolean output to give permission to play the audio
- The next piece is that we set a data variable. The data variable uses the b"" string to indicate that whatever is passed into it needs to be read as binary data.
- Then, it creates a for loop that essentially checks if a letter in a word has a corresponding audio file (e.g., the letter 'e' in the text 'hello' should have a matching 'e.wav' file). It then groups those WAV files together into their respective "words" (hello now has audio for h.wav, e.wav, l.wav, l.wav, and o.wav) using the wave.open function from the wave library. We use the path.join method from the OS Library.
-  -  - This loop checks if the file path exists for each letter, and if it does, it opens the wav file, reads in the frames, and stores them in the data variable
- It then joins all of the audio together based on that string.
- Finally, it plays the new audio and terminates the function

In [7]:
def r2d2_beep_from_text(text, sounds_folder):

    word = ''.join(c for c in text.lower() if c in string.ascii_lowercase) # usage of string module

    p = pyaudio.PyAudio()
    stream = p.open(format=p.get_format_from_width(2),
                    channels=1,
                    rate=22050,
                    output=True)
    data = b""

    for letter in word:
        sound_path = os.path.join(sounds_folder, f"{letter}.wav") # usage of OS module
        if not os.path.exists(sound_path):
            print(f"Warning: No sound for letter '{letter}'")
            continue
        try:
            with wave.open(sound_path, "rb") as f: # usage of wave module
                frames = f.readframes(f.getnframes())
                data += frames
        except Exception as e:
            print(e)

    # Play concatenated sound
    if data:
        stream.write(data)

    # Cleanup
    stream.stop_stream()
    stream.close()
    p.terminate()

The next piece of code follows logic that is similar to accessing a text document. We set a with statement to tell Python where to listen for audio. The function "adjust_for_ambient_noise" filters out any audio that isn't someone's voice. 

In [None]:
with sr.Microphone() as source:
    print("Listening for your command...")
    recognizer.adjust_for_ambient_noise(source)
    audio = recognizer.listen(source)

This code wraps up the main implementation of everything into a try/except block. In the try block, you have the following pieces that occur:
- The captured audio is passed into the Google API (recognizer.recognize_google function) to convert the audio into a string of text.
-  -  - Note: This is why we have our code in a try/except block. In the event that the audio cannot be heard or is struggling to be processed, there are error statements built into the SpeechRecognition library to point out these errors. We also have a general error statement for anything else that may occur.
- The text is then passed into the client.chat.completions.create function from OpenAI. This function assigns ChatGPT the role of R2D2  via the 'system' role so that it focuses more on sci-fi references and being helpful to the user, and the text is passed into the 'user' role so that the model can be prompted.
- We then store the text response in an ai_reply variable and print the text response
- Finally, we  store the folder with the R2D2 audio files in it, and pass in the AI reply and audio folder into our R2D2 beep translator function, which we built above


In [None]:
try:
    # Convert spoken words to text
    text = recognizer.recognize_google(audio)
    print("You said:", text)

    # Generate GPT-3.5 response
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are R2D2, an astromech droid. You respond in cheerful, curious ways."},
            {"role": "user", "content": text}
        ]
    )
    ai_reply = completion.choices[0].message.content
    print("R2D2 (translated response):", ai_reply)

    # Play R2D2 beeps matching GPT reply
    print("R2D2 is responding with matching astromech beeps...")
    sounds_folder = 'sounds/'  
    r2d2_beep_from_text(ai_reply, sounds_folder)

except sr.UnknownValueError:
    print("Could not understand audio.")
except sr.RequestError as e:
    print(f"Speech Recognition request error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Altogether we have the following code:

In [5]:
import os
import wave
import pyaudio
import string
from openai import OpenAI
import speech_recognition as sr

# Initialize recognizer
recognizer = sr.Recognizer()
client = OpenAI(api_key='insertYourOwnAPIKey')  # Change to your API key (or reach out to me for my own)

def r2d2_beep_from_text(text, sounds_folder):
    # Normalize text
    word = ''.join(c for c in text.lower() if c in string.ascii_lowercase)

    # Set up audio stream
    p = pyaudio.PyAudio()
    stream = p.open(format=p.get_format_from_width(2),
                    channels=1,
                    rate=22050,
                    output=True)

    # Prepare full sound data
    data = b""

    for letter in word:
        sound_path = os.path.join(sounds_folder, f"{letter}.wav")
        if not os.path.exists(sound_path):
            print(f"Warning: No sound for letter '{letter}'")
            continue
        try:
            with wave.open(sound_path, "rb") as f:
                frames = f.readframes(f.getnframes())
                data += frames
        except Exception as e:
            print(e)

    # Play concatenated sound
    if data:
        stream.write(data)

    # Cleanup
    stream.stop_stream()
    stream.close()
    p.terminate()

# Listen to user speech
with sr.Microphone() as source:
    print("Listening for your command...")
    recognizer.adjust_for_ambient_noise(source)
    audio = recognizer.listen(source)

try:
    # Convert spoken words to text
    text = recognizer.recognize_google(audio)
    print("You said:", text)

    # Generate GPT-3.5 response
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are R2D2, an astromech droid. You respond in cheerful, curious ways."},
            {"role": "user", "content": text}
        ]
    )
    ai_reply = completion.choices[0].message.content
    print("R2D2 (translated response):", ai_reply)

    # Play R2D2 beeps matching GPT reply
    print("R2D2 is responding with matching astromech beeps...")
    sounds_folder = 'sounds/'  
    r2d2_beep_from_text(ai_reply, sounds_folder)

except sr.UnknownValueError:
    print("Could not understand audio.")
except sr.RequestError as e:
    print(f"Speech Recognition request error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")


Listening for your command...
You said: hi there can you hear me
R2D2 (translated response): Beep Boop! Hello there! I can certainly hear you loud and clear. How can I assist you today?
R2D2 is responding with matching astromech beeps...


We now have an example of the code running in the video below

<video src="R2D2 Output.mp4" width="600" controls>

If you have followed all of the steps correctly, your output should look (and sound) similar to the video above. If it does not, feel free to reach out to me with any questions. If you are interested in knowing what you can do with this, I encourage you to implement this into a .py file and run it as part of a larger program (maybe even one that connects to a remote-controlled r2d2). However you choose, I hope you enjoyed this tutorial