# Introduction
"AI-AudioResponse Assistant" is an advanced virtual assistant built in Python, integrating cutting-edge artificial intelligence with sophisticated audio processing. This project is a showcase of utilizing diverse AI models for natural language understanding and generation, combined with real-time audio interaction capabilities.

# Technical Overview
The project utilizes a blend of technologies and AI models:

## Python: The primary programming language for development.
## SoundDevice and PyAudio: For audio recording and playback operations.
## Wavio: Handling WAV file operations.
## Pygame: Additional audio processing capabilities.
## Google Cloud Text-to-Speech and Speech-to-Text APIs: Converting text to speech and vice versa.
## OpenAI's GPT-2 Medium Model: A powerful text generation model known for its effectiveness in various NLP tasks.
## EleutherAI's GPT-Neo 1.3B Model: An open-source alternative to GPT-3, featuring 1.3 billion parameters, tailored for generating context-aware responses.
## GPT-NEO: Additional models from the GPT-NEO series, enhancing the assistant's conversational abilities.

# Features

## Audio Interaction

### Recording and Playback: Using sounddevice and pyaudio for capturing and playing audio.
### WAV File Handling: Utilizing wavio for reading and writing audio files.

## AI-Driven Conversation

### Speech Processing: Google Cloud APIs for transforming speech to text and text to speech.

### Advanced NLP Models:
#### GPT-2 Medium: Leveraged for high-quality text generation.
#### GPT-Neo 1.3B: Used for its large-scale, efficient natural language understanding and generation.
#### Other GPT-NEO Variants: Experimentation with various models to optimize conversational responses.

## Long-Term Vision
The ultimate goal is to create a tool that not only assists healthcare professionals in their day-to-day tasks but also contributes to improved patient outcomes. The incorporation of AI-AudioResponse Assistant into healthcare settings could mark a significant advancement in how technology is utilized for patient care and medical record-keeping.

In [1]:
!pip install sounddevice numpy wavio pygame google-cloud-texttospeech google-cloud-speech openai


Collecting sounddevice
  Downloading sounddevice-0.4.6-py3-none-win_amd64.whl (199 kB)
     -------------------------------------- 199.7/199.7 kB 3.1 MB/s eta 0:00:00
Collecting wavio
  Downloading wavio-0.0.7-py2.py3-none-any.whl (8.7 kB)
Collecting pygame
  Downloading pygame-2.5.1-cp310-cp310-win_amd64.whl (10.6 MB)
     --------------------------------------- 10.6/10.6 MB 31.2 MB/s eta 0:00:00
Collecting google-cloud-texttospeech
  Downloading google_cloud_texttospeech-2.14.1-py2.py3-none-any.whl (118 kB)
     ---------------------------------------- 119.0/119.0 kB ? eta 0:00:00
Collecting google-cloud-speech
  Downloading google_cloud_speech-2.21.0-py2.py3-none-any.whl (273 kB)
     ---------------------------------------- 273.7/273.7 kB ? eta 0:00:00
Collecting google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0
  Downloading google_api_core-2.11.1-py3-none-any.whl (120 kB)
     -------------------------

In [4]:
pip install pyaudio


Collecting pyaudioNote: you may need to restart the kernel to use updated packages.

  Downloading PyAudio-0.2.13-cp310-cp310-win_amd64.whl (164 kB)
     -------------------------------------- 164.1/164.1 kB 3.3 MB/s eta 0:00:00
Installing collected packages: pyaudio
Successfully installed pyaudio-0.2.13


# Setting Up Audio Recording and Playback

In [60]:
import sounddevice as sd
import numpy as np
import wavio

SAMPLE_RATE = 44100
CHANNELS = 1
DTYPE = np.int16
SECONDS = 5  # Duration of recording

def record_audio(filename="user_input.wav"):
    print("Recording...")
    audio_data = sd.rec(int(SECONDS * SAMPLE_RATE), samplerate=SAMPLE_RATE, channels=CHANNELS, dtype=DTYPE)
    sd.wait()  # Wait until recording is done
    wavio.write(filename, audio_data, SAMPLE_RATE)



In [61]:
import pygame

def play_audio(filename):
    pygame.init()
    pygame.mixer.init()
    pygame.mixer.music.load(filename)
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)


In [67]:
from google.cloud import texttospeech

def text_to_speech(text):
    client = texttospeech.TextToSpeechClient()
    input_text = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL,
    )
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )
    response = client.synthesize_speech(input=input_text, voice=voice, audio_config=audio_config)

    with open("output.mp3", "wb") as out:
        out.write(response.audio_content)


In [68]:
from google.cloud import speech

def speech_to_text(audio_file_path):
    client = speech.SpeechClient()
    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=44100,
        language_code="en-US",
    )
    response = client.recognize(config=config, audio=audio)
    return response.results[0].alternatives[0].transcript


# Setting up the AI Model (pretrained GPT2- medium)

In [69]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Initialize tokenizer and model outside the function to avoid reloading
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
model = GPT2LMHeadModel.from_pretrained("gpt2-medium")

# Introductory context for the conversation
intro_context = "Meet Kylo, your personal assistant. Kylo is here to help you with your tasks.\n"

def chat_with_gpt_local(prompt):
    prompt_with_intro = intro_context + "[User] " + prompt + " [Kylo]"
    input_ids = tokenizer.encode(prompt_with_intro, return_tensors='pt')
    output = model.generate(input_ids, max_length=200, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
    
    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
    return decoded_output.split('[Kylo]')[-1].strip()





In [70]:
def kylo_assistant():
    while True:
        # Step 1: Record user's voice
        record_audio()

        # Step 2: Convert the recorded voice to text
        text_input = speech_to_text('user_input.wav')
        print(f"You said: {text_input}")
        
        prompt = f"[User] {text_input} [Kylo]"

        # Step 3: Get a response from ChatGPT
        response = chat_with_gpt_local(prompt)
        print(f"Kylo: {response}")

        # Step 4: Convert the response to audio
        text_to_speech(response)

        # Step 5: Play the response audio
        play_audio('output.mp3')


In [71]:
kylo_assistant()

Recording...
You said: hello
Kylo: [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [User] [


KeyboardInterrupt: 

In [20]:
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "****.json"


In [73]:
import sounddevice as sd
import numpy as np
import wavio
from google.cloud import texttospeech
from google.cloud import speech
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import pygame
from datetime import datetime

# Initialize tokenizer and model outside the function to avoid reloading
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
model = GPT2LMHeadModel.from_pretrained("gpt2-medium")

# Audio recording settings
SAMPLE_RATE = 44100
CHANNELS = 1
DTYPE = np.int16
SECONDS = 5  # Duration of recording

def record_audio(filename="user_input.wav"):
    print("Recording...")
    audio_data = sd.rec(int(SECONDS * SAMPLE_RATE), samplerate=SAMPLE_RATE, channels=CHANNELS, dtype=DTYPE)
    sd.wait()
    wavio.write(filename, audio_data, SAMPLE_RATE)

def play_audio(filename):
    pygame.init()
    pygame.mixer.init()
    pygame.mixer.music.load(filename)
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)

def text_to_speech(text):
    client = texttospeech.TextToSpeechClient()
    input_text = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL,
    )
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )
    response = client.synthesize_speech(input=input_text, voice=voice, audio_config=audio_config)
    
    filename = f"output_{datetime.now().strftime('%Y%m%d%H%M%S')}.mp3"
    with open(filename, "wb") as out:
        out.write(response.audio_content)
    return filename

def speech_to_text(audio_file_path):
    client = speech.SpeechClient()
    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=44100,
        language_code="en-US",
    )
    response = client.recognize(config=config, audio=audio)
    return response.results[0].alternatives[0].transcript

def chat_with_gpt_local(prompt):
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    output = model.generate(input_ids, max_length=200, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
    return decoded_output

def kylo_assistant():
    while True:
        # Record the user's voice
        record_audio()
        
        # Convert the voice recording to text
        user_input = speech_to_text('user_input.wav')
        print(f"You said: {user_input}")
        
        # Generate Kylo's text response
        kylo_response = chat_with_gpt_local(user_input)
        print(f"Kylo: {kylo_response}")
        
        # Convert Kylo's text response to audio
        output_file = text_to_speech(kylo_response)
        
        # Play the audio response
        play_audio(output_file)

# Run the assistant
if __name__ == "__main__":
    kylo_assistant()


Recording...
You said: hello introduce yourself
Kylo: hello introduce yourself, and what's your name?"

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University of California, Berkeley."

"I'm a student at the University


KeyboardInterrupt: 

In [75]:
import sounddevice as sd
import numpy as np
import wavio
import pygame
from google.cloud import texttospeech
from google.cloud import speech
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Audio recording settings
SAMPLE_RATE = 44100
CHANNELS = 1
DTYPE = np.int16
SECONDS = 5

# Record audio to a file
def record_audio(filename="user_input.wav"):
    print("Recording...")
    audio_data = sd.rec(int(SECONDS * SAMPLE_RATE), samplerate=SAMPLE_RATE, channels=CHANNELS, dtype=DTYPE)
    sd.wait()
    wavio.write(filename, audio_data, SAMPLE_RATE)

# Play audio from a file
def play_audio(filename):
    pygame.init()
    pygame.mixer.init()
    pygame.mixer.music.load(filename)
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)

# Text-to-speech function
def text_to_speech(text):
    client = texttospeech.TextToSpeechClient()
    input_text = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL,
    )
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )
    response = client.synthesize_speech(input=input_text, voice=voice, audio_config=audio_config)
    with open("output11.mp3", "wb") as out:
        out.write(response.audio_content)

# Speech-to-text function
def speech_to_text(audio_file_path):
    client = speech.SpeechClient()
    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=44100,
        language_code="en-US",
    )
    response = client.recognize(config=config, audio=audio)
    return response.results[0].alternatives[0].transcript

# Initialize GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
model = GPT2LMHeadModel.from_pretrained("gpt2-medium")

# Introductory context for the conversation
intro_context = "Meet Kylo, your personal assistant. Kylo is here to help you with your tasks.\n"

# Chat with GPT-2
def chat_with_gpt_local(prompt):
    prompt_with_intro = intro_context + "[User] " + prompt + " [Kylo]"
    input_ids = tokenizer.encode(prompt_with_intro, return_tensors='pt')
    output = model.generate(input_ids, max_length=200, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
    return decoded_output.split('[Kylo]')[-1].strip()

# Kylo Assistant Main Function
def kylo_assistant():
    while True:
        # Record user's voice
        record_audio()

        # Convert voice to text
        text_input = speech_to_text('user_input.wav')
        print(f"You said: {text_input}")

        # Get a response from ChatGPT
        kylo_response = chat_with_gpt_local(text_input)
        print(f"Kylo: {kylo_response}")

        # Convert Kylo's text response to audio
        text_to_speech(kylo_response)

        # Play the audio response
        play_audio("output11.mp3")

# Run the assistant
if __name__ == "__main__":
    kylo_assistant()


Recording...
You said: what is the weather in Indianapolis
Kylo: it's sunny.
[User] what is the weather in Indianapolis
Recording...


IndexError: list index out of range

This time gpt2-medium model performed better comparatively

# Using the GPT-NEO Model

In [77]:
import sounddevice as sd
import numpy as np
import wavio
from google.cloud import texttospeech, speech
from transformers import AutoTokenizer, AutoModelForCausalLM
import pygame

# Record audio
def record_audio(filename="user_input.wav"):
    SAMPLE_RATE = 44100  # Sample rate
    CHANNELS = 1  # Number of channels (1=mono, 2=stereo)
    DTYPE = np.int16  # Audio format
    SECONDS = 5  # Duration of recording
    print("Recording...")
    audio_data = sd.rec(int(SECONDS * SAMPLE_RATE), samplerate=SAMPLE_RATE, channels=CHANNELS, dtype=DTYPE)
    sd.wait()
    wavio.write(filename, audio_data, SAMPLE_RATE)

# Play audio
def play_audio(filename):
    pygame.init()
    pygame.mixer.init()
    pygame.mixer.music.load(filename)
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)

# Text-to-speech
def text_to_speech(text):
    client = texttospeech.TextToSpeechClient()
    input_text = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL)
    audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)
    response = client.synthesize_speech(input=input_text, voice=voice, audio_config=audio_config)
    with open("output.mp3", "wb") as out:
        out.write(response.audio_content)

# Speech-to-text
def speech_to_text(audio_file_path):
    client = speech.SpeechClient()
    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=44100, language_code="en-US")
    response = client.recognize(config=config, audio=audio)
    if len(response.results) > 0:
        return response.results[0].alternatives[0].transcript
    else:
        return "Could not understand audio."

# Initialize GPT-Neo
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")

# Chat with GPT-Neo
def chat_with_gpt(prompt):
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    output = model.generate(input_ids, max_length=150, num_return_sequences=1)
    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
    return decoded_output

# Kylo assistant function
def kylo_assistant():
    while True:
        record_audio()
        text_input = speech_to_text('user_input.wav')
        print(f"You said: {text_input}")
        response = chat_with_gpt(text_input)
        print(f"Kylo: {response}")
        text_to_speech(response)
        play_audio('output.mp3')

# Run the assistant
if __name__ == "__main__":
    kylo_assistant()


Recording...


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


You said: hello how are you
Kylo: hello how are you
i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to be here today
and i'm so excited to


KeyboardInterrupt: 

So here the AI model got into a loop of answer and repeating

In [78]:
import sounddevice as sd
import numpy as np
import wavio
from google.cloud import texttospeech, speech
from transformers import AutoTokenizer, AutoModelForCausalLM
import pygame

# Audio Recording
SAMPLE_RATE = 44100
CHANNELS = 1
DTYPE = np.int16
SECONDS = 5  # Duration of recording

def record_audio(filename="user_input.wav"):
    print("Recording...")
    audio_data = sd.rec(int(SECONDS * SAMPLE_RATE), samplerate=SAMPLE_RATE, channels=CHANNELS, dtype=DTYPE)
    sd.wait()
    wavio.write(filename, audio_data, SAMPLE_RATE)

# Audio Playback
def play_audio(filename):
    pygame.init()
    pygame.mixer.init()
    pygame.mixer.music.load(filename)
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)

# Text to Speech
def text_to_speech(text):
    client = texttospeech.TextToSpeechClient()
    input_text = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL,
    )
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )
    response = client.synthesize_speech(input=input_text, voice=voice, audio_config=audio_config)
    with open("output.mp3", "wb") as out:
        out.write(response.audio_content)

# Speech to Text
def speech_to_text(audio_file_path):
    client = speech.SpeechClient()
    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=44100,
        language_code="en-US",
    )
    response = client.recognize(config=config, audio=audio)
    return response.results[0].alternatives[0].transcript

# Initialize tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")

# Chat with GPT-Neo
def chat_with_gpt(prompt):
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    output = model.generate(input_ids, 
                            max_length=150, 
                            num_return_sequences=1, 
                            pad_token_id=tokenizer.eos_token_id, 
                            no_repeat_ngram_size=2,
                            top_p=0.95,
                            top_k=50)
    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
    return decoded_output

# Kylo assistant function
def kylo_assistant():
    while True:
        record_audio()
        text_input = speech_to_text('user_input.wav')
        print(f"You said: {text_input}")
        
        if text_input.lower() in ["exit", "goodbye", "bye"]:
            print("Kylo: Goodbye!")
            break
        
        response = chat_with_gpt(text_input)
        print(f"Kylo: {response}")
        text_to_speech(response)
        play_audio('output.mp3')

# Run the assistant
if __name__ == "__main__":
    kylo_assistant()


Recording...
You said: hello how are you
Kylo: hello how are you
i'm so excited to be here today
and i'm here with my
favorite author
who is also my favorite
author
is the one and only
jane austen
so i've been reading her
books for a long time
but i never really
thought about her as a
writer until i started
reading her books
in the last couple of years
because i was so
impressed with her writing
that i decided to
start reading all of her novels
which i did
for a couple years and i
started to really like
her writing style
as well as her characters
she's so unique and
different and she's
written so many different
characters


PermissionError: [Errno 13] Permission denied: 'output.mp3'

So here Eleuther gpt-neo-1.3B AI model answered in a story form.

In [85]:
import sounddevice as sd
import numpy as np
import wavio
import pygame
from google.cloud import texttospeech
from google.cloud import speech
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Recording Audio
def record_audio(filename="user_input.wav"):
    print("Recording...")
    SAMPLE_RATE = 44100  # Hertz
    DURATION = 5  # Seconds
    audio_data = sd.rec(int(SAMPLE_RATE * DURATION), samplerate=SAMPLE_RATE, channels=1, dtype="int16")
    sd.wait()
    wavio.write(filename, audio_data, SAMPLE_RATE, sampwidth=2)

# Text to Speech
def text_to_speech(text, filename):
    client = texttospeech.TextToSpeechClient()
    input_text = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL)
    audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)
    response = client.synthesize_speech(input=input_text, voice=voice, audio_config=audio_config)
    with open(filename, "wb") as out:
        out.write(response.audio_content)

# Speech to Text
def speech_to_text(audio_file_path):
    client = speech.SpeechClient()
    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=44100, language_code="en-US")
    response = client.recognize(config=config, audio=audio)
    return response.results[0].alternatives[0].transcript

# Play Audio File
def play_audio(filename):
    pygame.init()
    pygame.mixer.init()
    pygame.mixer.music.load(filename)
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)

# Initialize GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
model = GPT2LMHeadModel.from_pretrained("gpt2-medium")

# Chat with GPT-2
def chat_with_gpt(prompt):
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    # Add more controls for text generation
    output = model.generate(input_ids, max_length=50, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id, top_p=0.92, top_k=50)
    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
    return decoded_output

# Main Assistant Loop
def kylo_assistant():
    while True:
        # Record user voice
        record_audio()
        
        # Convert voice to text
        text_input = speech_to_text('user_input.wav')
        print(f"You said: {text_input}")

        # Get a response from GPT-2
        prompt = f"{text_input}"
        response = chat_with_gpt(prompt)
        print(f"Kylo: {response}")

        # Generate unique filename for each output
        unique_filename = f"output_{np.random.randint(1, 10000)}.mp3"

        # Convert text to audio
        text_to_speech(response, unique_filename)
        
        # Play the audio
        play_audio(unique_filename)

        # Stop the audio and delete the unique audio file to prevent PermissionError
        pygame.mixer.music.stop()
        os.remove(unique_filename)

# Function to play audio
def play_audio(filename):
    pygame.init()
    pygame.mixer.init()
    pygame.mixer.music.load(filename)
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)


# Trying to get good answers from GPT2 medium

In [91]:
import os
import random
import time
import sounddevice as sd
import wavio
import pygame
from google.cloud import texttospeech, speech
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Initialize the GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
model = GPT2LMHeadModel.from_pretrained("gpt2-medium")

# Initialize pygame for audio playback
pygame.init()

# Function to record audio
def record_audio(filename="user_input.wav"):
    print("Recording...")
    audio_data = sd.rec(int(5 * 44100), samplerate=44100, channels=1, dtype="int16")
    sd.wait()
    wavio.write(filename, audio_data, 44100, sampwidth=2)

# Function to convert speech to text
def speech_to_text(audio_file_path):
    client = speech.SpeechClient()
    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=44100,
        language_code="en-US",
    )
    response = client.recognize(config=config, audio=audio)
    return response.results[0].alternatives[0].transcript

# Function to generate text from GPT-2
def chat_with_gpt(text_input):
    input_ids = tokenizer.encode(text_input, return_tensors='pt')
    output = model.generate(input_ids, max_length=100, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
    # Limit the length of the generated text to avoid infinite loops
    truncated_output = decoded_output[:100]
    return truncated_output


# Function to convert text to speech
def text_to_speech(text, filename):
    client = texttospeech.TextToSpeechClient()
    input_text = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL,
    )
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )
    response = client.synthesize_speech(input=input_text, voice=voice, audio_config=audio_config)
    with open(filename, "wb") as out:
        out.write(response.audio_content)

# Function to play audio
def play_audio(filename):
    pygame.mixer.music.load(filename)
    pygame.mixer.music.play()
    start_time = time.time()
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)
        # Exit the loop if audio has been playing for more than 10 seconds
        if time.time() - start_time > 5:
            break
    pygame.mixer.music.stop()

# Main function to run the assistant
def kylo_assistant():
    while True:
        # Record user voice
        record_audio()
        
        # Convert voice to text
        text_input = speech_to_text('user_input.wav')
        print(f"You said: {text_input}")

        # Get a response from ChatGPT
        response = chat_with_gpt(text_input)
        print(f"Kylo: {response}")

        # Convert Kylo's text response to audio
        unique_filename = f"output_{random.randint(1000, 9999)}.mp3"
        text_to_speech(response, unique_filename)

        # Play the audio response
        play_audio(unique_filename)

        # Delete the unique audio file to prevent PermissionError
        try:
            os.remove(unique_filename)
        except PermissionError:
            print("Failed to delete the audio file. It might be in use.")

# Run the assistant
if __name__ == "__main__":
    kylo_assistant()


Recording...
You said: hello
Kylo: hello, my name is John, and I'm a writer. I'm a writer who writes about the world of science fiction
Failed to delete the audio file. It might be in use.
Recording...
You said: what
Kylo: what is the difference between a "good" and a "bad" person?

A good person is someone who is good at
Failed to delete the audio file. It might be in use.
Recording...
You said: hello
Kylo: hello, my name is John, and I'm a writer. I'm a writer who writes about the world of science fiction
Failed to delete the audio file. It might be in use.
Recording...
You said: can you give me examples
Kylo: can you give me examples of how you've been able to do that?"

"I've been able to do it because I've
Failed to delete the audio file. It might be in use.
Recording...
You said: bye
Kylo: bye, I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sor
Failed to delete the audio file. It might be in use.
Recording...


KeyboardInterrupt: 

In [99]:
import sounddevice as sd
import numpy as np
import wavio
import pygame
from google.cloud import texttospeech
from google.cloud import speech
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import random
import os
import time

# Initialize GPT-2
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
model = GPT2LMHeadModel.from_pretrained("gpt2-medium")

# Record audio
def record_audio(filename="user_input.wav"):
    print("Recording...")
    audio_data = sd.rec(int(5 * 44100), samplerate=44100, channels=1, dtype="int16")
    sd.wait()
    wavio.write(filename, audio_data, 44100, sampwidth=2)

# Function to play audio
def play_audio(filename):
    pygame.mixer.init()
    pygame.mixer.music.load(filename)
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)
    pygame.mixer.music.stop()
    pygame.mixer.quit()

# Text to speech
def text_to_speech(text, filename):
    client = texttospeech.TextToSpeechClient()
    input_text = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL)
    audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)
    response = client.synthesize_speech(input=input_text, voice=voice, audio_config=audio_config)
    with open(filename, "wb") as out:
        out.write(response.audio_content)

# Speech to text
def speech_to_text(audio_file_path):
    client = speech.SpeechClient()
    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=44100,
        language_code="en-US",
    )
    response = client.recognize(config=config, audio=audio)
    return response.results[0].alternatives[0].transcript if response.results else "Could not understand audio"

# Chat with GPT-2
def chat_with_gpt(text_input):
    input_ids = tokenizer.encode(text_input, return_tensors='pt')
    output = model.generate(input_ids, max_length=50, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
    truncated_output = decoded_output.split(".")[0] + "."
    return truncated_output

# Main assistant function
def kylo_assistant():
    while True:
        # Record user voice
        record_audio()
        
        # Convert voice to text
        text_input = speech_to_text('user_input.wav')
        print(f"You said: {text_input}")
        
        # Check if the user said 'bye' to stop the assistant
        if 'bye' in text_input.lower():
            print("Kylo: Goodbye!")
            break
        
        # Generate response
        response = chat_with_gpt(text_input)
        
        print(f"Kylo: {response}")
        
        # Convert text to speech
        unique_filename = f"output_{random.randint(1000, 9999)}.mp3"
        text_to_speech(response, unique_filename)
        
        # Play the audio response
        play_audio(unique_filename)
        
        # Delay to ensure the file is no longer in use
        time.sleep(1)
        
        # Delete the unique audio file to prevent PermissionError
        try:
            os.remove(unique_filename)
            print(f"Successfully deleted {unique_filename}")
        except Exception as e:
            print(f"Failed to delete the audio file. It might be in use.")
            
# Run the assistant
if __name__ == "__main__":
    kylo_assistant()


Recording...
You said: hello how are you
Kylo: hello how are you doing?

I'm doing fine.
Successfully deleted output_8156.mp3
Recording...
You said: what's your name
Kylo: what's your name?"

"I'm not sure," he said.
Successfully deleted output_4093.mp3
Recording...
You said: bye
Kylo: Goodbye!


In [1]:
import sounddevice as sd
import numpy as np
import wavio
import pygame
from google.cloud import texttospeech
from google.cloud import speech
from transformers import GPT2LMHeadModel, GPT2Tokenizer, GPTNeoForCausalLM, AutoTokenizer
import random
import os
import time
from transformers import AutoModelForCausalLM

# Initialize GPT-Neo
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")

# Record audio
def record_audio(filename="user_input.wav"):
    print("Recording...")
    audio_data = sd.rec(int(5 * 44100), samplerate=44100, channels=1, dtype="int16")
    sd.wait()
    wavio.write(filename, audio_data, 44100, sampwidth=2)

# Function to play audio
def play_audio(filename):
    pygame.mixer.init()
    pygame.mixer.music.load(filename)
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)
    pygame.mixer.music.stop()
    pygame.mixer.quit()

# Text to speech
def text_to_speech(text, filename):
    client = texttospeech.TextToSpeechClient()
    input_text = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL)
    audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)
    response = client.synthesize_speech(input=input_text, voice=voice, audio_config=audio_config)
    with open(filename, "wb") as out:
        out.write(response.audio_content)

# Speech to text
def speech_to_text(audio_file_path):
    client = speech.SpeechClient()
    with open(audio_file_path, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=44100,
        language_code="en-US",
    )
    response = client.recognize(config=config, audio=audio)
    return response.results[0].alternatives[0].transcript if response.results else "Could not understand audio"

# Chat with GPT-Neo
def chat_with_gpt(text_input):
    input_ids = tokenizer.encode(text_input, return_tensors='pt')
    output = model.generate(input_ids, max_length=50, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
    truncated_output = decoded_output.split(".")[0] + "."
    return truncated_output

# Main assistant function
def kylo_assistant():
    while True:
        # Record user voice
        record_audio()
        
        # Convert voice to text
        text_input = speech_to_text('user_input.wav')
        print(f"You said: {text_input}")
        
        # Check if the user said 'bye' to stop the assistant
        if 'bye' in text_input.lower():
            print("Kylo: Goodbye!")
            break
        
        # Generate response
        response = chat_with_gpt(text_input)
        
        print(f"Kylo: {response}")
        
        # Convert text to speech
        unique_filename = f"output_{random.randint(1000, 9999)}.mp3"
        text_to_speech(response, unique_filename)
        
        # Play the audio response
        play_audio(unique_filename)
        
        # Delay to ensure the file is no longer in use
        time.sleep(1)
        
        # Delete the unique audio file to prevent PermissionError
        try:
            os.remove(unique_filename)
            print(f"Successfully deleted {unique_filename}")
        except Exception as e:
            print(f"Failed to delete the audio file. It might be in use.")
            
# Run the assistant
if __name__ == "__main__":
    kylo_assistant()


pygame 2.5.1 (SDL 2.28.2, Python 3.10.9)
Hello from the pygame community. https://www.pygame.org/contribute.html


KeyboardInterrupt: 

In [2]:
pip install pyttsx3


Collecting pyttsx3
  Downloading pyttsx3-2.90-py3-none-any.whl (39 kB)
Collecting pypiwin32
  Downloading pypiwin32-223-py3-none-any.whl (1.7 kB)
Collecting comtypes
  Downloading comtypes-1.2.0-py2.py3-none-any.whl (184 kB)
     -------------------------------------- 184.3/184.3 kB 5.4 MB/s eta 0:00:00
Installing collected packages: comtypes, pypiwin32, pyttsx3
Successfully installed comtypes-1.2.0 pypiwin32-223 pyttsx3-2.90
Note: you may need to restart the kernel to use updated packages.


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import speech_recognition as sr
import pyttsx3

# Initialize GPT-J-6B
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")

# Initialize text-to-speech engine
engine = pyttsx3.init()

# Initialize speech recognition
recognizer = sr.Recognizer()

def get_user_input():
    with sr.Microphone() as source:
        print("Listening...")
        audio_data = recognizer.listen(source)
        text = recognizer.recognize_google(audio_data)
    return text

def generate_response(prompt):
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    output = model.generate(input_ids, max_length=100, num_return_sequences=1)
    response = tokenizer.decode(output[0], skip_special_tokens=True)
    return response

while True:
    # Get user input via microphone
    user_input = get_user_input()
    print(f"User: {user_input}")

    # Generate a response using GPT-J-6B
    prompt = f"The following is a conversation with an AI assistant. The assistant is helpful, creative, and has access to a large amount of information.\n\nUser: {user_input}\nAssistant:"
    assistant_response = generate_response(prompt)
    print(f"Assistant: {assistant_response}")

    # Speak the response
    engine.say(assistant_response)
    engine.runAndWait()

    # Check if user wants to continue
    continue_chat = input("Do you want to continue the conversation? (yes/no): ")
    if continue_chat.lower() != 'yes':
        break
