<a href="https://colab.research.google.com/github/ddrbcn/voicegpt/blob/main/Voice_interaction_using_Elevenlabs%2C_OpenAI's_Whisper_and_chatGPT_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🤖 🔊 **Use ChatGPT with your voice!** using OpenAI's Whisper and the OpenAI and ElevenLabs APIs. ⭐⭐⭐⭐⭐

This notebook will help you use the OpenAI and ElevenLabs APIs to generate text using artificial intelligence. Follow the instructions to input your API keys and select a voice. You will only need to run the following code cells step by step.

📽️ *The notebook is based on the [original notebook](https://colab.research.google.com/drive/1qY-6J4UpKZ0tOmhNmh0Ci6MSiCo6xBTP?usp=sharing) created by [DotCSV](https://www.youtube.com/channel/UCy5znSnfMsDwaLlROnZ7Qbg) and GPT-4. (in Spanish.)*

🐦 *Don't forget to follow it on [Twitter](https://twitter.com/dotCSV) to stay updated on their latest posts and projects.*


**I also used content from the [notebook:](https://colab.research.google.com/github/fastforwardlabs/whisper-openai/blob/master/WhisperDemo.ipynb#scrollTo=v5hvo8QWN-a9)**
# Make your own recordings and transcriptions with OpenAI's Whisper!
_a fun diversion brought to you by [Melanie](https://www.linkedin.com/in/melanierbeck/), ML Research Manager at Cloudera Fast Forward Labs_



## **Step 1:** Access and register to the OpenAI and ElevenLabs APIs


> <p>✏️ <b>OpenAI Website</b> <i>(Text Generation)</i>
<br>
<a href="https://platform.openai.com/account/api">https://platform.openai.com/account/api-keys</a>

<p>🔊 <b>ElevenLabs Website</b> <i>(Text-to-Speech Synthesis)</i>
<br>
<a href="https://beta.elevenlabs.io/">https://beta.elevenlabs.io/</a>

## **STEP 2:** Configure your API access.



## Installs and imports
The commands below will install the Python packages needed to use GPT model, Elevenlabs voices, record audio snippets and use Whisper models for speech-to-text transcription.

In [None]:
!pip install -q openai
!pip install -q elevenlabs

import os
import openai
import tempfile
import requests
from IPython.display import Audio, clear_output
from elevenlabs import generate, play, set_api_key, voices, Models

In [None]:
! pip install git+https://github.com/openai/whisper.git
! pip install sounddevice wavio
! pip install ipywebrtc notebook

Get the key APIs of both tools and add them to the following form.


In [None]:
#@title
openai_api_key     = "" #@param {type:"string"}
eleven_api_key = "" #@param {type:"string"}

# Configure GPT-4 and Text-to-speech API keys
openai.api_key = openai_api_key
set_api_key(eleven_api_key)

voice_list = voices()

We also need the following in order to record audio from this notebook and process the resulting files.

In [None]:
!apt install ffmpeg
!apt-get install libportaudio2

In [None]:
import os
import numpy as np

try:
    import tensorflow  # required in Colab to avoid protobuf compatibility issues
except ImportError:
    pass

import torch
import pandas as pd
import whisper
import torchaudio

from ipywebrtc import AudioRecorder, CameraStream
from IPython.display import Audio, display
import ipywidgets as widgets

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

## **STEP 3:** Select the voice to use.

Run the code in the next cell and choose the voice you want to interact with. If you have cloned your voice on the *ElevenLabs* website, you will see the voice you created included in the list.

*   List item
*   List item



In [None]:
#@title
import ipywidgets as widgets

voice_labels = [voice.category + " voice: " + voice.name for voice in voice_list]

voice_id_dropdown = widgets.Dropdown(
    options=voice_labels,
    value=voice_labels[0],
    description="Selecciona una voz:",
)

display(voice_id_dropdown)

## **STEP 4:** Select language options.

**Whisper is capable of performing transcriptions for many languages** (though it performs better for some languages and worse for others.)

Whisper is also capable of detecting the input language. However, **to be on the safe side, we can explicitly tell Whisper which language to expect**.

In [None]:
language_options = whisper.tokenizer.TO_LANGUAGE_CODE
language_list = list(language_options.keys())

In [None]:
lang_dropdown = widgets.Dropdown(options=language_list, value='english')
output = widgets.Output()
display(lang_dropdown)

Whisper is also capable of several tasks, including English-only transcription, Any-to-English translation, and non-English transcription.

Below you can select either "transcription" (which will yield text in the same language as the input language).

In [None]:
task_dropdown = widgets.Dropdown(options=['transcribe', 'translate'], value='transcribe')
output = widgets.Output()
display(task_dropdown)

## **STEP 5:** Load Whisper model

Whisper comes in five model sizes, four of which also have an optimized English-only version. This notebook loads "base"-sized models (bigger than "tiny" but smaller than the others), which **require about 1GB of RAM**.

If you selected English above, the cell below will load the optimized English-only version. Otherwise, it will load the multilingual model.

In [None]:
if lang_dropdown.value == "english":
  model = whisper.load_model("base.en")
else:
  model = whisper.load_model("base")
print(
    f"Model is {'multilingual' if model.is_multilingual else 'English-only'} "
    f"and has {sum(np.prod(p.shape) for p in model.parameters()):,} parameters."
)

In [None]:
options = whisper.DecodingOptions(language=lang_dropdown.value, task=task_dropdown.value, without_timestamps=True, fp16=False)
options

We need to enable some Colab widgets so that we can make an audio recording.

In [None]:
from google.colab import output
output.enable_custom_widget_manager()

## **STEP 6:** Set Up and Interact with ChatGPT

You can choose below **which version of ChatGPT you want to talk to**. Please note that the GPT-4-based version comes at a higher cost than the GPT-3.5 model. Refer to the pricing table at the following link before using it.

👉 [**ChatGPT Pricing Table**](https://openai.com/pricing)

You can also **customize the behavior of the ChatGPT model** by modifying the system message.

In [None]:
#@title Configuración de ChatGPT.
chatgpt_model = "gpt-3.5-turbo" #@param ["gpt-3.5-turbo", "gpt-4"]

chatgpt_system = "You are a helpful assistant on a conversation. Answer should be not too long. Be kind and nice." #@param {type:"string"}

# Encuentra el índice de la opción seleccionada
selected_voice_index = voice_labels.index(voice_id_dropdown.value)
selected_voice_id    = voice_list[selected_voice_index].voice_id

# Function to get GPT-4 response
def get_gpt4_response(prompt):
    response = openai.ChatCompletion.create(
        model=chatgpt_model,
        messages=[
            {"role": "system", "content": chatgpt_system},
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content

# Main function to interact with GPT-4
def interact_with_gpt4(prompt):
    response_text = get_gpt4_response(prompt)

    import requests

    CHUNK_SIZE = 1024
    url = "https://api.elevenlabs.io/v1/text-to-speech/" + selected_voice_id

    headers = {
      "Accept": "audio/mpeg",
      "Content-Type": "application/json",
      "xi-api-key": eleven_api_key
    }

    data = {
      "text": response_text,
      "model_id" : "eleven_multilingual_v2",
      "voice_settings": {
         "stability": 0.5,
         "similarity_boost": 0.75,
         "style": 0,
          #"use_speaker_boost": true
      }
    }

    response = requests.post(url, json=data, headers=headers)

    # Save audio data to a temporary file
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as f:
        for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
            if chunk:
                f.write(chunk)
        f.flush()
        temp_filename = f.name

    return temp_filename



### Time to record the prompt!

Press the circle button and start speaking. **It may not look it, but the widget will be capturing sound**. Click the circle button again when you are finished. The widget will immediately begin to play back what it captured.

In [None]:
camera = CameraStream(constraints={'audio': True,'video':False})
recorder = AudioRecorder(stream=camera)
recorder

###Sending the prompt to GPT

In [None]:
clear_output(wait=True)

with open('recording.webm', 'wb') as f:
    f.write(recorder.audio.value)
!ffmpeg -i recording.webm -ac 1 -f wav my_recording.wav -y -hide_banner -loglevel panic

audio = whisper.load_audio("my_recording.wav")
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
result = model.decode(mel, options)
prompt = result.text
print(prompt)

audio_file = interact_with_gpt4(prompt)
play(audio_file, notebook=True)