<a href="https://colab.research.google.com/github/ThaissaTeodoro/chatgpt_with_whisper_and_python/blob/main/chatgpt_voice_whisper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Download used libraries
!pip install gTTS
!pip install openai
!pip install git+https://github.com/openai/whisper.git -q

# References and documentation:
# https://gist.github.com/korakot/c21c3476c024ad6d56d5f48b0bca92be
# https://github.com/openai/whisper#available-models-and-languages
# https://platform.openai.com/docs/api-reference/introduction
# https://help.openai.com/en/articles/4936830
# https://platform.openai.com/account/api-keys


In [42]:
language = 'en'

In [43]:
# All imports
from IPython.display import Audio, display, Javascript
from google.colab import output
from base64 import b64decode
import whisper
import openai
from gtts import gTTS
import os

In [44]:
# First voice message heard
message_start = "Hello, how can chatgpt help you?"

# Gtts will take the text and synthesize it into voice
print("synthesizing audio and saving")
gtts_object = gTTS(text=message_start, lang=language, slow=False)
voice_start = "/content/dio/whisper/audio/start.wav"
gtts_object.save(voice_start)

print("playing audio")
display(Audio(voice_start, autoplay=True))


synthesizing audio and saving
playing audio


In [40]:

RECORD = """
const sleep  = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
  const reader = new FileReader()
  reader.onloadend = e => resolve(e.srcElement.result)
  reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
  stream = await navigator.mediaDevices.getUserMedia({ audio: true })
  recorder = new MediaRecorder(stream)
  chunks = []
  recorder.ondataavailable = e => chunks.push(e.data)
  recorder.start()
  await sleep(time)
  recorder.onstop = async ()=>{
    blob = new Blob(chunks)
    text = await b2text(blob)
    resolve(text)
  }
  recorder.stop()
})
"""
# function that uses javascript code to listen to what is said and save
def record(sec=10):
  print("executing code in javascript")
  display(Javascript(RECORD))

  print("storing and saving response")
  result = output.eval_js('record(%d)' % (sec*1000))
  voice_record = b64decode(result.split(',')[1])
  with open('/content/dio/whisper/audio/audio.wav','wb') as f:
    f.write(voice_record)
  return '/content/dio/whisper/audio/audio.wav'

print("recording")
record_file = record()

print("playing said audio")
display(Audio(record_file, autoplay=True))


recording
executing code in javascript


<IPython.core.display.Javascript object>

storing and saving response
playing said audio


In [45]:
# Select the Whisper model
model = whisper.load_model("small")

print("Transcribing previously recorded audio:")
result = model.transcribe(record_file, fp16=False, language=language)
transcription = result["text"]
print(f'transcription{transcription}')


Transcribing previously recorded audio:
transcription What is each about doctor who?


In [46]:
# Replace the text "KEY" with your OpenAI API Key, it will be saved as an environment variable.
os.environ['OPENAI_API_KEY'] = 'KEY'

In [48]:
openai.api_key = os.environ.get('OPENAI_API_KEY')

# Sends a request to the ChatCompletion API using the GPT-3.5 Turbo model
response = openai.ChatCompletion.create(
    model= "gpt-3.5-turbo",
    messages = [{"role": "user", "content": transcription}]
)

# Gets the response generated by ChatGPT
chatgpt_response = response.choices[0].message.content
print(f'response generated by chatgpt {chatgpt_response}')

response generated by chatgpt Doctor Who is a long-running British science fiction television series that first aired in 1963. It follows the adventures of the Doctor, a Time Lord from the planet Gallifrey who travels through time and space in a spaceship called the TARDIS (Time and Relative Dimension in Space). 

The Doctor is a regenerating character, meaning that when the actor portraying him reaches the end of their tenure, the Doctor regenerates into a new form, allowing for the casting of a new actor. This concept has allowed the series to continue for several decades with different actors playing the Doctor.

Each episode typically features the Doctor and their companions encountering various alien species and facing different threats across time and space. The series blends elements of science fiction, adventure, comedy, and drama. Doctor Who has become known for its imaginative storylines, iconic villains, and the exploration of complex themes such as time travel, morality, an

In [49]:
# Creates a gTTS object with the response generated by ChatGPT and the language that will be synthesized in voice ("language" variable)
gtts_object = gTTS(text=chatgpt_response, lang=language, slow=False)

# Saves the response audio to the specified file (default Google Colab folder)
response_voice = "/content/dio/whisper/audio/response_chat.wav"
gtts_object.save(response_voice)

print("voice response generated by chatgpt:")
display(Audio(response_voice, autoplay=True))

voice response generated by chatgpt:
