<a href="https://colab.research.google.com/github/Fuenfgeld/LLM-Utility-Cookbook/blob/main/VoiceToText.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Voice to Text Transcription with the OpenAI Whisper API
This Jupyter notebook demonstrates a Voice-to-Text transcription system using OpenAI's API. 

## Installation of Necessary Libraries 🔧

The notebook starts by installing the necessary libraries - 
`ipywebrtc` is used to handle the audio stream from the user's microphone, `pydub` is used for audio processing, and `openai` is for leveraging OpenAI's APIs for transcription.

In [None]:
!pip install ipywebrtc
!pip install pydub
!pip install openai

fmpeg is a free and open-source software suite used for handling multimedia data. In this case, it's used to convert the audio file to the correct format for transcription.

In [None]:
!apt install ffmpeg

In [None]:
import openai

You need to set your OpenAI API key. The key is used to authenticate your requests to the OpenAI API.

In [None]:
openai.api_key = '' # <- Your openAI API key

This section enables the use of custom widgets in Google Colab. in our case a Audio recorder

In [None]:
from google.colab import output
output.enable_custom_widget_manager()

## Recording the Audio 🎙️

In [None]:
from ipywebrtc import CameraStream, AudioRecorder, WidgetStream

# Create an audio stream. Set 'constraints' to get audio without video.
audio_stream = CameraStream(constraints=
                            {'audio': True, 
                             'video': False})

# Create an audio recorder that uses the audio stream
recorder = AudioRecorder(stream=audio_stream)

# Hit the "Record" button to start recording
# "Recording... " will be displayed while recording
recorder


In [None]:
# Save the recording as a .webm file
recorder.save('recording.webm')

Once we have the audio recording, it's converted from the .webm format to .mp3 format using ffmpeg.

In [None]:
!ffmpeg -i recording.webm -vn -ab 128k -ar 44100 -y recording.mp3

## Transcribing the Audio 📝
Finally, the .mp3 file is transcribed into text using OpenAI's transcription service.

In [None]:
audio_file= open("recording.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)

In [None]:
dictTranscript = transcript.to_dict()

In [None]:
dictTranscript