# Real Time Speech to Translated Speech 

Translating my own voice into a different language, as I speak it!

---
## OpenAI For Translation

Using GPT-4-Turbo to quickly generate nuanced translations. Input for language and sentence in chain can be changed dynamically.

In [1]:
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

translation_template = """
Translate the following sentence into {language}, return ONLY the translation, nothing else.

Sentence: {sentence}
"""

output_parser = StrOutputParser()
llm = ChatOpenAI(temperature=0.0, model="gpt-4-turbo")
translation_prompt = ChatPromptTemplate.from_template(translation_template)

translation_chain = (
    {"language": RunnablePassthrough(), "sentence": RunnablePassthrough()} 
    | translation_prompt
    | llm
    | output_parser
)

def translate(sentence, language="French"):
    data_input = {"language": language, "sentence": sentence}
    translation = translation_chain.invoke(data_input)
    return translation

---
## ElevenLabs For Voice Cloning & Voice Synthesis

Premade voice model on [ElevenLabs Service](https://elevenlabs.io/app/voice-lab), using Multilingual V2 Model for synthesis

**Available Languages:** *Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Japanese, Ukrainian, Greek, Czech, Finnish, Romanian, Russian, Danish, Bulgarian, Malay, Slovak, Croatian, Classic Arabic, Tamil, English, Polish, German, Spanish, French, Italian, Hindi and Portuguese*

In [2]:
from elevenlabs.client import ElevenLabs
from elevenlabs import play, stream

client = ElevenLabs()

def gen_dub(text):
    print("Generating audio...")
    audio = client.generate(
        text=text,
        voice="", # Insert voice model here!
        model="eleven_multilingual_v2"
    )
    play(audio)

---
## AssemblyAI for Speech to Text Streaming

AssemblyAI handles Streaming STT within their own platform. Inserting the above translation and voice generation functions within this workflow.

In [3]:
import assemblyai as aai

def on_open(session_opened: aai.RealtimeSessionOpened):
  "This function is called when the connection has been established."
  print("Session ID:", session_opened.session_id)

def on_data(transcript: aai.RealtimeTranscript):
  "This function is called when a new transcript has been received."
  if not transcript.text:
    return

  if isinstance(transcript, aai.RealtimeFinalTranscript):
    print(transcript.text, end="\r\n")
    print("Translating...")
    translation = translate(str(transcript.text))
    print(f"Translation: {translation}")
    gen_dub(translation)
  else:
    print(transcript.text, end="\r")
      
def on_error(error: aai.RealtimeError):
  "This function is called when the connection has been closed."
  print("An error occured:", error)

def on_close():
  "This function is called when the connection has been closed."
  print("Closing Session")

transcriber = aai.RealtimeTranscriber(
  on_data=on_data,
  on_error=on_error,
  sample_rate=44_100,
  on_open=on_open, # optional
  on_close=on_close, # optional
)

---
## Main Script

(remember to change audio input/output in settings for airpods)

In [4]:
# Start the connection, likely have to restart kernal (runs better as full code in something like VSCode)
transcriber.connect()
microphone_stream = aai.extras.MicrophoneStream()
transcriber.stream(microphone_stream)

Session ID: a106c406-94c5-4615-b1ce-710bbc070207
So here's an example of putting it together and just having the translation into French happen.
Translating...
Translation: Voici donc un exemple de mise en œuvre et de réalisation de la traduction en français.
Perfect. So, as you can see, my voice is being transcribed, and then you'll see it. Sort of pop a little bit into a more formatted thing, which is the final object, and. Then that is what's passed to GPT four turbo and translated.
Translating...
Translation: Parfait. Donc, comme vous pouvez le voir, ma voix est transcrise, et ensuite vous la verrez. Cela ressemble un peu à un passage vers quelque chose de plus formaté, qui est l'objet final, et. Ensuite, c'est ce qui est transmis à GPT quatre turbo et traduit.


In [11]:
transcriber.close()

Closing Session
