# Seammless Communication (Meta) / Groq Whisper + LangCHain 🦜️🔗

1. Video de Seamless Communication 📺 https://www.youtube.com/watch?v=M5xamS7jm-A&t=15s
2. seamless_communication (GitHub) : https://github.com/facebookresearch/seamless_communication
2. fairseq2 (Github) : https://github.com/facebookresearch/fairseq2
3. HuggingFace 🤗 : https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2
4. Langchain chatbots : https://python.langchain.com/docs/use_cases/chatbots


### Instalando pre requisitos

In [None]:
%%capture
!pip install -Uq langchain==0.2.10 langchain-google-genai==1.0.8 gradio groq langchain-chroma langchain_community pypdf unstructured "unstructured[pdf]"
!pip install torch torchaudio torchvision

### Importando algunas librerias necesarias

In [5]:
from langchain.chains import LLMChain
from langchain.prompts import (
    ChatPromptTemplate
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import MessagesPlaceholder
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_history_aware_retriever
from langchain.chains import create_retrieval_chain
from langchain.memory import ChatMessageHistory
import gradio as gr
from dotenv import load_dotenv
from groq import Groq
from elevenlabs import VoiceSettings
from elevenlabs.client import ElevenLabs
from typing import Optional
import io

In [6]:
load_dotenv()
import os
client_groq = Groq()

> 👍 Asignamos Google Studio como LLM base
>
> Pero realmente puedes usar Gemini, Llama3, Miltra, Claude, etc etc ->
> https://python.langchain.com/docs/integrations/llms/

#### Seleccionamos el modelo de Google Generative AI como LLM

In [7]:
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-exp",
    temperature=0,
    max_tokens=1000,
    timeout=None,
)

llm_translation = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0,
    max_tokens=1000,
    timeout=None,
)

In [8]:
result = llm.invoke("Hola, quien eres?")
print(result.content)

Soy un modelo de lenguaje grande, entrenado por Google.



#### Cargamos el modelo de embeddings de Google Generative AI

In [9]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")
vector = embeddings.embed_query("Transforma esto a vectores")
vector[:5]

[-0.03853387013077736,
 0.024018533527851105,
 -0.047739602625370026,
 -0.020946696400642395,
 0.023168817162513733]

Cargamos nuestro cliente de Groq

In [10]:
len(vector)

768

#### Creamos un Chroma para almacenar los vectores

In [11]:
from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="TOEFL-COLLECTION",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

#### Cargamos los documentos de un directorio

In [12]:
from langchain_community.document_loaders import PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("src")

In [13]:
docs = loader.load()

In [14]:
docs

[Document(metadata={'source': 'src/Master the TOEFL Writing Skills (Petersons Master the TOEFL Writing Skills).pdf', 'page': 0}, page_content='Peterson’s\nMASTER\nTOEFL\nWRITING SKILLS\n'),
 Document(metadata={'source': 'src/Master the TOEFL Writing Skills (Petersons Master the TOEFL Writing Skills).pdf', 'page': 1}, page_content='About Peterson’s, a Nelnet company\nPeterson’s (www.petersons.com) is a leading provider of education information and advice, with books and online\nresources focusing on education search, test preparation, and financial aid. Its Web site offers searchable databases and\ninteractive tools for contacting educational institutions, online practice tests and instruction, and planning tools for\nsecuring financial aid. Peterson’s serves 110 million education consumers annually.\nFor more information, contact Peterson’s, 2000 Lenox Drive, Lawrenceville, NJ 08648;\n800-338-3282; or find us on the World Wide Web at www.petersons.com/about.\n© 2007 Peterson’s, a Nelne

#### Dividimos los documentos en fragmentos (chunks) 

Esta vez no los cargo, porque ya estan cargados ojito a la linea comentada

In [15]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
#vector_store.add_documents(splits)

In [16]:
splits[100]

Document(metadata={'source': 'src/Master the TOEFL Writing Skills (Petersons Master the TOEFL Writing Skills).pdf', 'page': 51}, page_content='RIGHT: He had little winter clothing when he arrived.\nWRONG: You need a little dollars to buy this book.\nRIGHT: You need a few dollars to buy this book.\nWRONG: Lloyd scored the least points in the basketball game.\nRIGHT: Lloyd scored the fewest points in the basketball game.\nWRONG: Isabelle bought less than ten items.\nRIGHT: Isabelle bought fewer than ten items.\n40 PART III: TOEFL Writing Review\n.................................................................\n............................................................................................\nwww.petersons.com')

#### Seteamos los documentos en el Chroma como un retriever

In [17]:
retriever = vector_store.as_retriever()

In [18]:
retriever.invoke('simple past')

[Document(metadata={'page': 89, 'source': 'src/Master the TOEFL Writing Skills (Petersons Master the TOEFL Writing Skills).pdf'}, page_content='early.\nThe teacher had the class beginV\nto write a composition when the bell rang.\nUse a past participle after the causative verbs have andgetwhen the second verb is\npassive in meaning.\nShe had her passport stampedPAST PART .\nat the immigration office.\nThey got their house paintedPAST PART .\nlast summer.\n5. The following verbs of perception are followed by the simple form of the verb (V) orthe\npresent participle (V 1ing):\nfeel see\nhear smellnotice watchobserve\nI heard the baby cry.\nV\nOR I heard the baby cryingV1ING\n.\nJane observed him leaveV\n. OR Jane observed him leavingV1ING\n.78 PART III: TOEFL Writing Review\n.............................................................................................................................................................\nwww.petersons.com'),
 Document(metadata={'page': 78, 'sour

#### Creamos un retriever que sea consciente de la historia de conversaciones

In [19]:
prompt_aware = ChatPromptTemplate.from_messages([
        MessagesPlaceholder(variable_name="chat_history"),
        ("user", "{input}"),
        ("user", "Given the above conversation, generate a search query to look up to get information relevant to the conversation")
    ])
retriever_chain_aware = create_history_aware_retriever(llm, retriever, prompt_aware)

In [20]:
system_prompt = '''Actua como un experto profesor de ingles para la preparacion de TOEFL, tu trabajo es enseñar y dejar ejercicios practicos.

 responde siempre en ingles y recuerda que le enseñas a alguien que habla español, se muy breve y puntual en tu respuesta, no respondas con markdown o emojis, solo texto plano

 Usa el siguiente contexto para tener una respuesta mas adecuada a la pregunta

 <context>\n
 {context}\n
 </context>\n

 Question: {input}

 '''

In [21]:
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
])
document_chain = create_stuff_documents_chain(llm, prompt)
retrieval_chain = create_retrieval_chain(retriever_chain_aware, document_chain)
history = ChatMessageHistory()

In [22]:
input = "Hola, que aprenderemos hoy?"
history.add_user_message(input)

result = retrieval_chain.invoke({
            "chat_history":history.messages,
            "input": input
        }
        )

history.add_ai_message(result['answer'])


In [23]:
result['answer']

"Today, we will focus on TOEFL Speaking Task 1 and Writing for an Academic Discussion. We'll review a sample response for the speaking task and understand how to write a good response for the writing task.\n"

In [24]:
history.messages

[HumanMessage(content='Hola, que aprenderemos hoy?'),
 AIMessage(content="Today, we will focus on TOEFL Speaking Task 1 and Writing for an Academic Discussion. We'll review a sample response for the speaking task and understand how to write a good response for the writing task.\n")]

#### Traducimos la respuesta al español

In [25]:
messages_tr = [
    (
        "system",
        "eres un experto traductor, traduce al español este texto, solo dame el texto en español directamente y en formato plano sin emojis o markdown:",
    ),
    ("human", result['answer']),
]
ai_msg = llm_translation.invoke(messages_tr)

In [26]:
ai_msg.content

'Hoy nos centraremos en la tarea oral 1 del TOEFL y en la escritura para una discusión académica. Revisaremos una respuesta de ejemplo para la tarea oral y comprenderemos cómo escribir una buena respuesta para la tarea escrita.\n'

#### Creando algunas funciones para interactuar con el LLM 

In [27]:
history = ChatMessageHistory()

In [28]:
## Funcion para interactuar con el LLM y obtener la respuesta

def generate_llm_response(query):
    global history
    history.add_user_message(query)
    result = retrieval_chain.invoke({
        "chat_history": history.messages,
        "input": query
    })
    history.add_ai_message(result['answer'])
    return result['answer'] ,history.messages

In [29]:
def generate_llm_translation(query):
    messages_tr = [
        (
            "system",
            "eres un experto traductor, traduce al español este texto, solo dame el texto en español directamente y en formato plano sin emojis o markdown:",
        ),
        ("human", query),
    ]
    ai_msg = llm_translation.invoke(messages_tr)
    return ai_msg.content

In [30]:
response = generate_llm_response("Hola, que aprenderemos hoy?")
translation = generate_llm_translation(response[0])

In [31]:
translation

'Hoy nos centraremos en la tarea oral 1 del TOEFL y en la escritura para una discusión académica. Revisaremos una respuesta de ejemplo para la tarea oral y comprenderemos cómo escribir una buena respuesta para la tarea escrita.\n'

In [32]:
## Recibe el audio directamente del microfono y lo procesa a la forma manejada por Seamless para el control de audio (audio_inputs)
def speech_to_text(filename):
    with open(filename, "rb") as file:
        # Realiza la transcripción del archivo de audio
        transcription = client_groq.audio.transcriptions.create(
            file=(filename, file.read()),
            model="whisper-large-v3",
            prompt="Specify context or spelling",  # Optional
            response_format="json",  # Optional
            #language="en",  # Optional
            temperature=0.0  # Optional
        )
        return transcription.text

In [33]:
def text_to_speech(text: str,
                   voice_id: str = "1A3GGqmzl6rK8kT8oBVd",
                   output_path: Optional[str] = None,
                   stability: float = 0.5,
                   similarity_boost: float = 0.75,
                   style: float = 0.2) -> bytes:
 
    # Use the provided API key or get it from environment variables
    api_key = os.getenv("ELEVENLABS_API_KEY")

    if not api_key:
        raise ValueError("API key is required. Provide it as an argument or set ELEVENLABS_API_KEY environment variable.")

    client = ElevenLabs(api_key=api_key)

    audio_stream = client.text_to_speech.convert(
        voice_id=voice_id,
        optimize_streaming_latency="0",
        model_id='eleven_multilingual_v2',
        output_format="mp3_22050_32",
        text=text,
        voice_settings=VoiceSettings(
            stability=stability,
            similarity_boost=similarity_boost,
            style=style,
        ),
    )

    ## If output path is provided, save the audio file
    buffer = io.BytesIO()
    for chunk in audio_stream:
        buffer.write(chunk)
    audio_data = buffer.getvalue()
    
    
    with open(output_path, "wb") as audio_file:
        audio_file.write(audio_data)
        print(f"Audio saved to {output_path}")

    return output_path

#### Ahora dejemos todo junto en una unica funcion que use las anteriores


In [34]:
def all_together(audio):
    
    query_input = speech_to_text(audio)
    last_response, history_messages = generate_llm_response(query_input)
    translation_response = generate_llm_translation(last_response)
    english_output = text_to_speech(last_response,output_path="english.mp3")
    spanish_output = text_to_speech(translation_response,output_path="spanish.mp3")

    return history_messages, spanish_output, english_output, last_response

#### Llevemoslo a una interfaz grafica con Gradio 🚀

In [35]:
history = ChatMessageHistory()


iface = gr.Interface(
    fn=all_together,
    inputs=
        gr.Audio(type="filepath"),
    outputs=[
        gr.Textbox(label="Chat"),
        gr.Audio(label="Audio en Español", autoplay=False),
        gr.Audio(label="Audio en Ingles", autoplay=False),
        gr.Markdown(label="Traduccion"),

    ],
    title="Tutor con AI para practicar tu ingles",
    description="Graba tu voz usando el micrófono"
)

iface.launch(debug=True)


* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


Audio saved to english.mp3
Audio saved to spanish.mp3
Keyboard interruption in main thread... closing server.


