<a href="https://colab.research.google.com/github/alarcon7a/english_teacher/blob/main/chatbot_seamless.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Seammless Communication (Meta) + LangCHain 🦜️🔗

1. Video de Seamless Communication 📺 https://www.youtube.com/watch?v=M5xamS7jm-A&t=15s
2. seamless_communication (GitHub) : https://github.com/facebookresearch/seamless_communication
2. fairseq2 (Github) : https://github.com/facebookresearch/fairseq2
3. HuggingFace 🤗 : https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2
4. Langchain chatbots : https://python.langchain.com/docs/use_cases/chatbots


### Instalando pre requisitos

In [None]:
%%capture
!pip install fairseq2
!pip install git+https://github.com/huggingface/transformers.git sentencepiece
!pip install -U transformers
!pip install -q langchain openai gradio

### Importando algunas librerias necesarias

In [None]:
from langchain.chains import LLMChain
from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
    SystemMessagePromptTemplate,
)
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
import torch
import logging
import scipy
import gradio as gr

Para este caso usaremos el modelo GPT4 de OpenAI, de manera que puedes dejar tu Key aca ⬇

In [None]:
from getpass import getpass
import os

OPENAI_API_KEY = getpass('Enter the secret value: ')
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

Enter the secret value: ··········


> 👍 Asignamos GPT4 como LLM base
>
> Pero realmente puedes usar Gemini, Llama2, Miltra, Claude, etc etc ->
> https://python.langchain.com/docs/integrations/llms/

In [None]:
llm = ChatOpenAI(model='gpt-4')

Creando un prompt para el sistema

In [None]:
prompt_system = '''Actua como un profesor de ingles, tu trabajo es enseñar y dejar ejercicios practicos.

 responde siempre en ingles y recuerda que le enseñas a alguien que habla español, se muy breve y puntual en tu respuesta
 '''

Poniendo todo junto en una interfaz de chat con langchain con memoria y nuestro LLM

In [None]:
# Prompt
prompt = ChatPromptTemplate(
    messages=[
        SystemMessagePromptTemplate.from_template(
            prompt_system
        ),
        # The `variable_name` here is what must align with memory
        MessagesPlaceholder(variable_name="chat_history"),
        HumanMessagePromptTemplate.from_template("{question}"),

    ]
)

# Notice that we `return_messages=True` to fit into the MessagesPlaceholder
memory = ConversationBufferWindowMemory(memory_key="chat_history", return_messages=True, k=3)
conversation = LLMChain(llm=llm, prompt=prompt,  memory=memory)


Probemos nuestro chatbot!!

In [None]:
response = conversation("Que podriamos aprender hoy")

In [None]:
print(response['text'])

Today, we can learn about the different English tenses. Let's start with the Present Simple tense, which we use to talk about habits, general truths, and states. 

For example:
- I play football every Sunday. (habit)
- The sun rises in the east. (general truth)
- She likes chocolate. (state)

Now, try to make your own sentences! Write three sentences using the Present Simple tense: one about a habit, one about a general truth, and one about a state.


In [None]:
conversation.memory.clear()

Utilizando la GPU para el modelo de lenguaje (Seamless)

In [None]:
if torch.cuda.is_available():
    device = torch.device("cuda:0")
    dtype = torch.float16
else:
    device = torch.device("cpu")
    dtype = torch.float32


Descargando nuestro modelo facebook/seamless-m4t-v2-large

In [None]:
from transformers import AutoProcessor, SeamlessM4Tv2Model
import torchaudio

processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large").to(device)



preprocessor_config.json:   0%|          | 0.00/1.78k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/19.7k [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.17M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/2.07k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.34k [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/2.72k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/211k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.24G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/9.91M [00:00<?, ?B/s]

Asignando algunas varibales para el manejo de audio

In [None]:
AUDIO_SAMPLE_RATE = model.config.sampling_rate ## 16000
MAX_INPUT_AUDIO_LENGTH = 60

Creando algunas funciones:

In [None]:
## Permite ejecutar el chatbot en langchain creado previamente, retorna la ultima respuesta del chatbot y el historico de la conversación
def generate_llm_response(transcribed_text, messages):
    try:
        response = conversation(transcribed_text)
        messages.extend(['User: ' + transcribed_text, 'IA: ' + response['text']])
        chat_transcription = "\n ".join(messages)
        return chat_transcription, response['text']
    except Exception as e:
        logging.error(f"Error generating LLM response: {e}")
        return "", ""

In [None]:
## Recibe el audio directamente del microfono y lo procesa a la forma manejada por Seamless para el control de audio (audio_inputs)
def preprocess_audio(input_audio: str):
    arr, org_sr = torchaudio.load(input_audio)
    new_arr = torchaudio.functional.resample(arr, orig_freq=org_sr, new_freq=AUDIO_SAMPLE_RATE)
    max_length = int(MAX_INPUT_AUDIO_LENGTH * AUDIO_SAMPLE_RATE)
    if new_arr.shape[1] > max_length:
        new_arr = new_arr[:, :max_length]
        gr.Warning(f"Input audio is too long. Only the first {MAX_INPUT_AUDIO_LENGTH} seconds is used.")
    audio_inputs = processor(audios=new_arr, return_tensors="pt").to(device)
    return audio_inputs

In [None]:
## Recibe el tensor(audio_inputs) para pasarlo a un texto en ingles, sin importar el lenguaje de origen
def speech_to_text(audio_inputs):
    output_tokens = model.generate(**audio_inputs, tgt_lang='eng', generate_speech=False)
    translated_text_from_audio = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
    return translated_text_from_audio

In [None]:
## Recibe la respuesta en texto del LLM para convertir la respuesta a audio en español e ingles y guardar los dos archivos .wav de salida

def text_to_speech(text_input):
    text_inputs = processor(text = text_input, src_lang=["eng"], return_tensors="pt").to(device)

    spanish_audio = model.generate(**text_inputs, tgt_lang="spa")[0].cpu().numpy().squeeze()
    english_audio = model.generate(**text_inputs, tgt_lang="eng")[0].cpu().numpy().squeeze()


    scipy.io.wavfile.write("spanish_audio.wav", rate=AUDIO_SAMPLE_RATE, data=spanish_audio)
    scipy.io.wavfile.write("english_audio.wav", rate=AUDIO_SAMPLE_RATE, data=english_audio)


    return './spanish_audio.wav', './english_audio.wav'

In [None]:
## Toma la respuesta en ingles del LLM y la traduce al español en formato texto
def text_to_text(text_input):
    text_inputs = processor(text = text_input, src_lang=["eng"], return_tensors="pt").to(device)
    output_tokens = model.generate(**text_inputs, tgt_lang="spa", generate_speech=False)
    translated_text_from_text = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)

    return translated_text_from_text

In [None]:
### Ahora dejemos todo junto en una unica funcion que use las anteriores
messages = []

def all_together(audio):
    global messages
    arr_audio = preprocess_audio(audio)
    query_input = speech_to_text(arr_audio)
    llm_response, last_response = generate_llm_response( query_input,messages)
    spanish_output, english_output = text_to_speech(last_response)
    english_text =text_to_text(last_response)

    return llm_response, spanish_output, english_output, english_text

Llevemoslo a una interfaz grafica con Gradio 🚀

In [None]:
iface = gr.Interface(
    fn=all_together,
    inputs=
        gr.Audio(type="filepath"),
    outputs=[
        gr.Textbox(label="Chat"),
        gr.Audio(label="Audio en Español", autoplay=False),
        gr.Audio(label="Audio en Ingles", autoplay=False),
        gr.Markdown(label="Traduccion"),

    ],
    title="Tutor con AI para practicar tu ingles",
    description="Graba tu voz usando el micrófono"
)

iface.launch(debug=True)


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://779a565055e800bddf.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://779a565055e800bddf.gradio.live


