<a href="https://colab.research.google.com/github/RickBarretto/llm-playground/blob/main/notebooks/tucano-2b4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tucano Gradio Chat Demo 🦜

Aqui temos um exemplo de como criar um Chat UI (user Interface) usando os modelos [Tucano](https://huggingface.co/TucanoBR) e a biblioteca [Gradio](https://www.gradio.app/)! 🚀

In [1]:
# Antes de começar, precisamos instalar `gradio`
!pip install -q gradio

import gradio as gr
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer
from threading import Thread

# Primeiro baixamos o modelo e tokenizador da plataforma Hugging Face
model_id="TucanoBR/Tucano-2b4-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    attn_implementation="sdpa" if torch.cuda.is_available() else "eager"
)

# Usaremos GPU caso GPU esteja disponível em nosso ambiente
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

config.json:   0%|          | 0.00/698 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/632 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]



Loading weights:   0%|          | 0/219 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/282 [00:00<?, ?B/s]

In [3]:
# Aqui estamos definindo um critério para que a geração de tokens acabe quando o token "</s>"
# for porduzido
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = [2]  # 2 é o ID do nosso EOS token (i.e., </s>).
        for stop_id in stop_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

# Função para gerar texto
def predict(message, history):
    stop = StopOnTokens()
    message = "<instruction>" + message + "</instruction>"
    model_inputs = tokenizer([message], return_tensors="pt").to(device)
    streamer = TextIteratorStreamer(tokenizer, timeout=10., skip_prompt=True, skip_special_tokens=True)
    generate_kwargs = dict(
        model_inputs,
        streamer=streamer,
        repetition_penalty=1.2,
        max_new_tokens=1024,
        do_sample=True,
        top_p=1.,
        top_k=50,
        temperature=0.1,
        num_beams=1,
        stopping_criteria=StoppingCriteriaList([stop])
    )
    t = Thread(target=model.generate, kwargs=generate_kwargs)
    t.start()  # Começamos a geração em uma thread separada
    partial_message = ""
    for new_token in streamer:
        partial_message += new_token
        if '</s>' in partial_message:  # Quebramos o loop caso o EOS token seja porduzido
            break
        yield partial_message

# Criando a interface do nosso Gradio chat app.
gr.ChatInterface(predict,
                 title="Tucano 🦜",
                 description="Faça uma pergunta para o Tucano",
                 examples=['Qual a capital do Rio Grande do Sul?', 'Invente uma história sobre um garoto chamado Pelé.']
                 ).launch()  # Pronto! 🚀

  self.chatbot = Chatbot(


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://2d01e5c4c2312ba23d.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


