<a href="https://colab.research.google.com/github/LoniQin/lifelong-ml/blob/main/Chatbot_with_Qwen2_and_Gradio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chatbot with Qwen2 and Gradio

## Intruduction

In this notebook, we will build a chatbot using Qwen2, a language model from the Hugging Face Transformers library, integrated with Gradio for a user-friendly interface. Qwen2 is a powerful model capable of generating human-like text responses based on input messages. Gradio simplifies the deployment of machine learning models into interactive applications, making it ideal for creating a chatbot interface.


## Setup and Installation

	1.	Install Required Libraries: Begin by installing the necessary Python libraries, including Gradio and Transformers from Hugging Face.
	2.	Load Qwen2 Model: Utilize Hugging Face’s Transformers library to load the Qwen2 model. Qwen2 is pretrained on large datasets, enabling it to generate coherent and context-aware responses.

Install Required Libraries: Begin by installing the necessary Python libraries, including Gradio and Transformers from Hugging Face.

In [None]:
!pip install -q accelerate gradio

In [1]:
import transformers
from tqdm.notebook import tqdm
import pandas as pd
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import numpy as np
import pickle
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)
print(f"Version: {transformers.__version__}")
import os

Version: 4.41.2


Load Qwen2 Model: Utilize Hugging Face’s Transformers library to load the Qwen2 model. Qwen2 is pretrained on large datasets, enabling it to generate coherent and context-aware responses.

In [2]:
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-7B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-7B-Instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.safetensors.index.json:   0%|          | 0.00/27.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.95G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.56G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/243 [00:00<?, ?B/s]


This code snippet generates a response from a large language model based on a given prompt. First, it sets up a conversation context where the system introduces itself and prompts the user with a specified message. The tokenizer.apply_chat_template function prepares the messages for input into the model, ensuring they are formatted appropriately for dialogue generation. The model_inputs are then tokenized and converted into PyTorch tensors for compatibility with the specified device. Using model.generate, the model generates a response, ensuring that it produces up to 512 new tokens beyond the input tokens. Finally, the generated response is decoded using the tokenizer to remove special tokens, and the resulting response is printed to the console. This process allows for the interactive generation of text-based responses from a language model based on user prompts, facilitating natural and context-aware interactions.

In [6]:
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"response:{response}")

response:A large language model is a type of artificial intelligence that has been trained on massive amounts of text data to understand and generate human-like language. These models are designed to be able to perform various natural language processing tasks, such as language translation, text summarization, question answering, text generation, sentiment analysis, and more.

These models typically consist of multiple layers of interconnected nodes that process and transform input data into output predictions. The training process involves feeding the model vast amounts of text data, allowing it to learn patterns and relationships within the language.

The most prominent example of a large language model is the GPT (Generative Pre-trained Transformer, which was developed by the AI research lab OpenAI. However, there are other models like BERT, T5, and many others from different organizations and companies. 

The key advantage of these models is their ability to generalize and adapt to

## Building the Chatbot

Implement a Python function that interacts with Qwen2. This function processes user messages, maintains conversation history, and generates responses using Qwen2. Leverage Gradio to create a graphical interface for the chatbot. Gradio simplifies the setup of user input fields and output displays, making the chatbot accessible via a web browser.

In [38]:
import gradio as gr
def echo(message, history):
    messages = []
    for item in history:
      messages.append({"role": "user", "content": item[0]})
      messages.append({"role": "system", "content": item[1]})
    messages.append({"role": "user", "content": message})
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(device)

    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=512
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response

demo = gr.ChatInterface(fn=echo, examples=["hello"], title="ChatBot")
demo.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://59d4b9daccbfc8ef3f.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




Close demo if needed.

In [33]:
demo.close()

Closing server running on port: 7862


## Conclusions
By the end of this notebook, you will have developed a fully functional chatbot powered by Qwen2 and accessible through a user-friendly web interface created with Gradio. This project showcases the seamless integration of advanced natural language processing models with interactive applications, enabling practical deployment of AI-driven conversational agents.

This overview sets the stage for your notebook, guiding you through the steps to create a sophisticated chatbot using Qwen2 and Gradio, emphasizing ease of use and powerful AI capabilities.