<a href="https://colab.research.google.com/github/DevilNReality/Qwen2_Powered_Chatbot/blob/main/Code%20File%20/%20Qwen2_Powered_ChatBot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 # **🔧 Section 1: Setup & Dependencies**

In [1]:
# Install required libraries
!pip install transformers datasets --quiet

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# Define the pre-trained model name
model_name = "Qwen/Qwen2-0.5B-Instruct"

# Load the tokenizer for the specified model
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the pre-trained causal language model
# device_map="auto" automatically handles model placement on available devices (like GPU)
# trust_remote_code=True is needed for some custom models
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    trust_remote_code=True
)

# Set the padding token to be the same as the end-of-sequence token
# This is important for batch processing in training and inference
tokenizer.pad_token = tokenizer.eos_token

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/659 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/988M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

# **🧩 Section 2: System Prompt & History Initialization**:

In [3]:
# Initialize the chat history with a system prompt and an initial assistant message
# This sets the persona and starts the conversation
chat_history = [
    {"role": "system", "content": "You are a helpful, friendly AI assistant. Respond conversationally and clearly."},
    {"role": "assistant", "content": "Hi there! I'm your friendly AI assistant. How can I help you today?"}
]

# **🧩 Section 3: Response Generation Logic**

In [4]:
def generate_response(user_input, model, tokenizer, max_length=512):
    # Add the user's input to the chat history
    chat_history.append({"role": "user", "content": user_input})

    # Build the prompt string from the chat history
    # Each turn is formatted with special tokens like <|system|>, <|user|>, <|assistant|>
    prompt = ""
    for turn in chat_history:
        if turn["role"] == "system":
            prompt += f"<|system|>\n{turn['content']}\n"
        elif turn["role"] == "user":
            prompt += f"<|user|>\n{turn['content']}\n"
        elif turn["role"] == "assistant":
            prompt += f"<|assistant|>\n{turn['content']}\n"
    # Add the assistant token at the end to prompt the model to generate the next assistant response
    prompt += "<|assistant|>\n"

    # Tokenize the prompt and move it to the model's device
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    # Generate a response from the model
    outputs = model.generate(
        **inputs,
        max_length=max_length,  # Set the maximum length of the generated response
        pad_token_id=tokenizer.eos_token_id, # Use the end-of-sequence token for padding
        do_sample=True,         # Enable sampling for more diverse responses
        top_k=50,               # Consider the top 50 most likely tokens
        top_p=0.95,             # Use nucleus sampling (consider tokens that sum up to 95% probability)
        temperature=0.7         # Control the randomness of the output (lower means less random)
    )

    # Decode the generated tokens back into text
    # Skip special tokens and split the output to get only the last assistant message
    reply = tokenizer.decode(outputs[0], skip_special_tokens=True).split("<|assistant|>\n")[-1].strip()

    # Add the generated assistant reply to the chat history
    chat_history.append({"role": "assistant", "content": reply})

    # Return only the content of the last assistant message
    return chat_history[-1]["content"]

# **💬 Section 4: Gradio UI Integration**

In [5]:
import gradio as gr

# Create a Gradio chatbot component initialized with the chat history
chatbot = gr.Chatbot(value=chat_history, type='messages')

# Create a Gradio ChatInterface
gr.ChatInterface(
    fn=lambda user_input, history: generate_response(user_input, model, tokenizer), # The function to call when the user sends a message
    title="Chat with AI", # Title of the chat interface
    description="A conversational assistant powered by Qwen2-0.5B-Instruct", # Description of the chat interface
    chatbot=chatbot # Link the ChatInterface to the chatbot component
).launch(debug = True) # Launch the Gradio interface, debug=True provides detailed logs



It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://25c499ffbe43205a75.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://25c499ffbe43205a75.gradio.live




# **🧩 Section 5: Instruction Tuning Dataset Prep**

In [None]:
import pandas as pd

# Define the training data as a dictionary
data = {
    "input": [
        "Summarize the concept of photosynthesis.",
        "Translate 'Good morning' to French.",
        "List three uses of artificial intelligence."
    ],
    "output": [
        "Photosynthesis is the process by which green plants convert sunlight, carbon dioxide, and water into energy, releasing oxygen as a byproduct.",
        "Bonjour",
        "1. Personalized recommendations in apps and websites\n2. Autonomous vehicles and robotics\n3. Fraud detection in banking and finance"
    ]
}

# Create a pandas DataFrame from the data
df = pd.DataFrame(data)

# Save the DataFrame to a CSV file named "train_data.csv"
# index=False prevents writing the DataFrame index as a column
df.to_csv("train_data.csv", index=False)

In [None]:
from datasets import load_dataset

# Load the dataset from the CSV file
# "csv" specifies the format, data_files points to the file, and split="train" loads the training split
dataset = load_dataset("csv", data_files="train_data.csv", split="train")

# Define a function to format each example in the dataset
# It takes an example (a row from the dataset) and formats it into a prompt string
# using the special tokens expected by the model
def format_example(example):
    prompt = f"<|user|>\n{example['input']}\n<|assistant|>\n{example['output']}"
    return {"text": prompt}

# Apply the formatting function to each example in the dataset
# .map() applies the function to each element and returns a new dataset
formatted_dataset = dataset.map(format_example)

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/3 [00:00<?, ? examples/s]

In [None]:
from transformers import AutoTokenizer

# Load the tokenizer again for processing the fine-tuning dataset
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct")

# Set the padding token to the end-of-sequence token
tokenizer.pad_token = tokenizer.eos_token

# Define a function to tokenize the formatted examples
# This function takes an example (which now has a 'text' field)
# and converts the text into token IDs that the model understands
# padding="max_length" ensures all sequences have the same length by adding padding tokens
# truncation=True cuts off sequences longer than the model's maximum input length
def tokenize(example):
    return tokenizer(example["text"], padding="max_length", truncation=True)

# Apply the tokenization function to the formatted dataset
# batched=True processes examples in batches, which is more efficient
tokenized_dataset = formatted_dataset.map(tokenize, batched=True)

Map:   0%|          | 0/3 [00:00<?, ? examples/s]

# **🧩 Section 6: Fine-Tuning Loop**

In [None]:
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer

# Define the model name again
model_name = "Qwen/Qwen2-0.5B-Instruct"

# Load the pre-trained model for fine-tuning
model = AutoModelForCausalLM.from_pretrained(model_name)

# Load the tokenizer again
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Set the padding token
tokenizer.pad_token = tokenizer.eos_token

In [None]:
# Define the training arguments for the Trainer
training_args = TrainingArguments(
    output_dir="./qwen2-instruct-tuned",  # Directory to save model checkpoints and outputs
    per_device_train_batch_size=2, # Batch size per GPU/device during training
    num_train_epochs=3,           # Number of training epochs
    logging_steps=10,             # Log training progress every X steps
    save_steps=50,                # Save a model checkpoint every X steps
    save_total_limit=2,           # Limit the total number of saved checkpoints
    weight_decay=0.01,            # Apply weight decay to prevent overfitting
    fp16=True,                    # Use mixed precision training (faster on supported hardware)
    report_to="none"              # Do not report training metrics to external services
)

In [None]:
# Initialize the Trainer
# The Trainer is a class that simplifies the training loop for 🤗 Transformers models
trainer = Trainer(
    model=model,               # The model to train
    args=training_args,        # The training arguments defined above
    train_dataset=tokenized_dataset, # The tokenized training dataset
    tokenizer=tokenizer        # The tokenizer used for the model and data
)

# Start the training process
trainer.train()

  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151645}.


# **🧩 Section 7: Evaluation**

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define the path to the fine-tuned model
model_path = "./qwen2-instruct-tuned"

# Load the fine-tuned model from the specified path
model = AutoModelForCausalLM.from_pretrained(model_path)

# Load the tokenizer associated with the fine-tuned model
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Set the padding token for the loaded tokenizer
tokenizer.pad_token = tokenizer.eos_token

In [None]:
# Define a function to test the fine-tuned model with a given prompt
def test_prompt(prompt, max_length=512):
    # Format the user prompt with the required tokens
    formatted = f"<|user|>\n{prompt}\n<|assistant|>\n"

    # Tokenize the formatted prompt and move it to the model's device
    inputs = tokenizer(formatted, return_tensors="pt").to(model.device)

    # Generate a response using the fine-tuned model
    outputs = model.generate(
        **inputs,
        max_length=max_length, # Set the maximum length of the generated response
        pad_token_id=tokenizer.eos_token_id, # Use the end-of-sequence token for padding
        do_sample=True,         # Enable sampling
        top_k=50,               # Consider top_k tokens
        top_p=0.95,             # Use top_p sampling
        temperature=0.7         # Set the temperature
    )

    # Decode the generated output and extract the assistant's reply
    reply = tokenizer.decode(outputs[0], skip_special_tokens=True).split("<|assistant|>\n")[-1].strip()

    # Return the generated reply
    return reply

In [None]:
# Test the fine-tuned model with example prompts
print(test_prompt("Translate 'Good night' to French."))
print(test_prompt("Give me three uses of AI."))
print(test_prompt("Summarize the concept of gravity."))