<a href="https://colab.research.google.com/github/afzal/MiniTalky/blob/main/chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
from transformers import pipeline

# Load the model
model = pipeline("text-generation", model="distilgpt2")
print(model.model) #This shows the raw model inside the pipeline—the actual distilgpt2 architecture.


Device set to use cuda:0


GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-5): 6 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)


In [5]:
from transformers import pipeline
import torch

# Check if GPU is available
if torch.cuda.is_available():
    print("GPU is available! Using:", torch.cuda.get_device_name(0))
else:
    print("No GPU found, using CPU.")

# Load the model
model = pipeline("text-generation", model="distilgpt2", device=0 if torch.cuda.is_available() else -1)
print(model.model) #This shows the raw model inside the pipeline—the actual distilgpt2 architecture.


GPU is available! Using: Tesla T4


Device set to use cuda:0


GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-5): 6 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)


# **GPU**
Added import torch to check for GPU.
Added a check: if torch.cuda.is_available() to confirm GPU usage.
Added device=0 in pipeline to put the model on the GPU (0 is the first GPU; -1 is CPU).

In [6]:
!nvidia-smi

Wed Apr 16 18:24:27 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   64C    P0             30W /   70W |     792MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

asks gpt2 Can’t Really Do

    These are in the pipeline options, but gpt2 isn’t trained for them, so they won’t work well:
        "summarization", "question-answering", "translation", "sentiment-analysis"
            Why? gpt2 generates text forward—it doesn’t understand context backward or analyze text like other models (e.g., BERT or BART).
            Example: If you try pipeline("summarization", model="gpt2"), it’ll fail or give nonsense because gpt2 isn’t built for that.

In [None]:
# Chat loop
while True:
    prompt = input("You: ")
    result = model(prompt, max_length=50, temperature=0.7, top_k=50)
    print("Bot:", result[0]["generated_text"])

In [None]:


# Chat loop
while True:
    prompt = input("You: ")
    result = model(prompt, max_length=50, temperature=0.7, top_k=40, truncation=True)
    # Remove the prompt from the reply
    reply = result[0]["generated_text"].replace(prompt, "").strip()
    print("Bot:", reply)

#**truncation=True**

**truncation=True** is a parameter you might see when working with the transformers library (like in our chatbot code). Let me explain it in very small, simple steps!
What is truncation=True?

    It’s an option you can add when using a pipeline in transformers.
    It tells the model how to handle text that’s too long for it to process.

Step-by-Step Explanation

    Models Have Limits
        Language models like gpt2 or distilgpt2 can only handle a certain number of words (or "tokens") at once.
        For gpt2, this limit is usually 1024 tokens (a token is roughly a word or part of a word).
    What Happens Without Truncation?
        If your input (like a really long prompt) is more than 1024 tokens, the model crashes or throws an error.
        Example: You type a 2000-word story as a prompt—it won’t work.
    What truncation=True Does
        It says: "If the input is too long, cut it down to fit the model’s limit."
        The model keeps only the first 1024 tokens (or whatever the limit is) and ignores the rest.
    Example
        Prompt: "The cat sat on the mat and then ran to the door and kept running all day..."
        If it’s too long, with truncation=True, it might chop it to: "The cat sat on the mat and then ran to the door."
        The rest gets ignored so the model can process it.
    Why Use It?
        Prevents errors when you accidentally give a huge input.
        Keeps things running smoothly.


# **top_k=40**
Step-by-Step: What **top_k=40** Does

    Model Predicts Words
        When you type "Hello," the model looks at all possible next words and gives them probabilities:
            "there" (30%)
            "friend" (20%)
            "how" (15%)
            "to" (10%)
            And tons more (like "zebra" at 0.001%).
    Sorts the List
        It ranks them from highest to lowest probability:
            "there" (30%)
            "friend" (20%)
            "how" (15%) ...and so on.
    Picks the Top 40
        With top_k=40, it takes only the top 40 most likely words.
        So, "there," "friend," "how," etc., are in, but super rare ones like "zebra" (way down the list) are ignored.
    Chooses Randomly from Those 40
        It picks one word from those 40, based on their probabilities.
        Higher probability = more likely, but it’s still a bit random.
    Builds the Reply
        It repeats this for each word until it hits max_length=50.

Example with Your Chatbot

    You type: "You are"
    Model’s top probabilities might be:
        "a" (40%)
        "so" (20%)
        "very" (15%)
        ...down to the 40th word.
    With top_k=40, it picks from those 40 (not the full list of thousands).
    Possible reply: "Bot: You are a great person to chat with today!"

# **temperature=0.7**
The Model Picks Words

    When you type "Hello," the model looks at possible next words: "Hi" (30%), "Hey" (20%), "Greetings" (10%), etc.
    Each word has a probability (chance of being picked).

Temperature Changes Choices

    temperature tweaks these probabilities:
        High temperature (e.g., 1.5): Makes all words more equal, so it might pick weird ones (like "Hello banana").
        Low temperature (e.g., 0.3): Boosts likely words, ignores unlikely ones (sticks to "Hello there").
        temperature=0.7: A middle ground—mostly picks sensible words but allows some variety.

In [None]:
# Chat loop
while True:
    prompt = input("You: ")
    if not prompt.strip():  # If input is empty, skip
        print("Bot: Say something!")
        continue
    full_prompt = f"Human: {prompt} Bot:"  # Add context
    result = model(full_prompt, max_length=50, temperature=0.9, top_k=50, truncation=True)
    #print(result)
    reply = result[0]["generated_text"].replace(full_prompt, "").strip()
    print("Bot:", reply)

In [None]:
while True:
    prompt = input("You: ")
    if prompt.lower() == "quit":
        print("Bot: Bye bye!")
        break
    result = model(prompt, max_length=50, temperature=0.7, top_k=50)
    print("Bot:", result[0]["generated_text"])

In [3]:
# Chat loop
while True:
    prompt = input("You: ")
    # Generate reply
    result = model(prompt, max_length=50, temperature=0.7, top_k=50, truncation = True, return_full_text=False)
    # Show reply without input
    print("Bot:", result[0]["generated_text"])

You: hi


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Bot: 
At the end of the game, you’ll do it. You’ll just go to the next stage and play.




The idea is that you’ll be able to play with your team without
You: how are you


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Bot:  guys?‍





‍







‍



‍



‍
‍


‍
‍
�


KeyboardInterrupt: Interrupted by user

#**return_full_text=False**
Added return_full_text=False to the model call.
This tells the model not to include your input in the output, so it only gives the new part.

In [4]:
from transformers import pipeline

# Load the model and explicitly set pad token
model = pipeline("text-generation", model="distilgpt2")
model.tokenizer.pad_token_id = model.tokenizer.eos_token_id  # Fix padding warning

# Chat loop
while True:
    prompt = input("You: ")
    # Generate reply
    result = model(prompt, max_length=60, min_length=10, temperature=0.7, top_k=50, return_full_text=False)
    # Show reply without input
    print("Bot:", result[0]["generated_text"].strip())

Device set to use cpu


You: hi


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Bot: It's not the first time that you've heard of a new feature called “Spybot.” The feature has been popular for years and has been used as a way to help users track the number of times one or more people have ever been caught on the Internet.


KeyboardInterrupt: Interrupted by user

In [5]:
from transformers import pipeline

# Load the model and explicitly set pad token
model = pipeline("text-generation", model="distilgpt2")
model.tokenizer.pad_token_id = model.tokenizer.eos_token_id  # Fix padding warning

# Chat loop
while True:
    user_input = input("You: ")
    # Create a friendly system prompt
    prompt = f"You are a friendly chatbot. Respond to: '{user_input}'"
    # Generate reply
    result = model(prompt, max_length=40, min_length=5, temperature=0.7, top_k=50, truncation=True, return_full_text=False)
    # Show reply without input
    print("Bot:", result[0]["generated_text"].strip())

Device set to use cpu


You: Hi


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Bot: and let me know how you have made this a success.


KeyboardInterrupt: Interrupted by user

In [6]:
from transformers import pipeline

# Load the model with explicit padding fix
model = pipeline("text-generation", model="distilgpt2", pad_token_id=50256)

# Chat loop
while True:
    user_input = input("You: ")
    # Simple prompt to focus the model
    prompt = f"Respond as a friendly chatbot: {user_input}"
    # Generate reply
    result = model(prompt, max_length=40, min_length=5, temperature=0.6, top_p=0.9, truncation=True, return_full_text=False)
    # Clean and check reply
    reply = result[0]["generated_text"].strip()
    if len(reply) < 3:  # If reply is too short, use a fallback
        reply = "Hey, I'm here! What's up?"
    print("Bot:", reply)

Device set to use cpu


You: Hi
Bot: , I'm glad to see you guys are going to be doing a great job.
You: how are you
Bot: doing?


KeyboardInterrupt: Interrupted by user

In [7]:
from transformers import pipeline

# Load a better model (gpt2) with padding fix
model = pipeline("text-generation", model="gpt2", pad_token_id=50256)

# Chat loop
while True:
    user_input = input("You: ")
    # Stronger prompt to act like a chatbot
    prompt = f"You are a friendly, casual chatbot named Grok. Answer this in a short, fun way: '{user_input}'"
    # Generate reply
    result = model(prompt, max_length=50, min_length=10, temperature=0.8, top_k=50, truncation=True, return_full_text=False)
    # Clean and check reply
    reply = result[0]["generated_text"].strip()
    if len(reply) < 5 or "?" in reply[-1]:  # Catch short or question-like replies
        reply = "Yo, what's good? I'm here to chat!"
    print("Bot:", reply)

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


You: hi
Bot: [for each person.] and 'go home' [for each person]. The more you answer, the more fun it will be


KeyboardInterrupt: Interrupted by user

In [8]:
from transformers import pipeline

# Load gpt2 with padding fix
model = pipeline("text-generation", model="gpt2", pad_token_id=50256)

# Chat loop
while True:
    user_input = input("You: ")
    # Exit if user types 'quit'
    if user_input.lower().strip() == "quit":
        print("Bot: Bye for now!")
        break
    # Simple, dialogue-focused prompt
    prompt = f"Chatbot: Hey! You said '{user_input}'. Here's my reply: "
    # Generate reply
    result = model(prompt, max_length=50, min_length=10, temperature=0.7, top_k=40, truncation=True, return_full_text=False)
    # Clean and check reply
    reply = result[0]["generated_text"].strip()
    # Filter out bad replies (too short, contains code-like text, or irrelevant)
    if len(reply) < 8 or any(char in reply for char in "[]{}<>()") or reply.count(" ") > 20:
        reply = "Hey, what's up? I'm ready to chat!"
    print("Bot:", reply)

Device set to use cpu


You: hi
Bot: ~~Papa!~~~

RAW Paste Data

//F4M~~~ //F4M~~~ //F4M~~~ //F4
You: how are you
Bot: ಠ_ಠ༽ಠ_ಠ༽ಠ_ಠ༽ಠ_�
You: quit
Bot: Bye for now!


In [9]:
from transformers import pipeline

# Load DialoGPT-small with padding fix
model = pipeline("text-generation", model="microsoft/DialoGPT-small", pad_token_id=50256)

# Chat loop
while True:
    user_input = input("You: ")
    # Exit if user types 'quit'
    if user_input.lower().strip() == "quit":
        print("Bot: Catch ya later!")
        break
    # Simple prompt for dialogue
    prompt = f"{user_input} <|endoftext|> Bot: "
    # Generate reply
    result = model(prompt, max_length=40, min_length=5, temperature=0.7, top_k=50, truncation=True, return_full_text=False)
    # Clean and check reply
    reply = result[0]["generated_text"].strip()
    # Filter out bad replies (too short, odd characters, or code-like)
    if len(reply) < 5 or any(char in reply for char in "~|/<>{}[]()") or reply.count(" ") > 15:
        reply = "Hey! What's good?"
    print("Bot:", reply)

config.json:   0%|          | 0.00/641 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/351M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Device set to use cpu


You: hi




Bot: ips amp hl width Not sure if you're serious or not.
You: how are you
Bot: Hey! What's good?
You: quit
Bot: Catch ya later!


In [10]:
from transformers import pipeline

# Load DialoGPT-small with padding fix
model = pipeline("text-generation", model="microsoft/DialoGPT-small", pad_token_id=50256)

# Chat loop
while True:
    user_input = input("You: ")
    # Exit if user types 'quit'
    if user_input.lower().strip() == "quit":
        print("Bot: See ya later!")
        break
    # Clear, dialogue-focused prompt
    prompt = f"User: {user_input} <|endoftext|> Bot: A friendly, casual reply: "
    # Generate reply with sampling enabled
    result = model(prompt, max_length=40, min_length=8, temperature=0.7, top_k=50, do_sample=True, truncation=True, return_full_text=False)
    # Clean and check reply
    reply = result[0]["generated_text"].strip()
    # Filter bad replies (too short, odd characters, or web-like text)
    if len(reply) < 6 or any(char in reply for char in "~|/<>{}[]()&") or any(word in reply.lower() for word in ["amp", "hl", "width"]):
        reply = "Yo, what's up? I'm here!"
    print("Bot:", reply)

Device set to use cpu


You: hi
Bot: irl User : The bot replies to user replies Bot : A friendly bot bot
You: how re you
Bot: Yo, what's up? I'm here!
You: i am good
Bot: Yo, what's up? I'm here!
You: i am learning llm
Bot: Yo, what's up? I'm here!


KeyboardInterrupt: Interrupted by user

In [14]:
from transformers import pipeline

# Load DialoGPT-small with padding fix
model = pipeline("text-generation", model="microsoft/DialoGPT-small", pad_token_id=50256)

# Chat loop
while True:
    user_input = input("You: ")
    # Exit if user types 'quit'
    if user_input.lower().strip() == "quit":
        print("Bot: Catch you later!")
        break
    # Simple, conversational prompt
    prompt = f"{user_input} <|endoftext|> "
    # Generate reply
    result = model(prompt, max_length=40, min_length=8, temperature=0.8, top_k=50, do_sample=True, truncation=True, return_full_text=False)
    # Clean reply
    reply = result[0]["generated_text"].strip()
    # Remove special characters
    reply = "".join(c for c in reply if c.isalnum() or c in " !?.,'").strip()
    # Filter bad replies (too short, gibberish, or prompt-like)
    words = reply.split()
    if len(reply) < 8 or len(words) < 3 or any(char in reply for char in "~|/<>{}[]()&") or any(word.lower() in ["user", "bot", "endoftext"] for word in words) or any(words.count(w) > 2 for w in words):
        reply = "Hey, what's up? Ready to chat!"
    print("Bot:", reply)

Device set to use cpu


You: Hi
Bot: ikr its crazy
You: how are you
Bot: Hey, what's up? Ready to chat!
You: lets play football
Bot: ive not even seen a single ficken commercial.


KeyboardInterrupt: Interrupted by user