<a href="https://colab.research.google.com/github/afzal/MiniTalky/blob/main/chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from transformers import pipeline

# Load the model
model = pipeline("text-generation", model="distilgpt2")
print(model.model) #This shows the raw model inside the pipeline—the actual distilgpt2 architecture.


asks gpt2 Can’t Really Do

    These are in the pipeline options, but gpt2 isn’t trained for them, so they won’t work well:
        "summarization", "question-answering", "translation", "sentiment-analysis"
            Why? gpt2 generates text forward—it doesn’t understand context backward or analyze text like other models (e.g., BERT or BART).
            Example: If you try pipeline("summarization", model="gpt2"), it’ll fail or give nonsense because gpt2 isn’t built for that.

In [None]:
# Chat loop
while True:
    prompt = input("You: ")
    result = model(prompt, max_length=50, temperature=0.7, top_k=50)
    print("Bot:", result[0]["generated_text"])

In [None]:


# Chat loop
while True:
    prompt = input("You: ")
    result = model(prompt, max_length=50, temperature=0.7, top_k=40, truncation=True)
    # Remove the prompt from the reply
    reply = result[0]["generated_text"].replace(prompt, "").strip()
    print("Bot:", reply)

#**truncation=True**

**truncation=True** is a parameter you might see when working with the transformers library (like in our chatbot code). Let me explain it in very small, simple steps!
What is truncation=True?

    It’s an option you can add when using a pipeline in transformers.
    It tells the model how to handle text that’s too long for it to process.

Step-by-Step Explanation

    Models Have Limits
        Language models like gpt2 or distilgpt2 can only handle a certain number of words (or "tokens") at once.
        For gpt2, this limit is usually 1024 tokens (a token is roughly a word or part of a word).
    What Happens Without Truncation?
        If your input (like a really long prompt) is more than 1024 tokens, the model crashes or throws an error.
        Example: You type a 2000-word story as a prompt—it won’t work.
    What truncation=True Does
        It says: "If the input is too long, cut it down to fit the model’s limit."
        The model keeps only the first 1024 tokens (or whatever the limit is) and ignores the rest.
    Example
        Prompt: "The cat sat on the mat and then ran to the door and kept running all day..."
        If it’s too long, with truncation=True, it might chop it to: "The cat sat on the mat and then ran to the door."
        The rest gets ignored so the model can process it.
    Why Use It?
        Prevents errors when you accidentally give a huge input.
        Keeps things running smoothly.


# **top_k=40**
Step-by-Step: What **top_k=40** Does

    Model Predicts Words
        When you type "Hello," the model looks at all possible next words and gives them probabilities:
            "there" (30%)
            "friend" (20%)
            "how" (15%)
            "to" (10%)
            And tons more (like "zebra" at 0.001%).
    Sorts the List
        It ranks them from highest to lowest probability:
            "there" (30%)
            "friend" (20%)
            "how" (15%) ...and so on.
    Picks the Top 40
        With top_k=40, it takes only the top 40 most likely words.
        So, "there," "friend," "how," etc., are in, but super rare ones like "zebra" (way down the list) are ignored.
    Chooses Randomly from Those 40
        It picks one word from those 40, based on their probabilities.
        Higher probability = more likely, but it’s still a bit random.
    Builds the Reply
        It repeats this for each word until it hits max_length=50.

Example with Your Chatbot

    You type: "You are"
    Model’s top probabilities might be:
        "a" (40%)
        "so" (20%)
        "very" (15%)
        ...down to the 40th word.
    With top_k=40, it picks from those 40 (not the full list of thousands).
    Possible reply: "Bot: You are a great person to chat with today!"

# **temperature=0.7**
The Model Picks Words

    When you type "Hello," the model looks at possible next words: "Hi" (30%), "Hey" (20%), "Greetings" (10%), etc.
    Each word has a probability (chance of being picked).

Temperature Changes Choices

    temperature tweaks these probabilities:
        High temperature (e.g., 1.5): Makes all words more equal, so it might pick weird ones (like "Hello banana").
        Low temperature (e.g., 0.3): Boosts likely words, ignores unlikely ones (sticks to "Hello there").
        temperature=0.7: A middle ground—mostly picks sensible words but allows some variety.

In [None]:
# Chat loop
while True:
    prompt = input("You: ")
    if not prompt.strip():  # If input is empty, skip
        print("Bot: Say something!")
        continue
    full_prompt = f"Human: {prompt} Bot:"  # Add context
    result = model(full_prompt, max_length=50, temperature=0.9, top_k=50, truncation=True)
    #print(result)
    reply = result[0]["generated_text"].replace(full_prompt, "").strip()
    print("Bot:", reply)

In [None]:
while True:
    prompt = input("You: ")
    if prompt.lower() == "quit":
        print("Bot: Bye bye!")
        break
    result = model(prompt, max_length=50, temperature=0.7, top_k=50)
    print("Bot:", result[0]["generated_text"])

In [None]:
from transformers import pipeline

# Load the model
model = pipeline("text-generation", model="distilgpt2")

# Chat loop
while True:
    prompt = input("You: ")
    # Generate reply
    result = model(prompt, max_length=50, temperature=0.7, top_k=50, truncation = True, return_full_text=False)
    # Show reply without input
    print("Bot:", result[0]["generated_text"])

#**return_full_text=False**
Added return_full_text=False to the model call.
This tells the model not to include your input in the output, so it only gives the new part.