<a href="https://colab.research.google.com/github/buvir/daily_python_practice/blob/main/Chatbot_with_Python_and_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Setting Up Your Environment

In [None]:
!pip install nltk
!pip install spacy
!pip install transformers



In [None]:
import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')

import spacy
spacy.cli.download("en_core_web_sm")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


Building the Chatbot Step by Step
Step 1: Create a Simple Rule-Based Chatbot
A rule-based chatbot relies on predefined responses. Here’s an example:

In [None]:
def chatbot_response(user_input):
    responses = {
        "hello": "Hi there! How can I help you?",
        "bye": "Goodbye! Have a great day!",
        "how are you": "I’m just a bot, but I’m doing great!"
    }
    return responses.get(user_input.lower(), "I’m sorry, I didn’t understand that.")

# Test the chatbot
while True:
    user_input = input("You: ")
    if user_input.lower() == "exit":
        print("Chatbot: Goodbye!")
        break
    print("Chatbot:", chatbot_response(user_input))

You: "hello": "Hi there! How can I help you?"
Chatbot: I’m sorry, I didn’t understand that.
You: Goodbye! Have a great day
Chatbot: I’m sorry, I didn’t understand that.
You: bye
Chatbot: Goodbye! Have a great day!
You: how are you
Chatbot: I’m just a bot, but I’m doing great!
You: exit
Chatbot: Goodbye!


 Adding NLP for Better Understanding
With NLP, your chatbot can handle more complex interactions. Use libraries like NLTK or spaCy for text preprocessing:

In [None]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [None]:
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

# Preprocess the user input
def preprocess_text(text):
    tokens = word_tokenize(text)
    stop_words = set(stopwords.words('english'))
    filtered_tokens = [word.lower() for word in tokens if word.isalnum() and word not in stop_words]
    return filtered_tokens

user_input = "Hello, how can I help you today?"
print(preprocess_text(user_input))

['hello', 'i', 'help', 'today']


Implementing Intent Recognition
Using pre-trained models, you can identify user intent:

In [None]:
from transformers import pipeline

# Load a zero-shot classification pipeline
classifier = pipeline("zero-shot-classification")

def detect_intent(text):
    candidate_labels = ["greeting", "goodbye", "question"]
    result = classifier(text, candidate_labels)
    return result['labels'][0]

print(detect_intent("Hello, chatbot!"))

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


greeting


You can integrate GPT models for dynamic responses:

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load GPT model
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

def generate_response(prompt):
    inputs = tokenizer.encode(prompt, return_tensors="pt")
    outputs = model.generate(inputs, max_length=50, num_return_sequences=1)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generate_response("Tell me a joke about programming."))

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Tell me a joke about programming.

I'm not sure what you're talking about.

I'm not sure what you're talking about.

I'm not sure what you're talking about.

I'm not sure what
