# Task
Develop a rule-based chatbot that uses a small dataset of question-answer pairs, applies basic NLP techniques like tokenization, stemming, and keyword extraction, and interacts with users through a command-line interface. Conclude by summarizing its capabilities and demonstrating its functionality with example interactions.

## Data Collection

### Subtask:
Create a small dataset of question-answer pairs relevant to a specific domain for the chatbot to use. This dataset will serve as the knowledge base for the rule-based system.


In [1]:
qa_dataset = [
    {"question": "What is the capital of France?", "answer": "The capital of France is Paris."},
    {"question": "Who painted the Mona Lisa?", "answer": "Leonardo da Vinci painted the Mona Lisa."},
    {"question": "What is the highest mountain in the world?", "answer": "Mount Everest is the highest mountain in the world."},
    {"question": "What is the chemical symbol for water?", "answer": "The chemical symbol for water is H2O."},
    {"question": "How many planets are in our solar system?", "answer": "There are eight planets in our solar system."}
]

print("Question-Answer dataset created successfully.")
print(f"Number of Q&A pairs: {len(qa_dataset)}")
print("First Q&A pair:", qa_dataset[0])

Question-Answer dataset created successfully.
Number of Q&A pairs: 5
First Q&A pair: {'question': 'What is the capital of France?', 'answer': 'The capital of France is Paris.'}


## Implement NLP Techniques

### Subtask:
Develop functions to apply basic NLP techniques such as tokenization, stemming, and keyword extraction to process user input.


In [4]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords

# 2. Download NLTK datasets if not already downloaded
nltk.download('punkt', quiet=True)
nltk.download('stopwords', quiet=True)
nltk.download('punkt_tab', quiet=True) # Added to address LookupError for punkt_tab

print("NLTK 'punkt', 'stopwords', and 'punkt_tab' datasets downloaded/checked successfully.")

# 3. Define tokenization function
def tokenize_text(text):
    return word_tokenize(text)

# 4. Instantiate PorterStemmer
stemmer = PorterStemmer()
print("PorterStemmer instantiated.")

# 5. Define stemming function
def stem_tokens(tokens):
    return [stemmer.stem(token) for token in tokens]

# 6. Get English stopwords
english_stopwords = set(stopwords.words('english'))
print(f"Loaded {len(english_stopwords)} English stopwords.")

# 7. Define keyword extraction function
def extract_keywords(user_input):
    # a. Tokenize the input string
    tokens = tokenize_text(user_input)

    # b. Convert all tokens to lowercase and c. Filter out stopwords and non-alphabetic tokens
    filtered_tokens = [word.lower() for word in tokens if word.isalpha() and word.lower() not in english_stopwords]

    # d. Stem the remaining tokens
    stemmed_keywords = stem_tokens(filtered_tokens)

    # e. Return the list of unique stemmed keywords
    return list(set(stemmed_keywords))

print("NLP utility functions (tokenize_text, stem_tokens, extract_keywords) defined.")

# Demonstrate functionality with an example
example_input = "What is the capital of France? I want to know about it."
tokenized_example = tokenize_text(example_input)
stemmed_example = stem_tokens(tokenized_example)
keywords_example = extract_keywords(example_input)

print(f"\nOriginal input: '{example_input}'")
print(f"Tokenized output: {tokenized_example}")
print(f"Stemmed output: {stemmed_example}")
print(f"Extracted keywords: {keywords_example}")

NLTK 'punkt', 'stopwords', and 'punkt_tab' datasets downloaded/checked successfully.
PorterStemmer instantiated.
Loaded 198 English stopwords.
NLP utility functions (tokenize_text, stem_tokens, extract_keywords) defined.

Original input: 'What is the capital of France? I want to know about it.'
Tokenized output: ['What', 'is', 'the', 'capital', 'of', 'France', '?', 'I', 'want', 'to', 'know', 'about', 'it', '.']
Stemmed output: ['what', 'is', 'the', 'capit', 'of', 'franc', '?', 'i', 'want', 'to', 'know', 'about', 'it', '.']
Extracted keywords: ['capit', 'know', 'want', 'franc']


## Build Rule-Based System

### Subtask:
Implement a rule-based system that uses the extracted keywords and patterns from user queries to map them to appropriate, predefined responses from the collected dataset.


In [5]:
processed_qa_dataset = []
for item in qa_dataset:
    question_keywords = extract_keywords(item['question'])
    processed_qa_dataset.append({
        "question": item['question'],
        "answer": item['answer'],
        "keywords": question_keywords
    })

print("Processed Q&A dataset created successfully with keywords.")
print(f"Number of processed Q&A pairs: {len(processed_qa_dataset)}")
print("First processed Q&A pair:", processed_qa_dataset[0])

Processed Q&A dataset created successfully with keywords.
Number of processed Q&A pairs: 5
First processed Q&A pair: {'question': 'What is the capital of France?', 'answer': 'The capital of France is Paris.', 'keywords': ['capit', 'franc']}


In [6]:
def find_answer(user_query):
    user_keywords = extract_keywords(user_query)
    best_match_score = 0
    best_answer = "I'm sorry, I don't have an answer to that question."

    if not user_keywords:
        return best_answer

    for item in processed_qa_dataset:
        question_keywords = item['keywords']
        # Calculate match score: number of common keywords
        match_score = len(set(user_keywords).intersection(set(question_keywords)))

        if match_score > best_match_score:
            best_match_score = match_score
            best_answer = item['answer']

    # If no keywords matched (score remains 0) and the default answer is still present, ensure it's returned
    if best_match_score == 0:
        return "I'm sorry, I don't have an answer to that question."
    else:
        return best_answer

print("The 'find_answer' function has been defined.")

# Test the find_answer function with example queries
print("\n--- Testing find_answer function ---")

# Example 1: Clear match
query1 = "What is the capital of France?"
response1 = find_answer(query1)
print(f"User: {query1}\nChatbot: {response1}")

# Example 2: Partial match / rephrased query
query2 = "Who painted Mona Lisa?"
response2 = find_answer(query2)
print(f"User: {query2}\nChatbot: {response2}")

# Example 3: Different phrasing of an existing question
query3 = "Highest peak in the world?"
response3 = find_answer(query3)
print(f"User: {query3}\nChatbot: {response3}")

# Example 4: Query with no match
query4 = "Tell me about artificial intelligence."
response4 = find_answer(query4)
print(f"User: {query4}\nChatbot: {response4}")

# Example 5: Another query with no match
query5 = "What is the weather like today?"
response5 = find_answer(query5)
print(f"User: {query5}\nChatbot: {response5}")

# Example 6: Query with multiple common words but not enough matching keywords
query6 = "How many planets are there in our solar system right now?"
response6 = find_answer(query6)
print(f"User: {query6}\nChatbot: {response6}")


The 'find_answer' function has been defined.

--- Testing find_answer function ---
User: What is the capital of France?
Chatbot: The capital of France is Paris.
User: Who painted Mona Lisa?
Chatbot: Leonardo da Vinci painted the Mona Lisa.
User: Highest peak in the world?
Chatbot: Mount Everest is the highest mountain in the world.
User: Tell me about artificial intelligence.
Chatbot: I'm sorry, I don't have an answer to that question.
User: What is the weather like today?
Chatbot: I'm sorry, I don't have an answer to that question.
User: How many planets are there in our solar system right now?
Chatbot: There are eight planets in our solar system.


## Develop User Interface

### Subtask:
Create a simple command-line interface (CLI) that allows users to interact with the chatbot by typing queries and receiving responses.


In [7]:
print("\n--- Chatbot CLI Started ---")
print("Type 'quit', 'exit', or 'bye' to end the conversation.")

while True:
    user_input = input("\nYou: ")
    user_input_lower = user_input.lower()

    if user_input_lower in ['quit', 'exit', 'bye']:
        print("Chatbot: Goodbye! Thanks for chatting.")
        break
    else:
        response = find_answer(user_input)
        print(f"Chatbot: {response}")


--- Chatbot CLI Started ---
Type 'quit', 'exit', or 'bye' to end the conversation.

You: What is the capital of France?
Chatbot: The capital of France is Paris.

You: quiet
Chatbot: I'm sorry, I don't have an answer to that question.

You: quit
Chatbot: Goodbye! Thanks for chatting.


## Final Task

### Subtask:
Summarize the developed chatbot's capabilities, demonstrate its functionality with a few example interactions, and discuss potential next steps for improvement or extensions.
