# Introduction
This notebook implements a Wikipedia-based chatbot using Python. The chatbot can fetch information from Wikipedia, process the content, and interact with users by answering questions about a given topic. It uses Natural Language Processing (NLP) techniques for text processing and similarity matching to provide relevant answers.


####**Technologies Used:**
*   Python libraries: requests, BeautifulSoup, nltk, scikit-learn, numpy
*   Machine learning techniques: TF-IDF and cosine similarity
*   NLP tools: Tokenization, stopword removal, lemmatization

#Installing Required Libraries
The following commands install the necessary libraries required for this project:

In [None]:
!pip install beautifulsoup4 -q
!pip install requests -q
!pip install nltk -q
!pip install scikit-learn -q
!pip install numpy -q

These libraries are essential for web scraping, natural language processing, and text-based machine learning tasks.

#Importing Libraries
This section imports the libraries used in the project.Each library serves the following purposes:

*   `requests`: To fetch content from Wikipedia.
*   `BeautifulSoup`: For parsing and extracting HTML content.
*   `nltk`: For natural language processing tasks such as tokenization and lemmatization.
*   `string`: To handle string manipulation.
*   `scikit-learn`: For TF-IDF vectorization and similarity calculations.
*   `numpy`: For numerical operations.
*   `time`: To add delays for a better chatbot interaction experience.
*   `logging`: For debugging and tracking the chatbot's internal processes.

In [None]:
# Required libraries
import requests
from bs4 import BeautifulSoup
import nltk
from string import punctuation
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from time import sleep
import logging

#Setup and Constants
This section initializes logging, downloads required NLTK data, and defines constants used throughout the chatbot:
*   `Logging`: Used for tracking the chatbot's operations.
*   `NLTK Data`: Downloads essential language processing datasets such as stopwords and lemmatization data.
*   `EXIT_COMMANDS`: Words to terminate the chat.
*   `THANKS_COMMANDS`: Words for recognizing gratitude.
*   `MORE_COMMAND`: Command to request additional details about the topic.

In [None]:
# Set up logging
logging.basicConfig(level=logging.INFO)

# Download necessary NLTK data
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt_tab')

# Constants for exit commands and special inputs
EXIT_COMMANDS = ['bye', 'quit', 'exit']
THANKS_COMMANDS = ['thanks', 'thank you', 'thanks for your help', 'thank you very much']
MORE_COMMAND = 'more'


#WikipediaChatBot Class
This class defines the chatbot's functionality. Below are its main components:

1.   `__init__`:
    *   Initializes the chatbot with necessary variables like terminate_chat, topic_set, and sentences.
    *   Displays a welcome message with instructions for the user.
2.   `display_welcome_message`:
    *   Prints a greeting and explains how to use the chatbot.
3.   `start_chat`:
    *   Main loop that listens to user input and responds accordingly.
    *   Handles special commands like "bye", "more", and "thanks".
4.   `fetch_wikipedia_content`:
    *   Fetches content from Wikipedia for a given topic.
    *   Parses the webpage to extract paragraphs and sentences for processing.
5.   `preprocess_text`:
    *   Preprocesses text by removing punctuation, stopwords, and applying lemmatization.
6.   `handle_user_query`:
    *   Responds to user queries by finding the most relevant content using TF-IDF and cosine similarity.
7.   `provide_more_info`:
    *   Provides additional details about the last discussed topic if requested by the user.

In [None]:
class WikipediaChatBot:
    def __init__(self):
        """Initialize the chatbot with necessary data and flags."""
        self.terminate_chat = False
        self.topic_set = False
        self.topic_title = None
        self.paragraphs = []
        self.sentences = []
        self.current_response_index = None
        self.paragraph_indices = []
        self.punctuation_removal = str.maketrans('', '', punctuation)
        self.lemmatizer = nltk.stem.WordNetLemmatizer()
        self.stopwords = set(nltk.corpus.stopwords.words('english'))
        self.display_welcome_message()

    def display_welcome_message(self):
        """Display the initial greeting and instructions."""
        print("Initializing ChatBot...")
        sleep(2)
        print('Type "bye", "quit", or "exit" to end the chat.')
        sleep(2)
        print("\nEnter a topic of interest, and I'll fetch data from Wikipedia.")
        sleep(3)
        print('If you want more details after my response, type "more".')
        sleep(3)
        print('-' * 50)

    def start_chat(self):
        """Main chat loop for user interaction."""
        while not self.terminate_chat:
            user_input = input("User    >> ").strip().lower()
            if user_input in EXIT_COMMANDS:
                self.terminate_chat = True
                print("ChatBot >> Goodbye! Have a great day!")
                sleep(1)
            elif user_input == MORE_COMMAND:
                self.provide_more_info()
            elif user_input in THANKS_COMMANDS:
                print("ChatBot >> You're welcome!")
            elif not self.topic_set:
                self.fetch_wikipedia_content(user_input)
            else:
                self.handle_user_query(user_input)

    def fetch_wikipedia_content(self, topic: str):
        """Fetch content from Wikipedia for the given topic."""
        topic = '_'.join(topic.title().split())
        url = f'https://en.wikipedia.org/wiki/{topic}'
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            self.topic_title = soup.find('h1').string
            self.paragraphs = [p.get_text(strip=True) for p in soup.find_all('p')]
            self.sentences = []
            self.paragraph_indices = []

            for i, paragraph in enumerate(self.paragraphs):
                sentences = nltk.sent_tokenize(paragraph)
                self.sentences.extend(sentences)
                self.paragraph_indices.extend([i] * len(sentences))

            self.topic_set = True
            print(f"ChatBot >> Topic set to '{self.topic_title}'. Let's chat!")
        except requests.exceptions.RequestException as e:
            print(f"ChatBot >> Network error: {e}. Please check your connection.")
        except Exception as e:
            print(f"ChatBot >> Couldn't fetch the topic. Error: {e}. Try a different topic!")

    def preprocess_text(self, text: str) -> list[str]:
        """Preprocess text by removing punctuation, stopwords, and lemmatizing."""
        words = nltk.word_tokenize(text.lower().translate(self.punctuation_removal))
        return [self.lemmatizer.lemmatize(word) for word in words if word not in self.stopwords]

    def handle_user_query(self, query: str):
        """Handle user queries by finding the most relevant response."""
        self.sentences.append(query)
        vectorizer = TfidfVectorizer(tokenizer=self.preprocess_text)
        tfidf_matrix = vectorizer.fit_transform(self.sentences)
        similarity_scores = cosine_similarity(tfidf_matrix[-1], tfidf_matrix[:-1])
        self.sentences.pop()  # Remove user query after vectorization

        best_match_index = similarity_scores.argsort()[0, -1]
        best_match_score = similarity_scores[0, best_match_index]

        if best_match_score > 0:
            self.current_response_index = best_match_index
            print(f"ChatBot >> {self.sentences[best_match_index]}")
        else:
            print("ChatBot >> I'm sorry, I couldn't find relevant information.")

    def provide_more_info(self):
        """Provide more information about the last discussed topic."""
        if self.current_response_index is not None:
            paragraph_index = self.paragraph_indices[self.current_response_index]
            print(f"ChatBot >> {self.paragraphs[paragraph_index]}")
        else:
            print("ChatBot >> Please ask a question first!")


#Running the Bot
The chatbot is initialized and started using the following code:

In [None]:
# Run the chatbot
if __name__ == "__main__":
    bot = WikipediaChatBot()
    bot.start_chat()

#How to Use:
1.   Enter a topic of interest to fetch information from Wikipedia.
2.   Ask questions related to the topic to receive relevant responses.
3.   Use the "more" command for additional details on the topic.
4.   Exit the chat anytime using commands like "bye" or "quit".

# ***`Output`***

---

