### CREATING MY OWN CHATBOT 

In this exercise, you will learn how to create your own chatbot by modifying the code provided in the example in our course. You will need to choose a topic that you want your chatbot to be based on and find a text file related to that topic. Then, you will need to modify the code to preprocess the data in the text file and create a chatbot interface that can interact with the user.


**Instructions**

1. Choose a topic: Choose a topic that you are interested in and find a text file related to that topic. You can use websites such as Project Gutenberg to find free text files.
2. Preprocess the data: Modify the preprocess() function in the code provided to preprocess the data in your text file. You may want to modify the stop words list or add additional preprocessing steps to better suit your needs.
3. Define the similarity function: Modify the get_most_relevant_sentence() function to compute the similarity between the user's query and each sentence in your text file. You may want to modify the similarity metric or add additional features to improve the performance of your chatbot.
4. Define the chatbot function: Modify the chatbot() function to return an appropriate response based on the most relevant sentence in your text file.
5. Create a Streamlit app: Use the main() function in the code provided as a template to create a web-based chatbot interface. Prompt the user for a question, call the chatbot() function to get the response, and display it on the screen.

**Note:**

- To run your code, you need to have the text file in the same directory as your Python script.
- You may want to test your chatbot with different types of questions to ensure that it is working correctly.
- You can continue to modify your chatbot to add additional features or improve its performance.

In [10]:
import nltk

In [12]:
nltk.download('punkt')  
nltk.download('stopwords')  
nltk.download('wordnet')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Dami\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Dami\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Dami\AppData\Roaming\nltk_data...


True

In [30]:
# Load text file
with open('alice_in_wonderland.txt', 'r', encoding='utf-8') as f:
    text = f.read().replace('\n', ' ')

In [32]:
# Preprocessing involves tokenizing, removing stopwords, and lemmatizing (reducing words to their base form).
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import string

# Initialize stopwords and lemmatizer
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

# Preprocess each sentence
def preprocess(sentence):
    tokens = word_tokenize(sentence.lower())
    tokens = [word for word in tokens if word not in stop_words and word not in string.punctuation]
    return [lemmatizer.lemmatize(token) for token in tokens]

# Tokenize text into sentences
sentences = nltk.sent_tokenize(text)
corpus = [preprocess(sentence) for sentence in sentences]

In [34]:
def jaccard_similarity(query, sentence):
    query_set = set(preprocess(query))
    sentence_set = set(sentence)
    return len(query_set.intersection(sentence_set)) / len(query_set.union(sentence_set))

def get_response(query):
    max_similarity = 0
    best_response = ""
    for sentence in corpus:
        similarity = jaccard_similarity(query, sentence)
        if similarity > max_similarity:
            max_similarity = similarity
            best_response = " ".join(sentence)
    return best_response

In [40]:
# Example query
user_query = "What is the rabbit doing?"
response = get_response(user_query)
print(response)

rabbit ” engraved upon


In [43]:
# Example query
user_query = "Who is the author?"
response = get_response(user_query)
print(response)

title alice 's adventure wonderland author lewis carroll release date june 27 2008 ebook 11 recently updated february 4 2024 language english credit arthur dibianca david widger start project gutenberg ebook alice 's adventure wonderland illustration alice ’ adventure wonderland lewis carroll millennium fulcrum edition 3.0 content chapter


In [45]:
while True:
    print("Type in 'quit' to quit")
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    response = get_response(user_input)
    print("Bot:", response)

Type in 'quit' to quit


You:  who is the author?


Bot: title alice 's adventure wonderland author lewis carroll release date june 27 2008 ebook 11 recently updated february 4 2024 language english credit arthur dibianca david widger start project gutenberg ebook alice 's adventure wonderland illustration alice ’ adventure wonderland lewis carroll millennium fulcrum edition 3.0 content chapter
Type in 'quit' to quit


You:  quit


In [47]:
# Create the file chatbot_wonderland.py in write mode
with open("chatbot_wonderland.py", "w") as file:
    # Writing the Streamlit code into the file
    file.write('''
    
##### Let's build a beginner-friendly chatbot in Streamlit #####
# This project will build a chatbot that reads a text file, processes it, and returns relevant answers based on user input.

# Importing necessary libraries

# nltk (Natural Language Toolkit) library for various text processing tasks
import nltk
import streamlit as st  # Streamlit is used for building interactive web applications
from nltk.tokenize import word_tokenize, sent_tokenize  # Tokenizers for splitting text into words and sentences
from nltk.corpus import stopwords  # List of common words (stopwords) that are usually removed from text (like "is", "the", "and")
from nltk.stem import WordNetLemmatizer  # Lemmatizer to reduce words to their base form (e.g., 'running' -> 'run')
import string  # Python's built-in library for handling strings and punctuation


# Uncomment to download necessary NLTK resources if not downloaded already
# nltk.download('punkt')
# nltk.download('stopwords')
# nltk.download('wordnet')

# Load stopwords and initialize lemmatizer
stop_words = set(stopwords.words('english'))  # Load a set of common English stopwords to filter out later
lemmatizer = WordNetLemmatizer()  # Initialize a lemmatizer to reduce words to their base form

# Define a function to preprocess text (tokenizing, removing stopwords and punctuation, lemmatizing)
def preprocess(sentence):
    # Tokenize the sentence into words and convert to lowercase
    words = word_tokenize(sentence.lower())
    
    # Remove stopwords and punctuation from the list of words
    words = [word for word in words if word not in stop_words and word not in string.punctuation]
    
    # Lemmatize each word to convert it to its base form (e.g., 'running' -> 'run')
    words = [lemmatizer.lemmatize(word) for word in words]
    
    # Return the list of processed words
    return words


# Load the text file (Alice in Wonderland)
def load_text():
    try:
        # Provide the path to the text file
        file_path = r'C:\\Users\\pc\\Desktop\\B-older\\Data and Stuff\\GMC\\ML GMC\\alice_in_wonderland.txt'
        
        # Open the file, read its content, and replace newline characters with spaces
        with open(file_path, 'r', encoding='utf-8') as file:
            return file.read().replace('\\n', ' ')
    
    # Handle case where the file is not found and display an error message in Streamlit
    except FileNotFoundError:
        st.error("Text file not found.")
        return ""


# Tokenize the text into sentences and preprocess them
def prepare_corpus(text):
    # Tokenize the text into individual sentences using sent_tokenize
    sentences = sent_tokenize(text)
    
    # Preprocess each sentence (tokenizing, removing stopwords/punctuation, and lemmatizing)
    return [preprocess(sentence) for sentence in sentences]



# Calculate Jaccard similarity between two sets
def jaccard_similarity(query, sentence):
    # Convert both the query and sentence to sets (unique words)
    query_set = set(query)
    sentence_set = set(sentence)
    
    # If the union of both sets is zero, return 0 to avoid division by zero
    if len(query_set.union(sentence_set)) == 0:
        return 0
    
    # Calculate the Jaccard similarity as the size of intersection divided by the size of union
    return len(query_set.intersection(sentence_set)) / len(query_set.union(sentence_set))


# Find the most relevant sentence using Jaccard similarity
def get_most_relevant_sentence(query, corpus, original_sentences):
    # Preprocess the user query (tokenization, stopword removal, etc.)
    query = preprocess(query)
    
    # Initialize variables to store the maximum similarity and best matching sentence
    max_similarity = 0
    best_sentence = "I couldn't find a relevant answer."  # Default response if no match is found
    
    # Iterate over the corpus of preprocessed sentences to find the best match
    for i, sentence in enumerate(corpus):
        # Calculate the Jaccard similarity between the user query and the current sentence
        similarity = jaccard_similarity(query, sentence)
        
        # If the similarity score is higher than the current maximum, update the best sentence
        if similarity > max_similarity:
            max_similarity = similarity
            best_sentence = original_sentences[i]  # Retrieve the original sentence (before preprocessing)
    
    # Return the most relevant sentence (or the default response if no match is found)
    return best_sentence

# Main function to create the chatbot interface in Streamlit
def main():
    # Title for the app
    st.title("Wonderland's Novice Chatbot")
    
    # A brief description of the chatbot's purpose
    st.write("Hello! Ask me anything related to Alice in Wonderland!")
    
    # Add a dropdown (expander) for suggested questions
    with st.expander("Click me for suggestions"):
        st.write("""
        1. Who does Alice meet first in Wonderland?
        2. What is the Cheshire Cat's famous line?
        3. How does Alice enter Wonderland?
        4. What is the Queen of Hearts known for?
        5. Why did Alice follow the White Rabbit?
        6. What was Alice's reaction to the Mad Hatter's tea party?
        7. What advice does the Caterpillar give Alice?
        8. What is the significance of the bottle labeled 'Drink Me'?
        9. How does the story of Alice in Wonderland end?
        10. What game does the Queen of Hearts play with Alice?
        """)

        # Load and prepare text corpus
    text = load_text()  # Load the text from the file (Alice in Wonderland)
    if text:
        # Preprocess the text to create a corpus of tokenized sentences
        corpus = prepare_corpus(text)  # Prepares the text into a list of preprocessed sentences
        original_sentences = sent_tokenize(text)  # Tokenizes the original text into sentences for later reference

        # Get user input from the Streamlit interface
        user_input = st.text_input("Enter your question:")  # Input field for the user's question

        # If the user clicks the submit button
        if st.button("Submit"):
            if user_input:
                # Get the most relevant sentence from the corpus based on the user's input
                response = get_most_relevant_sentence(user_input, corpus, original_sentences)
                st.write(f"Chatbot: {response}")  # Display the chatbot's response
            else:
                st.write("Please enter a question.")  # Prompt user to enter a question if the input is empty

                
# Run the Streamlit app
if __name__ == "__main__":
    main()  # Call the main function to run the Streamlit app
    ''')

print("chatbot_wonderland.py creation executed successfully!")


chatbot_wonderland.py creation executed successfully!
