<a href="https://colab.research.google.com/github/Eniola-Otukoya/Machine_Learning-Model/blob/main/Creating_my_own_chatbox.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Checkpoint Objective
In this exercise, you will learn how to create your own chatbot by modifying the code provided in the example in our course. You will need to choose a topic that you want your chatbot to be based on and find a text file related to that topic. Then, you will need to modify the code to preprocess the data in the text file and create a chatbot interface that can interact with the user.

Instructions
Choose a topic: Choose a topic that you are interested in and find a text file related to that topic. You can use websites such as Project Gutenberg to find free text files.
Preprocess the data: Modify the preprocess() function in the code provided to preprocess the data in your text file. You may want to modify the stop words list or add additional preprocessing steps to better suit your needs.
Define the similarity function: Modify the get_most_relevant_sentence() function to compute the similarity between the user's query and each sentence in your text file. You may want to modify the similarity metric or add additional features to improve the performance of your chatbot.
Define the chatbot function: Modify the chatbot() function to return an appropriate response based on the most relevant sentence in your text file.
Create a Streamlit app: Use the main() function in the code provided as a template to create a web-based chatbot interface. Prompt the user for a question, call the chatbot() function to get the response, and display it on the screen.
Note:

To run your code, you need to have the text file in the same directory as your Python script.

You may want to test your chatbot with different types of questions to ensure that it is working correctly.

You can continue to modify your chatbot to add additional features or improve its performance.


In [None]:
!pip install nltk streamlit

Collecting streamlit
  Downloading streamlit-1.29.0-py2.py3-none-any.whl (8.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
Collecting importlib-metadata<7,>=1.4 (from streamlit)
  Downloading importlib_metadata-6.11.0-py3-none-any.whl (23 kB)
Collecting validators<1,>=0.2 (from streamlit)
  Downloading validators-0.22.0-py3-none-any.whl (26 kB)
Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit)
  Downloading GitPython-3.1.40-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.6/190.6 kB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.8.1b0-py2.py3-none-any.whl (4.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.8/4.8 MB[0m [31m43.4 MB/s[0m eta [36m0:00:00[0m
Collecting watchdog>=2.1.5 (from streamlit)
  Downloading watchdog-3.0.0-py3-none-manylinux2014_x86_64.whl 

In [None]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import string
import streamlit as st

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.



The ‘nltk’ library is used for natural language processing tasks such as tokenization, lemmatization, and stopword removal. The ‘string’ library is used for string operations. The ‘streamlit’ library is used to create the web-based chatbot interface.
The ‘nltk.download()’ function is used to download additional resources needed for the nltk library. In this case, we are downloading the punkt and averaged_perceptron_tagger resources. These resources are needed for tokenization and part-of-speech tagging tasks.
Once you have imported the necessary libraries, you can use their functions and classes to perform various NLP tasks and create your chatbot

# Loading and Preprocessing Data:
The first step in building a chatbot is to load and preprocess the data that the chatbot will use to generate responses.
In this example, we will load a text file and preprocess each sentence in the file to create a corpus that the chatbot can use to
find the most relevant response.

In [None]:
#import dataset
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [None]:

nltk.download('stopwords')

nltk.download('wordnet')



[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...


True

In [None]:
# Load the text file and preprocess the data
with open('/content/drive/MyDrive/GOMYCODE/GOMYCODE CHECKPOINT 32 [creating my own chatbox]/health.txt', 'r', encoding='utf-8') as f:
    data = f.read().replace('\n', ' ')
# Tokenize the text into sentences
sentences = sent_tokenize(data)
# Define a function to preprocess each sentence
def preprocess(sentence):
    # Tokenize the sentence into words
    words = word_tokenize(sentence)
    # Remove stopwords and punctuation
    words = [word.lower() for word in words if word.lower() not in stopwords.words('english') and word not in string.punctuation]
    # Lemmatize the words
    lemmatizer = WordNetLemmatizer()
    words = [lemmatizer.lemmatize(word) for word in words]
    return words

# Preprocess each sentence in the text
corpus = [preprocess(sentence) for sentence in sentences]



Loading and Preprocessing Data:
First, we open the text file using the open() function and read the contents of the file using the read() method. We replace any newline characters (\n) with a space character to ensure that each sentence is on a separate line.
Next, we use the sent_tokenize() function from the nltk.tokenize module to tokenize the text into individual sentences.
We then define a function called preprocess() that takes a sentence as input and performs the following preprocessing steps:
1.	Tokenize the sentence into individual words using the word_tokenize() function from the nltk.tokenize module.
2.	Remove stopwords and punctuation from the list of words using a list comprehension. We use the stopwords.words('english') function from the nltk.corpus module to get a list of English stopwords, and the string.punctuation constant to get a string of all punctuation characters.
3.	Lemmatize the words using the WordNetLemmatizer() class from the nltk.stem module. Lemmatization is the process of reducing a word to its base form (e.g., "running" to "run").
Additional Resources

In [None]:
# Defining the Similarity Function:
# Define a function to find the most relevant sentence given a query
def get_most_relevant_sentence(query):
    # Preprocess the query
    query = preprocess(query)
    # Compute the similarity between the query and each sentence in the text
    max_similarity = 0
    most_relevant_sentence = ""
    for sentence in corpus:
        similarity = len(set(query).intersection(sentence)) / float(len(set(query).union(sentence)))
        if similarity > max_similarity:
            max_similarity = similarity
            most_relevant_sentence = " ".join(sentence)
    return most_relevant_sentence



The get_most_relevant_sentence() function is responsible for finding the most relevant sentence in the corpus given a user query.
 Here's how it works:


Defining the Similarity Function:
1.	Preprocess the user query using the preprocess() function defined earlier.
2.	Iterate over each sentence in the corpus.
3.	Compute the similarity between the preprocessed query and the current sentence using the Jaccard similarity coefficient. The Jaccard similarity coefficient is a measure of similarity between two sets and is defined as the size of the intersection divided by the size of the union of the sets. In this case, we treat the preprocessed query and each sentence in the corpus as sets of words and compute their Jaccard similarity coefficient.
4.	Update the most relevant sentence if the current sentence has a higher similarity score.
5.	Return the most relevant sentence.


In [None]:
# The chatbot Function:
def chatbot(question):
    # Find the most relevant sentence
    most_relevant_sentence = get_most_relevant_sentence(question)
    # Return the answer
    return most_relevant_sentence

The chatbot() function is the main function that takes a user's question as input, processes it using the get_most_relevant_sentence() function, and returns the most relevant sentence as the chatbot's response.
Here's how it works:

The chatbot Function:
1.	The chatbot() function takes a user's question as input.
2.	It calls the get_most_relevant_sentence() function to get the most relevant sentence from the corpus that matches the user's query.
3.	It returns the most relevant sentence as the chatbot's response.

In [None]:
# Creating a Streamlit App :
# The main() function creates a Streamlit app that provides a user interface for the chatbot. Here's how it works:
# Create a Streamlit app

def main():
    st.title("Chatbot")
    st.write("Hello! I'm a chatbot designed by Clifford. Ask me any health related question")
    # Get the user's question
    question = st.text_input("User:")
    # Create a button to submit the question
    if st.button("Submit"):
        # Call the chatbot function with the question and display the response
        response = chatbot(question)
        st.write("Chatbot: " + response)
if __name__ == "__main__":
    main()






2023-12-21 08:44:45.000 
  command:

    streamlit run /usr/local/lib/python3.10/dist-packages/colab_kernel_launcher.py [ARGUMENTS]


Conclusion :
In summary, the code provided defines a simple chatbot using Python's Natural Language Toolkit (NLTK) and Streamlit. The chatbot is designed to provide answers to questions related to a specific topic, as described in a text file.
The code consists of several functions:

•	preprocess(): This function preprocesses a sentence by tokenizing it into words, removing stopwords and punctuation, and lemmatizing the words.

•	get_most_relevant_sentence(): This function finds the most relevant sentence in the text file given a user query. It does this by computing the similarity between the query and each sentence in the text file and returning the sentence with the highest similarity score.

•	chatbot(): This function uses the get_most_relevant_sentence() function to get the most relevant sentence for a given user question and returns it as the chatbot's response.

•	main(): This function creates a Streamlit app that provides a user interface for the chatbot. It prompts the user to enter a question and displays the chatbot's response on the screen.

Overall, the chatbot is a simple example of how NLTK and Streamlit can be used to create a conversational interface for answering questions related to a specific topic. With further development and refinement, the chatbot could be made more robust and capable of answering a wider range of questions.


