<a href="https://colab.research.google.com/github/Mychoyce/Gomycode-Checkpoints/blob/main/NLP_chatbot_ipy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

What You're Aiming For

*In this exercise, you will learn how to create your own chatbot by modifying the code provided in the example in our course.
 *You will need to choose a topic that you want your chatbot to be based on and find a text file related to that topic.
 *Then, you will need to modify the code to preprocess the data in the text file and create a chatbot interface that can interact with the user.
**Instructions
*Choose a topic: Choose a topic that you are interested in and find a text file related to that topic. You can use websites such as Project Gutenberg to find free text files.
**Preprocess the data: Modify the preprocess() function in the code provided to preprocess the data in your text file. You may want to modify the stop words list or add additional preprocessing steps to better suit your needs.
**Define the similarity function: Modify the get_most_relevant_sentence() function to compute the similarity between the user's query and each sentence in your text file. You may want to modify the similarity metric or add additional features to improve the performance of your chatbot.
**Define the chatbot function: Modify the chatbot() function to return an appropriate response based on the most relevant sentence in your text file.
**Create a Streamlit app: Use the main() function in the code provided as a template to create a web-based chatbot interface. Prompt the user for a question, call the chatbot() function to get the response, and display it on the screen.
Note:

To run your code, you need to have the text file in the same directory as your Python script.
You may want to test your chatbot with different types of questions to ensure that it is working correctly.
You can continue to modify your chatbot to add additional features or improve its performance.


import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import string
import streamlit as st

In [1]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.35.0-py2.py3-none-any.whl (8.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.6/8.6 MB[0m [31m27.9 MB/s[0m eta [36m0:00:00[0m
Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit)
  Downloading GitPython-3.1.43-py3-none-any.whl (207 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.3/207.3 kB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m64.9 MB/s[0m eta [36m0:00:00[0m
Collecting watchdog>=2.1.5 (from streamlit)
  Downloading watchdog-4.0.1-py3-none-manylinux2014_x86_64.whl (83 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m83.0/83.0 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.19,<4,>=3.0.7->streamlit)
  Downloading gitdb-4.0

In [2]:
%%writefile app.py

import streamlit as st
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import string



Writing app.py


Load the text file and preprocess the data

In [5]:
# Import the nltk library
import nltk

In [6]:
# Download NLTK resources
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [8]:
# Import the stopwords submodule from nltk.corpus
from nltk.corpus import stopwords

In [9]:
# Load stop words
stop_words = set(stopwords.words('english'))

Define a function to preprocess each sentence

In [22]:
def preprocess(text):
    # Tokenize the text into sentences
    sentences = sent_tokenize(text)

    # Tokenize each sentence and preprocess
    preprocessed_sentences = []  # Initialize an empty list to hold preprocessed sentences
    for sentence in sentences:  # Iterate over the sentences, not the empty preprocessed_sentences list
        # Tokenize the sentence
        tokens = word_tokenize(sentence)
        # Remove punctuation and stop words
        tokens = [
            word
            for word in tokens
            if word not in string.punctuation and word not in stop_words
        ]
        # Reconstruct sentence from tokens
        preprocessed_sentence = " ".join(tokens)

        preprocessed_sentences.append(preprocessed_sentence)

    return preprocessed_sentences

In [23]:
def get_most_relevant_sentence(query, sentences):
    # Preprocess the query
    preprocessed_query = preprocess(query)[0]  # Only considering the first sentence of the query

    # Vectorize sentences and query
    vectorizer = TfidfVectorizer()
    sentence_vectors = vectorizer.fit_transform(sentences)
    query_vector = vectorizer.transform([preprocessed_query])

    # Compute cosine similarity between query and sentences
    similarities = cosine_similarity(query_vector, sentence_vectors)

    # Find index of most relevant sentence
    most_relevant_index = similarities.argmax()

    return sentences[most_relevant_index]


In [24]:
def chatbot(query, text):
    # Preprocess the text
    preprocessed_sentences = preprocess(text)

    # Get the most relevant sentence
    most_relevant_sentence = get_most_relevant_sentence(query, preprocessed_sentences)

    return most_relevant_sentence


In [31]:
import streamlit as st  # Import the Streamlit library and alias it as 'st'
def main():
    """
    """
    st.title("The Man on the Moon Chatbot")
    st.write("Ask a question about The Man on the Moon and get a response!")


    query = st.text_input("Enter your question:")

    if st.button("Ask"):
        file_path = '/content/The man on the moon'
        with open(file_path, 'r', encoding='utf-8') as file:
            text = file.read()

        response = chatbot(query, text)
        st.write("Chatbot Response:", response)
if __name__ == "__main__":
    main()


In [33]:
# Writing app.py

In [34]:
!npm install localtunnel

[K[?25h[37;40mnpm[0m [0m[30;43mWARN[0m [0m[35msaveError[0m ENOENT: no such file or directory, open '/content/package.json'
[0m[37;40mnpm[0m [0m[34;40mnotice[0m[35m[0m created a lockfile as package-lock.json. You should commit this file.
[0m[37;40mnpm[0m [0m[30;43mWARN[0m [0m[35menoent[0m ENOENT: no such file or directory, open '/content/package.json'
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No description
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No repository field.
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No README data
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No license field.
[0m
+ localtunnel@2.0.2
added 22 packages from 22 contributors and audited 22 packages in 2.405s

3 packages are looking for funding
  run `npm fund` for details

found 1 [93mmoderate[0m severity vulnerability
  run `npm audit fix` to fix them, or `npm audit` for details
[K[?25h

In [None]:
!streamlit run app.py & npx localtunnel --port 8501 & curl -s ipv4.icanhazip.com



34.32.187.124

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.32.187.124:8501[0m
[0m
[K[?25hnpx: installed 22 in 2.385s
your url is: https://fair-cups-rescue.loca.lt
