# <span style="font-family:consolas; background:#99ffff; font-size: 45px; test-align: center;">SPACE BOT</span>🚀🌌

<span style="font-family:consolas; font-size: 15px; test-align: center;">A chatbot is used to mimic human conversation skills. Nowadays, chatbots are used for placing orders, answering user queries, reporting issues, scheduling appointments, etc. Use of chatbot enhances user experience of a product.
In this notebook I have built a chatbot which helps in understanding one of the most amazing objects out there in space - <b>Neutron Stars</b>⭐. To build this chatbot, I make use of <b>NLTK</b> library.
</span>
<br><br>
<img src="./robot.jpg" style="width: 500px; height: 350px; margin:auto"/>

# <span style="font-family:consolas; background:#ffff80; font-size: 25px; test-align: center;">IMPORTING LIBRARIES AND DATA</span>

In [1]:
# Importing libraries
import numpy as np
import nltk
import pandas as pd
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.stem import WordNetLemmatizer
import re
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
import random

In [2]:
# Opening data file
file = open('space-bot.txt', 'r', encoding='utf-8')

In [3]:
# Read data from file
corpus = file.read()

# <span style="font-family:consolas; background:#99c2ff; font-size: 25px; test-align: center;">DATA PREPROCESSING</span>

In [4]:
# Split by paragraphs
tokens = re.split("[\n+]", corpus)

In [5]:
lemm = WordNetLemmatizer()

def clean_data(corpus):
    result = []
    for sentence in corpus:
        # Removing parenthesis from sentence
        sentence = re.sub(r"[\([{})\]]"," " , sentence)
        # Convert to list of words
        sentence = sentence.split()
        # Lemmatize each word
        sentence = [lemm.lemmatize(word) for word in sentence]
        # Form string from list and append to result
        result.append(" ".join(sentence))
    return result

In [6]:
clean_corpus = clean_data(tokens)

# Well, I had some empty strings in data after cleaning it. So just removing them with the following code.
while "" in clean_corpus:
    clean_corpus.remove("")

In [7]:
# Let's have a glimpse of our corpus
print(f"Length of our corpus is: {len(clean_corpus)}")
print("An example of corpus:")
print(clean_corpus[5])

Length of our corpus is: 9
An example of corpus:
Another type of neutron star is called a magnetar. In a typical neutron star, the magnetic field is trillion of time that of the Earth's magnetic field; however, in a magnetar, the magnetic field is another 1000 time stronger.


# <span style="font-family:consolas; background:#ffb3cc; font-size: 25px; test-align: center;">IMPORTANT STUFF: TF-IDF AND COSINE SIMILARITY</span>
<br>
<span style="font-family:consolas; font-size: 15px; test-align: center;">
    <b style=" background:#b3e6ff;">REVEALING THE SECRET BEHIND THE CHATBOT</b>🕵️‍♂️: The basic idea is finding the most similar sentence to the user input from the text.<br>
    <b>TF-IDF</b> means term frequency - inverse document frequency. Machines do not understand words, so we have to convert them to numbers. Here, we do this with the help of TF-IDF. It actually measures how important a word is in the document.<br>
    <b>Cosine Similarity</b> is used to find out similarity between two sentences.
    
</span>

In [9]:
welcome = ['hi', 'hey']

def chatbot_response(user_chat):
    # If user inputs any of the greeting words, give greeting in response
    for w in user_chat.split():
        if w.lower() in welcome:
            return random.choice(welcome)
    # Pre-processing user chat
    user_chat = clean_data([user_chat])
    # Append user input to our clean_corpus
    clean_corpus.append(" ".join(user_chat))
    cv = TfidfVectorizer()
    X = cv.fit_transform(clean_corpus)
    # Find cosine-similarity between user input and all our sentences in the corpus 
    cosine_data = cosine_similarity(X[-1], X)
    # Sorting the data by similarity and selecting most relevant sentence as the answer. We use the second last sentence as 
    # answer this is because the last sentence in our corpus is user input 
    idx = cosine_data.argsort()[0][-2]
    # Removing user input from corpus
    clean_corpus.pop(-1)
    return clean_corpus[idx]

# <span style="font-family:consolas; background:#ffa366; font-size: 25px; test-align: center;">RESULT TIME</span>
<span style="font-family:consolas; font-size: 15px; test-align: center;">Finally, we can use our chatbot</span>

In [11]:
print("SPACE BOT: Hi!! Type bye to exit. Ask me anything: ")
while(True):
    user_chat = input()
    if(user_chat.lower()=="bye"):
        print("Bye")
        break
    print("SPACE BOT: ", end=" ")
    print(chatbot_response(user_chat))

SPACE BOT: Hi!! Type bye to exit. Ask me anything: 
What are pulsars??
SPACE BOT:  Most neutron star are observed a pulsars. Pulsars are rotating neutron star observed to have pulse of radiation at very regular interval that typically range from millisecond to seconds. Pulsars have very strong magnetic field which funnel jet of particle out along the two magnetic poles. These accelerated particle produce very powerful beam of light. Often, the magnetic field is not aligned with the spin axis, so those beam of particle and light are swept around a the star rotates. When the beam cross our line-of-sight, we see a pulse – in other words, we see pulsar turn on and off a the beam sweep over Earth.
tell me about crust of neutron stars
SPACE BOT:  In all neutron stars, the crust of the star is locked together with the magnetic field so that any change in one affect the other. The crust is under an immense amount of strain, and a small movement of the crust can be explosive. But since the crus