## Building a Simple Chatbot from Scratch in Python (using NLTK)
 - [Source](https://medium.com/analytics-vidhya/building-a-simple-chatbot-in-python-using-nltk-7c8c8215ac6e)

In [1]:
import nltk
import numpy as np
import random
import string # to process standard python strings

## Corpus

In [None]:
# Importing the libraries
import bs4 as bs
import urllib.request
import re

# Gettings the data source
from urllib.parse import quote  
q = "Mustafa_Kemal_Atatürk"
source = urllib.request.urlopen('https://en.wikipedia.org/wiki/'+ quote(q)).read()

# Parsing the data/ creating BeautifulSoup object
soup = bs.BeautifulSoup(source,'lxml')

In [3]:
# Fetching the data
text = ""
for paragraph in soup.find_all('p'):
    text += paragraph.text


In [4]:
raw=text.lower()# converts to lowercase
nltk.download('punkt') # first-time use only
nltk.download('wordnet') # first-time use only

sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences 
word_tokens = nltk.word_tokenize(raw)# converts to list of words

[nltk_data] Downloading package punkt to /Users/uzaycetin/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/uzaycetin/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [5]:
sent_tokens[:2]

['\ngoals\nkemalism\n\nmustafa kemal atatürk (/ˈmʊstəfə kəˈmɑːl ˈætətɜːrk/; turkish:\xa0[mustaˈfa ceˈmal aˈtaˌtyɾc]; 19 may 1881 (conventional)\xa0– 10 november 1938) was a turkish field marshal (mareşal), revolutionary statesman, author, and founder of the republic of turkey, serving as its first president from 1923 until his death in 1938. ideologically a secularist and nationalist, his policies and theories became known as kemalism.',
 'atatürk came to prominence for his role in securing the ottoman turkish victory at the battle of gallipoli (1915) during world war i.']

In [6]:
word_tokens[:5]

['goals', 'kemalism', 'mustafa', 'kemal', 'atatürk']

### Pre-processing the raw text

In [7]:
lemmer = nltk.stem.WordNetLemmatizer()

#WordNet is a semantically-oriented dictionary of English included in NLTK.
def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]

remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)

def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

## Eliza's greetings

In [8]:
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
 
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

In [9]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [10]:
def response(user_response):
    robo_response=''
    
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)
    
    # tfidf[-1] is the users question
    # every sentence is compared with the question, most similar one is returned!!
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx=vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0):
        robo_response=robo_response+"I am sorry! I don't understand you"
    else:
        robo_response = robo_response+sent_tokens[idx]
    return robo_response

In [11]:
flag=True
print("ROBO: My name is Robo. I will answer your queries about Chatbots. If you want to exit, type Bye!")
while(flag==True):
    user_response = input()
    user_response=user_response.lower()
    if(user_response!='bye'):
        if(user_response=='thanks' or user_response=='thank you' ):
            flag=False
            print("ROBO: You are welcome..")
        else:
            if(greeting(user_response)!=None):
                print("ROBO: "+greeting(user_response))
            else:
                sent_tokens.append(user_response)
                word_tokens=word_tokens+nltk.word_tokenize(user_response)
                final_words=list(set(word_tokens))
                print("ROBO: ",end="")
                print(response(user_response))
                sent_tokens.remove(user_response)
    else:
        flag=False
        print("ROBO: Bye! take care..")

ROBO: My name is Robo. I will answer your queries about Chatbots. If you want to exit, type Bye!
where is Turkey?
ROBO: on 29 october 1923, the republic of turkey was proclaimed.
Greece is an enemy?
ROBO: relations between the two countries were friendly, but were based on the fact that they were fighting against a common enemy: britain and the west.
Do you love Ataturk?
ROBO: latife fell in love with mustafa kemal; again the extent to which this was reciprocated is unknown, but he was certainly impressed by latife's intellect: she was a graduate of the sorbonne and was studying english in london when the war broke out.
Do you like food?
ROBO: [180] like mustafa kemal, reza shah wanted to secure iran's borders.
Iran is enemy or friend?
ROBO: relations between the two countries were friendly, but were based on the fact that they were fighting against a common enemy: britain and the west.
bye
ROBO: Bye! take care..
