# Ama Ghana Chatbot

This chatbot gives information about my country, Ghana in West Africa. I use the Term Frequency-Inverse Document Frequency(TF-IDF) and Cosine Similarity to find similarity between the users' questions and the different portions of the text data available to the chatbot. The text I use here is from the [Wikipedia Page on Ghana](https://en.wikipedia.org/wiki/Ghana)

In [1]:
import random
import nltk
import numpy as np
import string

### Reading in the data

In [9]:
f = open('./data/amaghana.txt','r',errors= 'ignore')
raw = f.read()
raw = raw.lower()

sent_token = nltk.sent_tokenize(raw)
word_token = nltk.word_tokenize(raw)

### Prepocessing the raw text

In [10]:
lemmer = nltk.stem.WordNetLemmatizer()

def lemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]

remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)

def lemNormalize(text):
    return lemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

### Greetings

In [11]:
GREETINGS_KEYWORDS = ("hello","hi","what's up","hey","sup")
GREETINGS_RESPONSES = ["hi there","what's going on?", "*nods*","hey"]

def greeting(sentence):
    for word in sentence.split():
        if sentence.lower() in GREETINGS_KEYWORDS:
            return random.choice(GREETINGS_RESPONSES)

### Generating Responses

In [5]:
# Tfidf Vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

# Cosine similarity
from sklearn.metrics.pairwise import cosine_similarity

In [12]:
# implementation
def response(user_response):
    robo_response = ''
    sent_token.append(user_response)
    
    TfidfVec = TfidfVectorizer(tokenizer=lemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_token)
    
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx = vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    
    if req_tfidf == 0:
        robo_response = robo_response + "Sorry, I do not understand you"
        return robo_response
    else:
        robo_response = robo_response + sent_token[idx]
        return robo_response

### Starting and ending the conversation

In [13]:
flag = True
print("Ama Ghana: My name is Ama Ghana, I will answer your questions about Ghana. If you want to exit, type 'bye'")

while flag==True:
    user_response = input("\nUser: ")
    user_response = user_response.lower()
    
    if user_response != 'bye':
        if (user_response == 'thanks' or user_response == 'thank you'):
            flag = False
            print("\nAma Ghana: You are welcome\n")
        else:
            if greeting(user_response) != None:
                print("\nAma Ghana: " + greeting(user_response),"\n")
            else:
                print("\nAma Ghana:", end="")
                print(response(user_response),"\n")
                sent_token.remove(user_response)
            
    else:
        flag = False
        print("\nAma Ghana: Bye bye, take care..")

Ama Ghana: My name is Ama Ghana, I will answer your questions about Ghana. If you want to exit, type 'bye'

User: where is ghana located

Ama Ghana:ghana, officially the republic of ghana, is a country located along the gulf of guinea and atlantic ocean, in the subregion of west africa. 


User: what is the land mass of ghana

Ama Ghana:spanning a land mass of 238,535 km2 (92,099 sq mi), ghana is bordered by the ivory coast in the west, burkina faso in the north, togo in the east and the gulf of guinea and atlantic ocean in the south. 


User: what does ghana mean

Ama Ghana:ghana means "warrior king" in the soninke language. 


User: what is ghana's population

Ama Ghana:it became independent of the united kingdom on 6 march 1957.

ghana's population of approximately 30 million spans a variety of ethnic, linguistic and religious groups. 


User: what is the geography of ghana

Ama Ghana:its diverse geography and ecology ranges from coastal savannahs to tropical rain forests. 


User: 