# **Introduction to Chatbots**

### What is Chatbot?
Chatbots are consersational programs that automate interactions. They are artificial intelligence (A.I) softwares designed to simulate conversation with human users typically through text or voice.

- **Examples**:
    - A chatbot on a bank's website that helps with enquires
    - A chatbot on an e-commerce site that tracks orders or provides recommendations
    - Virtual Assistants like **Siri** and **Alexa**.

### B. Difference Between Chatbots and Bots

**Chatbots** are a sunset of bots. They are specifically designeed for conversation, meaning they are programmed to interact using natural language processing (NLP) to simulate human conversations.


**Bots**, on the other hand are more general-purpose programs designed to automate tasks. They don't necessarily interact, but they perform specific functions like web scraping, sending reminders or managing social media post.

- **Chatbot**: Focuses on conversation(e.g., answering customer queries).
- **Bot**: Focuses on automating repetitive tasks(e.g; posting scheduled tweets).

### C. Tpyes of Chatbots
**1. Rule-Based Chatbots**: 
- They follow specific set of instructions or rules. It works by looking for specific keywords or patterns in what you say and then picking the correct response from its list.
- The problem is if you ask something it wasn't programmed for, it might get confused or give a response that doesn't make sense.

**2. Retrieval-Based Chatbots**:
- They are bit smater than rule-based ones. Instead of giving fixed reply, they search through a bunch of pre-written reponses and try to find the best one based on what you said. It's like going through a library to find the book that mostly answers your question

**- Techniques Used**: 
  - Jaccard Similarity: Imagine you ask question like, "What's the weather today?" The bot checks which of its stored answers have the most words in common with your question. The more words they share, the more likely it is to pick that answer.
  - Cosine Similarity: This is like comparing two texts using math. It turns your words into numbers and checks how similar they are. If the numbers line up, the bot figures that the answer might be a good fit.
  - **Machine Learning Models like `Naive Bayes`**: This is where the bot starts to guess what you're talking about, learning from past examples. If it's trained to answer questions about sports, it'll know that when you ask about "Football", it should probably give a sports-related response.


**3. Generative Chatbots**:
- They are the most advanced chatbots. Instead of pulling from a list of pre-written answers, they create their own reponses based on what you said. It's like having a conversation with someone who thinks on the spot and makes up their answers.
- However, they need a lot of training to get good at answering questions. They use models like RNNs, LSTMs

## An example illustrating rule-based, retrieval-based and generative chatbots using a simple customer service scenario related to order tracking

Scenario

The user asks: "Where is my order?"

**1. Rule-Based Chatbot Example:**
- In a rule-based chatbot, predefined keywords like "order" and "track" to trigger specific responses

In [14]:
# Define a function for a simple rule-based chatbot
def rule_based_chatbot(user_input):
    # Check if the user input contain the words "track" or "order"
    if "track" in user_input.lower() or "order" in user_input.lower():
        # Respond with a prompt to provide an order number
        return "Please provide your order number to track your order."

    elif "refund" in user_input.lower():
        # Respond with information about the refund policy
        return "For a refund, please visit our refund policy page."

    # If the input doesn't match any of the predefined rules 
    else:
        # Reponse with a message indicating the chatbot doesn't understand the query
        return "I'm sorry, I didn't understand that. Can you try again?"


# Example user input 
user_query = input("How may I help you")

print(rule_based_chatbot(user_query))

How may I help you where is my order?


Please provide your order number to track your order.


## 2. Retrieval-Based Chatbot Example (Jaccard Similarity):**

- In a retrieval-based chatbot, the bot looks for similar sentences in a predefined set of responses 

In [16]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to C:\Users\St
[nltk_data]     Mary\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt_tab.zip.


True

In [20]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords 
import string

# A predefined set of possible responses for the chatbot stored as a list
responses = [
    "Please provide your order number to track your order.",
    "For a refund, please visit our refund policy page.",
    "Our customer service is available 24/7"

]

# Load a set of English stopwords (common words that may be removed in text preprocessing)
stop_words = set(stopwords.words('english'))

# Define prepeoces funtion to clean and prepare text data
def preprocess(text):
    # Tokenize the input text into individual words and convert them to lowercase
    words = word_tokenize(text.lower())

    # Remove stopwords (e.g 'the', 'is') and punctuations
    words = [word for word in words if word not in stop_words and word not in string.punctuation]

    # Return the cleaned list of words
    return words

# Define Function to calculate Jaccard similarity between two sentences
def jaccard_similarity(query, sentence):
    # Preprocess the query and the sentence
    query_set = set(preprocess(query))
    sentence_set = set(preprocess(sentence))

    # Calculate the intersection and union of the sets and return Jaccard similarity score
    return len(query_set.intersection(sentence_set)) / len(query_set.union(sentence_set))


# Define function to find the most relevant response based on user input

def retrieval_based_chatbot(user_input): 
    best_response = "" # Placeholder for the best matching response
    highest_similarity = 0

    # Loop through each predefined response and calculate the Jaccard similarity
    for response in responses:
        similarity = jaccard_similarity(user_input, response)

        # Update the best response if the current respone has a higher similarity score
        if similarity > highest_similarity:
            highest_similarity = similarity
            best_response = response

    # Return the best reponse
    return best_response if best_response else "I'm sorry, I couldn't find a relevant response."

# User Input
user_query = input("How may I help you? \n")

print(retrieval_based_chatbot(user_query)) 

How may I help you? 
 where is my order


Please provide your order number to track your order.


## Generative Chatbot

In a generative chatbot, the response is generated dynamically using a machine learning model (like GPT). This would involve training  a deep learning model.

- How it works: The generative chatbot creates a new response based on the user input, generating an original sentence that wasn't pre-programmed or retrieved a predefined list.

# **D. Common Terms in Natural Language Processing (NLP)**

#### 1. Natural Language Processing (NLP)
NLP is a ways for computers to understand, interpret, and respond to human language. With NLP, computers can read, listen and even reply like  humans.

#### 2. Tokenization
Tokenization is breaking down a sentence into smaller pieces that a computer can understand.

#### 3. **Lemmatization**
Lemmatization is when the computer changes word to their simplest form, called the **Lemma**. For example, the word "running" Changes to "run"

#### 4. **Stemming**
Stemming is when the computer cuts off the ends of words to get the base form, or **stem**. For example, "Playing", "Played", and "Played" all become "Play". This is different from lemmatization because it chops off word endings. Stemming helps computers group words with similar meanings together by choppings off extra endings.#

### 5. **Stopwords**
Stopwords are very common words, like "the", "is", "and", "in" that computers often ignore when analyzing a sentence


### 6. **Corpus**
A **Corpus** is a large collection of written or spoken text that computers use to learn and analyze language. It's like giving the computer lots of books to read and study from.

### 7. **Bag of Words (BOW)**
Is a simple way for computers to represent text. It works by counting how many times each words appears in a text, without caring about the order of the words.

### 8. **TF-IDF (Term Frequency-Inverse Document Frequency)**
TF-IDF is a more advanced version of **Bag of Words**. It doesn't just count how often a word appears in a text (like BOW), it also checks how rare or important that word is across many documents.

## **E. Workflow for Building a Simple Chatbot using NLTK (Natural Language ToolKit)**

In [12]:
# Install NLTK and Download all necessary resources
# !pip install nltk
# import nltk
# nltk.download('punkt')
# nltk.download('stopwords')
# nltk.download('wordnet')
# nltk.download('punkt_tab')

# Alternatively
# nltk.download()

### Load the Data

In [18]:
# Using the alice_in_wonderland.txt file

with open("alice_in_wonderland.txt", 'r', encoding='utf-8') as f:
    text = f.read().replace('\n', ' ')

## Preprocess the Data

In [24]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import string



# Initialize stopwords and Lemmatizer
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

# Use a function for preprocessing each sentence
def preprocess(sentence):
    tokens = word_tokenize(sentence.lower())
    tokens = [word for word in tokens if word not in stop_words and word not in string.punctuation]
    return [lemmatizer.lemmatize(token) for token in tokens]

# Tokenize text into sentences
sentences = nltk.sent_tokenize(text)
corpus = [preprocess(sentence) for sentence in sentences]

In [None]:
#Implement Jaccard Similarity for Response Matching

In [31]:
# Define Function to calculate Jaccard similarity between two sentences
def jaccard_similarity(query, sentence):
    # Preprocess the query and the sentence
    query_set = set(preprocess(query))
    sentence_set = set((sentence))

    # Calculate the intersection and union of the sets and return Jaccard similarity score
    return len(query_set.intersection(sentence_set)) / len(query_set.union(sentence_set))

def get_response(query):
    max_similarity = 0
    best_response = ''

    for num, sentence in enumerate(corpus):
        similarity = jaccard_similarity(query, sentence)
        if similarity > max_similarity:
            max_similarity = similarity
            best_response = sentences[num]
        return best_response

In [35]:
user_query = input("What can I help you with? \n")
response = get_response(user_query)
print(response)

What can I help you with? 
 who does Alice meet first in wonderland?


﻿The Project Gutenberg eBook of Alice's Adventures in Wonderland      This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
