# Comprehensive Conversational Chatbot Using Python

## Overview

Chatbots are a crucial application of Natural Language Processing (NLP), created to simulate human-like conversations with users. In this project, I will develop an end-to-end chatbot using Python, designed to handle conversations autonomously from start to finish without human involvement. By the conclusion of this notebook, you will have built a basic conversational agent capable of understanding user inputs, classifying their intent, and providing appropriate responses. This chatbot will serve as a foundational example of how NLP techniques can be integrated into practical applications to enhance user interaction and automate communication.

## Importing Libraries

In [10]:
import os
import ssl
import nltk
import random
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
import streamlit as st

# Setup for nltk downloads
ssl._create_default_https_context = ssl._create_unverified_context
nltk.data.path.append(os.path.abspath("nltk_data"))
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\33765\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.


True

## Understanding Intents and Patterns

An intent represents the underlying goal or purpose a user is trying to achieve when interacting with a chatbot. For instance, users may express different intents, such as greeting the bot, asking for assistance, or saying goodbye. Each intent is characterized by two essential components: patterns and responses. Patterns are example sentences or phrases that the user might use to communicate their intent, allowing the chatbot to recognize various ways the same intent can be expressed. Responses, on the other hand, are predefined replies or actions that the chatbot will use to respond appropriately once the intent is identified. To manage these interactions effectively, we will use a structured format to define intents, along with their associated patterns and corresponding responses, ensuring the chatbot can respond accurately and naturally to users' needs.

In [12]:
intents = [
    {
        "tag": "greeting",
        "patterns": ["Hi", "Hello", "Hey", "How are you", "What's up"],
        "responses": ["Hi there!", "Hello!", "Hey! How can I help you today?", "I'm doing well, thank you! How about you?"]
    },
    {
        "tag": "goodbye",
        "patterns": ["Bye", "See you later", "Goodbye", "Take care"],
        "responses": ["Goodbye!", "See you soon!", "Take care!", "Goodbye, have a nice day!"]
    },
    {
        "tag": "thanks",
        "patterns": ["Thank you", "Thanks", "Thanks a lot", "I appreciate it"],
        "responses": ["You're welcome!", "No problem at all!", "Glad I could help!", "You're welcome, anytime!"]
    },
    {
        "tag": "help",
        "patterns": ["Help", "I need help", "Can you help me?", "What should I do?"],
        "responses": ["Of course! What do you need help with?", "Sure, what can I assist you with?", "I'm here to help!"]
    },
    {
        "tag": "age",
        "patterns": ["How old are you?", "What is your age?", "Can you tell me your age?"],
        "responses": ["I don't have an age. I'm just a chatbot!", "I exist in the digital realm, age doesn't apply here!"]
    },
    {
        "tag": "weather",
        "patterns": ["What's the weather like?", "How's the weather today?", "Can you tell me the weather?"],
        "responses": ["I'm sorry, I can't provide real-time weather information.", "You can check a weather app for real-time data."]
    },
    {
        "tag": "budgeting",
        "patterns": ["How do I make a budget?", "Can you help me with budgeting?", "What's a good way to budget?"],
        "responses": [
            "To make a budget, start by tracking your income and expenses.",
            "A simple budgeting method is the 50/30/20 rule: 50% for essentials, 30% for wants, and 20% for savings and debt repayment."
        ]
    },
    {
        "tag": "credit_score",
        "patterns": ["What is a credit score?", "How can I check my credit score?", "How do I improve my credit score?"],
        "responses": [
            "A credit score represents your creditworthiness. The higher your score, the better.",
            "You can check your credit score on websites like Credit Karma.",
            "To improve your credit score, try to make all payments on time and keep your credit utilization low."
        ]
    },
    {
        "tag": "jokes",
        "patterns": ["Tell me a joke", "Can you tell me a joke?", "Make me laugh"],
        "responses": [
            "Why don’t scientists trust atoms? Because they make up everything!",
            "Why don’t skeletons fight each other? They don’t have the guts!"
        ]
    },
    {
        "tag": "general_info",
        "patterns": ["Tell me something interesting", "I want to know something new", "Give me a fun fact"],
        "responses": [
            "Did you know? Honey never spoils. Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old!",
            "A fun fact: Octopuses have three hearts!"
        ]
    }
]


## Preprocessing the Data

In order to train our machine learning model, we need to convert the patterns into numerical data. This is done using TfidfVectorizer, which transforms text data into numerical vectors based on word frequency.

In [15]:
# Create the vectorizer
vectorizer = TfidfVectorizer()

# Prepare data for training
tags = []
patterns = []

# Loop through intents to populate tags and patterns
for intent in intents:
    for pattern in intent['patterns']:
        tags.append(intent['tag'])
        patterns.append(pattern)

# Vectorize patterns and extract tags
x = vectorizer.fit_transform(patterns)  # Features
y = tags  # Labels (intents)


## Training the Machine Learning Model

Logistic Regression is a widely used and effective machine learning algorithm, particularly well-suited for classification tasks. In the context of Natural Language Processing (NLP), it can be applied to identify the intent behind user inputs by classifying text data into predefined categories. Despite its simplicity, Logistic Regression is powerful due to its ability to model the probability of a particular class, making it ideal for binary as well as multi-class classification. The algorithm estimates the relationship between input features (in this case, text converted into numerical representations like TF-IDF) and the probability of a specific outcome (the user's intent). Its straightforward interpretation and efficiency make Logistic Regression a strong choice for many NLP problems, ensuring accurate and reliable intent classification, even in cases where data is limited or highly dimensional.

In [16]:
clf = LogisticRegression(random_state=0, max_iter=10000)
clf.fit(x, y)

## Building the Chatbot

Now that we have a trained model, we can move on to creating a function that will process user inputs and generate appropriate responses based on the predicted intent. The chatbot function follows a straightforward yet effective workflow to achieve this:

- Text Processing: First, the user’s input is transformed into a numerical representation using the trained TfidfVectorizer. This transformation converts the raw text into a feature vector that the machine learning model can understand, capturing the importance of each word relative to the input dataset.

- Intent Prediction: Once the text has been vectorized, the machine learning model—trained to recognize various user intents—makes a prediction. This prediction determines the underlying purpose of the user’s message, such as asking for help, making a request, or simply greeting the chatbot.

- Response Generation: Based on the predicted intent, the chatbot then selects a predefined response from a set of possible replies. To keep the conversation dynamic and engaging, the response is chosen randomly from the pool of responses associated with that specific intent. This ensures that the chatbot does not provide the same reply every time, enhancing the user experience by adding variety to the interaction.

By combining these steps, the chatbot function is able to process user input efficiently and respond in a meaningful and contextually appropriate manner, simulating a natural conversation flow.

In [17]:
def chatbot(input_text):
    input_text = vectorizer.transform([input_text])
    tag = clf.predict(input_text)[0]
    for intent in intents:
        if intent['tag'] == tag:
            response = random.choice(intent['responses'])
            return response


## Deploying the Chatbot

To deploy the chatbot with a user-friendly interface, we will use Streamlit, a framework that enables the creation of interactive web applications with minimal coding. Streamlit provides an intuitive way to build a simple web interface where users can directly interact with the chatbot. By integrating the chatbot into this interface, users will be able to input their queries and receive responses in real time. Streamlit handles the frontend and backend interaction seamlessly, allowing the chatbot to process user inputs and display appropriate responses within a clean and accessible web environment. This deployment method ensures that the chatbot is easily accessible and ready for use.

In [18]:
def main():
    st.title("End-to-End Chatbot")
    st.write("Type your message below to start chatting with the chatbot.")

    user_input = st.text_input("You:")

    if user_input:
        response = chatbot(user_input)
        st.text_area("Chatbot:", value=response, height=100)
        if response.lower() in ['goodbye', 'bye']:
            st.write("Thank you for chatting with me!")
            st.stop()

if __name__ == '__main__':
    main()


2024-09-25 00:52:49.890 
  command:

    streamlit run C:\Users\33765\miniconda3\Lib\site-packages\ipykernel_launcher.py [ARGUMENTS]
2024-09-25 00:52:49.900 Session state does not function when running a script without `streamlit run`
