This is a chatbot built using logistic regression to handle queries, with the added twist of a training function where a superuser can add answers to unknown queries.

The chatbot is intend for use as a customer services bot to handle queries about teaching resources.

In [7]:
# modules for file handling
import pandas as pd
import numpy as np
import json

# modules for training the questions and responses
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report

# modules for running the chatbot
import nltk
from nltk.chat.util import Chat, reflections
from nltk.stem import WordNetLemmatizer

# datetime for timestamps
import datetime

# handling annoying warnings
import warnings
warnings.filterwarnings('ignore') 

<h1>Training the model</h1>

In [8]:
# Load the dataset
data = pd.read_csv('questions.csv')
data.head()

Unnamed: 0,question,intent
0,Hello!,greet_hello
1,Hi there!,greet_hello
2,Hey!,greet_hello
3,Good morning!,greet_hello
4,Good afternoon!,greet_hello


This dataset links likely questions to intents; the intents map to response options that are drawn from a JSON file.

In [9]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(data['question'], data['intent'], test_size=0.3, random_state=100)

# Vectorize the text data
vectorizer = TfidfVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train the model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_vectorized, y_train)

# Evaluate the model
y_pred = model.predict(X_test_vectorized)
print(classification_report(y_test, y_pred))

                                        precision    recall  f1-score   support

                 accessibility_options       0.00      0.00      0.00         1
additional_resources_solutions_manuals       0.00      0.00      0.00         0
       customize_difficulty_test_banks       0.00      0.00      0.00         1
                       customize_style       1.00      0.67      0.80         3
          difficulty_levels_test_banks       0.00      0.00      0.00         0
      errors_updates_solutions_manuals       1.00      1.00      1.00         2
            explanations_in_test_banks       0.50      1.00      0.67         1
                 format_lecture_slides       0.33      0.50      0.40         2
              format_solutions_manuals       1.00      0.67      0.80         3
                     format_test_banks       0.00      0.00      0.00         1
                           greet_hello       1.00      1.00      1.00         1
                  incorporate_feedback 

In [13]:
def predict_intent(user_input):
    user_input_vectorized = vectorizer.transform([user_input])
    intent = model.predict(user_input_vectorized)
    return intent[0]

# Example usage
user_input = "How many slides does a chapter have?"
predicted_intent = predict_intent(user_input)
print(user_input)
print(f"Predicted Intent: {predicted_intent}")

# Load responses from JSON file
with open('responses.json', 'r') as file:
    responses = json.load(file)

def get_response(intent):
    for item in responses:
        if item['intent'] == intent:
            return item['responses']
    return ["I'm sorry, I don't understand your question."]

# Example usage
response = get_response(predicted_intent)
print(response)

print('\n')

def chatbot_response(user_input):
    intent = predict_intent(user_input)
    responses = get_response(intent)
    return responses[0]

# Example conversation
user_input = "Can you tell me the format of the test banks?"
response = chatbot_response(user_input)
print(user_input)
print(f"Chatbot: {response}")

How many slides does a chapter have?
Predicted Intent: length_of_slides
['The typical length of lecture slides per chapter is about ____ slides.', "Each chapter's lecture slides usually contain around ____ slides."]


Can you tell me the format of the test banks?
Chatbot: Yes, we include explanations for ____ of the questions.


The above code tests the model using the responses in the JSON file.

<h1>Chatbot model</h1>

In [16]:
def run_chatbot():
    # Load the dataset
    data = pd.read_csv('questions.csv')

    # Load responses from JSON file
    with open('responses.json', 'r') as file:
        responses = json.load(file)

    # Preprocess and train the model
    lemmatizer = WordNetLemmatizer()

    # Vectorize the text data
    vectorizer = TfidfVectorizer()
    X_vectorized = vectorizer.fit_transform(data['question'])
    model = RandomForestClassifier(n_estimators=150, random_state=42)
    model.fit(X_vectorized, data['intent'])
    print('Model initialised')

    # Function to predict the intent with a probability threshold
    def predict_intent(user_input, threshold=0.5):
        user_input_vectorized = vectorizer.transform([user_input])
        probabilities = model.predict_proba(user_input_vectorized)
        max_prob = np.max(probabilities)
        print(max_prob)
        if max_prob >= threshold:
            intent = model.predict(user_input_vectorized)
            return intent[0]
        else:
            return None

    # Function to get response based on intent
    def get_response(intent):
        for item in responses:
            if item['intent'] == intent:
                return item['responses']
        return ["I'm sorry, I don't understand your question."]

    # Training mode functions
    def save_data_q(new_data, data_file):
        new_data.to_csv(data_file, mode='a', index=False, header=False)

    # Function to save data to JSON file
    def save_data_r(data, filename):
        with open(filename, 'r') as file:
            existing_data = json.load(file)

        for index, row in data.iterrows():
            intent = row['intent']
            response = row['response']
            found = False
            for entry in existing_data:
                if entry['intent'] == intent:
                    entry['responses'].append(response)
                    found = True
                    break
            if not found:
                # Add a new dictionary to the list if the intent was not found
                existing_data.append({'intent': intent, 'responses': [response]})

        with open(filename, 'w') as file:
            json.dump(existing_data, file, indent=4)

    # Chatbot function
    def chatbot_response(user_input, training_mode):
        intent = predict_intent(user_input)
        if intent is None:
            if not training_mode:
                return "I don't know the answer to that. Can you provide more details or ask another question?"
            if training_mode:
                print("I don't know the answer to that. Please provide the correct response.")
                question = user_input
                response = input("Correct response: ")
                old_data = pd.read_csv('questions.csv')
                intent_options = set(old_data['intent'])
                print(f"Current intent options are: {intent_options}")
                intent = input("Choose an intent option, or enter a new intent: ")
                new_data = pd.DataFrame({'question': [question],'intent': [intent],'response': [response]})
                save_data_q(new_data[['question', 'intent']], 'questions.csv')
                save_data_r(new_data, 'responses.json')
                return "Thanks! I'll remember that for next time."
        responses = get_response(intent)
        return responses[0]

    def save_chat_log(conversation):
        timestamp = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
        filename = f"chat_log_{timestamp}.txt"
        with open(filename, 'w') as file:
            for entry in conversation:
                file.write(f"{entry['user']}: {entry['message']}\n")
                file.write(f"{entry['bot']}: {entry['response']}\n")
        print(f"Chat log saved as {filename}")

    # Establish training mode
    training_mode = False

    # Chat with the bot
    print("Chatbot is ready to talk! Type 'quit' to stop.")
    conversation = []

    while True:
        user_input = input("You: ")
        if user_input.lower() == "quit":
            break
        if user_input.lower() == "enter training mode":
            training_mode = True
            print("Training mode entered. Quit and restart to update model.")
            continue
        response = chatbot_response(user_input, training_mode)
        print(f"Chatbot: {response}")

        # Store the conversation
        conversation.append({
                'user': 'You',
                'message': user_input,
                'bot': 'Chatbot',
                'response': response
            })

    # Save the chat log at the end of the conversation
    save_chat_log(conversation)
    return

In [17]:
run_chatbot()

Model initialised
Chatbot is ready to talk! Type 'quit' to stop.


You:  I need help with writing lecture slides


0.36
Chatbot: I don't know the answer to that. Can you provide more details or ask another question?


You:  enter training mode


Training mode entered. Quit and restart to update model.


You:  I need some help with writing lecture slides


0.58
Chatbot: I would love to help you write some lecture slides! Can you tell me exactly what you want to know?


You:  How many lecture slides do I need?


0.42
I don't know the answer to that. Please provide the correct response.


Correct response:  For each chapter, you need ___ slides.


Current intent options are: {'coverage_of_topics_test_banks', 'update_frequency_test_banks', 'difficulty_levels_test_banks', 'incorporate_feedback', 'customize_difficulty_test_banks', 'errors_updates_solutions_manuals', 'help_info', 'accessibility_options', 'unique_features', 'learning_objectives_test_banks', 'customize_style', 'length_of_slides', 'format_test_banks', 'greet_hello', 'format_solutions_manuals', 'type_of_questions_test_banks', 'greet_how_are_you', 'pineapples', 'step_by_step_solutions', 'explanations_in_test_banks', 'instructor_notes', 'additional_resources_solutions_manuals', 'slides_help_general', 'visual_aids', 'format_lecture_slides'}


Choose an intent option, or enter a new intent:  length_of_slides


Chatbot: Thanks! I'll remember that for next time.


You:  I want to know how many slides I need per chapter


0.7733333333333333
Chatbot: The typical length of lecture slides per chapter is about ____ slides.


You:  quit


Chat log saved as chat_log_2024-08-08_13-10-15.txt


The above shows how the chatbot handles questions. The propensity score is presented for transparency. If the chatbot doesn't know an obvious question, the user can enter training mode and teach the bot the correct answer. When the chatbot is reloaded, it will be retrained on the new data.

When the bot is quit, the chatlog is saved for review as a text file.

In [18]:
run_chatbot()

Model initialised
Chatbot is ready to talk! Type 'quit' to stop.


You:  I need help with writing lecture slides


0.36
Chatbot: I don't know the answer to that. Can you provide more details or ask another question?


You:  I need some help with writing lecture slides


0.58
Chatbot: I would love to help you write some lecture slides! Can you tell me exactly what you want to know?


You:  How many slides do I need?


0.7133333333333334
Chatbot: The typical length of lecture slides per chapter is about ____ slides.


You:  How many lecture slides do I need?


0.84
Chatbot: The typical length of lecture slides per chapter is about ____ slides.


You:  quit


Chat log saved as chat_log_2024-08-08_13-11-14.txt


It obviously needs a lot more training data to start with. I find using one of the generative AI models to generate variations of questions to be useful. But the training function is a cool way to make improvements on the fly, though sometimes you need to repeat yourself so it gets enough exposure to make a confident prediction.