<a href="https://colab.research.google.com/github/SyedShehry/Text_Generator_using_LSTM/blob/main/Python_Course_Chatbot_Trained_on_Both.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Course Chatbot
This notebook demonstrates how to build a chatbot using a formatted Q&A dataset and trains it on both questions and answers for enhanced performance.

In [None]:
# Import necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import pickle


## Load and Preprocess the Dataset

In [None]:
# Load the dataset
file_path = '/content/Formatted_Python_Course_QA_Dataset.csv'  # Replace with your dataset path
dataset = pd.read_csv(file_path, encoding='latin1')

# Remove prefixes like 'Q1:', 'A35:' from questions and answers
import re
def clean_text(text):
    return re.sub(r'\b[QA]\d+[:.]\s*', '', str(text)).strip()

dataset['question'] = dataset['question'].apply(clean_text)
dataset['answer'] = dataset['answer'].apply(clean_text)

# Handle missing values
questions = dataset['question'].fillna("").tolist()
answers = dataset['answer'].fillna("Sorry, no answer available.").tolist()


## Train TF-IDF Model on Combined Text

In [None]:
# Combine questions and answers for training
combined_text = [
    f"{q} {a}" for q, a in zip(dataset['question'], dataset['answer'])
]

# Train the TF-IDF vectorizer on the combined text
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(combined_text)

# Save the TF-IDF vectorizer and dataset for deployment
with open("tfidf_vectorizer.pkl", "wb") as f:
    pickle.dump(vectorizer, f)

with open("questions_answers.pkl", "wb") as f:
    pickle.dump({"questions": questions, "answers": answers}, f)

print("Model and data saved successfully.")

Model and data saved successfully.


## Define Chatbot Functions

In [None]:
def show_welcome_message():
    """Display a welcoming message when the chatbot starts."""
    welcome_message = (
        "🌟 Welcome to STEAM Minds! 🌟\n"
        "Your hub for learning, innovation, and discovery.\n"
        "I'm here to help you with your Python queries and more.\n"
        "Let's get started—ask me anything!"
    )
    print(welcome_message)

def get_response(user_input, threshold=0.2):
    """Fetch the most relevant answer based on user input, with a similarity threshold."""
    user_input_vec = vectorizer.transform([user_input])
    similarity = cosine_similarity(user_input_vec, X)
    best_match_idx = similarity.argmax()
    best_match_score = similarity[0, best_match_idx]

    if best_match_score >= threshold:
        return answers[best_match_idx]
    else:
        return (
            "🔍 At STEAM Minds, we strive for clarity and innovation!\n"
            "It seems your query is a bit out of my scope right now.\n"
            "Could you provide more details or ask something related to our Python courses?\n"
            "Together, we’ll discover the answers!"
        )

## Test the Chatbot

In [None]:

# Example usage
if __name__ == "__main__":
    show_welcome_message()

    while True:
        user_input = input("You: ")
        if user_input.lower() in ['exit', 'quit']:
            print("Bot: Thank you for visiting STEAM Minds. Have a great day! 🚀")
            break
        response = get_response(user_input)
        print(f"Bot: {response}")

🌟 Welcome to STEAM Minds! 🌟
Your hub for learning, innovation, and discovery.
I'm here to help you with your Python queries and more.
Let's get started—ask me anything!
You: python module
Bot: Module 2 of the Intermediate Level covers: Python loops with Input/Output and conditional statements. Python built-in functions and Input/Output. Python loops, strings, string methods, and functions. Python Input/Output with loops and different conditions. Python loops with formula-based tasks. Python plotting using the Turtle module. Commands and for loops in Python. Turtle programming to create shapes and patterns.
You: steam minds
Bot: STEAM Minds serves K-12 students worldwide, aiming to build future innovators and critical thinkers through its STEAM education programs.
You: python module 2
Bot: Module 2 of the Intermediate Level covers: Python loops with Input/Output and conditional statements. Python built-in functions and Input/Output. Python loops, strings, string methods, and functions. 

In [None]:
# Import necessary libraries
import pandas as pd
import re
from sentence_transformers import SentenceTransformer, util



In [None]:
# Load the dataset
file_path = 'Formatted_Python_Course_QA_Dataset.csv'  # Replace with your dataset path
dataset = pd.read_csv(file_path, encoding='latin1')

# Remove prefixes like 'Q1:', 'A35:' and normalize text
def clean_text(text):
    text = re.sub(r'\b[QA]\d+[:.]\s*', '', str(text))  # Remove prefixes
    return text.strip()

dataset['question'] = dataset['question'].apply(clean_text)
dataset['answer'] = dataset['answer'].apply(clean_text)

# Save cleaned dataset for future use
cleaned_file_path = 'Cleaned_Python_Course_QA_Dataset.csv'
dataset.to_csv(cleaned_file_path, index=False)
print(f"Cleaned dataset saved to {cleaned_file_path}")


Cleaned dataset saved to Cleaned_Python_Course_QA_Dataset.csv


In [None]:
# Load a pre-trained Sentence Transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Create embeddings for the cleaned questions
question_embeddings = model.encode(dataset['question'].tolist(), convert_to_tensor=True)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
def get_semantic_response(user_query, threshold=0.5):
    """Fetch the most relevant answer based on semantic similarity."""
    user_embedding = model.encode(user_query, convert_to_tensor=True)
    scores = util.cos_sim(user_embedding, question_embeddings)[0]
    best_match_idx = scores.argmax().item()  # Convert tensor index to integer
    if scores[best_match_idx] > threshold:
        return dataset['answer'].iloc[best_match_idx]
    else:
        return (
            "I'm sorry, I couldn't find a relevant answer.\n"
            "Could you try rephrasing your question or providing more details?"
        )


In [None]:
# Example usage
if __name__ == "__main__":
    print("🌟 Welcome to STEAM Minds! 🌟")
    print("Your hub for Python programming questions.")

    while True:
        user_input = input("You: ")
        if user_input.lower() in ['exit', 'quit']:
            print("Bot: Thank you for visiting STEAM Minds. Have a great day! 🚀")
            break
        response = get_semantic_response(user_input)
        print(f"Bot: {response}")


🌟 Welcome to STEAM Minds! 🌟
Your hub for Python programming questions.
You: python
Bot: This course is designed to teach Python programming, one of the most popular and easiest high-level programming languages to learn. Students will learn fundamental programming concepts and create their own programs. The course is designed to take students from a basic to an advanced level, making them experienced programmers.
You: python module 1
Bot: Module 1 covers topics such as Python variables and their usage, declaring variables, introduction to functions and loops, and exploring function arguments and types of statements.
You: steam minds
Bot: The company is led by CEO Mehtab Anwar Khalid, who has over seven years of experience in startups and a background in MSCS.
You: what is steam minds?
Bot: STEAM Minds' mission is to transform K-12 STEAM education by delivering AI-driven, interactive, simulative, and gamified learning experiences. The goal is to create an inclusive and immersive educatio

In [None]:
# Import necessary libraries
import pandas as pd
import re
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity as sk_cosine_similarity
from sentence_transformers import SentenceTransformer, util


In [None]:
# Load the dataset
file_path = 'Formatted_Python_Course_QA_Dataset.csv'  # Replace with your dataset path
dataset = pd.read_csv(file_path, encoding='latin1')

# Remove prefixes like 'Q1:', 'A35:' and normalize text
def clean_text(text):
    text = re.sub(r'\b[QA]\d+[:.]\s*', '', str(text))  # Remove prefixes
    return text.strip()

dataset['question'] = dataset['question'].apply(clean_text)
dataset['answer'] = dataset['answer'].apply(clean_text)

# Combine questions and answers into a single text field
dataset['combined_text'] = dataset['question'] + " " + dataset['answer']

# Save cleaned dataset for future use
cleaned_file_path = 'Cleaned_Python_Course_QA_Dataset.csv'
dataset.to_csv(cleaned_file_path, index=False)
print(f"Cleaned dataset saved to {cleaned_file_path}")


Cleaned dataset saved to Cleaned_Python_Course_QA_Dataset.csv


In [None]:
# Load a pre-trained Sentence Transformer model
semantic_model = SentenceTransformer('all-MiniLM-L6-v2')

# Create embeddings for the combined text
question_embeddings = semantic_model.encode(dataset['combined_text'].tolist(), convert_to_tensor=True)

# Create a Bag-of-Words model for keyword matching
bow_vectorizer = CountVectorizer()
bow_matrix = bow_vectorizer.fit_transform(dataset['combined_text'])


In [None]:
def get_hybrid_response(user_query, semantic_threshold=0.5, keyword_threshold=0.2):
    """Fetch the most relevant answer based on hybrid semantic and keyword matching."""
    # Step 1: Semantic Matching
    user_embedding = semantic_model.encode(user_query, convert_to_tensor=True)
    semantic_scores = util.cos_sim(user_embedding, question_embeddings)[0]
    best_semantic_idx = semantic_scores.argmax().item()
    if semantic_scores[best_semantic_idx] > semantic_threshold:
        return dataset['answer'].iloc[best_semantic_idx]  # Retrieve the answer

    # Step 2: Keyword Matching
    user_bow_vector = bow_vectorizer.transform([user_query])
    keyword_scores = sk_cosine_similarity(user_bow_vector, bow_matrix).flatten()
    best_keyword_idx = keyword_scores.argmax()
    if keyword_scores[best_keyword_idx] > keyword_threshold:
        return dataset['answer'].iloc[best_keyword_idx]  # Retrieve the answer

    # Step 3: Fallback Response
    return (
        "I'm sorry, I couldn't find a relevant answer.\n"
        "Could you try rephrasing your question or providing more details?"
    )


In [None]:
# Example usage
if __name__ == "__main__":
    print("🌟 Welcome to STEAM Minds! 🌟")
    print("Your hub for Python programming questions.")

    while True:
        user_input = input("You: ")
        if user_input.lower() in ['exit', 'quit']:
            print("Bot: Thank you for visiting STEAM Minds. Have a great day! 🚀")
            break
        response = get_hybrid_response(user_input)
        print(f"Bot: {response}")

🌟 Welcome to STEAM Minds! 🌟
Your hub for Python programming questions.
You: python
Bot: Module 1 of the Intermediate Level includes: Python Input/Output, operators, user input, and import. Python Input/Output with formula-based tasks. Using the Cmath module for complex mathematical operations. Python Input/Output with the Random module. Python Input/Output and different commands. Python Input/Output with different conditions using the 'if' command. Python for loops, lists, matrices, and NumPy arrays. Python for loops with conditional statements and Input/Output.
You: python course
Bot: This course is designed to teach Python programming, one of the most popular and easiest high-level programming languages to learn. Students will learn fundamental programming concepts and create their own programs. The course is designed to take students from a basic to an advanced level, making them experienced programmers.
You: steam minds
Bot: You can visit the STEAM Minds website at https://www.stea

In [None]:
# Import necessary libraries
import pandas as pd
import re
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer, util


In [None]:
# Load the dataset
file_path = '/content/Cleaned_Python_Course_QA_Dataset.csv'  # Replace with your dataset path
dataset = pd.read_csv(file_path, encoding='latin1')

# Remove prefixes like 'Q1:', 'A35:' and normalize text
def clean_text(text):
    text = re.sub(r'\b[QA]\d+[:.]\s*', '', str(text))  # Remove prefixes
    return text.strip()

dataset['question'] = dataset['question'].apply(clean_text)
dataset['answer'] = dataset['answer'].apply(clean_text)

# Combine questions and answers into a single text field
dataset['combined_text'] = dataset['question'] + " " + dataset['answer']

# Save cleaned dataset for future use
cleaned_file_path = 'Cleaned_Python_Course_QA_Dataset.csv'
dataset.to_csv(cleaned_file_path, index=False)
print(f"Cleaned dataset saved to {cleaned_file_path}")


Cleaned dataset saved to Cleaned_Python_Course_QA_Dataset.csv


In [None]:
# Load a pre-trained Sentence Transformer model
semantic_model = SentenceTransformer('all-MiniLM-L6-v2')

# Create embeddings for the combined text
semantic_embeddings = semantic_model.encode(dataset['combined_text'].tolist(), convert_to_tensor=True)


In [None]:
# Create a Bag-of-Words model for keyword matching
bow_vectorizer = CountVectorizer()
bow_matrix = bow_vectorizer.fit_transform(dataset['combined_text'])


In [None]:
# Create a TF-IDF model for weighted matching
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(dataset['combined_text'])


In [42]:
def get_hybrid_response(user_query, semantic_threshold=0.5, keyword_threshold=0.2, tfidf_threshold=0.2):
    """Fetch the most relevant answer based on Semantic Search, BoW, and TF-IDF."""

    # Step 1: Semantic Matching
    user_semantic_embedding = semantic_model.encode(user_query, convert_to_tensor=True)
    semantic_scores = util.cos_sim(user_semantic_embedding, semantic_embeddings)[0]
    best_semantic_idx = semantic_scores.argmax().item()
    if semantic_scores[best_semantic_idx] > semantic_threshold:
        return dataset['answer'].iloc[best_semantic_idx]

    # Step 2: Bag-of-Words Matching
    user_bow_vector = bow_vectorizer.transform([user_query])
    bow_scores = cosine_similarity(user_bow_vector, bow_matrix).flatten()
    best_bow_idx = bow_scores.argmax()
    if bow_scores[best_bow_idx] > keyword_threshold:
        return dataset['answer'].iloc[best_bow_idx]

    # Step 3: TF-IDF Matching
    user_tfidf_vector = tfidf_vectorizer.transform([user_query])
    tfidf_scores = cosine_similarity(user_tfidf_vector, tfidf_matrix).flatten()
    best_tfidf_idx = tfidf_scores.argmax()
    if tfidf_scores[best_tfidf_idx] > tfidf_threshold:
        return dataset['answer'].iloc[best_tfidf_idx]

    # Step 4: Fallback Response
    return "I'm sorry, I couldn't find a relevant answer. Could you try rephrasing your question?"


In [None]:
if __name__ == "__main__":
    print("🌟 Welcome to STEAM Minds! 🌟")
    print("Your hub for Python programming questions.")

    while True:
        user_input = input("You: ")
        if user_input.lower() in ['exit', 'quit']:
            print("Bot: Thank you for visiting STEAM Minds. Have a great day! 🚀")
            break
        response = get_hybrid_response(user_input)
        print(f"Bot: {response}")


🌟 Welcome to STEAM Minds! 🌟
Your hub for Python programming questions.
You: python
Bot: Module 1 of the Intermediate Level includes: Python Input/Output, operators, user input, and import. Python Input/Output with formula-based tasks. Using the Cmath module for complex mathematical operations. Python Input/Output with the Random module. Python Input/Output and different commands. Python Input/Output with different conditions using the 'if' command. Python for loops, lists, matrices, and NumPy arrays. Python for loops with conditional statements and Input/Output.
You: python course
Bot: The Python Programming course teaches students how to code using Python, one of the most popular and versatile programming languages. It covers topics like variables, conditions, loops, functions, Object-Oriented Programming (OOP), modules, libraries, file I/O, GUI creation, and game/web development. The course aims to equip students with the skills to write Python programs and apply them to real-world a

In [41]:
from keras import backend as K
K.clear_session()
print("TensorFlow backend cleared.")


TensorFlow backend cleared.
