
# FAQ Chatbot Computer Science Basics


### Project Description
This project implements an AI-powered FAQ Chatbot trained on a custom dataset 
containing 100+ Computer Science basic questions and answers.

The chatbot uses:
- Data Cleaning & Preprocessing
- TF-IDF Vectorization
- Cosine Similarity
- Simple NLP pipeline

The goal is to provide accurate responses to CS basic questions.


In [1]:
# IMPORT REQUIRED LIBRARIES

import pandas as pd
import numpy as np
import string

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


In [4]:
# LOAD DATASET
dataset_path = "mofe.csv"
df = pd.read_csv(dataset_path)

print("Dataset Loaded Successfully!")
print("Number of Questions:", len(df))
df.head(4)
df.tail(4)


Dataset Loaded Successfully!
Number of Questions: 100


Unnamed: 0,question,answer,category
96,What is a computer? (Basic 97),A computer is an electronic device that proces...,Basics
97,What is computer science? (Basic 98),Computer science is the study of computation a...,Basics
98,What are input devices? (Basic 99),Input devices are hardware used to enter data ...,Basics
99,What are output devices? (Basic 100),Output devices display processed information t...,Basics


In [6]:
# TEXT PREPROCESSING
df['question'] = df['question'].str.lower()

def clean_text(text):
    return text.translate(str.maketrans('', '', string.punctuation))

df['processed_question'] = df['question'].apply(clean_text)

print("Text preprocessing completed.")


Text preprocessing completed.


In [7]:
# TF-IDF VECTORIZATION
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = vectorizer.fit_transform(df['processed_question'])

print("TF-IDF Matrix Shape:", tfidf_matrix.shape)


TF-IDF Matrix Shape: (100, 122)


In [8]:
# CHATBOT RESPONSE
def chatbot_response(user_input):
    
    user_input = user_input.lower()
    user_input = clean_text(user_input)
    
    user_vector = vectorizer.transform([user_input])
    similarity_scores = cosine_similarity(user_vector, tfidf_matrix)
    
    best_match_index = np.argmax(similarity_scores)
    best_score = similarity_scores[0][best_match_index]
    
    if best_score < 0.3:
        return "I'm not sure about that. Please ask a Computer Science related question."
    
    return df.iloc[best_match_index]['answer']


In [10]:

print("FAQ Chatbot is now running!")
print("Type 'exit' to stop the chatbot.")

while True:
    user_query = input("You: ")
    
    if user_query.lower() == "exit":
        print("Chatbot: Goodbye!")
        break
    
    response = chatbot_response(user_query)
    print("Chatbot:", response)


FAQ Chatbot is now running!
Type 'exit' to stop the chatbot.


You:  ml


Chatbot: I'm not sure about that. Please ask a Computer Science related question.


You:  whatis artificl intelligence


Chatbot: AI is the simulation of human intelligence by machines.


You:  explain


Chatbot: I'm not sure about that. Please ask a Computer Science related question.


You:  what is a data


Chatbot: I'm not sure about that. Please ask a Computer Science related question.


You:  what is machine learning


Chatbot: Machine learning allows systems to learn from data.


You:  exit


Chatbot: Goodbye!


In [11]:

test_questions = [
    "What is machine learning?",
    "Explain database",
    "What is an operating system?",
    "Define algorithm",
    "What is Python?"
]

for question in test_questions:
    print("Question:", question)
    print("Answer:", chatbot_response(question))
    print("-" * 50)


Question: What is machine learning?
Answer: Machine learning allows systems to learn from data.
--------------------------------------------------
Question: Explain database
Answer: A database is an organized collection of data.
--------------------------------------------------
Question: What is an operating system?
Answer: I'm not sure about that. Please ask a Computer Science related question.
--------------------------------------------------
Question: Define algorithm
Answer: An algorithm is a step-by-step solution to a problem.
--------------------------------------------------
Question: What is Python?
Answer: Python is a high-level interpreted programming language.
--------------------------------------------------


# CONCLUSION


This project successfully demonstrates how a simple AI FAQ chatbot can be built using:
- NLP preprocessing
- TF-IDF feature extraction
 - Cosine similarity matching
- The chatbot retrieves the most relevant answer from the dataset based on similarity scoring.
### This system can be improved by:
- Adding deep learning models (LSTM, BERT)
- Deploying using Flask or Streamlit
- Hosting on cloud platformsV