# Intelligent Chatbot Development Project

## Introduction
This project builds a Smart Career Guidance Chatbot that:
- Understands user intent (e.g., “skills for data analyst”).
- Extracts entities (job titles, tools, skills).
- Retrieves or generates helpful responses.
- Adapts tone with basic sentiment awareness.

### Goals
1. Create a clean, reproducible notebook pipeline.
2. Start with a **retrieval-first** chatbot (fast, controllable).
3. Add NER + sentiment to personalize replies.
4. Keep everything version-controlled with Git.

### Milestones
- **M1:** Data loading + text cleaning
- **M2:** Embeddings + retrieval
- **M3:** Intent classifier (baseline)
- **M4:** NER + sentiment integration
- **M5:** Simple UI (Gradio) for demo


In [1]:
%pip install --quiet numpy pandas scikit-learn nltk spacy "sentence-transformers>=3.0.0" transformers gradio

Note: you may need to restart the kernel to use updated packages.


In [2]:
import nltk, sys
# Downloading of small NLTK packs
nltk.download("punkt", quiet=True)
nltk.download("stopwords", quiet=True)

import spacy
try:
    spacy.load("en_core_web_sm")
except OSError:
    !python -m spacy download en_core_web_sm

In [3]:
!pip install tf-keras

Collecting tf-keras
  Using cached tf_keras-2.20.1-py3-none-any.whl.metadata (1.8 kB)
Collecting tensorflow<2.21,>=2.20 (from tf-keras)
  Using cached tensorflow-2.20.0-cp310-cp310-win_amd64.whl.metadata (4.6 kB)
Collecting protobuf>=5.28.0 (from tensorflow<2.21,>=2.20->tf-keras)
  Using cached protobuf-6.33.0-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Collecting tensorboard~=2.20.0 (from tensorflow<2.21,>=2.20->tf-keras)
  Using cached tensorboard-2.20.0-py3-none-any.whl.metadata (1.8 kB)
Collecting keras>=3.10.0 (from tensorflow<2.21,>=2.20->tf-keras)
  Using cached keras-3.12.0-py3-none-any.whl.metadata (5.9 kB)
Collecting ml_dtypes<1.0.0,>=0.5.1 (from tensorflow<2.21,>=2.20->tf-keras)
  Using cached ml_dtypes-0.5.3-cp310-cp310-win_amd64.whl.metadata (9.2 kB)
Using cached tf_keras-2.20.1-py3-none-any.whl (1.7 MB)
Using cached tensorflow-2.20.0-cp310-cp310-win_amd64.whl (331.7 MB)
Downloading keras-3.12.0-py3-none-any.whl (1.5 MB)
   ---------------------------------------- 0.0/1.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-intel 2.18.0 requires ml-dtypes<0.5.0,>=0.4.0, but you have ml-dtypes 0.5.3 which is incompatible.
tensorflow-intel 2.18.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.3, but you have protobuf 6.33.0 which is incompatible.
tensorflow-intel 2.18.0 requires tensorboard<2.19,>=2.18, but you have tensorboard 2.20.0 which is incompatible.
tensorflow-metadata 1.16.1 requires protobuf<4.21,>=3.20.3; python_version < "3.11", but you have protobuf 6.33.0 which is incompatible.


In [4]:
import tf_keras as keras
import sys, platform
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import nltk, spacy

print("Python:", sys.version.split()[0], "| OS:", platform.system())
for name, mod in [
    ("numpy", np), ("pandas", pd),
    ("sentence-transformers", SentenceTransformer),
    ("transformers", AutoTokenizer),
    ("spacy", spacy), ("nltk", nltk),
]:
    try:
        v = mod.__version__ if hasattr(mod, "__version__") else "OK"
    except Exception:
        v = "OK"
    print(f"{name:22s} -> {v}")

# Quick sanity check: load a tiny embedding model
embedder = SentenceTransformer("all-MiniLM-L6-v2")
sample = ["hello world", "career in data science"]
emb = embedder.encode(sample, normalize_embeddings=True)
print("Embeddings shape:", emb.shape)





  from .autonotebook import tqdm as notebook_tqdm


Python: 3.10.16 | OS: Windows
numpy                  -> 2.0.2
pandas                 -> 2.2.3
sentence-transformers  -> OK
transformers           -> OK
spacy                  -> 3.8.7
nltk                   -> 3.9.2


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


Embeddings shape: (2, 384)


## Step 2 — Data Setup (2A: toy dataset)
We’ll create a tiny in-notebook dataset to verify our flow before using a real Kaggle dataset.


In [1]:
import pandas as pd

data = {
    "intent": [
        "ask_skills_for_data_science",
        "ask_skills_for_ai",
        "ask_job_recommendation",
        "ask_learning_path",
        "greeting",
        "goodbye",
    ],
    "patterns": [
        ["What skills do I need for data science?", "How to become a data scientist?"],
        ["What do I need to study for AI?", "What are AI skills?"],
        ["What job is good for me if I like numbers?", "I enjoy problem-solving, any job ideas?"],
        ["Where should I start learning data analysis?", "Best way to learn machine learning?"],
        ["Hi", "Hello", "Hey"],
        ["Bye", "See you later", "Goodbye"],
    ],
    "responses": [
        "Data Science needs skills like Python, SQL, Statistics, and Machine Learning.",
        "AI requires knowledge of Python, Neural Networks, and Data Modeling.",
        "You might enjoy careers like Data Analyst, Statistician, or Research Scientist.",
        "Start with Python, then learn data visualization and basic machine learning.",
        "Hello! How can I help you today?",
        "Goodbye! Keep learning and stay curious!",
    ],
}

df = pd.DataFrame(data)
df


Unnamed: 0,intent,patterns,responses
0,ask_skills_for_data_science,"[What skills do I need for data science?, How ...","Data Science needs skills like Python, SQL, St..."
1,ask_skills_for_ai,"[What do I need to study for AI?, What are AI ...","AI requires knowledge of Python, Neural Networ..."
2,ask_job_recommendation,"[What job is good for me if I like numbers?, I...","You might enjoy careers like Data Analyst, Sta..."
3,ask_learning_path,"[Where should I start learning data analysis?,...","Start with Python, then learn data visualizati..."
4,greeting,"[Hi, Hello, Hey]",Hello! How can I help you today?
5,goodbye,"[Bye, See you later, Goodbye]",Goodbye! Keep learning and stay curious!


In [2]:
rows = []
for _, row in df.iterrows():
    for pattern in row["patterns"]:
        rows.append({"text": pattern, "intent": row["intent"], "response": row["responses"]})

chatbot_df = pd.DataFrame(rows)
chatbot_df.head(10)


Unnamed: 0,text,intent,response
0,What skills do I need for data science?,ask_skills_for_data_science,"Data Science needs skills like Python, SQL, St..."
1,How to become a data scientist?,ask_skills_for_data_science,"Data Science needs skills like Python, SQL, St..."
2,What do I need to study for AI?,ask_skills_for_ai,"AI requires knowledge of Python, Neural Networ..."
3,What are AI skills?,ask_skills_for_ai,"AI requires knowledge of Python, Neural Networ..."
4,What job is good for me if I like numbers?,ask_job_recommendation,"You might enjoy careers like Data Analyst, Sta..."
5,"I enjoy problem-solving, any job ideas?",ask_job_recommendation,"You might enjoy careers like Data Analyst, Sta..."
6,Where should I start learning data analysis?,ask_learning_path,"Start with Python, then learn data visualizati..."
7,Best way to learn machine learning?,ask_learning_path,"Start with Python, then learn data visualizati..."
8,Hi,greeting,Hello! How can I help you today?
9,Hello,greeting,Hello! How can I help you today?
