University Student Support — Intent Detection

A machine learning project that classifies student questions into predefined intent categories. The system supports two model backends for comparison:

Classic ML — TF-IDF + best classifier selected via stratified cross-validation (scikit-learn)
BERT — bert-base-uncased fine-tuned for sequence classification (Hugging Face + PyTorch)

Intent Schema

The classifier recognizes 11 university support intents:

Intent	Example question
`ask_schedule`	What time does my calculus class start?
`ask_registration`	How do I register for courses next semester?
`ask_tuition`	When is the tuition payment deadline?
`ask_deadline`	What is the last day to withdraw from a course?
`ask_location`	Where is the registrar's office located?
`ask_contact`	How can I contact my academic advisor?
`ask_gpa`	What GPA do I need to maintain my scholarship?
`ask_courses`	What courses are required for a CS degree?
`ask_accommodation`	How do I apply for on-campus housing?
`greeting`	Hi, I need some help.
`goodbye`	Thanks, that was very helpful. Goodbye.

Dataset

File: data/intent-detection-train.jsonl
Format: one JSON object per line, fields text and label
Size: 333 examples — 38 for ask_contact (extended with informal/email-style queries), 30 for 9 original intents, 25 for ask_accommodation
Language: English
Balance: near-uniform; number of classes is inferred dynamically at train time

Splits are generated deterministically via stratified sampling:

Split	Default size
Train	70%
Validation	10%
Test	20%

Project Structure

university-intent-detection/
├── analysis/
│   ├── scripts/
│   │   ├── common.py                  # Shared I/O helpers
│   │   ├── eda_report.py              # Dataset summary + intent distribution
│   │   ├── robustness_experiments.py  # Lexical overlap + baseline CV stats
│   │   ├── generate_stress_test.py    # Rule-based perturbed test generation
│   │   └── run_all.py                 # Runs all analysis scripts in sequence
│   └── outputs/                       # Generated analysis artifacts
├── apps/
│   └── app.py                         # Streamlit demo app
├── config/
│   └── settings.yaml                  # Central configuration
├── data/
│   ├── intent-detection-train.jsonl   # Main dataset (333 examples, 11 intents)
│   ├── intent-detection-test-perturbed.jsonl  # Stress-test variants
│   └── splits/                        # Deterministic train/val/test splits
├── logs/                              # Runtime logs
├── models/                            # Saved model artifacts
├── outputs/evaluations/               # Structured evaluation outputs (JSON)
├── src/
│   ├── cli/main.py                    # Unified CLI (train/predict/evaluate/prepare-data/augment-data)
│   ├── config/                        # Config loader and typed settings
│   ├── data/                          # Dataset preparation and augmentation
│   ├── models/                        # base, classic, bert, registry
│   ├── services/                      # Training, evaluation, prediction orchestration
│   └── utils/                         # Dataset I/O, logging, run helpers
├── Dockerfile
├── Makefile
└── requirements.txt

Setup

python -m venv .venv
.venv\Scripts\python -m pip install --upgrade pip
.venv\Scripts\python -m pip install -r requirements.txt

Or with Makefile:

make install

Workflow

1. Prepare dataset splits

make prepare-data

Creates data/splits/train.jsonl, validation.jsonl, test.jsonl.

2. (Optional) Augment training data

make augment-data

Creates data/splits/train_expanded.jsonl with rule-based variants added to balance classes.

3. Train

make train MODEL=classic
make train MODEL=bert

Or directly:

python -m src.cli.main train --model classic --dataset data/intent-detection-train.jsonl
python -m src.cli.main train --model bert   --dataset data/intent-detection-train.jsonl

4. Predict

make predict MODEL=classic TEXT="How do I register for classes next semester?"
make predict MODEL=bert   TEXT="How do I register for classes next semester?"

Or directly:

python -m src.cli.main predict --model classic --text "How do I register for classes next semester?"
python -m src.cli.main predict --model bert   --text "How do I register for classes next semester?"

5. Evaluate

make evaluate MODEL=classic
make evaluate MODEL=bert

Structured evaluation artifacts are saved to outputs/evaluations/<model>/<timestamp>/:

5a. Robustness evaluation (optional)

Evaluates both models on a perturbed stress-test dataset and logs a side-by-side comparison. Requires the perturbed dataset to exist (run python analysis/scripts/generate_stress_test.py first).

python -m src.cli.main evaluate-robustness

Or with a custom dataset path:

python -m src.cli.main evaluate-robustness --dataset data/intent-detection-test-perturbed.jsonl

summary.json — run metadata and split info
metrics.json — accuracy, macro F1, weighted F1, precision, recall
classification_report.json — per-class precision/recall/F1
confusion_matrix.json — full confusion matrix
predictions.json — per-sample true vs predicted labels
error_analysis.json — per-class error breakdown and top confused label pairs

6. Run the Streamlit demo

make app

Or:

.venv\Scripts\python -m streamlit run apps/app.py

Docker

make docker-train   MODEL=classic
make docker-predict MODEL=classic TEXT="What time is my physics class?"
make docker-evaluate MODEL=classic

make docker-train   MODEL=bert
make docker-app

Exploratory Analysis

python analysis/scripts/run_all.py

Or individually:

python analysis/scripts/eda_report.py
python analysis/scripts/robustness_experiments.py
python analysis/scripts/generate_stress_test.py

Outputs are written to analysis/outputs/.

Models

Classic ML (TF-IDF + sklearn)

Vectorizer: TfidfVectorizer with custom tokenizer (whitespace/apostrophe split + contraction expansion)
Contraction expansion: 36-entry lookup table applied before tokenization (e.g. "don't" → "do not")
Candidates: Random Forest, Logistic Regression, SVM, Naive Bayes, Gradient Boosting, KNN
Selection: stratified k-fold cross-validation, best macro F1 wins
SVC fitted with probability=True (Platt scaling) to enable per-prediction confidence scores
Saved to: models/traditional/

BERT (`bert-base-uncased`)

Tokenizer: BertTokenizer
Model: BertForSequenceClassification fine-tuned on training split; num_labels inferred dynamically
Optimizer: AdamW with linear warmup scheduler (warmup = 10% of total training steps)
Validation loop: macro F1 computed on data/splits/validation.jsonl after each epoch
Early stopping: patience = 2 epochs; best checkpoint (highest val macro F1) is restored before saving
Confidence scores: softmax over logits, max probability returned alongside predicted intent
Saved to: models/bert/

BERT training output — UNEXPECTED and MISSING weights (normal)

When you run train --model bert for the first time, Hugging Face prints a message like this:

Some weights of BertForSequenceClassification were not initialized from the model checkpoint
at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a downstream task ...

Some weights of the model checkpoint at bert-base-uncased were not used when initializing
BertForSequenceClassification: ['cls.predictions.bias', ...]

This is normal. It is not an error.

Here is what each part means:

MISSING / "newly initialized"

classifier.weight and classifier.bias are the linear layer on top of BERT that maps the [CLS] token representation to your intent labels. bert-base-uncased was pre-trained on general English text for masked-token prediction — it has no classification layer. When the code calls BertForSequenceClassification.from_pretrained(..., num_labels=N), Hugging Face creates this layer fresh and initialises it randomly. It is supposed to be missing from the checkpoint — that is the whole point of fine-tuning. These weights are what get trained during your epochs.

UNEXPECTED / "not used"

bert-base-uncased ships with a masked language model head (cls.predictions.*) that was used during pre-training to predict masked tokens. BertForSequenceClassification has no slot for this head — it only needs the encoder. Those weights are discarded at load time. They are "unexpected" from the classification model's perspective.

The exact line that triggers both messages is in src/models/bert.py:

self.model = BertForSequenceClassification.from_pretrained(
    self.model_name, num_labels=self._num_labels  # bert.py:86-88
)

num_labels is derived dynamically two lines earlier:

self._num_labels = int(len(self.label_encoder.classes_))

The classifier head is always the right size for whatever labels are in your training split — no hardcoding.

When would these messages become a real problem?

The UNEXPECTED/MISSING messages are only a problem if:

bert.encoder.* or bert.embeddings.* appear in the MISSING list — that would mean the core encoder failed to load entirely.
A saved fine-tuned checkpoint is loaded into a fresh architecture with a different num_labels — PyTorch would raise a shape mismatch error, not just a warning.
A Python traceback appears in the lines immediately following these messages.

None of these apply when loading bert-base-uncased into BertForSequenceClassification for the first time.

Summary: training is only considered successful if it proceeds past this message and completes its epochs (or early-stops on validation F1) without raising a traceback. The UNEXPECTED/MISSING lines appear before epoch 1 and have no effect on anything that follows.

Notes

The number of output classes is inferred dynamically from the label encoder — no hardcoded class count.
Both models return a confidence score alongside the predicted intent; the Streamlit app displays High / Medium / Low confidence labels (≥90% / ≥70% / below 70%).
The Streamlit app uses @st.cache_resource so models are loaded once per server process, not on every button click.
Retraining is required after adding new intents; the pipeline automatically picks up new labels via LabelEncoder.fit_transform.
If Hugging Face rate limits apply, set the HF_TOKEN environment variable.
On Windows, symlink warnings from huggingface_hub are cosmetic; enable Developer Mode to suppress them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

University Student Support — Intent Detection

Intent Schema

Dataset

Project Structure

Setup

Workflow

1. Prepare dataset splits

2. (Optional) Augment training data

3. Train

4. Predict

5. Evaluate

5a. Robustness evaluation (optional)

6. Run the Streamlit demo

Docker

Exploratory Analysis

Models

Classic ML (TF-IDF + sklearn)

BERT (`bert-base-uncased`)

BERT training output — UNEXPECTED and MISSING weights (normal)

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
apps		apps
config		config
data		data
docs		docs
logs		logs
models		models
outputs/evaluations		outputs/evaluations
src		src
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
RUN_ME.md		RUN_ME.md
covered_concepts.md		covered_concepts.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

University Student Support — Intent Detection

Intent Schema

Dataset

Project Structure

Setup

Workflow

1. Prepare dataset splits

2. (Optional) Augment training data

3. Train

4. Predict

5. Evaluate

5a. Robustness evaluation (optional)

6. Run the Streamlit demo

Docker

Exploratory Analysis

Models

Classic ML (TF-IDF + sklearn)

BERT (bert-base-uncased)

BERT training output — UNEXPECTED and MISSING weights (normal)

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

BERT (`bert-base-uncased`)

Packages