# UniAssist – Safety and Scope Control Notebook

## Purpose
This notebook introduces safety mechanisms to the UniAssist system.
Its goal is to ensure that answers are returned only when the system
is sufficiently confident and that out-of-scope queries are handled
gracefully.

This prevents incorrect or misleading responses and is essential
for professional and commercial-grade behavior.

---

## Why Safety Is Required
Semantic similarity systems always return a “closest match”,
even when a question is unrelated.
Without safety checks, this could lead to wrong answers.

This notebook ensures that:
- Low-confidence matches are rejected
- Out-of-scope questions are handled politely
- User trust is maintained

---

## What This Notebook Does
✔ Introduces similarity thresholds  
✔ Defines safe fallback responses  
✔ Adds scope control logic  
✔ Improves system reliability  

---

## What This Notebook Does NOT Do
✘ Train machine learning models  
✘ Change the dataset  
✘ Generate new facts  
✘ Deploy an application  

Safety is handled independently of modeling.


# Stage 1

In [None]:
import os
os.listdir()


['.config', 'UniAssist_training_data.csv', 'sample_data']

In [None]:
import pandas as pd
import numpy as np

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity




In [None]:
qa_frame = pd.read_csv("UniAssist_training_data.csv")

retrieval_questions = qa_frame["question"].astype(str).tolist()
retrieval_answers = qa_frame["answer"].astype(str).tolist()


In [None]:
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
question_embeddings = embedding_model.encode(
    retrieval_questions,
    show_progress_bar=True
)


Batches:   0%|          | 0/34 [00:00<?, ?it/s]

In [None]:
SIMILARITY_THRESHOLD = 0.65


In [None]:
SAFE_FALLBACK_MESSAGE = (
    "I’m sorry, I don’t have reliable information on this topic. "
    "UniAssist currently handles academic and internship-related queries only."
)


In [None]:
def safe_retrieve_answer(user_query):
    """
    Retrieves an answer only if similarity confidence
    exceeds the defined threshold.
    """
    query_vector = embedding_model.encode([user_query])
    similarity_scores = cosine_similarity(query_vector, question_embeddings)[0]

    best_index = similarity_scores.argmax()
    best_score = similarity_scores[best_index]

    if best_score < SIMILARITY_THRESHOLD:
        return SAFE_FALLBACK_MESSAGE, best_score

    return retrieval_answers[best_index], best_score


In [None]:
question = "What is the minimum attendance requirement?"
answer, score = safe_retrieve_answer(question)

print("Question:", question)
print("Answer:", answer)
print("Similarity Score:", score)


Question: What is the minimum attendance requirement?
Answer: Students are required to maintain a minimum of 75% overall attendance and at least 60% attendance in each subject for all programs.
Similarity Score: 1.0000001


In [None]:
question = "What is the weather today?"
answer, score = safe_retrieve_answer(question)

print("Question:", question)
print("Answer:", answer)
print("Similarity Score:", score)


Question: What is the weather today?
Answer: I’m sorry, I don’t have reliable information on this topic. UniAssist currently handles academic and internship-related queries only.
Similarity Score: 0.36113927


## Notebook Summary — Safety and Scope Control

In this notebook:
- A similarity confidence threshold was introduced
- Low-confidence matches are rejected safely
- Out-of-scope queries are handled gracefully
- The retrieval system was made reliable and professional

This step ensures UniAssist behaves responsibly and predictably.
