# 🧠 Smart Score Calculator: An analysis of Free-Text User Responses
_Free-Text answer evaluation, Entity recognition, Automated scoring system_

---



## 1. Setup & Installation
Install dependencies and configure the environment.

In [1]:
!pip install datasets

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: C:\Users\saiha\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [2]:
pip install sentence-transformers

Defaulting to user installation because normal site-packages is not writeable
Collecting sentence-transformers
  Using cached sentence_transformers-4.1.0-py3-none-any.whl.metadata (13 kB)
Using cached sentence_transformers-4.1.0-py3-none-any.whl (345 kB)
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-4.1.0
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: C:\Users\saiha\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


---

## 2. Data Loading
Load datasets (custom CSV, SQUAD Datasets).

In [3]:
import pandas as pd

# my Dataset
file_path = "C:/Users/saiha/Desktop/spring 2025 semester/NLP/Grad checkpoints/quiz_score_calculator/pre_defined_questions.csv"
custom_df = pd.read_csv(file_path)

In [4]:
import pandas as pd
from datasets import load_dataset

# Load SQUAD dataset
squad_dataset = load_dataset("squad_v2", split="train")  # SQuAD Dataset: https://huggingface.co/datasets/rajpurkar/squad_v2
squad_df = pd.DataFrame({
    "question": squad_dataset["question"],
    "context": squad_dataset["context"],
    "squad_answer": [ans["text"][0] if ans["text"] else "" for ans in squad_dataset["answers"]]
})

# Merge datasets (outer join to include all questions)
merged_df = pd.merge(custom_df, squad_df, on="question", how="outer")

print("Columns in merged_df:", merged_df.columns.tolist())

# Prioritize my reference_answer if both exist
merged_df["final_answer"] = merged_df["reference_answer"].combine_first(merged_df["squad_answer"])

Columns in merged_df: ['question', 'reference_answer', 'context', 'squad_answer']



## 3. Question-answering model


In [5]:
from sentence_transformers import SentenceTransformer, util
from transformers import pipeline

# Load models
similarity_model = SentenceTransformer("all-MiniLM-L6-v2")  # Sentence-BERT Model: https://www.sbert.net/docs/pretrained_models.html
qa_model = pipeline("question-answering", model="deepset/bert-base-cased-squad2") # https://huggingface.co/deepset/bert-base-cased-squad2


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/508 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/433M [00:00<?, ?B/s]

Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/152 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cpu


---

## 4. Score Calculation
Sentence similarity(threshold>=0.4)

In [6]:
def score_answer(user_answer, question_row):
    # Extract answers from the merged dataset
    reference_answer = question_row["reference_answer"]
    squad_answer = question_row["squad_answer"]
    context = question_row["context"]

    scores = []

    # Calculate emb_user outside the conditional blocks to ensure it's always defined
    emb_user = similarity_model.encode(user_answer, convert_to_tensor=True)

    # Case 1: Check against your reference answer
    if pd.notna(reference_answer):
        emb_user = similarity_model.encode(user_answer, convert_to_tensor=True)
        emb_ref = similarity_model.encode(reference_answer, convert_to_tensor=True)
        similarity_ref = util.cos_sim(emb_user, emb_ref).item()
        scores.append(similarity_ref)

    # Case 2: Check against SQuAD’s answer (if context exists)
    if pd.notna(context) and pd.notna(squad_answer):
        # Extract SQuAD’s answer dynamically (for robustness)
        squad_result = qa_model(question=question_row["question"], context=context)
        squad_pred = squad_result["answer"]
        emb_squad = similarity_model.encode(squad_pred, convert_to_tensor=True)
        similarity_squad = util.cos_sim(emb_user, emb_squad).item()
        scores.append(similarity_squad)

    # Case 3: No answers available (edge case)
    if not scores:
        return 0.0

    # Return the highest score from available comparisons
    return max(scores)

## 5. Quiz execution Testing


In [7]:
# Run for check
import random

score=0
idx = 0

#not getting questions randomly from my dataset when i merged
#sampled_questions = merged_df.sample(n=3)

custom_samples = custom_df.sample(n=2)
squad_samples = squad_df.sample(n=2)
sampled_questions = pd.concat([custom_samples, squad_samples])

for idx,row in sampled_questions.iterrows():
    print(f"\nQuestion: {row['question']}")
    answer = input("Your answer: ")

    similarity_score = score_answer(answer, row)  # Calculating similarity score
    print(f"Similarity Score: {similarity_score}")
    #print(f"Reference Answer: {row['reference_answer']}")
    #print(f"SQUAD Answer: {row['squad_answer']}")


    if similarity_score>=0.4:
        print("✅ Correct!")
        score += 1
    else:
        print(f"❌ Incorrect. Expected keyword: my reference answer: {row['reference_answer']},SQUAD answer: {row['squad_answer']}")

print(f"\nYour total score: {score}/{len(sampled_questions)}")



Question: How do economic factors influence stock markets?


Your answer:  due to risen in the import and export of goods. which will rise to increase and inflation comes up


Similarity Score: 0.4946717619895935
✅ Correct!

Question: COVID-19 mostly affects which part of the body?


Your answer:  nose, tongue and heart


Similarity Score: 0.4296804666519165
✅ Correct!

Question: What do developers commonly do when creating software that can lead to failures?


Your answer:  no idea


Similarity Score: 0.18294477462768555
❌ Incorrect. Expected keyword: my reference answer: nan,SQUAD answer: lack of backward compatibility

Question: What was built in 1960?


Your answer:  France


Similarity Score: 0.058430083096027374
❌ Incorrect. Expected keyword: my reference answer: nan,SQUAD answer: 

Your total score: 2/4
