# Other Models Tried
This notebook explores alternative models for retrieval and generation.<br>
#### ***Explored Models:***
***Embedding Model:*** <br>sentence-transformers/all-MiniLM-L6-v2.<br>
***Cross-Encoders:***<br>
nli-roberta-base for efficient re-ranking.<br>
ms-marco-MiniLM-L-12-v2 for deeper semantic understanding.<br>
***Generation:***<br>
T5 (google/flan-t5-large) used for response generation.<br>
***Insights:***<br>
While these models performed well, GPT-3.5 combined with mpnet-base-v2 embeddings and hybrid retrieval consistently outperformed them in terms of precision, coherence, and response quality.

In [17]:
!pip install sentence-transformers faiss-gpu
!pip install faiss-cpu rank-bm25 sentence-transformers
!pip install  bert-score



In [18]:
from bert_score import score
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import numpy as np

## **Data Loading from Github**

In [19]:
import requests
import json

# GitHub repository details
repo_owner = "Sritejam"
repo_name = "datahub"
folder_path = "LLM/processed_documents"

# Function to fetch JSON files from GitHub API
def fetch_files_from_github_folder(owner, repo, folder):
    base_url = f"https://api.github.com/repos/{owner}/{repo}/contents/{folder}"
    try:
        response = requests.get(base_url)
        response.raise_for_status()
        file_urls = []
        for item in response.json():
            if item['type'] == 'file' and item['name'].endswith('.json'):
                file_urls.append(item['download_url'])
        return file_urls
    except requests.exceptions.RequestException as e:
        print(f"Error fetching folder contents from GitHub: {e}")
        return []


# Fetch list of JSON files in the folder
document_files = fetch_files_from_github_folder("Sritejam", "datahub", "LLM/processed_documents")

# Initialize a list to store all text chunks
documents = []

def process_chunk(chunk):
    processed_text = []
    for key, value in chunk.items():
        if isinstance(value, list):
            for item in value:
                if isinstance(item, dict):
                    section_info = "; ".join(f"{k}: {v}" for k, v in item.items())
                    processed_text.append(f"{key} - {section_info}")
        elif isinstance(value, dict):
            nested_info = "; ".join(f"{k}: {v}" for k, v in value.items())
            processed_text.append(f"{key} - {nested_info}")
        else:
            processed_text.append(f"{key}: {value}")

    return " | ".join(processed_text)


# Correct FAQ URL to raw content
faq_url = "https://raw.githubusercontent.com/Sritejam/datahub/main/LLM/FAQ.json"

# Function to fetch and parse JSON from raw GitHub URLs
def fetch_json_from_github(raw_url):
    try:
        response = requests.get(raw_url)
        response.raise_for_status()  # Check for HTTP errors
        return response.json()  # Parse JSON content
    except requests.exceptions.RequestException as e:
        print(f"Error fetching file from GitHub: {raw_url}")
        print(f"Error message: {e}")
        return None
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON from file: {raw_url}")
        print(f"Error message: {e}")
        return None



# Fetch and process each document file
for file_url in document_files:
    data = fetch_json_from_github(file_url)
    if data:
        chunk_text = data.get('ChunkText')
        if isinstance(chunk_text, dict):
            documents.append(process_chunk(chunk_text))
        elif isinstance(chunk_text, str):
            documents.append(chunk_text)
        else:
            print(f"Loaded URL: {file_url}")

# Check the number of documents loaded
print(f"Loaded {len(documents)} documents.")

# Load preprocessed FAQs
faqs = fetch_json_from_github(faq_url)
if faqs:
    print(f"Loaded {len(faqs)} FAQs.")
else:
    print("Failed to load FAQs.")



Loaded URL: https://raw.githubusercontent.com/Sritejam/datahub/main/LLM/processed_documents/About_OISS.docx_chunk0.json
Loaded URL: https://raw.githubusercontent.com/Sritejam/datahub/main/LLM/processed_documents/Admitted_Student_Events_and_Orientation.docx_chunk0.json
Loaded URL: https://raw.githubusercontent.com/Sritejam/datahub/main/LLM/processed_documents/Admitted_Student_Events_and_Orientation.docx_chunk1.json
Loaded URL: https://raw.githubusercontent.com/Sritejam/datahub/main/LLM/processed_documents/Applying_to_UMBC.docx_chunk0.json
Loaded URL: https://raw.githubusercontent.com/Sritejam/datahub/main/LLM/processed_documents/Applying_to_UMBC.docx_chunk1.json
Loaded URL: https://raw.githubusercontent.com/Sritejam/datahub/main/LLM/processed_documents/CPT.docx_chunk0.json
Loaded URL: https://raw.githubusercontent.com/Sritejam/datahub/main/LLM/processed_documents/CPT.docx_chunk1.json
Loaded URL: https://raw.githubusercontent.com/Sritejam/datahub/main/LLM/processed_documents/CPT.docx_chu

## **Embedding model**

In [20]:
!pip install huggingface_hub transformers
from huggingface_hub import login

# Log in to Hugging Face
login(token="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")




In [21]:
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer, CrossEncoder
from rank_bm25 import BM25Okapi

# Load models
embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')


cross_encoder = CrossEncoder('cross-encoder/nli-roberta-base')  # Efficient cross-encoder
bert_reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-12-v2')  # High-accuracy reranker



print("Models loaded successfully!")


# Tokenizing the documents for BM25
tokenized_documents = [doc.split(" ") for doc in documents]

# Initialize BM25
bm25 = BM25Okapi(tokenized_documents)

# Generating embeddings for FAISS with FlatL2
def prepare_flat_faiss_index(documents):
    embeddings = np.array([embedding_model.encode(text, convert_to_tensor=False) for text in documents]).astype("float32")
    embedding_dim = embeddings.shape[1]

    # Creating a FlatL2 index (no clustering)
    index = faiss.IndexFlatL2(embedding_dim)  # L2 distance
    index.add(embeddings)  # Add embeddings to the index
    print(f"Flat FAISS index built with {index.ntotal} vectors.")
    return index, embeddings

# Building a FAISS FlatL2 index
faiss_index, document_embeddings = prepare_flat_faiss_index(documents)

# Mapping document metadata retrieval
document_metadata = {i: {"Text": doc} for i, doc in enumerate(documents)}

# Hybrid Retrieval
def hybrid_retrieval(query, top_k=3):
    # Sparse Retrieval with BM25
    bm25_scores = bm25.get_scores(query.split(" "))
    top_bm25_indices = np.argsort(bm25_scores)[::-1][:top_k]
    sparse_results = [{"Text": documents[idx], "BM25_Score": bm25_scores[idx]} for idx in top_bm25_indices]

    # Dense Retrieval with FAISS
    query_embedding = embedding_model.encode(query, convert_to_tensor=False).astype("float32").reshape(1, -1)
    distances, indices = faiss_index.search(query_embedding, top_k)
    dense_results = [
        {"Text": document_metadata[idx]["Text"], "FAISS_Distance": distances[0][i]}
        for i, idx in enumerate(indices[0])
    ]

    # Combine Sparse and Dense Results

    seen_texts = set()
    combined_results = []
    for result in sparse_results + dense_results:
        if result["Text"] not in seen_texts:
            combined_results.append(result)
            seen_texts.add(result["Text"])

    # Re-rank results using BERT
    rerank_inputs = [(query, result["Text"]) for result in combined_results]
    rerank_scores = bert_reranker.predict(rerank_inputs)
    for i, result in enumerate(combined_results):
        result["Rerank_Score"] = rerank_scores[i]
        doc_embedding = embedding_model.encode(result["Text"], convert_to_tensor=False).astype("float32").reshape(1, -1)
        result["Cosine_Similarity"] = cosine_similarity(query_embedding, doc_embedding)[0][0]


    # Sort by Rerank_Score
    final_results = sorted(combined_results, key=lambda x: x["Rerank_Score"], reverse=True)
    return final_results



Models loaded successfully!
Flat FAISS index built with 493 vectors.


In [22]:

# Example Query
query = "What orientation events does UMBC offer to new international students?"

# Get results
results = hybrid_retrieval(query, top_k=3)

# Print Top Results
print("Top Results:")
for result in results[:3]:
    print(f"Text: {result['Text']}, Rerank Score: {result['Rerank_Score']}, FAISS Distance: {result.get('FAISS_Distance', 'N/A')}, BM25 Score: {result.get('BM25_Score', 'N/A')}")

Top Results:
Text: international student orientation international student orientation introduces you to umbc from a holistic perspective. you will meet other new students, current international students, and umbc staff from around campus who will be able to help you as you begin your career at umbc. well be covering a variety of topics during orientation such as: how to work with isss ?, Rerank Score: 8.275500297546387, FAISS Distance: N/A, BM25 Score: 15.285302998752528
Text: please note in the comments section that you are currently transferring to a new school, and please wait to receive an approval on your verification request before your visit to the mva. international student orientation all f-1 students new to umbc are expected to attend the international rules and policy meeting irpm in addition to undergraduate or graduate orientation., Rerank Score: 7.449615478515625, FAISS Distance: N/A, BM25 Score: 12.978256262396958
Text: these opportunities for intercultural exchange and

In [23]:

# Example Query
query = "Do I need a job offer before applying for STEM OPT?"

# Get results
results = hybrid_retrieval(query, top_k=3)

# Print Top Results
print("Top Results:")
for result in results[:3]:
    print(f"Text: {result['Text']}, Rerank Score: {result['Rerank_Score']}, FAISS Distance: {result.get('FAISS_Distance', 'N/A')}, BM25 Score: {result.get('BM25_Score', 'N/A')}")

Top Results:
Text: to dos: watch a previous OPT workshop presentation requires myumbc login and review the slides download a draft i-765 application submit a complete OPT request in the isss portal remember: you do not need a job offer before applying!, Rerank Score: 5.909634113311768, FAISS Distance: N/A, BM25 Score: 14.630419686059854
Text: stem OPT extensions are approved by u.s. citizen and immigration services uscis, but also require support from the university isss office. once approved for stem opt, students can work an additional 24 months in paid positions after their year of regular opt., Rerank Score: 4.313155651092529, FAISS Distance: 0.5993536710739136, BM25 Score: N/A
Text: this option should be used very carefully, as students interested in using the stem extension should complete their degree and have it awarded before their first year of OPT runs out, to be eligible to apply for the extension., Rerank Score: 4.131031513214111, FAISS Distance: 0.5850527882575989, BM25 S

In [24]:

# Example Query
query = "Can I take a course while on OPT?"

# Get results
results = hybrid_retrieval(query, top_k=3)

# Print Top Results
print("Top Results:")
for result in results[:3]:
    print(f"Text: {result['Text']}, Rerank Score: {result['Rerank_Score']}, FAISS Distance: {result.get('FAISS_Distance', 'N/A')}, BM25 Score: {result.get('BM25_Score', 'N/A')}")

Top Results:
Text: please also note that students on OPT cannot begin a new degree program in the US while on OPT your I-20 must be transferred to begin a new degree program. coursework outside of a degree can be pursued during opt, such as coursework for personal enjoyment, a certificate program, or preparatory courses, such as pre-requisites, for a new degree program., Rerank Score: 8.394672393798828, FAISS Distance: 0.5969669818878174, BM25 Score: N/A
Text: travel abroad and visa renewal during OPT travelling abroad during OPT or stem OPT you can absolutely travel abroad during opt, but the there are a few additional pieces to prepare than the typical documents required for current students., Rerank Score: 1.6838610172271729, FAISS Distance: 0.6722770929336548, BM25 Score: N/A
Text: opt: post-graduation work authorization students who plan on working in a paid position after completing their program must receive work authorization. OPT optional practical training is a type of work p

##**FLAN-T5-Large**

In [25]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Loading FLAN-T5 model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
generator_model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large")

def generate_faq_response(query, retrieved_texts):
    """
    Generate a detailed, FAQ-style response using FLAN-T5 based on the retrieved texts.
    The query is used only as a guide for tone and focus. The response summarizes the
    retrieved documents in a structured manner.
    """

    input_text = (
        f"Using the context provided, generate a detailed response to the query: '{query}'. "
        f"Summarize and focus only on the provided context: {retrieved_texts}"
    )

    # Tokenizing the input for FLAN-T5
    input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=1024, truncation=True)

    # Generate the response
    outputs = generator_model.generate(
        input_ids,
        max_length=512,
        num_beams=7,          # Higher beam count for diverse and accurate output
        early_stopping=True,
        repetition_penalty=1.2  # Penalize repetitive output
    )

    # Decoding the generated response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.strip()


In [27]:
# Example Query
query = "What orientation events does UMBC offer to new international students?"

# Get results
results = hybrid_retrieval(query, top_k=3)


retrieved_texts = " ".join([result["Text"] for result in results])
print(retrieved_texts)
# Generate FAQ-style response
faq_response = generate_faq_response(query, retrieved_texts)

# Print Results
print("Response:")
print(faq_response)

# Print Top Results
print("Top Results:")
for result in results[:3]:
    print(f" Text: {result['Text']}, Rerank Score: {result['Rerank_Score']}, FAISS Distance: {result.get('FAISS_Distance', 'N/A')}, BM25 Score: {result.get('BM25_Score', 'N/A')}")


international student orientation international student orientation introduces you to umbc from a holistic perspective. you will meet other new students, current international students, and umbc staff from around campus who will be able to help you as you begin your career at umbc. well be covering a variety of topics during orientation such as: how to work with isss ? please note in the comments section that you are currently transferring to a new school, and please wait to receive an approval on your verification request before your visit to the mva. international student orientation all f-1 students new to umbc are expected to attend the international rules and policy meeting irpm in addition to undergraduate or graduate orientation. these opportunities for intercultural exchange and mutual understanding enhance the umbc international student and scholar experience. student work and internships oiss works with current students and alumni to help them access the many employment and i

In [28]:
# Example Query
query = "Do I need a job offer before applying for STEM OPT?"

# Get results
results = hybrid_retrieval(query, top_k=3)


retrieved_texts = " ".join([result["Text"] for result in results])
print(retrieved_texts)
# Generate FAQ-style response
faq_response = generate_faq_response(query, retrieved_texts)

# Print Results
print("Response:")
print(faq_response)

# Print Top Results
print("Top Results:")
for result in results[:3]:
    print(f" Text: {result['Text']}, Rerank Score: {result['Rerank_Score']}, FAISS Distance: {result.get('FAISS_Distance', 'N/A')}, BM25 Score: {result.get('BM25_Score', 'N/A')}")


to dos: watch a previous OPT workshop presentation requires myumbc login and review the slides download a draft i-765 application submit a complete OPT request in the isss portal remember: you do not need a job offer before applying! stem OPT extensions are approved by u.s. citizen and immigration services uscis, but also require support from the university isss office. once approved for stem opt, students can work an additional 24 months in paid positions after their year of regular opt. this option should be used very carefully, as students interested in using the stem extension should complete their degree and have it awarded before their first year of OPT runs out, to be eligible to apply for the extension. keep excellent records keep a copy of all job offer letters, and make sure the dates match what you report via your sevp portal or your i-983 for stem OPT tip: you can upload job offer letters and i-983 training plans to the isss portal for easy access! if you arent able to acce

In [29]:
# Example Query
query = "Can I take a course/program while on OPT?"

# Get results
results = hybrid_retrieval(query, top_k=3)


retrieved_texts = " ".join([result["Text"] for result in results])
print(retrieved_texts)
# Generate FAQ-style response
faq_response = generate_faq_response(query, retrieved_texts)

# Print Results
print("Response:")
print(faq_response)

# Print Top Results
print("Top Results:")
for result in results[:3]:
    print(f" Text: {result['Text']}, Rerank Score: {result['Rerank_Score']}, FAISS Distance: {result.get('FAISS_Distance', 'N/A')}, BM25 Score: {result.get('BM25_Score', 'N/A')}")


please also note that students on OPT cannot begin a new degree program in the US while on OPT your I-20 must be transferred to begin a new degree program. coursework outside of a degree can be pursued during opt, such as coursework for personal enjoyment, a certificate program, or preparatory courses, such as pre-requisites, for a new degree program. opt: post-graduation work authorization students who plan on working in a paid position after completing their program must receive work authorization. OPT optional practical training is a type of work permission available for f-1 international students who finish their program requirements. OPT is approved by u.s. citizen and immigration services uscis, with support from the university isss office. travel abroad and visa renewal during OPT travelling abroad during OPT or stem OPT you can absolutely travel abroad during opt, but the there are a few additional pieces to prepare than the typical documents required for current students. if y

## **Evaluation**

In [30]:
from bert_score import score
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import numpy as np

In [31]:
def evaluate_all_faqs_with_mean(faqs, top_k=3):
    results = []

    for idx, faq in enumerate(faqs):  # Iterate over all FAQs
        query = faq["question"]
        gold_answer = faq["answer"]
        print(f"\nProcessing FAQ #{idx + 1}")
        print(f"Query: {query}")

        # Step 1: Retrieve context
        retrieved_results = hybrid_retrieval(query, top_k=top_k)

        # Extract cosine similarities
        cosine_similarities = [result["Cosine_Similarity"] for result in retrieved_results]

        # Compute retrieval metric (mean similarity score)
        retrieval_score = np.mean(cosine_similarities)
        print(f"Retrieval Metric (Mean Cosine Similarity): {retrieval_score:.4f}")

        # Top retrieved context for generation
        retrieved_context = " ".join([result["Text"] for result in retrieved_results])

        # Debug: Retrieved results with cosine similarities
        print("Top Retrieved Results with Cosine Similarities:")
        for i, (text, cos_sim) in enumerate(zip([r["Text"] for r in retrieved_results], cosine_similarities)):
            print(f"Top-{i+1} Text: {text}\nCosine Similarity: {cos_sim:.4f}\n")

        # Step 2: Generate response using the retrieved context
        generated_response = generate_faq_response(query, retrieved_texts=retrieved_context)

        # Debug: Generated response
        print(f"Generated Response:\n{generated_response}")

        # Step 3: Evaluate the generated response against the gold answer
        P, R, F1 = score([generated_response], [gold_answer], lang="en", verbose=False)

        # Debug: BERTScore metrics
        print(f"BERTScore - Precision: {P.mean().item():.4f}, Recall: {R.mean().item():.4f}, F1: {F1.mean().item():.4f}")

        # Collect results
        results.append({
            "query": query,
            "gold_answer": gold_answer,
            "retrieved_context": retrieved_context,
            "generated_response": generated_response,
            "retrieval_score": retrieval_score,  # Mean cosine similarity
            "bertscore_precision": P.mean().item(),
            "bertscore_recall": R.mean().item(),
            "bertscore_f1": F1.mean().item(),
        })

    # Convert results to DataFrame for better visualization
    results_df = pd.DataFrame(results)

    # Compute mean scores for each metric
    mean_retrieval_score = results_df["retrieval_score"].mean()
    mean_precision = results_df["bertscore_precision"].mean()
    mean_recall = results_df["bertscore_recall"].mean()
    mean_f1 = results_df["bertscore_f1"].mean()

    # Print mean scores
    print("\nMean Scores Across All 20 FAQs:")
    print(f"Mean Retrieval Score (Cosine Similarity): {mean_retrieval_score:.4f}")
    print(f"Mean BERTScore Precision: {mean_precision:.4f}")
    print(f"Mean BERTScore Recall: {mean_recall:.4f}")
    print(f"Mean BERTScore F1: {mean_f1:.4f}")

    return results_df, {
        "mean_retrieval_score": mean_retrieval_score,
        "mean_precision": mean_precision,
        "mean_recall": mean_recall,
        "mean_f1": mean_f1,
    }


In [32]:
# Run the evaluation and get results
faq_results, mean_scores = evaluate_all_faqs_with_mean(faqs, top_k=3)

# Save results to CSV
faq_results.to_csv("evaluation_results.csv", index=False)

# Display mean scores
print("\nMean Scores:")
print(mean_scores)



Processing FAQ #1
Query: What services does the ISSS office provide?
Retrieval Metric (Mean Cosine Similarity): 0.4626
Top Retrieved Results with Cosine Similarities:
Top-1 Text: common examples are work done in any campus office, the library or rac gym, food services, the book store, or graduate assistantship positions including tas, ras and gas. a few exceptions exist please contact the isss office if you are not sure if a job you are considering would classify as on campus.
Cosine Similarity: 0.4893

Top-2 Text: the office offers a variety of programming and services to assist students and scholars in pursuit of their academic, personal, and professional goals.
Cosine Similarity: 0.4486

Top-3 Text: isss office closed on thursday, november 7 if you have a question or concern, please submit a help ticket below and our team will respond as soon as we can. thank you! help ticket the isss office does not respond to emails. instead, we use a help ticket system to send and receive messag

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8956, Recall: 0.8870, F1: 0.8913

Processing FAQ #2
Query: How can I contact the ISSS office?
Retrieval Metric (Mean Cosine Similarity): 0.4383
Top Retrieved Results with Cosine Similarities:
Top-1 Text: if you have entered the address perfectly and it is not accepted, please contact us on isss website your name, birthday, or other personal details that you can see in the portal are incorrect: please contact us if you notice any errors to information in the portal that you do not have access to.
Cosine Similarity: 0.6435

Top-2 Text: isss office closed on thursday, november 7 if you have a question or concern, please submit a help ticket below and our team will respond as soon as we can. thank you! help ticket the isss office does not respond to emails. instead, we use a help ticket system to send and receive messages.
Cosine Similarity: 0.6412

Top-3 Text: complete all sections of the application as per the directions provided. ensure that you obtain electroni

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8774, Recall: 0.8311, F1: 0.8536

Processing FAQ #3
Query: How do I request an I-20?
Retrieval Metric (Mean Cosine Similarity): 0.6448
Top Retrieved Results with Cosine Similarities:
Top-1 Text: step 1: request your OPT I-20 students must first request a special OPT I-20 from the isss office. applications submitted to uscis without an OPT I-20 will be denied. students must submit a complete OPT request in the isss portal. first, students choose an OPT start date.
Cosine Similarity: 0.6287

Top-2 Text: how to access your I-20 how to access your signed electronic umbc form i-20: request your I-20 in the isss portal the umbcs isss team will contact you via email when your I-20 is ready. it will be attached to your original request under the documents tab.
Cosine Similarity: 0.7139

Top-3 Text: requesting an I-20 in the isss portal once admitted, you will receive an email from our office directing you to complete a new student request application.
Cosine Similarity

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8621, Recall: 0.8718, F1: 0.8669

Processing FAQ #4
Query: What should I do if my visa expires while in the U.S.?
Retrieval Metric (Mean Cosine Similarity): 0.4996
Top Retrieved Results with Cosine Similarities:
Top-1 Text: if your visa is expired and you are outside of the us, you must visit a US embassy to get a new f-1 visa before returning to the us. to renew your visa, you will actually apply for a new one, following the same process as the first time you applied for an f-1 visa.
Cosine Similarity: 0.6440

Top-2 Text: if you renew your passport, but your visa stamp is in the old passport, you can carry both passports with you until the visa stamp expires and you get a new visa in the new passport.
Cosine Similarity: 0.4069

Top-3 Text: visa stamp check your f-1 visa stamp in your passport to make sure it will still be valid when you are ready to return to the us. if you need to renew your visa, please read more below. i-20: your I-20 needs to be valid, and

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8714, Recall: 0.8635, F1: 0.8674

Processing FAQ #5
Query: What is SEVIS, and how do I pay the SEVIS fee?
Retrieval Metric (Mean Cosine Similarity): 0.2753
Top Retrieved Results with Cosine Similarities:
Top-1 Text: 2. after you decide which school you will attend and have an I-20 from that school, you will need to pay the i-901 sevis fee for that sevis id found a the top right of the i-20. this is a one-time fee, which is used to maintain the immigration database that manages international student and scholar information.
Cosine Similarity: 0.5324

Top-2 Text: please note you will need to take your sevis fee receipt to your visa appointment. 3. next, complete a ds-160 visa application form, the ds-160 has a US $185 application fee, which you often cannot pay until you actually schedule your visa appointment. each embassy handles this a bit differently.
Cosine Similarity: 0.5255

Top-3 Text: sevp portal help the sevp portal is a tool students use to report thei

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8600, Recall: 0.8449, F1: 0.8524

Processing FAQ #6
Query: What are the full-time enrollment requirements?
Retrieval Metric (Mean Cosine Similarity): 0.5823
Top Retrieved Results with Cosine Similarities:
Top-1 Text: enrollment requirements: must enroll in half of full-time enrollment at least 5 credits for graduate students, at least 6 credits for undergraduate students requirements to be approved: email or letter from course instructor or academic advisor, recommending that the student be allowed to drop the class for one of the following reasons: improper course placement this option is appropriate for students who were advised inappropriately and have taken classes out of sequence, or who were inappropriately allowed to take a course for which they did not have the prerequisite knowledge to be successful.
Cosine Similarity: 0.5851

Top-2 Text: possible exceptions to full-time enrollment requirements the immigration language for receiving permission to enrol

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8902, Recall: 0.9127, F1: 0.9013

Processing FAQ #7
Query: Can I take a Reduced Course Load (RCL)?
Retrieval Metric (Mean Cosine Similarity): 0.4322
Top Retrieved Results with Cosine Similarities:
Top-1 Text: possible exceptions to full-time enrollment requirements the immigration language for receiving permission to enroll less than full-time is a reduced course load rcl. any rcl must be approved in advance with isss, before the semester begins in which you plan to be enrolled part-time.
Cosine Similarity: 0.6000

Top-2 Text: 2. medical reduced course load rcl duration: approved one semester at a time, can be granted until the deadline to drop classes during a semester limitations: up to 12 months cumulative during one degree program enrollment requirements: enrollment can be part-time or not at all requirements to be approved: you must provide a note from your doctor indicating their recommendation that you do not study full-time this semester based on a medi

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8569, Recall: 0.8429, F1: 0.8498

Processing FAQ #8
Query: Can I take online classes?
Retrieval Metric (Mean Cosine Similarity): 0.6578
Top Retrieved Results with Cosine Similarities:
Top-1 Text: online courses can be taken as much as you like in summer or winter sessions, when enrollment is not required, unless you plan to complete your academic program during a summer term, in which case you must follow all full-time enrollment rules and are only permitted to take at most one fully-online course.
Cosine Similarity: 0.7331

Top-2 Text: for example, an undergraduate could take one online course for 3 credits, 3 regular courses for 9 credits, and thats the required 12. beyond the required 12, an additional online course could be taken.
Cosine Similarity: 0.6467

Top-3 Text: undergraduate students- undergrads are required to enroll in a minimum of 12 credits per semester. one class is typically 3 or 4 credits. online coursework immigration regulations allow for o

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8966, Recall: 0.8451, F1: 0.8701

Processing FAQ #9
Query: What is CPT, and how do I apply?
Retrieval Metric (Mean Cosine Similarity): 0.3409
Top Retrieved Results with Cosine Similarities:
Top-1 Text: cpt: off-campus work authorization curricular practical training cpt allows f-1 international students to work off-campus in paid positions within the us. cpt requires an application completed by the student, which can be submitted via the isss portal. once approved, isss will add cpt work authorization to the students immigration record, and issue a new form I-20 noting the work authorization.
Cosine Similarity: 0.5159

Top-2 Text: cpt approval is granted per semester if a position continues to the next semester, students will need to request cpt again for the next term. students are eligible to start working from the date they are approved by isss. isss cannot approve cpt for dates that have passed. please plan ahead and apply early!
Cosine Similarity: 0.5171



Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8611, Recall: 0.8820, F1: 0.8714

Processing FAQ #10
Query: What is OPT, and when can I apply?
Retrieval Metric (Mean Cosine Similarity): 0.3850
Top Retrieved Results with Cosine Similarities:
Top-1 Text: opt: post-graduation work authorization students who plan on working in a paid position after completing their program must receive work authorization. OPT optional practical training is a type of work permission available for f-1 international students who finish their program requirements. OPT is approved by u.s. citizen and immigration services uscis, with support from the university isss office.
Cosine Similarity: 0.7345

Top-2 Text: OPT is granted only once per education level. once approved by uscis: students are approved for 1 year of work authorization. students cannot begin working before the start date on their employment authorization document ead card. students must work a combined total of at least 20 hoursweek. positions can be unpaidpaid and wit

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8830, Recall: 0.8403, F1: 0.8611

Processing FAQ #11
Query: What is STEM OPT, and how is it different?
Retrieval Metric (Mean Cosine Similarity): 0.3736
Top Retrieved Results with Cosine Similarities:
Top-1 Text: eligibility to be eligible for stem opt, students must meet following criteria: currently in active f-1 status currently participating in one year of regular OPT completed all program requirements for a stem-eligible degree pursuing work with an employer enrolled and participating in the e-verify program have not been previously granted optstem OPT at the current level of study application process the application process for stem OPT is very similar to applying for regular opt.
Cosine Similarity: 0.5924

Top-2 Text: stem OPT extensions are approved by u.s. citizen and immigration services uscis, but also require support from the university isss office. once approved for stem opt, students can work an additional 24 months in paid positions after their y

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8598, Recall: 0.8664, F1: 0.8631

Processing FAQ #12
Query: What documents do I need to travel internationally?
Retrieval Metric (Mean Cosine Similarity): 0.3857
Top Retrieved Results with Cosine Similarities:
Top-1 Text: you need these documents, in addition to those listed above, if you are returning to the US after your OPT start date: valid ead card proof of employment a job offer letter or letter of employment verification from your employer I-20 updated with your employer information on the third page, and a travel signature that will be valid on your return.
Cosine Similarity: 0.4692

Top-2 Text: travel abroad and visa renewal during OPT travelling abroad during OPT or stem OPT you can absolutely travel abroad during opt, but the there are a few additional pieces to prepare than the typical documents required for current students.
Cosine Similarity: 0.2923

Top-3 Text: in addition to the typical documents, please be sure to carry the following additional

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8574, Recall: 0.8568, F1: 0.8571

Processing FAQ #13
Query: Can I travel with an expired visa?
Retrieval Metric (Mean Cosine Similarity): 0.4431
Top Retrieved Results with Cosine Similarities:
Top-1 Text: if your visa is expired and you are outside of the us, you must visit a US embassy to get a new f-1 visa before returning to the us. to renew your visa, you will actually apply for a new one, following the same process as the first time you applied for an f-1 visa.
Cosine Similarity: 0.5707

Top-2 Text: short trips to canada, mexico or some caribbean islands with an expired f-1 visa automatic visa revalidation students who have a valid I-20 and are in good f-1 status, but whose visa has expired, are able to take short trips less than 30 days to countries neighboring the US to canada, mexico and some caribbean islands.
Cosine Similarity: 0.5128

Top-3 Text: if you renew your passport, but your visa stamp is in the old passport, you can carry both passports with

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8844, Recall: 0.8937, F1: 0.8890

Processing FAQ #14
Query: What is Automatic Visa Revalidation?
Retrieval Metric (Mean Cosine Similarity): 0.3615
Top Retrieved Results with Cosine Similarities:
Top-1 Text: please read more about automatic visa revalidation, and see the list of eligible destinations please also note this rule is only for re-entry to the us, and does not play a role in allowing you to enter the country you plan to visit.
Cosine Similarity: 0.6285

Top-2 Text: special approval is not required from isss this rcl is automatic if your I-20 shows it is your last semester.
Cosine Similarity: 0.2937

Top-3 Text: you can use the basic instructions to schedule a visa appointment you may already be familiar.
Cosine Similarity: 0.5160

Top-4 Text: changes are required to be captured within 10 days in the portal.
Cosine Similarity: 0.4405

Top-5 Text: OPT cap gap and h-1b start date of october 1 even after it is approved, h-1b status is not actually active 

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8611, Recall: 0.8371, F1: 0.8490

Processing FAQ #15
Query: What are my options after graduation?
Retrieval Metric (Mean Cosine Similarity): 0.2733
Top Retrieved Results with Cosine Similarities:
Top-1 Text: youll need to make it clear to the official interviewing you that your goals in the US are still ultimately academic in nature, and that you still intend to return home after you complete your academic goals.
Cosine Similarity: 0.3924

Top-2 Text: f-1 immigration rules work options for f-1 students billing and health insurance about US and umbc culture programs and services available on campus spring 2025 orientation international student orientation takes part in 2 sessions, both are mandatory!
Cosine Similarity: 0.1999

Top-3 Text: please be sure to work with your academic advisor to choose which courses you will take at another school, and to plan in advance to make sure they will transfer back to umbc and be applied to your degree in the way you are hop

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8305, Recall: 0.8385, F1: 0.8344

Processing FAQ #16
Query: What is the grace period after completing my program?
Retrieval Metric (Mean Cosine Similarity): 0.4221
Top Retrieved Results with Cosine Similarities:
Top-1 Text: students can request another OPT I-20 and submit a new i-765 application if their 60 day grace period has not ended. contact the isss office about any questions or concerns. approval the approval notice will list the approved OPT start and end dates. however, students cannot begin working until their physical ead card arrives in the mail.
Cosine Similarity: 0.4314

Top-2 Text: please note, you must work with us to get your new I-20 before your current I-20 end date, plus the 60 day grace period.
Cosine Similarity: 0.4006

Top-3 Text: I-20 program extension the program dates specified in the program of study section of your form I-20 reflect the typical duration for completing your degree program. however, certain circumstances may require ad

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8689, Recall: 0.8647, F1: 0.8668

Processing FAQ #17
Query: How do I access the ISSS Portal?
Retrieval Metric (Mean Cosine Similarity): 0.5114
Top Retrieved Results with Cosine Similarities:
Top-1 Text: when a student receives an OPT recommendation from our office, we send out an email with instructions on how to set up an isss portal account using a personal email address. students on OPT will need to login using their personal email address to access the isss portal.
Cosine Similarity: 0.5625

Top-2 Text: how to access your I-20 how to access your signed electronic umbc form i-20: request your I-20 in the isss portal the umbcs isss team will contact you via email when your I-20 is ready. it will be attached to your original request under the documents tab.
Cosine Similarity: 0.5057

Top-3 Text: if you have entered the address perfectly and it is not accepted, please contact us on isss website your name, birthday, or other personal details that you can see in 

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8662, Recall: 0.8626, F1: 0.8644

Processing FAQ #18
Query: What should I do if I lose my passport or I-20?
Retrieval Metric (Mean Cosine Similarity): 0.4697
Top Retrieved Results with Cosine Similarities:
Top-1 Text: you will need to print the signed i-20, sign it in ink with a blue pen, and then take that physical document to your visa appointment. you will carry this document with you through immigration as well. if i defer my admission to a new semester, do i need a new i-20?
Cosine Similarity: 0.5359

Top-2 Text: I-20 please make sure your I-20 has: your current employer information updated on the employment page a travel signature, found at the bottom of the 2nd page, that will not be more than 6 months old as of your return date to the us. request a travel signature or updated I-20 with employer information:.
Cosine Similarity: 0.5178

Top-3 Text: copy of new I-20 issued by international advisor recommending economic hardship. checkmoney order for $410 o

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8522, Recall: 0.8663, F1: 0.8592

Processing FAQ #19
Query: How do I get a Social Security Number (SSN)?
Retrieval Metric (Mean Cosine Similarity): 0.3425
Top Retrieved Results with Cosine Similarities:
Top-1 Text: ssn are issued by the united states social security administration ssa. to start the process, students should submit a ssn letter request in the isss portal. after a student has an I-20 with cpt approval and a receipt from the ssa, they present these documents to their employers human resources department.
Cosine Similarity: 0.6216

Top-2 Text: applying for a social security number ssn an ssn is required of any individual in order to get paid in the us.
Cosine Similarity: 0.6870

Top-3 Text: when you receive your first job offer, you will need to apply for a social security number ssn if you dont already have one. an ssn is required to be paid in the us. to learn more about what an ssn is and how to apply, please review the information in this websit

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8718, Recall: 0.8542, F1: 0.8629

Processing FAQ #20
Query: What should I do if my I-20 is about to expire?
Retrieval Metric (Mean Cosine Similarity): 0.5461
Top Retrieved Results with Cosine Similarities:
Top-1 Text: deadline please note that your form I-20 can only be extended up until the current end date. after your current I-20 end date passes, we can no longer extend the date into the future. it is important to plan ahead for this deadline.
Cosine Similarity: 0.6237

Top-2 Text: please note, you must work with us to get your new I-20 before your current I-20 end date, plus the 60 day grace period.
Cosine Similarity: 0.6837

Top-3 Text: you will need to print the signed i-20, sign it in ink with a blue pen, and then take that physical document to your visa appointment. you will carry this document with you through immigration as well. if i defer my admission to a new semester, do i need a new i-20?
Cosine Similarity: 0.5200

Top-4 Text: you may receive an 

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore - Precision: 0.8595, Recall: 0.8558, F1: 0.8577

Mean Scores Across All 20 FAQs:
Mean Retrieval Score (Cosine Similarity): 0.4424
Mean BERTScore Precision: 0.8683
Mean BERTScore Recall: 0.8609
Mean BERTScore F1: 0.8644

Mean Scores:
{'mean_retrieval_score': 0.44238487, 'mean_precision': 0.868302509188652, 'mean_recall': 0.8608709454536438, 'mean_f1': 0.8644487887620926}
