# Answer:

Evaluating Question: How can an international student on an F-1 visa maintain their status during graduate studies?

Generated Answer: <|im_sep|>

assistant: To maintain F-1 visa status during graduate studies, an international student must adhere to several key requirements:

1. **Full-time enrollment**: Students must remain enrolled in a full-time course load as determined by their academic institution. This usually means taking the minimum number of credits required for their degree program.

2. **Academic progress**: Maintain satisfactory academic standing. This means staying within the school's standards for grade point average, and completing coursework in a timely manner towards the degree.

3. **Coursework relevance**: Ensure that the courses taken align with the student's field of study. This maintains alignment with the original visa purpose and avoids issues with maintaining status.

4. **On-campus employment**: Follow guidelines for on-campus employment, which includes:
   - Not displacing U.S. citizens or lawful permanent residents (LPRs) from their jobs.
   - Employments must be for the school or for an educationally affiliated organization. Examples include school
Evaluation Result: Question: How can an international student on an F-1 visa maintain their status during graduate studies?

Expected Answer: To maintain F-1 status during graduate studies, an international student must enroll as a full-time student, make normal progress toward a degree, and comply with all rules and regulations set by their educational institution and the U.S. government. This includes not engaging in unauthorized employment and keeping documentation up to date.

Generated Answer: <|im_sep|>

assistant: To maintain F-1 visa status during graduate studies, an international student must adhere to several key requirements:

1. **Full-time enrollment**: Students must remain enrolled in a full-time course load as determined by their academic institution. This usually means taking the minimum number of credits required for their degree program.

2. **Academic progress**: Maintain satisfactory academic standing. This means staying within the school's standards for grade point average, and completing coursework in a timely manner towards the degree.

3. **Coursework relevance**: Ensure that the courses taken align with the student's field of study. This maintains alignment with the original visa purpose and avoids issues with maintaining status.

4. **On-campus employment**: Follow guidelines for on-campus employment, which includes:
   - Not displacing U.S. citizens or lawful permanent residents (LPRs) from their jobs.
   - Employments must be for the school or for an educationally affiliated organization. Examples include school

Please evaluate the generated answer compared to the expected answer. Provide a score between 1 (poor) and 5 (excellent) along with a brief justification for your score.

Score and Justification: Excellent 5/5
The generated answer is comprehensive and answers the question accurately by listing out the requirements for maintaining F-1 visa status during graduate studies while providing additional context on each requirement.

- It clearly highlights the importance of full-time enrollment, maintaining satisfactory academic standing, ensuring coursework relevance, and following guidelines for on-campus employment.
- The answer also reinforces the need for alignment with the original visa purpose to avoid issues with maintaining status
--------------------------------------------------------------------------------

Evaluating Question: What is Curricular Practical Training (CPT) and how does it benefit F-1 students?


# Half precision

In [1]:
import os
import torch
import json
import re
import csv
from transformers import AutoModelForCausalLM, AutoTokenizer
from dotenv import load_dotenv
from RAG_With_QWEN import RAG  # Import your RAG class

# Method to get the API key from the .env file
def get_api_key(api_name):
    env_path = "../.dummy_env"  # Adjust the path as needed
    load_dotenv(env_path)  # Load the environment variables
    return os.getenv(api_name)

# -------------------------------
# Setup Evaluator Model (Mistral)
# -------------------------------
CUSTOM_CACHE_DIR = "/media/volume/vol-VisaWise/models/mistral_cache"

evaluator_model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    cache_dir=CUSTOM_CACHE_DIR,
    torch_dtype=torch.float16,
    token=get_api_key('HF_GAURI')
)
device = "cuda" if torch.cuda.is_available() else "cpu"
evaluator_model.to(device)
evaluator_model.eval()  # Set model to evaluation mode

evaluator_tokenizer = AutoTokenizer.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    cache_dir=CUSTOM_CACHE_DIR,
    token=get_api_key('HF_GAURI')
)

# -------------------------------
# Load QA Dataset from CSV
# -------------------------------
def load_qa_dataset(csv_file):
    qa_dataset = []
    with open(csv_file, newline='', encoding="utf-8") as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            qa_dataset.append({
                "question": row["Question"],
                "expected_answer": row["Expected Answer"]
            })
    return qa_dataset[-8:]  # Keep only the last 8 rows

qa_dataset = load_qa_dataset("RAG_Evaluation_Dataset.csv")

# -------------------------------
# Define Evaluation Function
# -------------------------------
def evaluate_answer(question, generated_answer, expected_answer):
    """
    Uses the evaluator model to compare the generated answer with the expected answer.
    Extracts the full justification after "Score:" and "Justification:".
    """
    eval_prompt = (
        f"Question: {question}\n\n"
        f"Expected Answer: {expected_answer}\n\n"
        f"Generated Answer: {generated_answer}\n\n"
        "Evaluate the generated answer **strictly** by comparing it **only** to the expected answer provided above. "
        "**Do not use any external knowledge, assumptions, or personal judgment about what the answer should be.** "
        "Your evaluation must be based **solely** on whether the generated answer conveys the same meaning, tone, and intent as the expected answer, "
        "even if the wording is different.\n\n"
        "**Key Criteria:**\n"
        "- If the expected answer **refuses** to answer a question, the generated answer **must also refuse** in a similar way.\n"
        "- If the expected answer provides a redirection or context, the generated answer **should match the same intent and purpose.**\n"
        "- Minor differences in wording **should not** be penalized if the meaning remains the same.\n"
        "- Only deduct points if the generated answer changes the meaning, adds incorrect details, or omits key aspects.\n\n"
        "Assign a score between **1 (poor) and 5 (excellent)** based **only** on this comparison.\n\n"
        "**Response Format:**\n"
        "Score: [1-5]\n"
        "Justification: [Detailed reason for the score]\n\n"
        "Example:\n"
        "Score: 4\n"
        "Justification: The generated answer is mostly correct but slightly different in tone.\n\n"
        "Now, provide the evaluation:\n"
    )

    # Tokenize and move tokens to GPU
    tokens = evaluator_tokenizer.encode(eval_prompt, return_tensors="pt").to(device)
    
    # Disable gradient calculations and use AMP for efficiency
    with torch.no_grad():
        with torch.cuda.amp.autocast():
            generated_ids = evaluator_model.generate(tokens, max_new_tokens=100, do_sample=True)
    
    evaluation_text = evaluator_tokenizer.decode(generated_ids[0].tolist(), skip_special_tokens=True).strip()

    # Extract score
    score_match = re.search(r"Score:\s*(\d+)", evaluation_text, re.IGNORECASE)
    score = int(score_match.group(1)) if score_match else None

    # Extract justification
    justification_match = re.search(r"Justification:\s*(.*)", evaluation_text, re.IGNORECASE)
    justification = justification_match.group(1).strip() if justification_match else "?? Not found"

    return score, justification  # Return both score and justification separately

# -------------------------------
# Main Evaluation Routine
# -------------------------------
def main():
    rag_model = RAG()
    evaluations = []  
    
    for qa in qa_dataset:
        question = qa["question"]
        expected_answer = qa["expected_answer"]
        
        # Generate answer using your RAG model
        generated_answer = rag_model.generate_answer(question)
        
        # Evaluate the generated answer
        score, justification = evaluate_answer(question, generated_answer, expected_answer)
        
        # Print output in a structured format (only once)
        print("\n" + "="*80)
        print(f"**Question:** {question}")
        print(f"**Expected Answer:** {expected_answer}")
        print(f"**Generated Answer:** {generated_answer}")
        print(f"**Score:** {score if score is not None else '?? Not found'}")
        print(f"**Justification:** {justification}")
        print("="*80)

        # Store results for JSON output
        evaluations.append({
            "question": question,
            "expected_answer": expected_answer,
            "generated_answer": generated_answer,
            "score": score,
            "justification": justification
        })
    
    # Save evaluation results to a JSON file
    with open("Evaluation_Results_Without_Finetuning.json", "w") as outfile:
        json.dump(evaluations, outfile, indent=4)
    
    print("\nEvaluation completed. Results saved to test_results2.json.")

if __name__ == "__main__":
    main()


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Device set to use cuda:0


Using device: cuda


  with torch.cuda.amp.autocast():
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.



**Question:** What is the capital of Mongolia?
**Expected Answer:** Oh, that's an interesting question! While I don't have information on that right now, I'd be delighted to help you with anything related to CPT, OPT, or maintaining your F-1 visa status. Is there anything I can assist you with in those areas?
**Generated Answer:** This question is outside the scope of F-1 visa regulations, CPT, or OPT.
**Score:** 4
**Justification:** Detailed reason for the score


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



**Question:** Can you write a poem about a cat's adventure in space?
**Expected Answer:** That sounds like a fun creative project! Unfortunately, I'm not able to generate poems, but I can certainly provide detailed information and guidance on CPT, OPT, and F-1 visa status maintenance. Perhaps we could explore some of your questions on those topics?
**Generated Answer:** This question is outside the scope of F-1 visa regulations, CPT, or OPT.
**Score:** 4
**Justification:** Detailed reason for the score


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



**Question:** What is the current price of Bitcoin?
**Expected Answer:** I don't have access to real-time financial data, but I'd be happy to discuss how financial matters might impact your F-1 status, if you have any questions. Or, if you have questions regarding CPT or OPT, please ask!
**Generated Answer:** This question is outside the scope of F-1 visa regulations, CPT, or OPT.
**Score:** 4
**Justification:** Detailed reason for the score


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



**Question:** Tell me a recipe for chocolate chip cookies.
**Expected Answer:** That sounds delicious! While I can't provide recipes, I can definitely help you navigate the complexities of CPT, OPT, and F-1 visa status maintenance. Do you have any specific concerns or questions about those topics?
**Generated Answer:** This question is outside the scope of F-1 visa regulations, CPT, or OPT.
**Score:** 4
**Justification:** Detailed reason for the score


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



**Question:** Who won the 2023 World Series?
**Expected Answer:** I don't follow sports results, but I'm ready to assist you with any questions you have about CPT, OPT, and maintaining your F-1 visa status. How can I help you today?
**Generated Answer:** This question is outside the scope of F-1 visa regulations, CPT, or OPT.
**Score:** 4
**Justification:** Detailed reason for the score


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



**Question:** What are the symptoms of the common cold?
**Expected Answer:** I'm not a medical professional, so I can't give medical advice. However, if you have questions about how a medical situation might affect your F-1 status, or if you need help with CPT or OPT, please feel free to ask!
**Generated Answer:** This question is outside the scope of F-1 visa regulations, CPT, or OPT.
**Score:** 4
**Justification:** Detailed reason for the score


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



**Question:** What is the weather forecast for London tomorrow?
**Expected Answer:** I don't have access to real-time weather information, but I'm here to help with any questions you have about CPT, OPT, and F-1 visa status maintenance. What's on your mind?
**Generated Answer:** This question is outside the scope of F-1 visa regulations, CPT, or OPT.
**Score:** 4
**Justification:** Detailed reason for the score


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



**Question:** Explain the theory of relativity.
**Expected Answer:** That's a fascinating topic! While I don't specialize in theoretical physics, I can provide detailed explanations and guidance on CPT, OPT, and F-1 visa status maintenance. Would you like to discuss any of those areas?
**Generated Answer:** This question is outside the scope of F-1 visa regulations, CPT, or OPT.
**Score:** 4
**Justification:** Detailed reason for the score

Evaluation completed. Results saved to test_results2.json.


In [None]:
! pip install google.generativeai -q