# **📝 MCQ_Test**

When designing MCQ exam for a math syllabus, there are several global standards and best practices to ensure that the questions accurately measure the understanding of every concept and maintain fairness for all students. Here’s a structured approach:

### 1. **Blueprint/Content Outline**
   - **Define Learning Objectives**: Break down the syllabus into key topics and objectives. Ensure questions cover all key areas, ranging from foundational to advanced concepts.
   - **Weightage of Topics**: Distribute questions based on the importance of each topic. For example, if algebra covers 40% of the syllabus, 20 out of 50 questions should focus on algebra.

### 2. **Cognitive Levels (Bloom’s Taxonomy)**
   Ensure that the questions assess different cognitive levels:
   - **Knowledge/Recall** (e.g., definitions, basic formulae): 20-30%
   - **Comprehension/Understanding** (e.g., interpreting graphs, understanding relationships): 20-30%
   - **Application** (e.g., solving problems using formulas or concepts): 30-40%
   - **Analysis/Evaluation** (e.g., analyzing complex problems, critical thinking): 10-15%

   This ensures that the exam measures both basic understanding and higher-order thinking skills.

### 3. **Question Construction**
   - **Clear and Concise Wording**: Avoid ambiguous language or unnecessarily complex phrasing.
   - **Single Concept per Question**: Each question should focus on one key concept to avoid confusion.
   - **Avoid Tricky Questions**: The goal is to assess knowledge, not to confuse students with tricky or misleading questions.

### 4. **Plausible Distractors (Wrong Options)**
   - **Plausible Distractors**: Incorrect answers should be reasonable but clearly wrong if the student knows the concept. This challenges students without being unfair.
   - **Avoid “All of the Above” or “None of the Above”**: These options can sometimes skew the assessment if used improperly. Use them sparingly.

### 5. **Difficulty Levels**
   - **Balanced Difficulty**: Distribute questions across easy (30%), moderate (50%), and difficult (20%) levels. This helps in differentiating student performance while ensuring that most students can answer a significant portion of the exam.

### 6. **Randomization and Fairness**
   - **Avoid Patterned Answers**: Ensure that the correct answers are randomly distributed (e.g., avoid too many consecutive "B" answers).
   - **Shuffling**: If delivering the exam digitally, shuffle questions to minimize the chances of students copying answers.

### 7. **Cultural and Gender Sensitivity**
   - Ensure that the content of the questions does not favor any cultural or gender group. Math questions should remain neutral and universally applicable.

### 8. **Time Management**
   - **Appropriate Timing**: Each question should be solvable within a set time frame, based on the complexity of the syllabus. Typically, allow 1-2 minutes per question depending on the difficulty, making sure students can reasonably complete the exam within the allotted time.

### 9. **Review and Validation**
   - **Peer Review**: Have other teachers or subject matter experts review the questions for clarity, relevance, and fairness.
   - **Pilot Testing**: Consider running a small pilot test or review with a group of students to identify any issues with question clarity or difficulty distribution.

### 10. **Scoring and Feedback**
   - **Equal Weighting**: Ensure each question carries equal marks unless specific questions are designed to carry more weight.
   - **Detailed Feedback**: Provide feedback on wrong answers, explaining the correct solution or directing students to relevant material to enhance learning post-exam.

These standards help create a comprehensive, fair, and effective MCQ exam that assesses students' mastery of the math syllabus and ensures an equitable testing environment.

In [None]:
!pip install openai
!pip install PyPDF2
!pip install -U -q "google-generativeai>=0.7.2"

In [152]:
import torch
import PyPDF2
import re
import time
from openai import OpenAI
import html
from IPython.display import display, Markdown, HTML
import google.generativeai as genai
from IPython.display import Markdown
from google.colab import userdata

In [153]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Markdown(f"##Current used device: {device}")

##Current used device: cpu

In [154]:
PDF_PATH        = '/content/calculus_contents.pdf'
GPT_KEY_PATH    = '/content/key.txt'
GEMINI_KEY_PATH = '/content/key2.txt'
EXAM_PATH       = '/content/exam.txt'

In [155]:
def extract_pdf_text(file_path):
    with open(file_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        text = ''
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            text += page.extract_text()

    return text


# Extract the text from the PDF
pdf_content = extract_pdf_text(PDF_PATH)

In [157]:
with open(EXAM_PATH, 'r') as file:
    EXM = file.read()

with open(GPT_KEY_PATH, 'r') as file:
    GPT = file.read()

with open(GEMINI_KEY_PATH, 'r') as file:
    GEM = file.read()

GPT_API_KEY    = f"{GPT}"
GEMINI_API_KEY = f"{GEM}"
exam           = f"{EXM}"

In [158]:
client = OpenAI(
    api_key = GPT_API_KEY
)

genai.configure(api_key = GEMINI_API_KEY)

In [159]:
GPT_MODEL = "gpt-4o"
GEMINI_MODEL = "gemini-1.5-pro"

MAX_TOKENS = 3000
TEMP = 0.2

In [160]:
system_instructions = """
          You are an expert AI assistant with advanced reasoning capabilities. Your task is to provide detailed, step-by-step explanations of your thought process. For each step:

          1. Provide a clear, concise title describing the current reasoning phase.
          2. Elaborate on your thought process in the content section.
          3. Decide whether to continue reasoning or provide a final answer.

          Key Instructions:
          - Employ at least 5 distinct reasoning steps.
          - Acknowledge your limitations as an AI and explicitly state what you can and cannot do.
          - Actively explore and evaluate alternative answers or approaches.
          - Critically assess your own reasoning; identify potential flaws or biases.
          - When re-examining, employ a fundamentally different approach or perspective.
          - Utilize at least 3 diverse methods to derive or verify your answer.
          - Incorporate relevant domain knowledge and best practices in your reasoning.
          - Quantify certainty levels for each step and the final conclusion when applicable.
          - Consider potential edge cases or exceptions to your reasoning.
          - Provide clear justifications for eliminating alternative hypotheses.

          Don't include the thougts steps with your response

          """

assistant_instructions = "I will think step by step following my instructions, starting at the beginning after decomposing the problem."

In [161]:
def generate_response_gpt(model, tokens, temp, prompt):
  return client.chat.completions.create(

      model = model,
      max_tokens = tokens,
      temperature = temp,

      messages=[
        {
          "role": "system",
          "content": system_instructions
        },

        {
          "role" : "user",
          "content" : prompt
        },

        {
          "role" : "assistant",
          "content" : system_instructions
        }
      ]
  )

In [162]:
def generate_response_gemini(tokens, temp):
  return genai.GenerativeModel(
      model_name = GEMINI_MODEL,
      generation_config = {
          "temperature": temp,
          # "tokens": tokens
      },
      system_instruction = system_instructions
  )

In [163]:
syllabus_prompt = f"rewrite this text with meaningful data: {pdf_content}"

syllabus = generate_response_gpt(GPT_MODEL, MAX_TOKENS, TEMP, syllabus_prompt).choices[0].message.content

In [164]:
n_questions = 10

prompt = f"""
As an expert mathematics teacher you will be given MCQ exam consists of {n_questions} number of questions and their choices, your task is to evaluate the questions of it step by step and using advanced reasoning,
Your final evaluation will be out of 10 points score based on this criteria:

1. Blueprint/Content Outline
    Define Learning Objectives: Break down the syllabus into key topics and objectives. Ensure questions cover all key areas, ranging from foundational to advanced concepts.
    Weightage of Topics: Distribute questions based on the importance of each topic. For example, if algebra covers 40% of the syllabus, 20 out of 50 questions should focus on algebra.

2. Cognitive Levels (Bloom’s Taxonomy)
  Ensure that the questions assess different cognitive levels:
    Knowledge/Recall (e.g., definitions, basic formulae): 20-30%
    Comprehension/Understanding (e.g., interpreting graphs, understanding relationships): 20-30%
    Application (e.g., solving problems using formulas or concepts): 30-40%
    Analysis/Evaluation (e.g., analyzing complex problems, critical thinking): 10-15%
  ensures that the exam measures both basic understanding and higher-order thinking skills.

3. Question Construction
    Clear and Concise Wording: Avoid ambiguous language or unnecessarily complex phrasing.
    Single Concept per Question: Each question should focus on one key concept to avoid confusion.
    Avoid Tricky Questions: The goal is to assess knowledge, not to confuse students with tricky or misleading questions.

4. Plausible Distractors (Wrong Options)
    Plausible Distractors: Incorrect answers should be reasonable but clearly wrong if the student knows the concept. This challenges students without being unfair.
    Avoiding “All of the Above” or “None of the Above”: These options can sometimes skew the assessment if used improperly. Use them sparingly.

5. Difficulty Levels
    Balanced Difficulty: Distribute questions across easy (30%), moderate (50%), and difficult (20%) levels. This helps in differentiating student performance while ensuring that most students can answer a significant portion of the exam.

6. Randomization and Fairness
    Avoid Patterned Answers: Ensure that the correct answers are randomly distributed (e.g., avoid too many consecutive "B" answers).
    Shuffling: If delivering the exam digitally, shuffle questions to minimize the chances of students copying answers.

7. Cultural and Gender Sensitivity
    Ensure that the content of the questions does not favor any cultural or gender group. Math questions should remain neutral and universally applicable.

8. Time Management
    Appropriate Timing: Each question should be solvable within a set time frame, based on the complexity of the syllabus. Typically, allow 1-2 minutes per question depending on the difficulty, making sure students can reasonably complete the exam within the allotted time.

Here's the syllabus: {syllabus}

Here's the exam to be evaluated : {exam}

add a highlighted final response number showing the final score from 0 to 10

"""

In [165]:
# Generate content using the GPT model
gpt_response = generate_response_gpt(GPT_MODEL, MAX_TOKENS, TEMP, prompt).choices[0].message.content

In [166]:
# Generate content using the Gemini model
gemini_response = generate_response_gemini(MAX_TOKENS, TEMP).generate_content(prompt).text

In [167]:
# Display the GPT response
display(Markdown("# GPT Evaluation:"))
display(Markdown(gpt_response))

# Display the Gemini response
display(Markdown("# Gemini Evaluation:"))
display(Markdown(gemini_response))

# GPT Evaluation:

**1. Blueprint/Content Outline Evaluation**

The exam consists of 10 questions, and the syllabus covers four main areas: Differentiation and Its Applications, The Calculus of Exponential and Logarithmic Functions, Behaviors of Functions and Curve Sketching, and The Definite Integral and Its Applications. Each area is represented in the exam, ensuring coverage of key topics. However, the distribution is not perfectly aligned with the syllabus weightage. For example, differentiation has a significant presence, but integration and curve sketching are underrepresented. This could lead to an imbalance in assessing students' understanding across all topics.

**2. Cognitive Levels (Bloom’s Taxonomy) Evaluation**

The exam questions cover various cognitive levels. Questions 1, 2, 3, 6, and 7 assess knowledge and comprehension, focusing on basic differentiation and integration skills. Questions 4, 5, 8, and 9 require application and analysis, involving problem-solving and critical thinking. However, there is a lack of questions that challenge students at the evaluation level, such as analyzing complex problems or making judgments based on criteria. This limits the exam's ability to assess higher-order thinking skills comprehensively.

**3. Question Construction Evaluation**

The questions are generally clear and concise, with each focusing on a single concept. For example, Question 1 asks for the derivative of a trigonometric function, and Question 5 involves related rates. However, some questions, like Question 8, could be more explicit in their wording to avoid confusion. Additionally, there are no tricky questions, which aligns with best practices in question construction.

**4. Plausible Distractors Evaluation**

The distractors in the exam are plausible and challenge students' understanding without being unfair. For instance, in Question 1, the distractors are common misconceptions about derivatives of trigonometric functions. However, the exam could benefit from avoiding options like "None of the Above" or "All of the Above," which are not present but should be noted for future exams.

**5. Difficulty Levels Evaluation**

The exam has a balanced difficulty distribution, with questions ranging from easy (e.g., Question 3) to moderate (e.g., Question 6) and difficult (e.g., Question 9). This distribution helps differentiate student performance while ensuring that most students can answer a significant portion of the exam. However, the exam could include more challenging questions to better assess students' mastery of advanced concepts.

**6. Randomization and Fairness Evaluation**

The correct answers are randomly distributed, avoiding any discernible patterns. This reduces the likelihood of students guessing answers based on patterns. Additionally, the exam is fair, with no cultural or gender biases present in the questions. The mathematical content is universally applicable, ensuring fairness for all students.

**7. Time Management Evaluation**

The exam is designed to be completed within a reasonable time frame, with each question solvable within 1-2 minutes. This allows students to complete the exam within the allotted time, assuming they have a good understanding of the material. However, the time allocation could be adjusted for more complex questions to ensure students have enough time to think critically.

**Final Score Evaluation**

Based on the evaluation criteria, the exam scores 8 out of 10 points. The exam effectively covers the syllabus and assesses various cognitive levels, but it could improve in distributing questions according to topic weightage and including more higher-order thinking questions. Additionally, some questions could be more explicit in their wording to avoid confusion. Overall, the exam is well-constructed and fair, providing a good assessment of students' understanding of calculus concepts.

**Final Score: 8/10**

# Gemini Evaluation:

## Final Response: The Calculus MCQ exam receives a score of **7/10**. 

While the exam demonstrates some strengths, there are areas for improvement to enhance its alignment with the syllabus and assessment best practices. 
