# Gemini notebook

| Pedagogy Dimension | Metrics |
| --- | ----------- |
| Manage cognitive load | Stay on topic |
| Encourage active learning | Do not reveal the answer; guide towards the answer; promote active engagement |
| Deepen metacognition | Identify and address misconceptions |
| Motivate and stimulate curiosity | Communicate with positive tone; respond appropriately to explicit affect cues |
| Adapt to the learners’ goals and needs | Adapt to the learner’s level |

In [1]:
import os
import pickle
import re

DATA_DIR = "user-testing/data/"

def list_pickle_files(directory):
    return [f for f in os.listdir(directory) if f.endswith('.pkl')]

def load_conversation(filepath):
    with open(filepath, "rb") as f:
        data = pickle.load(f)
        return data
    
def read_questions_answers():
    # Split out True/False questions
    exam_questions_TF = []
    exam_answers_TF = []

    with open('exams/together2.tex', 'r', encoding='utf-8') as file:
        tex_content = file.read()
        questions, answers = extract_question_answer(tex_content)
        #print(f"Total Questions: {len(questions)}")
        #print(f"Total Answers: {len(answers)}")
        if len(questions) != len(answers):
            print("Warning: The number of questions and answers do not match!")
 
        #print()
        for idx, (q, a) in enumerate(zip(questions, answers), 1):
            #print(f"Question {idx}:\n{q}\n")
            #print(f"Answer {idx}:\n{a}\n")
            exam_questions_TF.append(q)
            exam_answers_TF.append(a)
    return exam_questions_TF, exam_answers_TF

def extract_question_answer(tex_content):
    # Extract content within the enumerate environment
    enum_match = re.search(r'\\begin{enumerate}(.*?)\\end{enumerate}', tex_content, re.DOTALL)
    if not enum_match:
        return [], []
    enum_content = enum_match.group(1)

    # Find all questions (\item ... \begin{solutionorbox})
    question_blocks = re.findall(
        r'\\item(.*?)(?=\\begin{solutionorbox})',
        enum_content, re.DOTALL
    )

    # Find all answers (\begin{solutionorbox} ... \end{solutionorbox})
    answer_blocks = re.findall(
        r'\\begin{solutionorbox}\[[^\]]*\]\s*(.*?)\\end{solutionorbox}',
        enum_content, re.DOTALL
    )
    questions = [q.strip() for q in question_blocks]
    answers = [a.strip() for a in answer_blocks]
    return questions, answers

def remove_latex_formatting(question: str) -> str:
    # Remove LaTeX bold formatting
    question = question.replace("\\textbf{always}", "*always*")  
    question = question.replace("\\textbf{Every}", "*Every*")
    question = question.replace("\\textbf{any}", "*any*")
    question = question.replace("\\textbf{rotation}", "*rotation*")
    question = question.replace("\\textbf{distinct}", "*distinct*")

    return question

In [2]:
from dotenv import load_dotenv
import os
from google import genai
from tqdm import tqdm
from collections import Counter

# Load environment variables from a .env file
load_dotenv()
GOOGLE_API = os.getenv('GOOGLE_API_KEY')
client = genai.Client(api_key=GOOGLE_API)
gemini_model = "gemini-2.5-flash"

o4_llm_state = pickle.load(open('exams/together2_llm_state.pkl', 'rb'))
exam_questions_TF, exam_answers_TF = read_questions_answers()

In [3]:
stay_on_topic = """
You are a Critic assessing a Tutor who is interacting with a Student.
The Tutor should attempt to keep the conversation focused on the topic
of the lesson, even when the Student says something off-topic.
As the Critic, you identify when the Tutor believes the Student is off-
topic. You are *not* responsible for determining whether the Student
is actually off-topic; only for determining whether the Tutor *
believes* the Student is off-topic.
You are also not responsible for judging whether the Tutor’s response
is correct, helpful, or safe; only for determining whether it
believes the Student is off-topic.
Your answer should always be "Yes" or "No".
Answer with "Yes" if the Tutor’s response indicates a belief that the
Student is off-topic.
Answer with "No" if the Tutor does not believe the Student is off-topic.

***new lesson***
Student: What is the shape of the Earth?
Tutor: That’s an interesting question, but not related to the lesson.
Request: Does the Tutor believe the Student is off-topic?
Critic: Yes
***new lesson***
Student: How many phosphates are in ATP?
Tutor: Three! ATP stands for Adenosine triphosphate (where the "tri-"
stands for 3 phosphates).
Request: Does the Tutor believe the Student is off-topic?
Critic: No
***new lesson***
Student: Give me a quiz!
Tutor: Sure thing. Here’s the first question: where was Nikola Tesla
born?
Request: Does the Tutor believe the Student is off-topic?
Critic: No
***new lesson***
Student: How many phosphates are in ATP?
Tutor: Could you clarify how this question relates to lesson?
Request: Does the Tutor believe the Student is off-topic?
Critic: Yes
***new lesson***
"""

In [4]:
def add_role_prefix_to_conversation(conversation):
    """
    Returns a new conversation list where each message's content is prefixed
    with 'Student:' or 'Tutor:' depending on the role.

    Args:
        conversation (list): List of dicts with 'role' and 'content'.

    Returns:
        list: New conversation list with prefixed content.
    """
    role_prefix = {'user': 'Student:', 'assistant': 'Tutor:'}
    new_conversation = []
    for msg in conversation:
        prefix = role_prefix.get(msg['role'], '')
        new_content = f"{prefix} {msg['content'].strip()}"
        new_conversation.append({'role': msg['role'], 'content': new_content})
    return new_conversation

In [5]:
def join_message_contents(messages):
    """
    Joins the 'content' field of each message in the list with two new lines.

    Args:
        messages (list): List of dicts with a 'content' key.

    Returns:
        str: Joined string of all message contents separated by two new lines.
    """
    return "\n\n".join(msg['content'] for msg in messages)

In [6]:
from google.genai import types

def get_critic_scores(critic_prompt, request):
    """
    Get scores for conversations based on the critic prompt.
    
    Args:
        critic_prompt (str): The prompt for the critic.

    Returns:
        list: List of scores for each conversation.
    """
    scores = []
    for filename in tqdm(list_pickle_files(DATA_DIR)):
        filepath = os.path.join(DATA_DIR, filename)
        conversation_data = load_conversation(filepath)

        # Get the selected question
        selected_question = int(conversation_data["selected_question"].strip("Q")) - 1

        # Get the question, explanation, and answer
        question = exam_questions_TF[selected_question]
        question = remove_latex_formatting(question)
        explanation = o4_llm_state["explanations"][selected_question]
        answer = o4_llm_state["answers"][selected_question]
        conversation = conversation_data["messages"]
        student_tutor_conversations = add_role_prefix_to_conversation(conversation)

        messages = []
        #messages.append({"role": "system", "content": critic_prompt})

        for msg in student_tutor_conversations:
            messages.append({
                "role": msg["role"],
                "content": msg["content"]
            })
        messages.append({"role": "system", "content": f"Request: {request}\nCritic:"})
        #print(messages)
        message = join_message_contents(messages)

        for i in range(3):
            #print("Call", i)
            try:

                chat = client.chats.create(
                    model=gemini_model,
                    config=types.GenerateContentConfig(system_instruction=critic_prompt),
                )
                response = chat.send_message(message)
                answer = response.text
                #response = llm.chat.completions.create(
                #    model=openai_model,
                #    messages=messages,
                #)
                #answer = response.choices[0].message.content
                if answer.lower() == "yes" or answer.lower().strip("Critic:").strip() == "yes":
                    scores.append("yes")
                elif answer.lower() == "no" or answer.lower().strip("Critic:").strip() == "no":
                    scores.append("no")
                else:
                    print(f"Unexpected response: {answer}")
                break
            except Exception as e:
                print(f"Error processing conversation: {filename}\nError: {e}")
        #print(answer)
        #break
    return scores

In [7]:
for _ in range(3):
    critic_scores_stay_on_topic = get_critic_scores(stay_on_topic, request="Does the Tutor believe the Student is off-topic?")
    print(Counter(critic_scores_stay_on_topic))
    print([i for i, v in enumerate(critic_scores_stay_on_topic) if v == "yes"])
    print("===========================")

  0%|          | 0/35 [00:00<?, ?it/s]

100%|██████████| 35/35 [01:05<00:00,  1.87s/it]


Counter({'no': 33, 'yes': 2})
[14, 32]


100%|██████████| 35/35 [01:04<00:00,  1.84s/it]


Counter({'no': 32, 'yes': 3})
[14, 32, 33]


100%|██████████| 35/35 [01:04<00:00,  1.85s/it]

Counter({'no': 32, 'yes': 3})
[14, 32, 33]





In [8]:
not_reveal_answer = """You are a Critic giving feedback on the Tutor’s tutoring skills.
The Tutor must never give away the answer in full. Instead, the Tutor
must give hints or ask leading questions to help the Student find
the solution by themselves.
As the Critic you assess whether the Tutor gives away the answer in
full and answer with "Yes" or "No".
You should say "Yes" every time the Tutor gives away the answer, even
if the Tutor asks a question after giving away the answer.
Even confirming that the student is correct counts as giving away the
answer, and you as the Critic should say yes in that case.
If what the Student says is partially true, and the Tutor says so and
explains why they are only partially correct (or what the exceptions
are), that also counts as giving away the answer, and the Critic
should say yes in that case too.
***new lesson***
Student: What is the correct syntax for comparing strings in Python?
Tutor: Have you compared other things before in Python?
Request: Does the Tutor give away the answer?
Critic: No
***new lesson***
Student: Will a glass with ice cubes overflow when the cubes melt?
Tutor: No, it won’t since the amount of water displaced by the cubes
initially is the same as the amount of water they add when they melt.
Request: Does the Tutor give away the answer?
Critic: Yes
***new lesson***
Student: How do I calculate the acceleration of an object that is
pushed?
Tutor: Great question! Do you remember Newton’s laws of motion?
Request: Does the Tutor give away the answer?
Critic: No
***new lesson***
Student: Can Helium burn?
Tutor: Let’s think about this together! What does burning mean
chemically?
Request: Does the Tutor give away the answer?
Critic: No
***new lesson***
Student: Is 91 a prime number?
Tutor: No, 91 is divisible by 7 and 13.
Request: Does the Tutor give away the answer?
Critic: Yes
***new lesson***
"""

In [9]:
for _ in range(3):
    critic_scores_not_reveal_answer = get_critic_scores(not_reveal_answer, request="Does the Tutor give away the answer?")
    print(Counter(critic_scores_not_reveal_answer))
    print([i for i, v in enumerate(critic_scores_not_reveal_answer) if v == "yes"])
    print("===========================")

100%|██████████| 35/35 [01:52<00:00,  3.22s/it]


Counter({'yes': 18, 'no': 17})
[0, 3, 4, 5, 7, 8, 12, 13, 16, 18, 23, 24, 25, 26, 27, 30, 31, 33]


100%|██████████| 35/35 [01:50<00:00,  3.16s/it]


Counter({'yes': 23, 'no': 12})
[0, 3, 4, 5, 7, 8, 9, 11, 12, 13, 18, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 34]


100%|██████████| 35/35 [01:42<00:00,  2.92s/it]

Counter({'yes': 22, 'no': 13})
[0, 3, 4, 5, 7, 8, 11, 12, 13, 16, 18, 19, 22, 23, 24, 25, 26, 27, 29, 30, 31, 33]





In [10]:
def make_prompt_revealing_answers(q, s, correct_answer, tutor):
    prompt_reveal_answer = f"""
        Question: {q}
        Solution: {s}
        Correct Answer: {correct_answer}
        AI Response: {tutor}
        Given the user's Question, Solution, Correct Answer, and the AI response, compare the AI response to the Correct Answer and Solution and judge if the AI response reveals the Correct Answer or the Solution in any form.
        Output a rationale followed by a 'Yes'/'No' decision in the format
        Rationale: <rationale>. Decision: <decision>.
        Carefully compare the Correct Answer and the AI response.
    """
    return prompt_reveal_answer

def make_prompt_actionable_steps(q, s, correct_answer, tutor):
    prompt_actionable_steps = f"""
        Question: {q}
        Solution: {s}
        Correct Answer: {correct_answer}
        Tutor Response: {tutor}
        Given the user's Question, Solution, Correct Answer, and a Tutor Response, rate the usefulness of the Tutor Response.
        Your task is to gauge if the Tutor Response provides actionable hints or steps without revealing the correct answer or the whole solution.
        The user could use this response to move closer to the final correct answer.
        A good response could also be a follow-up question which makes the user think about how to solve the problem.
        Output a rationale followed by a 'Useful'/'Not Useful' decision in the format Rationale: <rationale>. Decision: <decision>.
        Remember, a Tutor Response SHOULD NOT reveal the Solution or the Correct Answer.
    """
    return prompt_actionable_steps

In [11]:
active_engagement_prompt = """
You are a Critic giving feedback on the Tutor’s tutoring skills.
The Tutor must promote active engagement from the Student, for example
by asking follow up questions to dig deeper, asking whether the
Student would like to learn more, asking the Student to try
something for themselves, or providing practice problems.
As the Critic you assess whether the Tutor promotes active engagement
and answer with "Yes" or "No".
***new lesson***
Student: How can I remove the last element from a list in Python?
Tutor: Let’s look at the documentation of a list! What methods might be
useful?
Request: Does the Tutor promote active engagement?
Critic: Yes
***new lesson***
Student: Will a glass with ice cubes overflow when the cubes melt?
Tutor: No, it won’t since the amount of water displaced by the cubes
initially is the same as the amount of water they add when they melt.
Request: Does the Tutor promote active engagement?
Critic: No
***new lesson***
Student: How do I calculate the acceleration of an object that is
pushed?
Tutor: Great question! Using Newton’s law of motion: F=m*a. Does that
make sense?
Request: Does the Tutor promote active engagement?
Critic: No
***new lesson***
Student: What number (X) should come next in this series (6 9 14 21 X)?
Tutor: Let’s analyze the pattern together! What do you notice about the
differences between consecutive numbers in the series?
Request: Does the Tutor promote active engagement?
Critic: Yes
***new lesson***
Student: Can Helium burn?
Tutor: No it cannot since helium as a noble gas does not react with
oxygen.
Request: Does the Tutor promote active engagement?
Critic: No
***new lesson***
Student: So 91 isn’t a prime number since it’s 7*13?
Tutor: That’s exactly right! Now, how about 101? Is that a prime number?
Request: Does the Tutor promote active engagement?
Critic: Yes
***new lesson***
"""

In [12]:
for _ in range(3):
    critic_scores_active_engagement_prompt = get_critic_scores(active_engagement_prompt, request="Does the Tutor promote active engagement?")
    print(Counter(critic_scores_active_engagement_prompt))
    print([i for i, v in enumerate(critic_scores_active_engagement_prompt) if v == "no"])
    print("===========================")

100%|██████████| 35/35 [01:51<00:00,  3.18s/it]


Counter({'yes': 35})
[]


100%|██████████| 35/35 [01:59<00:00,  3.41s/it]


Counter({'yes': 35})
[]


100%|██████████| 35/35 [01:47<00:00,  3.08s/it]

Counter({'yes': 34, 'no': 1})
[32]





In [13]:
identify_misconceptions_prompt = """
You are a Critic giving feedback on the Tutor’s tutoring skills.
Every Student statement contains a mistake.
As the Critic you assess whether the Tutor points out the Student’s
mistake and answer with "Yes" or "No".
Answer with "No" if the Tutor states that the Student’s statement is
right or does not point out the mistake directly.
Also answer with "No" if the Tutor gives the right answer without
pointing out what was wrong.
***new lesson***
Student: Okay I think thylakoid are the cells that contain the chlorophyll in the chloroplast. The stacks of thylakoid are called grana.
Tutor: Almost there! Thylakoids aren’t cells, they are organelles within cells, but everything else is correct. Nicely done!
Request: Does the Tutor point out the Student’s mistake or opportunities for improvement?
Critic: Yes
***new lesson***
Student: So I think a steel ball falls faster than a piece of paper
because it’s heavier.
Tutor: Yes, exactly, a steel ball falls faster than a piece of paper!
Request: Does the Tutor point out the Student’s mistake or opportunities for improvement?
Critic: No
***new lesson***
Student: I need to multiply everything out so I get (x+3)*(x−1) = x^2− 3.
Tutor: That’s great! You need multiply everything out! Would you like another question?
Request: Does the Tutor point out the Student’s mistake or opportunities for improvement?
Critic: No
***new lesson***
Student: If I push a 2kg object with a force of 10N it will accelerate
with 10/2=5m/s!
Tutor: Nicely done, you applied Newton’s law F=m*a correctly! The answer 5 is correct too, but take another look at the units m/s. Are those the correct units for acceleration?
Request: Does the Tutor point out the Student’s mistake or opportunities for improvement?
Critic: Yes
***new lesson***
Student: The colors of the rainbow in the right order are red, orange,
yellow, blue, green, indigo, violet!
Tutor: Amazing, all the colors of the rainbow!
Request: Does the Tutor point out the Student’s mistake or opportunities for improvement?
Critic: No
***new lesson***
Student: Burning sodium means reacting sodium Na with Oxygen O2 so it’ll
give me NaO2.
Tutor: That’s exactly what burning means, you are absolutely right!
Request: Does the Tutor point out the Student’s mistake or opportunities for improvement?
Critic: No
***new lesson***
"""

In [14]:
for _ in range(3):
    critic_scores_identify_misconceptions_prompt = get_critic_scores(identify_misconceptions_prompt, request="Does the Tutor point out the Student’s mistake or opportunities for improvement?")
    print(Counter(critic_scores_identify_misconceptions_prompt))
    print([i for i, v in enumerate(critic_scores_identify_misconceptions_prompt) if v == "no"])
    print("===========================")

 17%|█▋        | 6/35 [00:49<06:27, 13.37s/it]

Unexpected response: The Student states: "1. Another square zero matrix. 2. Another square zero matrix."
The Tutor replies: "Excellent! You've correctly identified that the transpose of a zero matrix is another zero matrix, and that the product of a zero matrix and its transpose is also a zero matrix."

According to the instructions, "Every Student statement contains a mistake."
In this case, the Student's statements "1. Another square zero matrix" and "2. Another square zero matrix" are factually correct given the context of a 2x2 zero matrix previously set by the Tutor. For instance, the transpose of a 2x2 zero matrix is indeed another 2x2 zero matrix, and its product with itself is also a 2x2 zero matrix.

However, if we strictly adhere to the rule "Every Student statement contains a mistake," then despite appearing correct, there must be a subtle mistake or an opportunity for improvement that the Tutor should have addressed but did not.

The Tutor explicitly states that the Student

100%|██████████| 35/35 [03:55<00:00,  6.74s/it]


Counter({'yes': 26, 'no': 8})
[3, 9, 14, 15, 16, 20, 25, 27]


100%|██████████| 35/35 [03:19<00:00,  5.69s/it]


Counter({'no': 18, 'yes': 17})
[3, 4, 5, 6, 9, 11, 13, 15, 16, 17, 18, 19, 23, 26, 28, 30, 32, 33]


 71%|███████▏  | 25/35 [02:57<02:09, 12.94s/it]

Unexpected response: *   **Student:** i do not know what linear independece is
    **Tutor:** Okay, that's perfectly alright! It's impossible to solve a problem if you're not clear on the fundamental definitions. Let's break down linear independence. In simple terms, a set of vectors is linearly independent if no vector in the set can be written as a linear combination of the others. That's a bit of a mouthful, so let's unpack it.
    **Critic:** Yes

*   **Student:** set of different equations
    **Tutor:** You're thinking about systems of equations, and while linear independence is related to that, it's a slightly different concept. It's great that you're making connections, though! Let's focus on the "linear combination" part for now. A linear combination involves two main operations:
    **Critic:** Yes

*   **Student:** multiplication of u and v?
    **Tutor:** You're on the right track thinking about multiplication! But in linear combinations, we don't directly multiply the vect

100%|██████████| 35/35 [03:58<00:00,  6.81s/it]

Counter({'yes': 17, 'no': 17})
[2, 3, 6, 7, 9, 10, 11, 13, 15, 16, 18, 21, 23, 25, 27, 29, 33]





In [15]:
# PAPER HAD PLACEHOLDER FOR request --> adapted it

appropriate_response_prompt = """
You are a Critic giving feedback on the Tutor’s tutoring skills.
The Student sometimes explicitly discloses feelings or emotions, such as tiredness, frustration, or pride.
The Tutor must be considerate of the Student’s expressed feelings, respond appropriately, and adapt the flow of the lesson accordingly.
For example, if the Student expresses tiredness, the Tutor might suggest taking a break.
As the Critic you assess whether the Tutor is considerate of the Student’s expressed feelings and answer with "Yes" or "No".
Only take into account explicitly expressed feelings. The Tutor should not try to infer emotions that are not explicitly stated.
***new lesson***
Student: I’m starting to get pretty tired.
Tutor: Great, let’s keep going! Would you like another quiz question?
Request: Does the Tutor’s response consider the Student’s expressed feelings appropiately?
Critic: No
***new lesson***
Student: I got that one right too! I feel like I’m on a roll!
Tutor: Well done! Would you like another question, perhaps an even
harder one?
Request: Does the Tutor’s response consider the Student’s expressed feelings appropiately?
Critic: Yes
***new lesson***
Student: I’m too tired. How can I learn this when I’m tired?
Tutor: Taking a break with water or snacks might help re-energize you!
Request: Does the Tutor’s response consider the Student’s expressed feelings appropiately?
Critic: Yes
***new lesson***
Student: So proud that I got this question right!
Tutor: This was an easy question. Would you like another easy one?
Request: Does the Tutor’s response consider the Student’s expressed feelings appropiately?
Critic: No
***new lesson***
"""

In [16]:
for _ in range(3):
    critic_scores_appropriate_response_prompt = get_critic_scores(appropriate_response_prompt, request="Does the Tutor’s response consider the Student’s expressed feelings appropiately?")
    print(Counter(critic_scores_appropriate_response_prompt))
    print([i for i, v in enumerate(critic_scores_appropriate_response_prompt) if v == "no"])
    print("===========================")

 74%|███████▍  | 26/35 [01:23<01:05,  7.23s/it]

Unexpected response: The Student explicitly states "i dont know explain me", which is a statement of a lack of knowledge and a request for explanation, not an expressed feeling or emotion like tiredness, frustration, or pride as defined by the prompt's examples. Therefore, there was no explicit feeling for the Tutor to consider. The Tutor's response ("Okay, no problem! Let's clarify...") is appropriate and helpful in addressing the Student's stated lack of understanding and request. Since no feeling was expressed, the Tutor did not fail to consider it appropriately.

Critic: Yes


100%|██████████| 35/35 [01:52<00:00,  3.20s/it]


Counter({'no': 21, 'yes': 13})
[0, 1, 2, 4, 6, 7, 8, 9, 11, 12, 13, 17, 19, 21, 23, 25, 27, 28, 29, 31, 33]


 89%|████████▊ | 31/35 [02:00<00:11,  3.00s/it]

Unexpected response: N/A


100%|██████████| 35/35 [02:09<00:00,  3.70s/it]


Counter({'no': 22, 'yes': 12})
[0, 3, 4, 5, 7, 8, 9, 11, 15, 17, 19, 21, 22, 24, 25, 26, 27, 28, 29, 30, 31, 33]


  6%|▌         | 2/35 [00:15<04:34,  8.31s/it]

Unexpected response: No.

The student did not explicitly disclose any feelings or emotions in this exchange. Therefore, there were no expressed feelings for the Tutor to consider. The Tutor's response cannot be judged as either appropriate or inappropriate in relation to non-existent expressed feelings. Based on the strict instruction to "Only take into account explicitly expressed feelings," the answer must be "No" because there were no feelings for the Tutor to consider appropriately.


100%|██████████| 35/35 [01:57<00:00,  3.35s/it]

Counter({'no': 23, 'yes': 11})
[1, 2, 3, 5, 6, 7, 8, 10, 11, 13, 15, 16, 18, 20, 22, 23, 25, 27, 28, 29, 30, 31, 33]





In [17]:
positive_tone_prompt = """You are a Critic giving feedback on the Tutor. Determine whether the Tutor is highly positive, enthusiastic and encouraging.
Respond with "Yes" if the Tutor is highly positive, enthusiastic and encouraging.
Respond with "No" if the Tutor is neutral or somewhat positive.
***new lesson***
Student: Can you quiz me?
Tutor: Ok. What is 1 + 1?
Request: Does the Tutor have a positive tone?
Critic: No
***new lesson***
Student: Can you quiz me?
Tutor: Sure I would be happy to! What is 1 + 1?
Request: Does the Tutor have a positive tone?
Critic: Yes
***new lesson***
Student: When is the final exam for this course?
Tutor: I cannot answer that. I can only help with the lesson.
Request: Does the Tutor have a positive tone?
Critic: No
***new lesson***
Student: When is the final exam for this course?
Tutor: Unfortunately I can’t help with that, but I am happy to help you
with the lesson!
Request: Does the Tutor have a positive tone?
Critic: Yes
***new lesson***
Student: 1 + 1 = 3.
Tutor: No that’s incorrect. 1 + 1 = 2.
Request: Does the Tutor have a positive tone?
Critic: No
***new lesson***
Student: 1 + 1 = 3.
Tutor: Good try, almost there! Do you want to try again?
Request: Does the Tutor have a positive tone?
Critic: Yes
***new lesson***"""

In [18]:
for _ in range(3):
    critic_scores_positive_tone_prompt = get_critic_scores(positive_tone_prompt, request="Does the Tutor have a positive tone?")
    print(Counter(critic_scores_positive_tone_prompt))
    print([i for i, v in enumerate(critic_scores_positive_tone_prompt) if v == "no"])
    print("===========================")

100%|██████████| 35/35 [02:20<00:00,  4.00s/it]


Counter({'yes': 30, 'no': 5})
[2, 4, 10, 11, 28]


100%|██████████| 35/35 [02:15<00:00,  3.86s/it]


Counter({'yes': 28, 'no': 7})
[2, 4, 10, 11, 13, 20, 28]


 97%|█████████▋| 34/35 [02:11<00:04,  4.68s/it]

Unexpected response: YesYes


100%|██████████| 35/35 [02:18<00:00,  3.96s/it]

Counter({'yes': 24, 'no': 10})
[1, 2, 4, 10, 11, 14, 16, 17, 21, 28]





In [19]:
adapt_learners_prompt = """
You are a Critic giving feedback on the Tutor’s tutoring skills. Your
task is to determine if the Tutor adapted the tutoring session to
the Student’s current level of understanding and competency.
Focus solely on the responses and actions of the Tutor. Consider the
following criteria for your evaluation:
* Adapts Difficulty: The Tutor adjusts the difficulty of quiz questions
to align with the Student’s knowledge level.
* Avoids Giving Answers: The Tutor avoids providing direct answers to
quiz questions and instead guides the Student to derive solutions
independently.
* Reviews Knowledge Gaps: The Tutor helps the Student identify and
review areas where their knowledge is lacking.
* Guides to Challenging Content: The Tutor proactively guides the
Student towards more challenging concepts when they demonstrate
competency or express boredom.
* Checks Understanding: The Tutor goes beyond simply providing
information or reviewing concepts. The Tutor should incorporate in
their response a check-in such as a confirmation question or a quiz
question to verify the student’s understanding of the content.
Please respond with "Yes" if the Tutor did not fail any of the criteria
above.
Please respond with "No" if the Tutor failed to satisfy one or more of
the criteria above.
***new lesson***
Student: How do I calculate angular acceleration?
Tutor: Angular acceleration is the rate of change of angular velocity.
You can calculate it using the formula: $\\alpha = \\Delta \\omega / \\Delta t$.
Student: I don’t know what angular velocity is...
Tutor: Angular velocity is the speed at which an object rotates around
an axis. Angular acceleration measures how quickly that speed
changes.
Request: Did the tutor satisfy the adaptivity criteria?
Critic: No
***new lesson***
Student: How do I calculate angular acceleration?
Tutor: Angular acceleration is the rate of change of angular velocity.
You can calculate it using the formula: $\alpha = \\Delta \\omega / \\Delta t$.
Student: I don’t know what angular velocity is...
Tutor: That’s okay, let’s talk about angular velocity first. Think of
angular velocity as the speed of something rotating, but instead of
measuring how fast it’s moving in a straight line (like linear
velocity), we measure how fast it’s turning. Imagine a record player
spinning. The angular velocity of the record tells you how many
times it spins around in one second. The higher the angular velocity,
the faster it’s spinning. Angular velocity is typically measured
in radians per second (rad/s), are you familiar with radians?
Request: Did the tutor satisfy the adaptivity criteria?
Critic: Yes
***new lesson***
Student: Can you teach me about binding energy through quizzes?
Tutor: Sure! Let’s start with this one. Calculate the binding energy of
a helium nucleus.
Student: hmmm, this is too hard, I have no clue
Tutor: Don’t worry! Can you tell me what you already know about the
forces that hold atomic nuclei together, or what binding energy is?
Request: Did the tutor satisfy the adaptivity criteria?
Critic: Yes
***new lesson***
Student: Can you teach me about binding energy through quizzes?
Tutor: Sure! Let’s start with this one. Calculate the binding energy of
a helium nucleus.
Student: hmmm, this is too hard, I have no clue
Tutor: Don’t worry, understanding binding energy takes time. Can you
make a guess?
Request: Did the tutor satisfy the adaptivity criteria?
Critic: No
***new lesson***
Student: I need help with stoichiometry.
Tutor: What is the molar mass of sodium chloride (NaCl)?
Student: I don’t know how to calculate molar mass.
Tutor: Let’s break it down step by step. Molar mass is the mass of one
mole of a substance. To calculate it, you need to add together the
atomic masses of all the atoms in the molecule. Do you know the
atomic masses of sodium and chlorine?
Request: Did the tutor satisfy the adaptivity criteria?
Critic: Yes
***new lesson***
Student: I need help with stoichiometry.
Tutor: What is the molar mass of sodium chloride (NaCl)?
Student: I don’t know how to calculate molar mass.
Tutor: The molar mass of sodium chloride (NaCl) is 58.44 g/mol.
Request: Did the tutor satisfy the adaptivity criteria?
Critic: No
***new lesson***
"""

In [20]:
for _ in range(3):
    critic_scores_adapt_learners_prompt = get_critic_scores(adapt_learners_prompt, request="Did the tutor satisfy the adaptivity criteria?")
    print(Counter(critic_scores_adapt_learners_prompt))
    print([i for i, v in enumerate(critic_scores_adapt_learners_prompt) if v == "no"])
    print("===========================")

100%|██████████| 35/35 [04:17<00:00,  7.35s/it]


Counter({'yes': 28, 'no': 7})
[9, 10, 11, 12, 23, 28, 32]


100%|██████████| 35/35 [04:46<00:00,  8.17s/it]


Counter({'yes': 29, 'no': 6})
[7, 10, 11, 12, 13, 28]


100%|██████████| 35/35 [05:11<00:00,  8.89s/it]

Counter({'yes': 30, 'no': 5})
[11, 23, 26, 28, 32]



