# 4 Generating the Bank for the MicroTasks

To generate the bank for the microtasks I will use an API for an LLM.
The output will be two questions for each core course of each programme.

Basically I will create the perfect prompt that will use the columns of the df_courses to generate the microstasks. 

We use two prompts: broad + disambiguaition (Kenneth Style)

In [1]:
from pathlib import Path
import pandas as pd
#!pip install --upgrade openai
import os, re
from openai import OpenAI
import json
import numpy as np
from tqdm import tqdm  # optional progress bar, pip install tqdm


## 1 Load the data and filter for max 2 courses

In [2]:
# load the csv file about the courses forwhih we have to gen the tasks
silver = Path("../data_programmes_courses/silver")

df_courses_tasks = pd.read_csv(silver / "df_courses_tasks_silver.csv", encoding="utf-8-sig")
print("The shape of the courses tasks dataframe is:", df_courses_tasks.shape)

# keep only first two courses from each programme
df_courses_tasks = df_courses_tasks.groupby("programme_title").head(2).reset_index(drop=True)
print("After keeping only first two courses from each programme the shape is:", df_courses_tasks.shape)

The shape of the courses tasks dataframe is: (36, 21)
After keeping only first two courses from each programme the shape is: (28, 21)


In [3]:
# add a column called professor that takes randomly one of the following prof names:
# - prof. L. Morabito
# - prof. A. Kazemi
# - prof. S. Srinivasan
# - prof. K.N.M Mwandingi

# set random seed for reproducibility
np.random.seed(42) 
professors = ["prof. L. Morabito", "prof. A. Kazemi", "prof. S. Srinivasan", "prof. K.N.M Mwandingi"]
df_courses_tasks["professor"] = np.random.choice(professors, size=len(df_courses_tasks))
df_courses_tasks.head(2)

Unnamed: 0,code,course_name,programme_title,faculty,programme_url,year_num,period,ects,course_level,course_coordinator,...,course_paragraphs_json,number_programmes,course_objective,course_content,additional_information_teaching_methods,method_of_assessment,literature,additional_information_target_audience,recommended_background_knowledge,professor
0,L_AABAOHW115,Objects in Context. An Interdisciplinary Persp...,Ancient Studies,Faculty of Humanities,https://studiegids.vu.nl/en/Bachelor/2025-2026...,1.0,1.0,6,100,prof. dr. J.P. Crielaard,...,"[""Course Objective\nA distinct feature of Anci...",1,A distinct feature of Ancient Studies is the c...,In this course you will familiarize yourself w...,Lectures and seminars (3 x 2hrs p/w),Multiple choice exam (40%)\r\nSet-up for writi...,,,,prof. S. Srinivasan
1,L_AABAOHW101,The Classical Canon I: The Heritage of Antiquity,Ancient Studies,Faculty of Humanities,https://studiegids.vu.nl/en/Bachelor/2025-2026...,1.0,1.0,6,100,prof. dr. mr. R.J. Allan,...,"[""Course Objective\nThis first canon module (f...",8,This first canon module (followed up by a modu...,This course prepares for the module ‘Canon II’...,"Lectures/seminars, 2 x 2 hours\r\nThe first ha...",Oral presentation of a group project (40% of t...,A selection of articles and book chapters whic...,First year students --> Bachelor's in Ancient ...,,prof. K.N.M Mwandingi


## 2. Set up OpenAI client 

In [4]:

key_path = Path("../data_bank_microtasks") / "api_key.txt"

# Read the key and strip spaces and newlines
api_key = key_path.read_text(encoding="utf8").strip()

# Create the client using this key
client = OpenAI(api_key=api_key)

models = client.models.list()
#for m in models.data:
#    print(m.id)

model_gpt = "gpt-4.1-mini"  


## 3. Define the Prompts
Here is the prompt that generates for each programme the questions based on the core courses.

In [9]:
SYSTEM_PROMPT_BROAD = """
You generate one broad RIASEC question for a playful study choice tool.

The question should feel like a first step in a real course task.

INPUT
You receive one JSON object with:
- programme_title
- course_code
- course_name
- course_objective
- course_content
- teaching_methods
- assessment

TASK
Create exactly one multiple choice question with six options.

Rules
1. Use the course information to imagine a realistic first year situation.
2. Write a short question string that describes the situation and ends with a question.
3. Create tiny_learn as a list of exactly three short sentences:
   a. definition of the key concept
   b. method reminder
   c. info on the course: This is what you will learn in the course "[course_name]" taught by [professor]
4. Create six options labelled A to F.
5. Each option describes a plausible first action the student could take.
6. Each option must be tagged with one RIASEC code:
   R, I, A, S, E, or C.
7. Across the six options you must use each of the six RIASEC codes exactly once.
8. Do not mention RIASEC, personality, or profiles in the visible text.

OUTPUT
Return one JSON object with this shape:

{
  "question": string,
  "tiny_learn": [string, string, string],
  "options": {
    "A": {"text": string, "riasec": "R" | "I" | "A" | "S" | "E" | "C"},
    "B": {...},
    "C": {...},
    "D": {...},
    "E": {...},
    "F": {...}
  }
}

All strings must be single line strings. Do not insert raw newline characters in any value.
"""

SYSTEM_PROMPT_DISAMB = """
You generate one RIASEC disambiguation question for a playful study choice tool.

The question should feel like a first step in a real course task.

INPUT
You receive one JSON object with:
- programme_title
- course_code
- course_name
- course_objective
- course_content
- teaching_methods
- assessment
- triple_code   for example "RIA"

triple_code contains three distinct letters from R, I, A, S, E, C.

TASK
Create exactly one multiple choice question with three options.

Rules
1. Use the course information to imagine a realistic first year situation.
2. Write a short question string that describes the situation and ends with a question.
3. Create tiny_learn as a list of exactly three short sentences:
   a. definition of the key concept
   b. method reminder
   c.  c. info on the course: This is what you will learn in the course "[course_name]" taught by [professor]
4. Create three options labelled A, B, C.
5. The three options together must use exactly the three letters in triple_code, one per option.
   For example triple_code "RIA" means one R, one I, one A.
6. Each option describes a plausible first action the student could take.
7. Each option must be tagged with a riasec letter that is one of the letters in triple_code.
8. Do not mention RIASEC, personality, or profiles in the visible text.

OUTPUT
Return one JSON object with this shape:

{
  "question": string,
  "tiny_learn": [string, string, string],
  "options": {
    "A": {"text": string, "riasec": one letter from triple_code},
    "B": {...},
    "C": {...}
  }
}

All strings must be single line strings. Do not insert raw newline characters in any value.
"""


## 4. Define the helpers functions

In [10]:
import json
import re
from collections import defaultdict
from pathlib import Path
from tqdm import tqdm

def truncate(text, max_chars=1200):
    """Short helper to shorten very long fields."""
    if text is None:
        return ""
    s = str(text)
    if len(s) <= max_chars:
        return s
    return s[:max_chars]

def build_course_payload(row):
    """Context that we pass to the model for one course."""
    return {
        "programme_title": str(row.get("programme_title", "")),
        "course_code": str(row.get("code", "")),
        "course_name": str(row.get("course_name", "")),
        "course_objective": truncate(row.get("course_objective", ""), 800),
        "course_content": truncate(row.get("course_content", ""), 800),
        "teaching_methods": truncate(row.get("additional_information_teaching_methods", ""), 400),
        "assessment": truncate(row.get("method_of_assessment", ""), 400),
        "professor": str(row.get("professor", "")),
    }

def safe_parse_json(raw_text: str):
    """
    Clean and parse model output as JSON.
    Removes newlines and obvious trailing commas, then parses.
    """
    if raw_text is None:
        raise ValueError("Model returned no text")

    text = raw_text.strip()
    if not text:
        raise ValueError("Model returned empty text")

    # normalise whitespace
    text = text.replace("\r\n", " ").replace("\n", " ").replace("\t", " ")
    text = re.sub(r"\s+", " ", text)

    # remove trailing commas before closing braces or brackets
    text = re.sub(r",\s*([}\]])", r"\1", text)

    # try direct
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        pass

    # try first and last brace
    start = text.find("{")
    end = text.rfind("}")
    if start == -1 or end == -1 or end <= start:
        print("Could not find JSON object, preview:")
        print(text[:400])
        raise ValueError("No JSON object found")

    candidate = text[start : end + 1]
    candidate = re.sub(r",\s*([}\]])", r"\1", candidate)

    try:
        return json.loads(candidate)
    except json.JSONDecodeError as e:
        print("Still could not parse, preview:")
        print(candidate[:400])
        raise e


def call_broad_question(row, model_name=model_gpt):
    payload = build_course_payload(row)

    response = client.responses.create(
        model=model_name,
        input=json.dumps(payload),
        instructions=SYSTEM_PROMPT_BROAD,
        max_output_tokens=500,
    )

    raw = response.output_text
    print("BROAD PREVIEW:", repr((raw or "")[:160]))
    return safe_parse_json(raw)


def call_disamb_question(row, triple_code, model_name=model_gpt):
    payload = build_course_payload(row)
    payload["triple_code"] = triple_code

    response = client.responses.create(
        model=model_name,
        input=json.dumps(payload),
        instructions=SYSTEM_PROMPT_DISAMB,
        max_output_tokens=400,
    )

    raw = response.output_text
    print(f"DISAMB {triple_code} PREVIEW:", repr((raw or "")[:160]))
    return safe_parse_json(raw)



In [11]:
# test with one course
test_row = df_courses_tasks.iloc[0]
broad_question = call_broad_question(test_row)  
broad_question

BROAD PREVIEW: '{\n  "question": "You have just been assigned your first research object from the Allard Pierson collection. What should be your initial step in studying this ob'


{'question': 'You have just been assigned your first research object from the Allard Pierson collection. What should be your initial step in studying this object?',
 'tiny_learn': ['Material culture refers to physical objects created and used by people in the past to understand their lives and societies.',
  'Start your research by gathering background information and reviewing existing scholarship related to the object.',
  'This is what you will learn in the course "Objects in Context. An Interdisciplinary Perspective on the Ancient World" taught by prof. S. Srinivasan.'],
 'options': {'A': {'text': 'Visit the archaeological collection to observe the object directly and note its physical characteristics.',
   'riasec': 'R'},
  'B': {'text': 'Search for academic articles and books that discuss the type of object you received.',
   'riasec': 'I'},
  'C': {'text': 'Sketch or creatively interpret the object to explore its artistic and symbolic features.',
   'riasec': 'A'},
  'D': {'text

## 5. Function that calls the API and provide the prompt

In [12]:
from collections import defaultdict
from pathlib import Path
import json
from tqdm import tqdm

TRIPLES = ["RIA", "RIS", "REC", "IEC", "ASE", "ASC"]

# final result:
# {
#   "Programme": {
#       "broad": [ {question_code, question, tiny_learn, options}, ... ],
#       "R": [ {question_code, question, tiny_learn, options}, ... ],
#       "I": [...], "A": [...], "S": [...], "E": [...], "C": [...]
#   }
# }
ml_structure = defaultdict(lambda: {
    "broad": [],
    "R": [],
    "I": [],
    "A": [],
    "S": [],
    "E": [],
    "C": [],
})

# track how many questions we have emitted per course
from collections import Counter
question_counter = Counter()

max_courses = 100   # start small, then increase

for i, (_, row) in enumerate(tqdm(df_courses_tasks.iterrows(), total=len(df_courses_tasks))):
    if i >= max_courses:
        break

    programme = str(row.get("programme_title", "UNKNOWN PROGRAMME"))
    course_code = str(row.get("code", ""))

    # 1) broad question for this course
    try:
        broad_obj = call_broad_question(row)
    except Exception as e:
        print(f"Problem making broad question for course {course_code}: {e}")
        continue

    question_counter[course_code] += 1
    broad_qcode = f"{course_code}_{question_counter[course_code]}"

    broad_entry = {
        "question_code": broad_qcode,
        "question": broad_obj["question"],
        "tiny_learn": broad_obj["tiny_learn"],
        "options": broad_obj["options"],
    }

    ml_structure[programme]["broad"].append(broad_entry)

    # 2) six disambiguation questions for this course
    for triple in TRIPLES:
        try:
            disamb_obj = call_disamb_question(row, triple)
        except Exception as e:
            print(f"Problem making disamb {triple} for course {course_code}: {e}")
            continue

        question_counter[course_code] += 1
        disamb_qcode = f"{course_code}_{question_counter[course_code]}"

        disamb_entry = {
            "question_code": disamb_qcode,
            "question": disamb_obj["question"],
            "tiny_learn": disamb_obj["tiny_learn"],
            "options": disamb_obj["options"],
        }

        # attach under each RIASEC letter in the triple
        for letter in set(triple):
            if letter in "RIASEC":
                ml_structure[programme][letter].append(disamb_entry)


  0%|          | 0/28 [00:00<?, ?it/s]

BROAD PREVIEW: '{\n  "question": "You have been assigned to study a group of ancient coins from the Allard Pierson collection to understand their historical significance. What i'
DISAMB RIA PREVIEW: '{\n  "question": "You have been assigned to research an ancient coin from the Allard Pierson collection. What is your first step to understand its historical con'
DISAMB RIS PREVIEW: '{\n  "question": "You just received your assignment to analyze an ancient statue from the Allard Pierson collection. What should be your first step in working wi'
DISAMB REC PREVIEW: '{\n  "question": "You have just received a detailed description of an ancient statue from the archaeological collection. What should be your first step to start '
DISAMB IEC PREVIEW: '{\n  "question": "You have just started researching an ancient coin from the university collection. What should you do first to frame your study?",\n  "tiny_learn'
DISAMB ASE PREVIEW: '{\n  "question": "You have just received your first assignment 

  4%|▎         | 1/28 [00:30<13:37, 30.27s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You found an ancient coin in the Allard Pierson collection and want to start researching it for your first assignment. What is your best first '
BROAD PREVIEW: '{\n  "question": "You have just started the course and need to pick a classical canonical item not covered in class for your group research project. What is your'
DISAMB RIA PREVIEW: '{\n  "question": "You need to prepare your first group presentation on a canonical item not discussed in class. What is your initial step?",\n  "tiny_learn": [\n  '
DISAMB RIS PREVIEW: '{\n  "question": "You have just started \'The Classical Canon I\' course and need to prepare for your group presentation. What should you do first?",\n  "tiny_learn'
DISAMB REC PREVIEW: '{\n  "question": "You have been asked to contribute to a group presentation on a classical monument\'s historical significance. What should you do first?",\n  "tin'
DISAMB IEC PREVIEW: '{\n  "question": "You need to prepare your first group pr

  7%|▋         | 2/28 [01:00<13:06, 30.25s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You have just started studying key classical texts and monuments and need to decide how to begin engaging with the course materials. What shoul'
BROAD PREVIEW: '{\n  "question": "You are preparing for the first seminar in Introduction to Communication Studies. Which step do you take first to engage effectively with the u'
DISAMB RIA PREVIEW: '{\n  "question": "In your first seminar for Introduction to Communication Studies, you must choose one way to engage with the theories discussed. What will you d'
DISAMB RIS PREVIEW: '{\n  "question": "You are preparing for your first seminar in \'Introduction to Communication Studies\' and need to decide how to start. What is your first step?",'
DISAMB REC PREVIEW: '{\n  "question": "You have just attended a lecture on psychological and societal communication theories. What is your first step to get ready for the upcoming se'
DISAMB IEC PREVIEW: '{\n  "question": "You have just attended a lecture on differen

 11%|█         | 3/28 [01:26<11:46, 28.27s/it]

DISAMB ASC PREVIEW: '{\n  "question": "As you start your first seminar in Introduction to Communication Studies, how do you approach understanding the different communication theorie'
BROAD PREVIEW: '{\n  "question": "You have just attended your first lecture on how children acquire language sounds and words. What is your first step to deepen your understandi'
DISAMB RIA PREVIEW: '{\n  "question": "You have just received your first dataset of language sounds and need to decide what to do first. Which step should you take?",\n  "tiny_learn":'
DISAMB RIS PREVIEW: '{\n  "question": "You have just received a dataset of speech recordings from young children for analysis. What should be your first step?",\n  "tiny_learn": [\n   '
DISAMB REC PREVIEW: '{\n  "question": "You have just received a dataset of spoken sentences from a language you don\'t know. What is your first step to analyze this data?",\n  "tiny_le'
DISAMB IEC PREVIEW: '{\n  "question": "You receive a dataset of spoken words fro

 14%|█▍        | 4/28 [01:54<11:18, 28.27s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You start the first week of \\"Introduction to Linguistics\\" and need to prepare for the seminar exercise. What is your first step?",\n  "tiny_le'
BROAD PREVIEW: '{\n  "question": "You have just started the course Calculus and Analysis I and need to begin understanding the concept of limits. What is your first step in appr'
DISAMB RIA PREVIEW: '{\n  "question": "You are beginning the first tutorial of Calculus and Analysis I and need to decide how to start preparing for your upcoming midterm exam. What '
DISAMB RIS PREVIEW: '{\n  "question": "You are beginning the first assignment involving derivatives and limits. Which approach will you take to start solving the problem?",\n  "tiny_l'
DISAMB REC PREVIEW: '{\n  "question": "You encounter a challenging problem involving the behavior of multivariable functions and need to decide your first step. What do you do?",\n  "'
DISAMB IEC PREVIEW: '{\n  "question": "You encounter a challenging problem about

 18%|█▊        | 5/28 [02:26<11:17, 29.44s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You encounter a challenging problem about finding the local extreme values of a multivariable function. What would be your first step?",\n  "tin'
BROAD PREVIEW: '{\n  "question": "You just attended your first lecture in \'Introduction to Programming\' and received your first programming assignment. What should you do first '
DISAMB RIA PREVIEW: '{\n  "question": "You have just started the course and received your first programming assignment: to write a simple program that asks the user for input and pri'
DISAMB RIS PREVIEW: '{\n  "question": "You have just started the course and need to prepare for your first programming assignment. What should you do first?",\n  "tiny_learn": [\n    "'
DISAMB REC PREVIEW: '{\n  "question": "You are starting your first programming assignment where you must implement an algorithm in Python. What should you do first?",\n  "tiny_learn":'
DISAMB IEC PREVIEW: '{\n  "question": "You just started the course and need to 

 21%|██▏       | 6/28 [02:52<10:25, 28.44s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You just started the \'Introduction to Programming\' course and need to decide your first step to understand the material. What will you do first'
BROAD PREVIEW: '{\n  "question": "You have just started the course \'Academic Skills for Historians\' and need to begin your first assignment by understanding what has already bee'
DISAMB RIA PREVIEW: '{\n  "question": "You have just started the course and need to decide how to begin preparing your status quaestionis. What is your first step?",\n  "tiny_learn": '
DISAMB RIS PREVIEW: '{\n  "question": "You have just started working on your status quaestionis and want to clarify what scholars have written and where gaps remain. What should your'
DISAMB REC PREVIEW: '{\n  "question": "You have just started researching for your historiographical essay and need to understand existing debates on the topic. What should you do fir'
DISAMB IEC PREVIEW: '{\n  "question": "You are starting your first assignment wh

 25%|██▌       | 7/28 [03:19<09:48, 28.02s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You need to start your research for the historiographical essay by understanding what previous scholars have written and what questions remain '
BROAD PREVIEW: '{\n  "question": "You have just received the historiographical theme for your status quaestionis paper; what is your first step in approaching this topic?",\n  "t'
DISAMB RIA PREVIEW: '{\n  "question": "You are assigned to start a status quaestionis on a historiographical theme. What is your first step to organize your work?",\n  "tiny_learn": ['
DISAMB RIS PREVIEW: '{\n  "question": "You need to start your first task in Academic Skills for Historians: summarizing key scientific publications about your historiographical theme'
DISAMB REC PREVIEW: '{\n  "question": "You are starting your first assignment to write a status quaestionis on a historiographic theme. What is your first step to get started?",\n  "t'
DISAMB IEC PREVIEW: '{\n  "question": "You have to write a status quaestionis on a

 29%|██▊       | 8/28 [03:44<08:57, 26.86s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You need to start your status quaestionis for the historiographical theme assigned by prof. L. Morabito. What is your first step in tackling th'
BROAD PREVIEW: '{\n  "question": "You have just received the first literary text to analyze using the structuralist approach. What is your initial step to start your analysis?",'
DISAMB RIA PREVIEW: '{\n  "question": "You have just received the first literary text to analyze using the structuralist approach. What is your first step to start the analysis?",\n  '
DISAMB RIS PREVIEW: '{\n  "question": "You have just read a poem from the Dutch literary canon. What is your first step to start the analysis?",\n  "tiny_learn": [\n    "Structuralist '
DISAMB REC PREVIEW: '{\n  "question": "You have just received your first literary text for analysis using structuralist methods. What should you do first to begin your analysis?",\n  '
DISAMB IEC PREVIEW: '{\n  "question": "You have just read a complex poem from a h

 32%|███▏      | 9/28 [04:14<08:50, 27.93s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You have just read a poem from the Dutch literary canon and are asked to start your first analysis. What is your first step?",\n  "tiny_learn": '
BROAD PREVIEW: '{\n  "question": "You have just read the first chapter of James Wood\'s \'How Fiction Works\' and need to start your own short story. What is your first step?",\n  "'
DISAMB RIA PREVIEW: '{\n  "question": "You have been assigned to write a short story using specific narrative techniques. What is the first step you should take to start the assignme'
DISAMB RIS PREVIEW: '{\n  "question": "You have just read the first chapter of James Wood\'s How Fiction Works and want to start your own short story. What should you focus on first?"'
DISAMB REC PREVIEW: '{\n  "question": "You are starting your first exercise in \'Creative Writing I\' where you need to develop a story from scratch. What is your first step to begin?"'
DISAMB IEC PREVIEW: '{\n  "question": "You have just read a chapter about nar

 36%|███▌      | 10/28 [04:41<08:15, 27.50s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You have just read a passage on narrative style for your first writing exercise. What should you do next to best start your creative process?",'
BROAD PREVIEW: '{\n  "question": "You are starting your first week in Ancient Philosophy and must prepare for the upcoming lecture on the origins of Greco-Roman and Sanskrit phi'
DISAMB RIA PREVIEW: '{\n  "question": "You have just started the course Ancient Philosophy and need to prepare for the first reading assignment. Which first step will best help you e'
DISAMB RIS PREVIEW: '{\n  "question": "You are starting the course \'Ancient Philosophy\' and must prepare for your first reading assignment. What is your first step?",\n  "tiny_learn":'
DISAMB REC PREVIEW: '{\n  "question": "You have just received the first set of primary texts from the Greco-Roman and Sanskrit traditions. What is your first step to approach these m'
DISAMB IEC PREVIEW: '{\n  "question": "You have just been assigned your first prim

 39%|███▉      | 11/28 [05:11<08:01, 28.33s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You are starting your first reading assignment on the main philosophical questions from the Greco-Roman and Sanskrit traditions. What should yo'
BROAD PREVIEW: '{\n  "question": "You have just started the Epistemology course and the professor asks you to prepare for the first seminar by engaging with a key epistemologica'
DISAMB RIA PREVIEW: '{\n  "question": "You have just read a complex argument about what knowledge really means. What is your first step to deepen your understanding?",\n  "tiny_learn"'
DISAMB RIS PREVIEW: '{\n  "question": "You are asked to evaluate a philosophical argument about knowledge and belief. What should you do first?",\n  "tiny_learn": [\n    "Epistemology '
DISAMB REC PREVIEW: '{\n  "question": "You are assigned to analyze a philosophical argument in epistemology for your first seminar. Which first step would help you best initiate your'
DISAMB IEC PREVIEW: '{\n  "question": "You just started the Epistemology course an

 43%|████▎     | 12/28 [05:39<07:30, 28.18s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You have just started the \'Epistemology\' course and need to prepare for the first seminar discussion. Which approach will best kickstart your u'
BROAD PREVIEW: '{\n  "question": "You have just started the Genetics course and are given a task to understand the structure of the human genome. What is your first step?",\n  "t'
DISAMB RIA PREVIEW: '{\n  "question": "You have just started the Genetics course and need to prepare for the first assignment analyzing differences between prokaryotic and eukaryotic'
DISAMB RIS PREVIEW: '{\n  "question": "You have just learned about the molecular structure of human DNA. To start your study effectively, what should you do first?",\n  "tiny_learn": '
DISAMB REC PREVIEW: '{\n  "question": "You have just started the Genetics course and need to decide how to approach your first assignment on DNA replication. What will you do first?"'
DISAMB IEC PREVIEW: '{\n  "question": "You have just started the Genetics course 

 46%|████▋     | 13/28 [06:02<06:40, 26.72s/it]

DISAMB ASC PREVIEW: '{\n  "question": "During your first lab practical on DNA replication, you notice some unexpected chromosome structures under the microscope. What should you do f'
BROAD PREVIEW: '{\n  "question": "You have just started the course \'Introduction to Biomedical Sciences\' and need to choose a first step to formulate your own scientific researc'
DISAMB RIA PREVIEW: '{\n  "question": "You are starting your literature report and need to clarify the main research question you want to investigate. What is your first step?",\n  "t'
DISAMB RIS PREVIEW: '{\n  "question": "You have just received a research article related to biomedical sciences. What should you do first to start your analysis?",\n  "tiny_learn": [\n'
DISAMB REC PREVIEW: '{\n  "question": "You want to start your first research question in the course. What is your first step?",\n  "tiny_learn": [\n    "Research involves systematicall'
DISAMB IEC PREVIEW: '{\n  "question": "You have received a scientific article 

 50%|█████     | 14/28 [06:26<06:01, 25.84s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You have been given a research article to review for your first assignment. What should you do first to understand the study properly?",\n  "tin'
BROAD PREVIEW: '{\n  "question": "You are introduced to limits and continuity in Calculus 1. Which would you do first to understand if a function is continuous at a point?",\n  "'
DISAMB RIA PREVIEW: '{\n  "question": "You encounter a problem requiring you to find the maximum value of a function on an interval. What should you do first?",\n  "tiny_learn": [\n   '
DISAMB RIS PREVIEW: '{\n  "question": "You are starting your first assignment in Calculus 1 and need to analyze a function to find where it reaches its highest or lowest points. What'
DISAMB REC PREVIEW: '{\n  "question": "You need to prepare for the midterm exam in Calculus 1. Which approach will you choose first to best understand the material?",\n  "tiny_learn":'
DISAMB IEC PREVIEW: '{\n  "question": "You are given a function to analyze for c

 54%|█████▎    | 15/28 [06:52<05:38, 26.03s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You are starting to study Calculus 1 and want to prepare for an exercise class. What is your first step?",\n  "tiny_learn": [\n    "Calculus is t'
BROAD PREVIEW: '{\n  "question": "You have just been introduced to the full scope of the Business Analytics course and its major case study involving data-driven decision making'
DISAMB RIA PREVIEW: '{\n  "question": "You are starting the Introduction to Business Analytics course and need to prepare for the first case study. Which step do you take first?",\n  '
DISAMB RIS PREVIEW: '{\n  "question": "You have just received the first case study for the Introduction to Business Analytics course. What is the best initial step to start tackling '
DISAMB REC PREVIEW: '{\n  "question": "You just received a large dataset from a company for your first Business Analytics case study. What is your first step to start analyzing the d'
DISAMB IEC PREVIEW: '{\n  "question": "You have just been assigned the first group

 57%|█████▋    | 16/28 [07:19<05:14, 26.17s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You are starting the Introduction to Business Analytics course and need to prepare for the first case study. Which step would you choose to beg'
BROAD PREVIEW: '{\n  "question": "You have just started your first programming assignment in the course \'Computer Programming.\' What is your first step to effectively tackle the'
DISAMB RIA PREVIEW: '{\n  "question": "You just started your first programming assignment in C++. What is your first step to approach this task?",\n  "tiny_learn": [\n    "Problem solv'
DISAMB RIS PREVIEW: '{\n  "question": "You have just started the Computer Programming course and need to decide your first step to approach the programming assignments. What should y'
DISAMB REC PREVIEW: '{\n  "question": "You just got the first programming assignment: implement a simple calculator using basic C++ expressions. What do you do first?",\n  "tiny_learn'
DISAMB IEC PREVIEW: '{\n  "question": "You begin your first programming assignme

 61%|██████    | 17/28 [07:41<04:35, 25.01s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You are about to start your first programming assignment. What will you do first to approach the task effectively?",\n  "tiny_learn": [\n    "Pro'
BROAD PREVIEW: '{\n  "question": "You encounter a complex logical statement involving several propositions and want to determine if it can be simplified. What is your first step'
DISAMB RIA PREVIEW: '{\n  "question": "You encounter a challenging logical statement that you need to verify for equivalence with another in your Logic and Sets course. What is your '
DISAMB RIS PREVIEW: '{\n  "question": "You are studying a complex logical statement from the course materials. How do you start understanding it?",\n  "tiny_learn": [\n    "Logic invol'
DISAMB REC PREVIEW: '{\n  "question": "You are asked to analyze a statement using propositional logic and then verify its correctness with a truth table. What is your first step?",\n '
DISAMB IEC PREVIEW: '{\n  "question": "You need to solve a problem about express

 64%|██████▍   | 18/28 [08:08<04:16, 25.61s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You are given a complex logical formula and a task to simplify it into a standard form. What is the first step you should take?",\n  "tiny_learn'
BROAD PREVIEW: '{\n  "question": "You have just started the \\"Economic Challenges\\" course and today\'s lecture covers the emergence of different economic paradigms in historical'
DISAMB RIA PREVIEW: '{\n  "question": "You need to start your first assignment by approaching an economic challenge—what will you do first?",\n  "tiny_learn": [\n    "Economic challeng'
DISAMB RIS PREVIEW: '{\n  "question": "You have just started the \'Economic Challenges\' course and need to prepare for your first tutorial assignment. What is your first step?",\n  "ti'
DISAMB REC PREVIEW: '{\n  "question": "You are preparing for your first tutorial on economic theories: what should you do first?",\n  "tiny_learn": [\n    "Economic theories explain ho'
DISAMB IEC PREVIEW: '{\n  "question": "You are preparing for the first tut

 68%|██████▊   | 19/28 [08:34<03:51, 25.70s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You are beginning the course \'Economic Challenges\' and need to engage with the material actively. Which approach would you take first to better'
BROAD PREVIEW: '{\n  "question": "You are given a problem describing how market prices adjust to equilibrium. What is your first step to begin solving this mathematically?",\n  "'
DISAMB RIA PREVIEW: '{\n  "question": "You encounter a complex business problem that requires a clear analytical approach. Which step do you take first to begin solving it?",\n  "tiny'
DISAMB RIS PREVIEW: '{\n  "question": "You need to start working on your first assignment that involves formulating a real-world economic problem using mathematical tools. What is yo'
DISAMB REC PREVIEW: '{\n  "question": "You need to start your first assignment on market equilibrium models. What is your first step?",\n  "tiny_learn": [\n    "Quantitative methods us'
DISAMB IEC PREVIEW: '{\n  "question": "You have been given a dataset of economi

 71%|███████▏  | 20/28 [09:04<03:35, 26.91s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You are starting a course where you will learn to use mathematics to solve business problems. What is the first practical step you take to enga'
BROAD PREVIEW: '{\n  "question": "You have just received the course syllabus for \\"Academic Skills\\" and need to decide how to start preparing for your first lecture. What is yo'
DISAMB RIA PREVIEW: '{\n  "question": "You have just received your first assignment in Academic Skills and need to choose how to begin. What is your first step?",\n  "tiny_learn": [\n '
DISAMB RIS PREVIEW: '{\n  "question": "You have been asked to start a research project on an international business problem; what is your first step?",\n  "tiny_learn": [\n    "Researc'
DISAMB REC PREVIEW: '{\n  "question": "You have just been given a research topic related to international business. What will you do first to start your project?",\n  "tiny_learn": [\n'
DISAMB IEC PREVIEW: '{\n  "question": "You have been given a research topic i

 75%|███████▌  | 21/28 [09:30<03:07, 26.74s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You need to start your first research assignment on an international business topic. What is your first step to approach this task?",\n  "tiny_l'
BROAD PREVIEW: '{\n  "question": "You are in your first tutorial group discussing a complex international business case involving cultural misunderstandings. What is your first '
DISAMB RIA PREVIEW: '{\n  "question": "You have just started working on a group project about intercultural collaboration in international business. What is your first step?",\n  "tin'
DISAMB RIS PREVIEW: '{\n  "question": "You are assigned a group project analyzing a strategic business case involving intercultural collaboration. Which first step will best prepare '
DISAMB REC PREVIEW: '{\n  "question": "You are given a case study about a conflict within an international company with employees from diverse cultural backgrounds. What is your firs'
DISAMB IEC PREVIEW: '{\n  "question": "You are asked to start a research project on

 79%|███████▊  | 22/28 [09:55<02:36, 26.10s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You are asked to analyze a case about cultural challenges in an international company. What is your first step to understand the situation?",\n '
BROAD PREVIEW: '{\n  "question": "You receive your first weekly assignment involving proving a simple mathematical statement. What is your initial step to tackle this task?",\n  '
DISAMB RIA PREVIEW: '{\n  "question": "You encounter a complex statement in your first homework that you need to prove. What is your first step?",\n  "tiny_learn": [\n    "A mathematic'
DISAMB RIS PREVIEW: '{\n  "question": "You just encountered a mathematical statement in your course \'Basic Concepts in Mathematics.\' What do you do first to approach understanding an'
DISAMB REC PREVIEW: '{\n  "question": "You encounter your first weekly homework problem about proving a statement on sets. What should you do first?",\n  "tiny_learn": [\n    "A mathem'
DISAMB IEC PREVIEW: '{\n  "question": "You need to start working on your firs

 82%|████████▏ | 23/28 [10:20<02:08, 25.79s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You have just started the course and need to prepare for the first weekly assignment. What is your first step?",\n  "tiny_learn": [\n    "A mathe'
BROAD PREVIEW: '{\n  "question": "You have been assigned to analyze a function\'s behavior near a specific point. What is the first step you take to begin understanding this func'
DISAMB RIA PREVIEW: '{\n  "question": "You are given a function and asked to find its local extreme values as a first exploration. What should you do first?",\n  "tiny_learn": [\n    "'
DISAMB RIS PREVIEW: '{\n  "question": "You are asked to evaluate the limit of a function that results in an indeterminate form during your first tutorial. What should be your first s'
DISAMB REC PREVIEW: '{\n  "question": "You need to start your first assignment by understanding how to find the local minima of a function. What would you do first?",\n  "tiny_learn":'
DISAMB IEC PREVIEW: '{\n  "question": "You are starting your first homework on 

 86%|████████▌ | 24/28 [10:49<01:46, 26.73s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You encounter a complex function and need to understand its behavior before proceeding. What is the first step you take?",\n  "tiny_learn": [\n  '
BROAD PREVIEW: '{\n  "question": "You are starting your first seminar in \'Approaching Visual and Material Culture\' and need to engage with the key concepts of visual representat'
DISAMB RIA PREVIEW: '{\n  "question": "You have been asked to analyze a famous artwork\'s symbolism and the artist\'s intended message. What is your first step?",\n  "tiny_learn": [\n   '
DISAMB RIS PREVIEW: '{\n  "question": "You have just started the course \'Approaching Visual and Material Culture\' and need to decide how to begin exploring the key concept of visual '
DISAMB REC PREVIEW: '{\n  "question": "You need to analyze a cultural image and decide how to start your research. What first step do you take?",\n  "tiny_learn": [\n    "Visual cultur'
DISAMB IEC PREVIEW: '{\n  "question": "You have just started the course \

 89%|████████▉ | 25/28 [11:14<01:18, 26.33s/it]

DISAMB ASC PREVIEW: '{\n  "question": "During your first seminar in Approaching Visual and Material Culture, you are asked to analyze an artwork using a specific approach. Which init'
BROAD PREVIEW: '{\n  "question": "You have just attended your first lecture on the Renaissance\'s impact on European culture. What is your initial step to engage with the materia'
DISAMB RIA PREVIEW: '{\n  "question": "You are starting your first assignment in European Cultural History and need to decide how to approach it. Which first step do you take?",\n  "t'
DISAMB RIS PREVIEW: '{\n  "question": "You have just started the European Cultural History course and need to decide how to prepare for the first class discussion on the Enlightenmen'
DISAMB REC PREVIEW: '{\n  "question": "You are starting the course \'European Cultural History\' and want to prepare effectively for your first presentation. Which approach do you choo'
DISAMB IEC PREVIEW: '{\n  "question": "You have just started the course \'Europea

 93%|█████████▎| 26/28 [11:41<00:52, 26.48s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You are starting your first assignment on the cultural consequences of industrialization in Europe. What is your first step to get a good start'
BROAD PREVIEW: '{\n  "question": "You have just been assigned to prepare for your first seminar discussion on the ethical theories covered in Ethics (PPE). What is your first st'
DISAMB RIA PREVIEW: '{\n  "question": "You are tackling your first seminar assignment on evaluating ethical theories. Which approach do you take first to start your analysis?",\n  "ti'
DISAMB RIS PREVIEW: '{\n  "question": "You have just started the Ethics (PPE) course and need to prepare for the first seminar discussion on moral responsibility. What is your first '
DISAMB REC PREVIEW: '{\n  "question": "In your first seminar discussion about consequentialism, what should you focus on to contribute effectively?",\n  "tiny_learn": [\n    "Ethical t'
DISAMB IEC PREVIEW: '{\n  "question": "During your first seminar discussion on eth

 96%|█████████▋| 27/28 [12:08<00:26, 26.57s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You are preparing for your first seminar discussion on ethical theories. What should you do first?",\n  "tiny_learn": [\n    "Ethical theories ar'
BROAD PREVIEW: '{\n  "question": "You have just been introduced to the mathematical optimization problems in Methods of PPE I. What is your first step in tackling a problem wher'
DISAMB RIA PREVIEW: '{\n  "question": "You need to start your first assignment by choosing a method to analyze a political decision-making problem. What will you do first?",\n  "tiny_'
DISAMB RIS PREVIEW: '{\n  "question": "You are analyzing a complex decision problem involving multiple variables and want to understand the logic behind different choices. Which appr'
DISAMB REC PREVIEW: '{\n  "question": "You are facing a complex decision problem involving logical reasoning, mathematical optimization, and choice modeling. Which approach do you tr'
DISAMB IEC PREVIEW: '{\n  "question": "You are starting your first exercise in Met

100%|██████████| 28/28 [12:35<00:00, 26.99s/it]

DISAMB ASC PREVIEW: '{\n  "question": "You need to start working on your first math lab exercise involving optimization problems. What is your initial step?",\n  "tiny_learn": [\n    "'





## 6. Saving

In [13]:
ml_structure = dict(ml_structure)

output_dir = Path("../data_bank_microtasks")
output_dir.mkdir(parents=True, exist_ok=True)

out_path = output_dir / "microtasks_RIASEC.json"

with out_path.open("w", encoding="utf8") as f:
    json.dump(ml_structure, f, ensure_ascii=False, indent=2)

print("Saved ML structure to:", out_path)


Saved ML structure to: ..\data_bank_microtasks\microtasks_RIASEC.json


In [14]:
with out_path.open("r", encoding="utf8") as f:
    data = json.load(f)

print("Programmes:", list(data.keys())[:5])

prog_name = next(iter(data.keys()))
p = data[prog_name]

print("\nProgramme:", prog_name)
print(" broad questions:", len(p["broad"]))
print(" R questions:", len(p["R"]))
print(" I questions:", len(p["I"]))
print(" A questions:", len(p["A"]))
print(" S questions:", len(p["S"]))
print(" E questions:", len(p["E"]))
print(" C questions:", len(p["C"]))

from pprint import pprint

print("\nExample broad question:")
pprint(p["broad"][0])

print("\nExample R disambiguation question:")
if p["R"]:
    pprint(p["R"][0])


Programmes: ['Ancient Studies', 'Communication and Information Studies', 'Econometrics and Operations Research', 'History', 'Literature and Society']

Programme: Ancient Studies
 broad questions: 2
 R questions: 6
 I questions: 6
 A questions: 6
 S questions: 6
 E questions: 6
 C questions: 6

Example broad question:
{'options': {'A': {'riasec': 'R',
                   'text': 'Visit the archaeological collection to closely '
                           "examine the coins' physical characteristics and "
                           'inscriptions firsthand.'},
             'B': {'riasec': 'I',
                   'text': 'Search academic databases for articles and sources '
                           'about ancient coins and their historical context.'},
             'C': {'riasec': 'A',
                   'text': 'Create a detailed drawing or artistic '
                           'representation of the coins to visualize their '
                           'design and symbols.'},
           

In [15]:
# the number and names of the programmes for which microtasks were generated
print("\nTotal programmes with microtasks generated:", len(data))
print("Programmes:", list(data.keys()))

# open file of programmes vector:
programme_vectors_path = "../data_RIASEC/df_RIASEC_programmes_vectors.csv"
programme_vectors = pd.read_csv(programme_vectors_path, encoding="utf-8-sig")

# number of programmes with vectors
print("The number of vectors: ", len(programme_vectors))
print("Programme titles with vectors:", list(programme_vectors["programme_title"].unique()))

# diffference between programmes with vectors and programmes with microtasks
programmes_with_microtasks = set(data.keys())
programmes_with_vectors = set(programme_vectors["programme_title"].unique())
programmes_difference = programmes_with_vectors - programmes_with_microtasks
print("Programmes with vectors but no microtasks:", programmes_difference)


Total programmes with microtasks generated: 14
Programmes: ['Ancient Studies', 'Communication and Information Studies', 'Econometrics and Operations Research', 'History', 'Literature and Society', 'Philosophy', 'Biomedical Sciences', 'Business Analytics', 'Computer Science', 'Economics and Business Economics', 'International Business Administration', 'Mathematics', 'Media, Art, Design and Architecture', 'Philosophy, Politics and Economics']
The number of vectors:  17
Programme titles with vectors: ['Ancient Studies', 'Archaeology', 'Artificial Intelligence', 'Biomedical Sciences', 'Business Analytics', 'Communication and Information Studies', 'Computer Science', 'Econometrics and Data Science', 'Econometrics and Operations Research', 'Economics and Business Economics', 'History', 'International Business Administration', 'Literature and Society', 'Mathematics', 'Media, Art, Design and Architecture', 'Philosophy', 'Philosophy, Politics and Economics']
Programmes with vectors but no micr