# SAT Words in Context question generation using Strands Agentic AI framework and OpenAI model

## Overview

My first [attempt](https://github.com/PragyanR/GenAI_for_SAT_Prep) to create words in context type question using Llama 3B on A100 was a bit involved from a coding perspective. With the advent of Agentic AI, I was able to generate such questions under 50 lines of code and a prompt template.

Strands Agentic AI framework made it very simple to try out the use case.

Check out my site if you want to try out Words in Context questions that I generated: [Acesat.ai](https://www.acesat.ai/).


## Setup and prerequisites

### Prerequisites
* Python 3.10+
* gpt-4.1-mini access

Install the requirement packages for Strands Agent

In [19]:
# installing pre-requisites
!pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### Importing dependency packages

Import the dependency packages

In [20]:
import os
from strands import Agent, tool
from strands.models.litellm import LiteLLMModel
import json

### Setting up OpenAI keys

Setup the OpenAI API Keys

In [21]:
os.environ["OPENAI_API_KEY"] = "<Key Goes here>"

### Setting up custom tools

I created three tools for the agent to call:

first, check_word_exists, a tool for the agent to check if a word was used to create a words in context question already.

second, store_paragraph, a tool for the agent to call for storing the generated paragraph.

third, store_answer_choices, a tool for the agent to call for storing the answer choices and the word definitions.

In [22]:
# Dictionary for storing SAT inference questions (keyed by genre)
para = {}

@tool
def check_genre_exists(genre: str):
    '''
    Check if this genre or a very similar one was used already.
    Args:
        genre: SAT passage genre
    '''
    return genre in para


@tool
def store_paragraph(genre: str, paragraph: str):
    '''
    Store inference paragraph
    Args:
        genre: SAT passage genre
        paragraph: paragraph ending with "because ____"
    '''
    para[genre] = {
        "genre": genre,
        "question": paragraph,
        "ans_choices": [],
        "ans_choices_with_explanation": {},
        "correct_answer": None
    }


@tool
def store_answer_choice(choice: str, explanation: str, genre: str):
    '''
    Store answer choices and explanations
    Args:
        choice: answer choice text
        explanation: brief justification
        genre: SAT passage genre
    '''
    if choice not in para[genre]["ans_choices"]:
        para[genre]["ans_choices"].append(choice)

    para[genre]["ans_choices_with_explanation"][choice] = explanation


@tool
def set_correct_answer(choice: str, genre: str):
    '''
    Store the correct inference answer
    Args:
        choice: correct answer choice
        genre: SAT passage genre
    '''
    para[genre]["correct_answer"] = choice


### LLM model

Agent will leverage `gpt-4.1-mini` using LiteLLM.

In [23]:
model = "gpt-4.1-mini"
litellm_model = LiteLLMModel(
    model_id=model, params={"max_tokens": 32000, "temperature": 0.7}
)

### Generating Words in Context questions

In [24]:
# Prompt template for generating words in context question
prompt = '''
You are generating authentic SAT Main Idea/ Central Idea questions.
You MUST use the provided tools to store and validate content.

Rules (SAT Authenticity):
- Use an academic, neutral tone.
- The conclusion must be implied, not explicitly tated.
- There must be exactly ONE logically valid completion.
- Avoid opinions, exaggeration, or unsupported assumptions.

Process:
1) Create a unique random Genre for the text. It can range from History, to science, to archertecture, to art.
2) Use check_question_genre(Genre).
   - If it returns a genre similar to the one generated, generate a new genre and restart.
   - else continue
3) Write a 70–90 word paragraph about the genre. Dont simply explane about the genre but talk about a niche, nuanced aspect of it. It must end with a line break and "Which choice best describes the main idea of the text?".
   - Do NOT have a sentence summarizing the main idea.
   - The whole text should be used to understand the main idea in the text.
5) Use store_paragraph(Genre, paragraph).

Answer Choices:
6) Generate exactly four answer choices (A–D):
   - One correct answer explaining the main idea.
   - One tempting but unsupported answer choice
   - Two clearly incorrect answer choices
7) For each answer choice, call:
   store_answer_choice(choice, explanation)
   - Explanation must be one concise sentence justifying correctness or incorrectness.
8) Use set_correct_answer(correct_choice).

Output Rules:
- Do NOT reveal reasoning steps.
- Do NOT explain unless explicitly asked.
- Ensure no more than one answer is defensible; otherwise, regenerate.
'''
# Function for generating
def generate_question(prompt):
    system_prompt = "You are a simple agent that can generate a paragraph for a given word"
    agent = Agent(
        model=litellm_model,
        system_prompt=system_prompt,
        tools=[store_paragraph, store_answer_choice, check_genre_exists,set_correct_answer],
    )
    agent(prompt)

In [25]:


# Specify number of questions to be generated
question_count = 50
para = {}

for i in range(question_count):
    generate_question(prompt)

# Print generated genres (one per question)
for genre in para.keys():
    print(genre)


Tool #1: check_genre_exists

Tool #2: store_paragraph

Tool #3: store_answer_choice

Tool #4: store_answer_choice

Tool #5: store_answer_choice

Tool #6: store_answer_choice

Tool #7: set_correct_answer
Historical archaeology focuses on the study of cultures and societies with written records, often combining physical artifacts with historical documents to gain a fuller understanding. Unlike prehistoric archaeology, it benefits from corroborating texts that provide context to the material finds. However, the interpretation of these findings requires careful consideration of biases in written sources and the archaeological record itself. This interplay between tangible evidence and textual information shapes our understanding of past human behaviors.
Which choice best describes the main idea of the text?

A) Historical archaeology integrates both material artifacts and written records to better understand past cultures.
B) Historical archaeology primarily relies on written records and 

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x10c2f0350>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x10c2dc280>, 575602.076699665)])']
connector: <aiohttp.connector.TCPConnector object at 0x10c2f0210>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x10c4010d0>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x10c2def20>, 575609.355545806)])']
connector: <aiohttp.connector.TCPConnector object at 0x10c4018d0>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x10c4b1750>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x10c2dec10>, 575622.906896139)])']
connector: <aiohttp.connector.TCPConnector object at 0x10c4b0fd0>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x10c3b2850>
Unclosed connector
connections: ['deque([(<aiohttp.c


Tool #7: store_answer_choice

Tool #8: store_answer_choice

Tool #9: store_answer_choice

Tool #10: store_answer_choice

Tool #11: set_correct_answer
Recent studies in neuroscience have focused on the hippocampus, a brain region critical for memory formation. Researchers have discovered that the hippocampus not only stores memories but also actively reorganizes them during sleep, which enhances long-term retention. This reorganization process involves neural plasticity, allowing the brain to strengthen important memories while discarding irrelevant information. Such findings suggest that memory is not static but dynamic, continuously evolving with experience and rest. Which choice best describes the main idea of the text?

A. Memory formation involves active reorganization in the hippocampus during sleep.
B. The hippocampus is primarily responsible for controlling emotions.
C. Memories are permanently stored in the brain without change.
D. Sleep has no effect on memory retention.Histo

In [28]:
# Print generated genres (one per question)
for genre in para.keys():
    print(genre)

Historical Archaeology
Marine Biology
Neoclassical Architecture
Renaissance Art
Environmental Science
Medieval Architecture
Quantum Physics
Ancient Pottery
Ancient Architectural Techniques
Ancient Architecture
Impressionist Art
Historical Maritime Navigation
Astrobiology
Medieval Manuscript Art
Medieval Manuscript Illumination
Astrophysics
Ancient Maritime Archaeology
Environmental Science and Urban Ecology
Astronomy
Renaissance Art Techniques
Renaissance Scientific Instruments
Ancient Civilizations
Architectural History
Renewable Energy Technology
Cognitive Psychology
Ethnomusicology
History of Maritime Navigation
Medieval Literature
Ancient Mesopotamian Architecture
Renewable Energy Technologies
Historical Cartography
Quantum Computing
History of Scientific Instruments
Historical Botany
Urban Archaeology
Bioluminescence in Deep Sea Creatures
Historical Architecture
Geological Formations of Desert Landscapes
Urban Green Spaces
Ancient Navigation Methods
Classical Music Composition Tec

In [29]:
for key in para:
    print(json.dumps(para[key]))

{"genre": "Historical Archaeology", "question": "Historical archaeology focuses on the study of cultures and societies with written records, often combining physical artifacts with historical documents to gain a fuller understanding. Unlike prehistoric archaeology, it benefits from corroborating texts that provide context to the material finds. However, the interpretation of these findings requires careful consideration of biases in written sources and the archaeological record itself. This interplay between tangible evidence and textual information shapes our understanding of past human behaviors.\nWhich choice best describes the main idea of the text?", "ans_choices": ["Historical archaeology integrates both material artifacts and written records to better understand past cultures.", "Historical archaeology primarily relies on written records and disregards physical artifacts.", "Historical archaeology studies only prehistoric societies without written records.", "Historical archaeol

In [30]:
# write the questions to a Json lines file
with open("Infrence_output.jsonl", "w") as f:
    for key in para:
        print(key)
        f.write(json.dumps(para[key])+ "\n")

Historical Archaeology
Marine Biology
Neoclassical Architecture
Renaissance Art
Environmental Science
Medieval Architecture
Quantum Physics
Ancient Pottery
Ancient Architectural Techniques
Ancient Architecture
Impressionist Art
Historical Maritime Navigation
Astrobiology
Medieval Manuscript Art
Medieval Manuscript Illumination
Astrophysics
Ancient Maritime Archaeology
Environmental Science and Urban Ecology
Astronomy
Renaissance Art Techniques
Renaissance Scientific Instruments
Ancient Civilizations
Architectural History
Renewable Energy Technology
Cognitive Psychology
Ethnomusicology
History of Maritime Navigation
Medieval Literature
Ancient Mesopotamian Architecture
Renewable Energy Technologies
Historical Cartography
Quantum Computing
History of Scientific Instruments
Historical Botany
Urban Archaeology
Bioluminescence in Deep Sea Creatures
Historical Architecture
Geological Formations of Desert Landscapes
Urban Green Spaces
Ancient Navigation Methods
Classical Music Composition Tec