### Pre-warming LLMs on the INCOSE Guide to Writing Requirements

The purpose of this notebook is to probe OpenAI's models on what it knows about the INCOSE Guide to Writing Requirements. The intent is to set a baseline on how the model performs out-of-the-box against some standard evaluation criteria.

In [8]:
from typing import Union, List, Dict
from pathlib import Path
from dotenv import dotenv_values
import asyncio
from pydantic import BaseModel, SecretStr, Field
from openai import OpenAI
import src
from src.components.promptrunner import ResponseClient
from src.prj_logger import ProjectLogger, get_logs
from src import utils

In [9]:
# Load config settings
DOT_ENV = dotenv_values("../.env")
openai_api_key = str(DOT_ENV['OPENAI_API_KEY'])
openai_api_key_secret = SecretStr(openai_api_key)
config = utils.load_config("../config.yaml")

# Create a unique run-id folder to store outputs
config["FILE_LOCATIONS"]["MAIN_DATA_FOLDER"] = "../src/data"
output_directory = utils.make_output_directory(config["FILE_LOCATIONS"])

# Instantiate logger
logger = ProjectLogger(
    name=src.BASE_LOGGERNAME,
    log_file=f"{output_directory}/{src.BASE_LOGGERNAME}.log"
)
logger.config()

<src.prj_logger.ProjectLogger at 0x238a2fe0890>

In [10]:
# Define model type
MODEL = "gpt-5"

In [11]:
# Create OpenAI instance
client = OpenAI(
    # Replace with your actual API key or use: api_key=os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key_secret.get_secret_value()
)

#### Pre-warming

In [12]:
resp_client = ResponseClient(client, MODEL)

In [13]:
response = resp_client.get_response(
    input="Can you tell me about the INCOSE Guide to Writing Requirements?",
    previous_response_id=None,
)

Printing `response.output_text`:

 The INCOSE Guide to Writing Requirements is a practical handbook from the International Council on Systems Engineering (INCOSE) that explains how to write clear, testable, and consistent requirements. It complements the ISO/IEC/IEEE 29148 standard by turning its principles into concrete rules, examples, and checklists. It’s used across industries (aerospace, defense, automotive, medical, IT) by authors, reviewers, and managers.

What the guide covers
- Purpose and scope: How to craft individual requirements and sets of requirements so they are understandable, implementable, and verifiable.
- Roles: Guidance for authors, reviewers, approvers; how to structure reviews and quality checks.
- Patterns and examples: Model sentences, good/bad examples, and rewrite patterns for common requirement types (functional, performance, interface, constraint, quality, safety).
- Checklists and rules: Language rules, formatting, attributes/metadata, and verification pl

In [14]:
resp_client.get_response(
    input=[{"role": "user", "content": "What are the rules for writing requirements as described in Section 4 of the INCOSE Guide to Writing Requirements?"}],
    previous_response_id=resp_client.previous_response_ids[-1]
)

Printing `response.output_text`:

 Short answer: Section 4 lays out a set of practical, prescriptive rules for writing clear, testable, consistent requirements. Below is a concise paraphrase of those rules (wording and numbering vary by edition).

Language and syntax
- Use “shall” to express binding requirements; use “should” for goals/recommendations and “will” for statements of fact or external events (not to impose requirements).
- Always name the subject of the requirement (e.g., “The system,” “The ECU,” “The API”); write in active voice.
- One requirement per statement; avoid chaining with and, or, and/or, commas, semicolons.
- State conditions explicitly: When/If/While <condition>, <item> shall <response> <within/by>.
- Prefer positive statements; use “shall not” only to forbid a specific behavior that would otherwise be allowed.
- Avoid optional or permissive verbs (“may,” “can,” “able to”) for requirements; if something is optional, define the configuration or mode in which it 

Response(id='resp_0df66ba03729e3a30068f05491d634819388cef289a7a23e6c', created_at=1760580754.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_0df66ba03729e3a30068f0549284b081938784b26b092c256a', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_0df66ba03729e3a30068f054c2ad888193a1f3e74e4b1b5c9c', content=[ResponseOutputText(annotations=[], text='Short answer: Section 4 lays out a set of practical, prescriptive rules for writing clear, testable, consistent requirements. Below is a concise paraphrase of those rules (wording and numbering vary by edition).\n\nLanguage and syntax\n- Use “shall” to express binding requirements; use “should” for goals/recommendations and “will” for statements of fact or external events (not to impose requirements).\n- Always name the subject of the requirement (e.g., “The system,” “The ECU,” “T

In [15]:
resp_client.get_response(
    input=[{"role": "user", "content": "Based on the previous output, suggest an approach on how to review a requirement for accuracy according to the INCOSE Guide to Writing Requirements?"}],
    previous_response_id=resp_client.previous_response_ids[-1],
)

Printing `response.output_text`:

 Here’s a practical, lightweight way to review a requirement for accuracy, aligned with the INCOSE Guide’s Section 4 rules and quality characteristics.

Preparation (what to have in hand)
- Source: stakeholder need, parent requirement, contract/standard, hazard/mission need.
- Context: operational scenarios/modes, interface definitions, architecture or allocation, performance budgets, environmental ranges.
- Verification approach: intended method (Test/Analysis/Inspection/Demonstration) and acceptance criteria.

Three-pass review method

Pass 1 — Language and structure (quick screen)
- Subject and shall: Does the statement name the item (e.g., “The system”) and use “shall” for the requirement?
- One per requirement: Is there only one enforceable statement (no and/or, chained clauses)?
- Positive, active, clear: Prefer positive statements with explicit trigger/condition (“When/If…”).
- Measurable: Are there concrete, measurable criteria with units, tole

Response(id='resp_0df66ba03729e3a30068f054d36bb481938740b4ac64bfa61b', created_at=1760580819.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_0df66ba03729e3a30068f054d43d488193b29122e1aca1f322', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_0df66ba03729e3a30068f054e7b5d48193b2b0a043f612a18a', content=[ResponseOutputText(annotations=[], text='Here’s a practical, lightweight way to review a requirement for accuracy, aligned with the INCOSE Guide’s Section 4 rules and quality characteristics.\n\nPreparation (what to have in hand)\n- Source: stakeholder need, parent requirement, contract/standard, hazard/mission need.\n- Context: operational scenarios/modes, interface definitions, architecture or allocation, performance budgets, environmental ranges.\n- Verification approach: intended method (Test/Analysis/Inspection/Demo

#### Prompt Engineering

In [15]:
# Create a formatted checklist
class Checklist(BaseModel):
    checklist: list[str] = Field(description="A list of criteria to be followed") 

messages = [
    {"role": "system", "content": "Act as a helpful assistant. THe user will hand over a description of a checklist. Your task is to convert this into a checklist as defined by the provided response_format."},
    {"role": "user", "content": resp_client.previous_responses[-1]}
]
structured_response = resp_client.get_structured_response(
    messages=messages,
    response_format=Checklist,
)
resp_client.check_structured_output(structured_response)

checklist=['Is the requirement clearly stated, avoiding ambiguous or vague terms?', 'Can all readers (technical and non-technical stakeholders) understand the requirement without confusion?', 'Is it written in plain, simple language with no jargon or undefined acronyms?', "Does the requirement use active voice and a positive statement (e.g., 'The system shall…')?", "Does it avoid subjective words such as 'user-friendly', 'fast', or 'optimal'?", 'Is the requirement phrased as a single, atomic statement (not combining multiple requirements)?', 'Does the requirement address a single capability or attribute?', 'Avoid compound requirements that include more than one functionality or condition.', 'Split complex requirements into multiple focused ones if necessary.', 'Is the requirement stated in such a way that it can be verified through test, inspection, or analysis?', 'Are the acceptance criteria or measurable parameters clearly indicated or implied?', 'Could a tester or analyst objectivel

In [18]:
# Create a prompt using this formatted checklist
class PromptTemplate(BaseModel):
    system: str = Field(description="A prompt system message") 
    user: str = Field(description="A user message")
messages = [
    {"role": "system", "content": "Act as a prompt engineering specialist. The user will hand over a summary checklist. Create a prompt which systematically checks each criteria in the checklist."},
    {"role": "user", "content": str(structured_response.parsed.checklist)}
]
structured_response = resp_client.get_structured_response(
    messages=messages,
    response_format=PromptTemplate,
)
resp_client.check_structured_output(structured_response)

system='You are a meticulous requirements analyst tasked with verifying the quality and clarity of a given requirement based on a detailed checklist.' user='Given the following requirement:\n\n"""\n{requirement}\n"""\n\nPlease systematically assess the requirement against each of the following criteria, providing a clear answer (Yes/No) and a concise explanation for each:\n\n1. Is the requirement clearly stated, avoiding ambiguous or vague terms?\n2. Can all readers (technical and non-technical stakeholders) understand the requirement without confusion?\n3. Is it written in plain, simple language with no jargon or undefined acronyms?\n4. Does the requirement use active voice and a positive statement (e.g., \'The system shall…\')?\n5. Does it avoid subjective words such as \'user-friendly\', \'fast\', or \'optimal\'?\n6. Is the requirement phrased as a single, atomic statement (not combining multiple requirements)?\n7. Does the requirement address a single capability or attribute?\n8. A

In [20]:
structured_response.parsed.system

'You are a meticulous requirements analyst tasked with verifying the quality and clarity of a given requirement based on a detailed checklist.'

'You are a meticulous requirements analyst tasked with verifying the quality and clarity of a given requirement based on a detailed checklist.'


In [21]:
structured_response.parsed.user

'Given the following requirement:\n\n"""\n{requirement}\n"""\n\nPlease systematically assess the requirement against each of the following criteria, providing a clear answer (Yes/No) and a concise explanation for each:\n\n1. Is the requirement clearly stated, avoiding ambiguous or vague terms?\n2. Can all readers (technical and non-technical stakeholders) understand the requirement without confusion?\n3. Is it written in plain, simple language with no jargon or undefined acronyms?\n4. Does the requirement use active voice and a positive statement (e.g., \'The system shall…\')?\n5. Does it avoid subjective words such as \'user-friendly\', \'fast\', or \'optimal\'?\n6. Is the requirement phrased as a single, atomic statement (not combining multiple requirements)?\n7. Does the requirement address a single capability or attribute?\n8. Avoid compound requirements that include more than one functionality or condition.\n9. Split complex requirements into multiple focused ones if necessary.\n1

'Given the following requirement:\n\n"""\n{requirement}\n"""\n\nPlease systematically assess the requirement against each of the following criteria, providing a clear answer (Yes/No) and a concise explanation for each:\n\n1. Is the requirement clearly stated, avoiding ambiguous or vague terms?\n2. Can all readers (technical and non-technical stakeholders) understand the requirement without confusion?\n3. Is it written in plain, simple language with no jargon or undefined acronyms?\n4. Does the requirement use active voice and a positive statement (e.g., \'The system shall…\')?\n5. Does it avoid subjective words such as \'user-friendly\', \'fast\', or \'optimal\'?\n6. Is the requirement phrased as a single, atomic statement (not combining multiple requirements)?\n7. Does the requirement address a single capability or attribute?\n8. Avoid compound requirements that include more than one functionality or condition.\n9. Split complex requirements into multiple focused ones if necessary.\n10. Is the requirement stated in such a way that it can be verified through test, inspection, or analysis?\n11. Are the acceptance criteria or measurable parameters clearly indicated or implied?\n12. Could a tester or analyst objectively determine if the requirement is met or not?\n13. Does the requirement use terminology consistent with the rest of the documentation?\n14. Are key terms defined somewhere in a glossary or within the document?\n15. Is there any conflicting wording when compared to other requirements?\n16. Is the requirement traceable back to a stakeholder need, higher-level system requirement, or project objective?\n17. Does the requirement add value to the system, or is it redundant or unnecessary?\n18. Is the rationale for the requirement clear or documented (if applicable)?\n19. Does the requirement cover all relevant conditions and constraints (e.g., operating environment, performance parameters)?\n20. Is it realistically implementable within project constraints (time, cost, technology)?\n\nProvide your answer in a numbered list with each item corresponding to the checklist criteria.'

#### Evaluation

In [12]:
import src.components.prompteval as pe

In [13]:
import inspect
import re
# Get all functions in the prompteval module
functions = inspect.getmembers(pe, inspect.isfunction)

#exclude_funcs = ['eval_correct_spelling', 'eval_is_unique_expression']
exclude_funcs = []

# Print function names
eval_config = {}
for name, func in functions:
    if (name.startswith("eval")) and (name not in exclude_funcs):
        docstring = inspect.getdoc(func)
        match = re.search(pattern='R\d+', string=docstring)
        rule_number = match.group() if match else ''
        print(name, rule_number)
        eval_config[name] = func

eval_acronym_consistency R37
eval_avoid_pronouns R24
eval_avoids_absolutes R26
eval_avoids_parentheses R21
eval_avoids_purpose_phrases R20
eval_avoids_vague_terms R7
eval_consistent_terms_and_units R36
eval_correct_punctuation R14
eval_correct_spelling R13
eval_decimal_format R40
eval_definite_articles_usage R5
eval_explicit_enumeration R22
eval_follows_style_guide R39
eval_group_related_needs_requirements R41
eval_has_appropriate_subject_verb R3
eval_has_combinators R19
eval_has_common_units_of_measure R6
eval_has_correct_grammar R12
eval_has_escape_clauses R8
eval_has_explicit_conditions R27
eval_has_explicit_conditions_for_single_action R28
eval_has_indefinite_temporal_keywords R35
eval_has_measurable_performance R34
eval_has_no_open_ended_clauses R9
eval_has_no_superfluous_infinitives R10
eval_has_range_of_values R33
eval_has_separate_clauses_for_conditions R11
eval_has_supporting_diagram_or_model_reference R23
eval_is_active_voice R2
eval_is_independent_of_heading R25
eval_is_sing

In [14]:
original_dict_keys = set(eval_config.keys())
keys_to_keep = {'eval_acronym_consistency', 'eval_has_escape_clauses', 'eval_has_combinators'}
eval_config_filtered = {key: eval_config[key] for key in keys_to_keep if key in original_dict_keys}

In [15]:
eval_config_filtered

{'eval_acronym_consistency': <function src.components.prompteval.eval_acronym_consistency(text: str, acronym_glossary: dict = {}) -> bool>,
 'eval_has_combinators': <function src.components.prompteval.eval_has_combinators(text: str) -> bool>,
 'eval_has_escape_clauses': <function src.components.prompteval.eval_has_escape_clauses(text: str) -> bool>}

In [16]:
import pandas as pd

In [17]:
def call_evals(df: pd.DataFrame, col: str, eval_config: dict = eval_config) -> pd.DataFrame:
    """
    Run evaluations for each row in the DataFrame.
    
    Args:
        df: DataFrame containing requirements
        col: Column containing requirement text
        
    Returns:
        DataFrame with evaluation results
    """
    result_df = df.copy()
    
    # Run evaluations for each row
    for idx, row in result_df.iterrows():
        for eval_name, eval_func in eval_config.items():
            eval_result = eval_func(row[col])
            result_df.loc[idx, eval_name] = pe.convert_bool_to_ohe(eval_result)
    return result_df

In [18]:
df = pd.DataFrame(data={'requirement': 'The system shall not contain a check for overpressure and underpressure.'}, index=[0])

In [19]:
eval_df = call_evals(df, 'requirement', eval_config_filtered)

In [20]:
eval_df

Unnamed: 0,requirement,eval_acronym_consistency,eval_has_combinators,eval_has_escape_clauses
0,The system shall not contain a check for overp...,1.0,0.0,1.0


In [None]:
def map_A_to_B(list_of_A:list, mapdict_AB:dict) -> list:
    """This function takes in a list (call as A) and a 
    dictionary who keys include elements of the list A. Using
    this dictionary and the input list A, an output list is generaed
    where the original elements of the list A are mapped to the 
    values of the dictionary provided for each key.
    
    Args:
        list_of_A (list): an input list
        mapdict_AB (dict): a dictionary with keys corresponding to the input list elements
    """
    return [*map(mapdict_AB.get, list_of_A)]

def get_failed_evals(df: pd.DataFrame) -> pd.DataFrame:
    """
    Identify failed evaluations for each requirement.
    
    Args:
        df: DataFrame with evaluation results
        
    Returns:
        DataFrame with added 'failed_evals' column
    """
    result_df = df.copy()
    eval_cols = [c for c in result_df.columns if c.startswith("eval")]
    
    result_df['failed_evals'] = result_df[eval_cols].apply(
        lambda row: [eval_cols[i] for i, val in enumerate(row) if val == 0], 
        axis=1
    )
    
    return result_df

def map_failed_evals_to_rule_ids(
    df: pd.DataFrame, 
    eval_to_rule_map: Dict[str, str]
) -> pd.DataFrame:
    """
    Map failed evaluations to rule IDs.
    
    Args:
        df: DataFrame with failed evaluations
        eval_to_rule_map: Mapping from evaluation names to rule IDs
        
    Returns:
        DataFrame with added 'failed_evals_rule_ids' column
    """
    result_df = df.copy()
    
    result_df['failed_evals_rule_ids'] = result_df['failed_evals'].apply(
        lambda eval_list: map_A_to_B(eval_list, eval_to_rule_map) if isinstance(eval_list, list) else None
    )
    
    return result_df

def check_for_failed_evals(df):
    """
    Checks whether any evaluations were failed and if so, then provides a boolean response used to run the review prompt.
    """
    pass

In [22]:
# map eval funcs to rule group to find which rule groups need to be revised through prompts
eval_df = get_failed_evals(eval_df)

In [23]:
eval_df

Unnamed: 0,requirement,eval_acronym_consistency,eval_has_combinators,eval_has_escape_clauses,failed_evals
0,The system shall not contain a check for overp...,1.0,0.0,1.0,[eval_has_combinators]


In [26]:
mapping = config["SECTION_4_RULE_GROUPS"]

In [27]:
mapping

{'eval_acronym_consistency': 'Uniformity_Of_Language',
 'eval_avoid_pronouns': 'Completeness',
 'eval_avoids_absolutes': 'Realism',
 'eval_avoids_parentheses': 'Abstraction',
 'eval_avoids_purpose_phrases': 'Abstraction',
 'eval_avoids_vague_terms': 'Accuracy',
 'eval_consistent_terms_and_units': 'Quantification',
 'eval_correct_punctuation': 'Uniformity_Of_Language',
 'eval_correct_spelling': 'Uniformity_Of_Language',
 'eval_decimal_format': 'Uniformity_Of_Language',
 'eval_definite_articles_usage': 'Conditions',
 'eval_explicit_enumeration': 'Abstraction',
 'eval_follows_style_guide': 'Uniformity_Of_Language',
 'eval_group_related_needs_requirements': 'Modularity',
 'eval_has_appropriate_subject_verb': 'Non_Ambiguity',
 'eval_has_combinators': 'Abstraction',
 'eval_has_common_units_of_measure': 'Non_Ambiguity',
 'eval_has_correct_grammar': 'Accuracy',
 'eval_has_escape_clauses': 'Conditions',
 'eval_has_explicit_conditions': 'Conditions',
 'eval_has_explicit_conditions_for_single_act

In [30]:
eval_df = map_failed_evals_to_rule_ids(eval_df, eval_to_rule_map=mapping)

In [31]:
eval_df

Unnamed: 0,requirement,eval_acronym_consistency,eval_has_combinators,eval_has_escape_clauses,failed_evals,failed_evals_rule_ids
0,The system shall not contain a check for overp...,1.0,0.0,1.0,[eval_has_combinators],[Abstraction]
