<a href="https://colab.research.google.com/github/asantos2000/master-degree-santos-anderson/blob/main/code/src/chap_6_semantic_annotation_rules_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Semantic Annotation - rules classification

Classify elements according to the taxonomy and get templates for transformation.

Chapter 6. Ferramentas de suporte
- Section 6.2 Implementação dos principais componentes
  - Section 6.2.3 Anotações semânticas
    - Section Algoritmo "taxonomy classification and templates"

## Google colab

In [1]:
%load_ext autoreload
%autoreload 2

import sys
import os

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
  from google.colab import drive
  drive.mount('/content/drive')
  !rm -rf cfr2sbvr configuration checkpoint
  !git clone https://github.com/asantos2000/master-degree-santos-anderson.git cfr2sbvr
  %pip install -r cfr2sbvr/code/requirements.txt
  !cp -r cfr2sbvr/code/src/configuration .
  !cp -r cfr2sbvr/code/src/checkpoint .
  !cp -r cfr2sbvr/code/config.colab.yaml config.yaml
  DEFAULT_CONFIG_FILE="config.yaml"
else:
  DEFAULT_CONFIG_FILE="../config.yaml"

## Imports

In [2]:
# Standard library imports
from collections import defaultdict
import json
from typing import List, Dict, Any
from itertools import islice

# Third-party libraries
from pydantic import BaseModel, Field

# Local application/library-specific imports
import checkpoint.main as checkpoint
from checkpoint.main import (
    restore_checkpoint,
    save_checkpoint,
    Document,
    DocumentProcessor,
)
import configuration.main as configuration
import logging_setup.main as logging_setup
import token_estimator.main as token_estimator
from token_estimator.main import estimate_tokens
import rules_taxonomy_provider.main as rules_taxonomy_provider
from rules_taxonomy_provider.main import RuleInformationProvider
import llm_query.main as llm_query
from llm_query.main import query_instruct_llm

DEV_MODE = True

if DEV_MODE:
    # Development mode
    import importlib

    importlib.reload(configuration)
    importlib.reload(logging_setup)
    importlib.reload(checkpoint)
    importlib.reload(token_estimator)
    importlib.reload(rules_taxonomy_provider)
    importlib.reload(llm_query)

## Settings

Default settings, check them before run the notebook.

### Get configuration

In [3]:
# load config
config = configuration.load_config(DEFAULT_CONFIG_FILE)

### Logging configuration

In [None]:
logger = logging_setup.setting_logging(config["DEFAULT_LOG_DIR"], config["LOG_LEVEL"])

## Checkpoints

Documents, annoted datasets, statistics and metrics about the execution of the notebook are stored by checkpoint module.

Checkpoints are stored / retrieved at the directory `DEFAULT_CHECKPOINT_FILE` in the configuration file.

During the execution, it will restore the checkpoint at the beginning of the section and saved at the end. We can run and restore the checkpoint several times. If the run fails, check the closest checkpoint and restore it.

Restore the checkpoint

In [None]:
# To run after extraction
last_checkpoint = configuration.get_last_filename(config["DEFAULT_CHECKPOINT_DIR"], "documents", "json")

logger.info(f"{last_checkpoint=}")

config["DEFAULT_CHECKPOINT_FILE"] = last_checkpoint

manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

## General functions and data structures

LLM response model for P1

In [6]:
class Classification(BaseModel):
    type: str = Field(..., description="Type of the rule (e.g., Party, Data, Activity)")
    confidence: float = Field(..., ge=0, le=1, description="Confidence level of the classification")
    explanation: str = Field(..., description="Explanation of why the classification was made")


class StatementClassification(BaseModel):
    doc_id: str = Field(..., description="Document ID associated with the statement")
    statement_id: str = Field(..., description="A provided string that identifies the statement. e.g., '1', 'Person'")
    statement_title: str = Field(..., description="The title of the statement")
    statement_text: str = Field(..., description="The statement to be classified")
    statement_sources: List[str] = Field(..., description="List of statement's source")
    classification: List[Classification] = Field(..., description="List of classifications with explanations")

class StatementClassifications(BaseModel):
    StatementClassifications: List[StatementClassification]

LLM response model for P2

In [7]:
class SubClassification(BaseModel):
    subtype: str = Field(..., description="Subtype of the rule. The title of the section/subsection.")
    templates_ids: List[str] = Field(..., description="List of template IDs that matched the statement.")
    confidence: float = Field(..., ge=0, le=1, description="Confidence level of the classification")
    explanation: str = Field(..., description="Explanation of why the classification was made")


class StatementSubClassification(BaseModel):
    doc_id: str = Field(..., description="Document ID associated with the statement")
    statement_id: str = Field(..., description="A provided string that identifies the statement. e.g., '1', 'Person'")
    statement_title: str = Field(..., description="The title of the statement")
    statement_text: str = Field(..., description="The statement to be classified")
    statement_sources: List[str] = Field(..., description="List of statement's source")
    classification: List[SubClassification] = Field(..., description="List of classifications with explanations")

class StatementSubClassifications(BaseModel):
    StatementSubClassifications: List[StatementSubClassification]


Prompts

In [8]:
def classify_and_add_prompts(element_type, rule_type, rules_to_classify, manager):
    user_prompt_classify = []
    system_prompt_classify = []

    # Helper to split list into batches with up to 15 items each
    def batch(iterable, max_batch_size):
        iterator = iter(iterable)
        while True:
            batch_list = list(islice(iterator, max_batch_size))
            if not batch_list:
                break
            yield batch_list

    # Process rules in batches of up to 15 items
    for batch_num, batch_rules in enumerate(batch(rules_to_classify, 5)):
        logger.info(f"Processing batch {batch_num + 1} with {len(batch_rules)} items")
        
        # Group by 'statement_type' within each batch
        grouped_data = defaultdict(list)
        for item in batch_rules:
            grouped_data[item["statement_type"]].append(item)

        # Process each statement type within the batch
        for statement_type, items in grouped_data.items():
            # print(10 * "cap ")
            # print(len(batch_rules), element_type, rule_type, statement_type)
            # print(10 * "cap ")
            system_prompt = get_system_prompt_classify_p2(len(batch_rules), element_type, rule_type, statement_type)
            system_prompt_classify.append(system_prompt)
            user_prompt = get_user_prompt_classify(items)
            user_prompt_classify.append(user_prompt)
            
            # Log token counts for the current batch and statement type
            logger.info(f"Batch {batch_num + 1} - token count system prompt {statement_type}: {estimate_tokens(system_prompt)}")
            logger.info(f"Batch {batch_num + 1} - token count user prompt {statement_type}: {estimate_tokens(user_prompt)}")

            # Add documents to the manager with batch and statement type info
            manager.add_document(
                Document(
                    id=f"prompt-classify_P2_{element_type.replace(' ', '_')}_batch{batch_num + 1}_{statement_type}",
                    type="prompt",
                    content=system_prompt
                )
            )
    
    return user_prompt_classify, system_prompt_classify


LLM

In [9]:
def classify_and_get_templates(
    element_name, part, user_prompt_classify, system_prompt_classify, manager
):
    # Log the start of the process
    logger.info(
        f"{part}. processing {len(system_prompt_classify)} prompts..."
    )
    logger.info(f"{part}. Classifying and getting templates for rules...")

    # Initialize an empty list to accumulate all responses
    all_responses_classify = []

    # Loop through each pair of user and system prompts with a counter
    for index, (user_prompt, system_prompt) in enumerate(zip(user_prompt_classify, system_prompt_classify), start=1):
        logger.info(f"Processing classification and templates prompt {index} ...")
        logger.debug(system_prompt)
        logger.debug(user_prompt)

        # Query the language model
        response_classify, completion, elapse_time = query_instruct_llm(
            system_prompt=system_prompt,
            user_prompt=user_prompt,
            document_model=StatementSubClassifications,
            llm_model=config["LLM"]["MODEL"],
            temperature=config["LLM"]["TEMPERATURE"],
            max_tokens=config["LLM"]["MAX_TOKENS"],
        )

        logger.info(response_classify.StatementSubClassifications)
        logger.info(f"{len(response_classify.StatementSubClassifications)} rules classified and templates obtained.")

        # Accumulate the responses in the list
        all_responses_classify.extend(response_classify.StatementSubClassifications)

        logger.info(f"Finished processing classification and templates prompt {index}.")

    # After the loop, create a single Document with all the accumulated responses
    if all_responses_classify:
        doc = Document(
            id=f"classify_{part}_{element_name.replace(' ', '_')}",
            type="llm_response_classification",
            content=all_responses_classify,
            elapsed_times=[elapse_time],
            completions=[completion.dict()],
        )
        manager.add_document(doc)

    return all_responses_classify


Utils

In [10]:
def check_missing_keys(elements: List[Dict[str, Any]], keys: List[str], dataset_name: str) -> None:
    """Check for missing keys in a list of dictionaries and print status for a dataset."""
    missing_found = False  # Flag to track if any key is missing
    for element in elements:
        for key in keys:
            if not element.get(key):
                logger.info(f"Dataset '{dataset_name}' - Missing key: '{key}' in element: {element}")
                missing_found = True

    # Print message based on whether keys are missing or not
    if not missing_found:
        logger.info(f"Dataset '{dataset_name}' - All elements have the required keys.")


## Datasets

Datasets used in the notebook.

### True tables

True tables are annotated or "golden" datasets in which entities have been manually identified and labeled within the original source data.

True tables for sectiona 275.0-2, 275.0-5 and 275.0-7

Load true table for P1 - Taxonomy Classification - top level and P2 - Taxonomy Classification - sub levels

In [11]:
with open(f"{config['DEFAULT_DATA_DIR']}/documents_true_table.json", 'r') as file:
    data = json.load(file)

    manager.add_document(
        Document.model_validate(data["classify_P1|true_table"])
    )

    manager.add_document(
        Document.model_validate(data["classify_P2_Operative_rules|true_table"])
    )

    manager.add_document(
        Document.model_validate(data["classify_P2_Definitional_terms|true_table"])
    )

    manager.add_document(
        Document.model_validate(data["classify_P2_Definitional_names|true_table"])
    )

    manager.add_document(
        Document.model_validate(data["classify_P2_Definitional_facts|true_table"])
    )

Save the checkpoint

In [None]:
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

## Prompt engeneering

#### Taxonomy classification and templates for definitional and operative rules

Instructions for classify rules using the top level of Witt (2012) taxonomy.

In examining the classification of terms, names, fact types, and rules within business systems, key distinctions arise between definitional and operative rules, as well as in the structuring of term definitions. Terms, names, and fact types are foundational elements in the taxonomy of definitional rules, though they are not rules themselves. Instead, they serve as the core vocabulary for creating precise definitions and facilitating the rule-making process within an organization. Terms represent general concepts or classes, names uniquely identify specific instances or entities, and fact types capture relationships between terms. Together, these elements enable the consistent and unambiguous use of language across rule statements, models, and documentation, ensuring that definitional and operative rules function cohesively.

Definitional rules aim to provide precise structures for organizational concepts, establishing clear, logical statements that support consistent interpretation and application of business language. These rules formalize term definitions, establish categorization schemes, and delineate relationships using fact types, which are classified by their structure as unary, binary, or higher-order and allow organizations to express relationships from simple Boolean properties to complex multi-term associations. When structured through definitional rule templates, fact types enhance clarity, facilitate shared understanding, and ensure accurate application within organizational rules, while supporting a conceptual backbone for business definitions.

Names function as unique identifiers for specific instances or entities, adding precision to rule statements that require exact identification. Embedded within definitional rules, names provide specific references crucial for rules that depend on individual entities, distinguishing them from broader terms and reducing ambiguity in complex business contexts. Through structured templates within the definitional rule taxonomy, organizations can integrate names with uniformity and clarity, allowing for reliable reference to distinct entities within rule statements and models. 

Operative rules, on the other hand, govern actionable requirements, setting conditions under which actions must or must not occur. These include data, activity, and party rules, each ensuring compliance, standardization, and procedural integrity across business processes. Operative rules provide the necessary conditions for maintaining organizational consistency, detailing what actions are authorized in particular circumstances or specifying roles within defined tasks, thereby aligning processes with business objectives.

To formalize term definitions, statements must adhere to specific templates within the definitional rule taxonomy, which clearly articulate each term's scope, meaning, and responsibilities. For instance, the term "Commission" can be formally defined within Template T7, part of the definitional rule taxonomy, to clarify its procedural functions as the entity that receives and forwards legal documents. By using templates for definitional rules, organizations achieve consistent, unambiguous documentation of terms, fact types, and names, which minimizes ambiguity and ensures standardized interpretation across all business contexts. This structured approach supports precise rule governance and enhances communication within organizational processes, reinforcing the integrity and reliability of rule-based systems.

System prompt (Instructions) for classify operative rules using the top level of Witt (2012) taxonomy.

In [13]:
def get_system_prompt_classify_p1():
    return """
You are an expert in SBVR (Semantics of Business Vocabulary and Business Rules).

You are working for regulatory bodies, auditors, or process managers.

You will be provided with a list of statements formatted as JSON.

Your task is to classify each statement into one or more Operative Rules types according to the given definitions.

# Steps

1. **Summarize statement**: Summarize the given statement to understand its structure and content.

2. **Classify statement**: Classify each Operative Rule statement into one or more of the provided rule types. The **Operative rules** govern actions or constraints that must or must not happen under certain conditions, such as Data Rules, Activity Rules, and Party Rules. types to classify are:

- **Party rules**: A "Party rule" is a type of operative rule that establishes distinctions or constraints involving parties or the roles they perform. To identify a party rule, it is important to recognize its defining characteristics, as these rules often specify who can carry out certain activities, access particular information, or hold specific responsibilities. Party rules may include restrictions on who is permitted to perform specific roles or processes. For example, a rule might state that a person can serve as the pilot in command only if they hold a current command endorsement. Additionally, these rules may enforce role separation to prevent conflicts of interest, such as a requirement that the cabin crew member verifying an aircraft door's disarmed status cannot be the same individual who initially disarmed it. In other cases, party rules may require role binding, ensuring continuity by stipulating that the consultant who signs a quality review report must be the one who conducted the review. Party rules can also govern information access, specifying who is authorized to view, create, or modify certain data. For instance, a rule might state that an employee’s leave record can only be accessed by the employee, their supervisor, or a human resources officer. Furthermore, responsibility rules fall under this category by defining accountability for specific actions or obligations, such as requiring the receiving parties in a property transfer to pay the associated stamp duty. These rules can be identified by linking actions or processes (predicates) to subjects, like roles or data, while applying conditions that qualify or limit their application. 
- **Data rules**: A "Data rule" imposes constraints or requirements on the data used in transactions, records, or systems. Identifying a data rule involves analyzing its structure, purpose, and type, which include cardinality, content, and update rules. Data cardinality rules govern the presence and multiplicity of data items. These may include mandatory rules requiring data items, such as specifying at least one passenger name in a flight booking confirmation. They can also restrict data, such as ensuring a one-way flight booking does not include a return date, or enforce limits on the number of data instances in a transaction. Data content rules regulate the values within data items. Examples include value set rules, which require a data item to match one of a specified set of valid values, and range rules that constrain a data item's value to within a specific range. Equality rules ensure consistency between related data items, such as requiring that an origin city matches the corresponding booking request. Additionally, uniqueness constraints ensure that a data item's value does not duplicate within a dataset, and consistency rules maintain logical relationships between multiple data points. Lastly, data update rules constrain modifications to existing data. These rules may prohibit updates entirely, restrict the scope of permissible changes (such as maintaining valid state transitions), or enforce monotonic trends like numeric values that can only increase or decrease. To identify a data rule, it is essential to examine the specific constraints or requirements applied to data items, their interrelationships, and their contexts within a given system or transaction. Templates and formalized structures, can aid in distinguishing these rules effectively.
- **Activity rules**: An "Activity rule" is an operative rule designed to constrain the operation of business processes or activities. Identifying an activity rule involves understanding its subcategories, which define how activities are regulated or mandated. The primary types of activity rules are activity restriction rules, activity obligation rules, and process decision rules. Activity restriction rules are used to place limitations on when or under what conditions an activity can occur. For example, time-based restrictions may stipulate that online check-in for a flight can only occur within a specific time window, such as the 24 hours before departure. Similarly, exclusion period rules prohibit activities during certain times, such as a restriction on operating machinery during nighttime hours. Activity pre-condition rules ensure that an event must occur before another activity can take place, such as requiring passengers to complete a security screening before boarding. Activity obligation rules, on the other hand, specify activities that must be performed either within a maximum time after a triggering event or as soon as practical. For instance, acknowledging an order may be mandated to occur within 24 hours of receipt. These rules enforce timely action and compliance with operational requirements. Process decision rules determine actions in response to specific situations. These rules guide devices or processes, such as ensuring that a ticket barrier retains invalid tickets to prevent misuse. To identify an activity rule, it is essential to look for statements that define constraints, obligations, or decision-making criteria for activities. Such rules are often structured using specific templates that articulate the conditions, timeframes, or triggers associated with the activity. Recognizing these elements helps in categorizing activity rules effectively.

2. Assess a **confidence level** for each classification between 0 and 1. Assign confidence scores to each class for the given statement, ensuring that no two classes receive the same score. If one class is assigned a score (e.g., 0.6), the others must have distinct values that are either higher or lower. The scores should reflect the likelihood of each class while avoiding ties.

3. **Explain classification**: You also need to record a confidence level for each classification and provide an explanation for why the classification was made.

4. **Repeat for each statement**: Repeat the process for each statement in the list.

5. **Output format**: Your output must also be in JSON format. It should contain, for each statement:

- The `doc_id`
- The `statement_id`
- The `statement_title`
- The original `statement_text`
- The `statement_sources` of the statement
- A list of classifications (`classification`), each containing:
  - The `type` of the rule.
  - The `confidence` in your classification.
  - An `explanation` detailing why you made the classification decision.

Here is an example of the expected output:

```
[
    {
        "doc_id": "some doc id",
        "statement_id": "some id",
        "statement_title": "some title",
        "statement_text": "some text",
        "statement_sources": ["some source"],
        "classification": [
            {
                "type": "Activity rules",
                "confidence": 0.9,
                "explanation": "This statement defines ..."
            },
            {
                "type": "Party rules",
                "confidence": 0.2,
                "explanation": "There is little reference ..."
            },
            ...
        ]
    },
    {
        "doc_id": ...,
        "statement_id": ...,
        "statement_title": ...,
        "statement_text": ...,
        "statement_sources": ...,
        "classification": ...
    }
]
```

# Notes
- **Detail the Reasoning**: Make sure to provide explanations that justify why a particular rule type was chosen.
- **Confidence Values**: The confidence value should genuinely represent how strongly you believe the classification is correct, with 1 being an absolute match and 0 meaning unlikely.

Make sure that every statement is analyzed thoroughly, and the final justification for each classification is straightforward and adequately supports both the type choice and confidence level.
"""

User prompt including the list of statements.

In [14]:
def get_user_prompt_classify(rules_to_classify):
    return f"""
# Classification Task:

Analyze the following statements based on the above guidelines:

{json.dumps(rules_to_classify, indent=2)}
"""

System prompt (Instructions) for classify definitional and operative rules using the sub levels of Witt (2012) taxonomy.

- element_type: term, name, fact, operative rule
- rule_type: Definitional, Activity, Data, Party


In [15]:
def get_system_prompt_classify_p2(element_count, element_type, rule_type, statement_type):
    rule_information_provider = RuleInformationProvider(config["DEFAULT_DATA_DIR"])

    subclassification_text = rule_information_provider.get_classification_and_templates(f"{statement_type}")

    return f"""
You are an expert in **SBVR (Semantics of Business Vocabulary and Business Rules)**, working for regulatory bodies, auditors, or process managers.

Your task is to classify {element_count} {element_type}(s) from a provided list into one or more **{rule_type} Rule subtypes**, explain each classification in detail, and assign a confidence score ranging from 0 to 1.

# Approach:
Use the **{rule_type} Rule subtype definitions**, associated templates, and guidelines to analyze each statement thoroughly, ensuring accurate classification.

---

# Steps

1. **Classify the Statement**:
   - For each `statement_text`, determine its rule subtype according to the provided **{rule_type} Rule subtypes** and their corresponding templates.
   - Use the provided templates, definitions, and examples to match the statement to the correct subtype.
   - If the statement does not align with a high-level type, analyze the sublevels.
   - The subtype to be used starts with "subtype: <subtype name>".
   - You should assign a subtype and a template ID, make your best guess, justify your choice, and lower the confidence level if necessary.
   - Templates and examples help identify subtypes.

2. **Assign Confidence Level**:
   - Assign a confidence score between **0 and 1** for each classification:
     - **1** indicates a strong and clear match.
     - Lower scores reflect weaker matches due to ambiguities or partial alignment.
   - Consider both template alignment and the clarity of the statement's intent when assigning scores.
   - Assign confidence scores to each class for the given statement, ensuring that no two classes receive the same score. If one class is assigned a score (e.g., 0.6), the others must have distinct values that are either higher or lower. The scores should reflect the likelihood of each class while avoiding ties.

3. **Provide an Explanation**:
   - Provide a concise yet detailed explanation for the assigned classification.
   - Justify the classification by referencing:
     - Template structure.
     - Terminology used in the statement.
     - Specific conditions or context highlighted by the statement.
   - Explicitly map the elements of the statement (e.g., terms, qualifying clauses, verb phrases, conditional clauses, etc.) to template components.

---

# {rule_type} Rule subtypes

{subclassification_text}

---

# Definitions
- **attribute term**: A term that signifies a non-Boolean property of an entity class (or object class).
- **role term**: A term that signifies the role played by one of the participating parties or objects in a relationship: for example, employer and employee are role terms (with respect to the relationship whereby an organization employs a person), whereas organization and person are not role terms.
- **category attribute term**: A term is usually admin-defined, with some external inputs. They have unique labels (e.g., 'Cash') and may use internal codes. Boolean attributes indicate "Yes" or "No" responses, shown as checkboxes or "Y/N" fields.
- **quantitative attribute**: An attribute on which some arithmetic can be performed (e.g., addition, subtraction) and on which comparisons other than "=" and "<>" can be performed.
- **qualifying clause**: refines a rule's scope or specificity by limiting the subject or other terms to particular subsets or conditions (e.g., “for a return journey” or “that is current”).

# Output Format:

Each analyzed statement must be provided in JSON format. The structure for each statement is as follows:

```json
{{
    "doc_id": "The Document ID from the input",
    "statement_id": "The original statement ID",
    "statement_title": "The original statement_title",
    "statement_text": "The original statement_text",
    "statement_sources": "The original statement_sources",
    "classification": [
        {{
            "subtype": "Assigned rule subtype, use the title of the section/subsection (e.g., Activity time limit rules)",
            "templates_ids": ["Template ID that matched the statement."],
            "confidence": Confidence Score (0-1),
            "explanation": "Detailed explanation of why this classification was assigned."
        }}
    ]
}}
```

---
    
## Example Output:

```
[
    {{
        "doc_id": "some doc id",
        "statement_id": "some id",
        "statement_title": "some title",
        "statement_text": "some text",
        "statement_sources": ["some source"],
        "classification": [
            {{
                "subtype": "Some Subtype Title",
                "templates_ids": ["T123", "T456"],
                "confidence": 0.9,
                "explanation": "This statement ..."
            }},
            {{
                "subtype": "Another Subtype Title",
                "templates_ids": ["T789"],
                "confidence": 0.4,
                "explanation": "There are elements ..."
            }}
        ]
    }},
    {{
        "doc_id": "another doc id",
        "statement_id": "another id",
        "statement_title": "another title",
        "statement_text": "another text",
        "statement_sources": ["another source"],
        "classification": [
            {{
                "subtype": "Subtype Title",
                "templates_ids": ["T123"],
                "confidence": 0.7,
                "explanation": "The clause dictates ..."
            }}
        ]
    }},
    ...
]
```

---

# Additional Notes:
- **Multiple Classifications**:
   - A statement can have multiple classifications if it aligns with different subtypes. Justify each with appropriate confidence levels.
- **Cross-References**:
   - When a statement refers to another section (e.g., "(a)(1)"), incorporate the referenced section if it is provided or available. If unavailable, indicate this in the explanation and lower the confidence score.

---

"""

## Execution

Restore the checkpoint

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

### Operative Rules classification

#### Processing P1 - Classify Operative Rules

User prompt (Data)

In [None]:
processor = DocumentProcessor(manager)

rules_to_classify_p1 = [
    {
        "doc_id": item["doc_id"],
        "statement_id": item["statement_id"],
        "statement_title": item["statement_title"],
        "statement_sources": item["sources"],
        "statement_text": item["statement"]
    }
    for item in processor.get_rules()
]

logger.info(f"Total rules to classify: {len(rules_to_classify_p1)}")

Running classification.

In [None]:
# Part 1 - Classify Operative Rules
user_prompt = get_user_prompt_classify(rules_to_classify_p1)
system_prompt = get_system_prompt_classify_p1()

logger.info("P1. Classifing Operative Rules...")
logger.debug(system_prompt)
logger.debug(user_prompt)
#raise Exception("Stop here")

response_classify_p1, completion_1, elapse_time_1 = query_instruct_llm(
    system_prompt=system_prompt,
    user_prompt=user_prompt,
    document_model=StatementClassifications,
    llm_model=config["LLM"]["MODEL"],
    temperature=config["LLM"]["TEMPERATURE"],
    max_tokens=config["LLM"]["MAX_TOKENS"],
)

logger.debug(response_classify_p1)

doc_1 = Document(
    id="classify_P1",
    type="llm_response_classification",
    content=response_classify_p1.StatementClassifications,
    elapsed_times=[elapse_time_1],
    completions=[completion_1.dict()],
)
manager.add_document(doc_1)

logger.info("Saving checkpoint...")
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

logger.info("Finished processing classification.")

Average 35s.

#### Processing P2 - Classify and get templates

User prompt (Data)

System and user prompt data

Create a prompt for each of the type of rule (Activity, Data, Party) from P1 with the statements for that type.

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

processor = DocumentProcessor(manager)

rules_to_classify_p2 = [
    {
        "doc_id": item["doc_id"],
        "statement_id": item["statement_id"],
        "statement_title": item["statement_title"],
        "statement_sources": item["sources"],
        "statement_text": item["statement"],
        "statement_type": item["type"]
    }
    for item in processor.get_rules()
]

logger.info(f"Total rules to classify: {len(rules_to_classify_p2)}")

Create a prompt for each of the type of rule (Activity, Data, Party) from P1 with the statements for that type.

In [None]:
user_prompt_classify_p2_op_rules, system_prompt_classify_p2_op_rules = (
    classify_and_add_prompts(
        "operative rule", "Operative", rules_to_classify_p2, manager
    )
)

logger.info(f"Prompts to run {len(user_prompt_classify_p2_op_rules)}")

# Persist the state to a file
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

P2. User Prompt to classify Operative Rules and get templates

Processing P2 - Subclassify Operative Rules

Running classification. For each type of rule get response from the LLM.

In [None]:
all_responses_classify_p2_op_rules = classify_and_get_templates(
    "rules",
    "P2_Operative",
    user_prompt_classify_p2_op_rules,
    system_prompt_classify_p2_op_rules,
    manager,
)

logger.info(f"Total responses: {len(all_responses_classify_p2_op_rules)}")

# Persist the state to a file
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

In [None]:
for i, resp in enumerate(all_responses_classify_p2_op_rules):
    print(i, resp.classification)

Average 15s per prompt.

#### Investigating miss classification

Prompt P2

In [23]:
doc_first_attempt_p1 = {
    "doc_id": "\u00a7 275.0-2",
    "statement_id": "4",
    "statement_text": "If the Secretary certifies that the Commission was served with process, pleadings, or other papers pursuant to paragraph (a)(1) of this section and forwarded these documents to a named party pursuant to paragraph (a)(2) of this section, this certification constitutes evidence of service upon that party.",
    "statement_source": "(a)(3)",
    "classification": [
        {
            "type": "Data",
            "confidence": 0.6,
            "explanation": "The statement involves certification, which is a form of data that serves as evidence of service."
        },
        {
            "type": "Activity",
            "confidence": 0.4,
            "explanation": "The statement describes the activity of certifying and forwarding documents, which are part of the process."
        }
    ]
}

In [24]:
doc_first_attempt_p2 = {
    "doc_id": "\u00a7 275.0-2",
    "statement_id": "4",
    "statement_text": "If the Secretary certifies that the Commission was served with process, pleadings, or other papers pursuant to paragraph (a)(1) of this section and forwarded these documents to a named party pursuant to paragraph (a)(2) of this section, this certification constitutes evidence of service upon that party.",
    "statement_source": "(a)(3)",
    "classification": [
        {
            "subtype": "Data content rules",
            "templates_ids": [],
            "confidence": 0.6,
            "explanation": "The statement describes a condition under which a certification constitutes evidence of service. It involves the presence of certain documents and their forwarding, which relates to data content. However, it does not fit neatly into a specific template, leading to a moderate confidence level."
        }
    ]
}

In [25]:
doc_second_attempt_p1 = {
    "doc_id": "\u00a7 275.0-2",
    "statement_id": "4",
    "statement_text": "If the Secretary certifies that the Commission was served with process, pleadings, or other papers pursuant to paragraph (a)(1) of this section and forwarded these documents to a named party pursuant to paragraph (a)(2) of this section, this certification constitutes evidence of service upon that party.",
    "statement_source": "(a)(3)",
    "classification": [
        {
            "type": "Party",
            "confidence": 0.7,
            "explanation": "The statement involves the certification of service to a named party, indicating a role for the Secretary and the parties involved. This aligns with Party rules as it specifies actions related to parties.",
        },
        {
            "type": "Activity",
            "confidence": 0.3,
            "explanation": "The statement describes the certification process as part of a business activity, which aligns with Activity rules as it governs the operation of certifying service.",
        },
    ],
}

In [26]:
doc_second_attempt_p2 = {
    "doc_id": "\u00a7 275.0-2",
    "statement_id": "4",
    "statement_text": "If the Secretary certifies that the Commission was served with process, pleadings, or other papers pursuant to paragraph (a)(1) of this section and forwarded these documents to a named party pursuant to paragraph (a)(2) of this section, this certification constitutes evidence of service upon that party.",
    "statement_source": "(a)(3)",
    "classification": [
        {
            "subtype": "Responsibility rules",
            "templates_ids": ["T56"],
            "confidence": 0.7,
            "explanation": "This statement involves the certification process by the Secretary, which serves as evidence of service. It defines a responsibility related to the certification of document forwarding, fitting the responsibility rule subtype.",
        }
    ],
}

##### Conclusion

The classification of the given statement initially had a lower confidence level because it could be interpreted from two distinct perspectives: a **party rule** or a **data consistency rule**. A **party rule** focuses on procedural governance and accountability, emphasizing the roles, responsibilities, and authority of individuals—in this case, the Secretary, whose certification serves as evidence of service. This perspective is crucial when considering formal roles, defining valid actions, and ensuring compliance, which makes it more fitting for scenarios involving regulatory oversight, accountability, and process management.

The **data consistency rule** perspective, on the other hand, considers the relationship between different data elements—here, the certification and document forwarding—and aims to ensure logical consistency between these elements. This classification fits more technical or data-centric contexts, such as ensuring the integrity of records. However, with the prompt specifying a focus on regulatory bodies, auditors, or process managers, the interpretation naturally shifts toward the **party rule** classification. Consequently, the confidence in this classification increases, as it directly aligns with the needs of those focused on governance, responsibility, and ensuring procedural correctness.

According to Opsahl-Ong et al. (2024) small changes in the prompt affect the outcome because language models are highly sensitive to the structure and context of the instructions. Minimal alterations in word choice, sentence order, or tone can lead the model to interpret the task differently, as it relies on statistical correlations and previously observed patterns. In multi-stage pipelines, this is even more critical, as a change in one module can create a cascading effect, impacting the results of subsequent stages. This sensitivity makes precise prompt adjustments essential to achieve the desired performance in specific tasks.

### Definitional rules classification

#### Terms

Selecting terms to classify. Terms without definition will be skipped.

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

processor = DocumentProcessor(manager, merge=True)

terms_to_classify_p2 = [
    {
        "doc_id": item["doc_id"],
        "statement_id": item["statement_id"],
        "statement_sources": item["sources"],
        "statement_text": item["definition"],
        "statement_type": "Definitional rules"
    }
    for item in processor.get_terms(definition_filter="non_null")
]

logger.info(f"Number of terms to classify: {len(terms_to_classify_p2)}")

Preparing prompts.

In [None]:
user_prompt_classify_p2_terms, system_prompt_classify_p2_terms = (
    classify_and_add_prompts("term", "Definitional", terms_to_classify_p2, manager)
)

Running classification.

In [None]:
all_responses_classify_terms = classify_and_get_templates(
    "terms",
    "P2_Definitional",
    user_prompt_classify_p2_terms,
    system_prompt_classify_p2_terms,
    manager,
)

logger.info(f"Total responses: {len(all_responses_classify_terms)}")

# Persist the state to a file
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

In [None]:
for i, resp in enumerate(all_responses_classify_terms):
    print(i, resp.classification)

Avarage 45s per prompt.

#### Names

Selection names to classify. Names without definition will be skipped.

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

processor = DocumentProcessor(manager, merge=True)

names_to_classify_p2 = [
    {
        "doc_id": item["doc_id"],
        "statement_id": item["statement_id"],
        "statement_sources": item["sources"],
        "statement_text": item["definition"],
        "statement_type": "Definitional rules"
    }
    for item in processor.get_names(definition_filter="non_null")
]

logger.info(f"Number of names to classify: {len(names_to_classify_p2)}")

In [None]:
user_prompt_classify_p2_names, system_prompt_classify_p2_names = (
    classify_and_add_prompts("name", "Definitional", names_to_classify_p2, manager)
)

logger.info(f"Prompts to run {len(user_prompt_classify_p2_names)}")

Running classification.

In [None]:
all_responses_classify_p2_names = classify_and_get_templates(
    "names",
    "P2_Definitional",
    user_prompt_classify_p2_names,
    system_prompt_classify_p2_names,
    manager,
)

logger.info(f"Total responses: {len(all_responses_classify_p2_names)}")

# Save the checkpoint after adding the combined document
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

Average time 40s per prompt.

#### Fact / Fact Types

Selecting facts to classify.

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

processor = DocumentProcessor(manager)

facts_to_classify_p2 = [
    {
        "doc_id": item["doc_id"],
        "statement_id": item["statement_id"],
        "statement_title": item["statement_title"],
        "statement_sources": item["sources"],
        "statement_text": item["statement"],
        "statement_type": "Definitional rules"
    }
    for item in processor.get_facts()
]

logger.info(f"Number of facts to classify: {len(facts_to_classify_p2)}")

Preparing prompts.

In [None]:
user_prompt_classify_p2_facts, system_prompt_classify_p2_facts = (
    classify_and_add_prompts("fact type", "Definitional", facts_to_classify_p2, manager)
)

logger.info(f"Prompts to run {len(user_prompt_classify_p2_facts)}")

Running classification.

In [None]:
all_responses_classify_p2_facts = classify_and_get_templates(
    "facts", "P2_Definitional", user_prompt_classify_p2_facts, system_prompt_classify_p2_facts, manager
)

logger.info(f"Total responses: {len(all_responses_classify_p2_facts)}")

# Save the checkpoint after adding the combined document
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

Avarage 30s per prompt.

### Check results

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

processor = DocumentProcessor(manager, merge=True)

pred_operative_rules = processor.get_rules()
pred_facts = processor.get_facts()
pred_terms = processor.get_terms()
pred_names = processor.get_names()
pred_terms_with_definitions = processor.get_terms(definition_filter="non_null")
pred_names_with_definitions = processor.get_names(definition_filter="non_null")

logger.debug(f"Rules: {pred_operative_rules}")
logger.debug(f"Facts: {pred_facts}")
logger.debug(f"Terms: {pred_terms}")
logger.debug(f"Names: {pred_names}")
logger.info(f"Rules to evaluate: {len(pred_operative_rules)}")
logger.info(f"Facts to evaluate: {len(pred_facts)}")
logger.info(f"Terms to evaluate: {len(pred_terms)}")
logger.info(f"Names to evaluate: {len(pred_names)}")
logger.info(f"Terms with definitions: {len(pred_terms_with_definitions)}")
logger.info(f"Names with definitions: {len(pred_names_with_definitions)}")

Verify if all elements has templates_ids and subtypes

In [None]:
# Common keys to check
keys = ["templates_ids", "subtype"]

# List of datasets to check with their names
datasets = [
    ("pred_operative_rules", pred_operative_rules),
    ("pred_facts", pred_facts),
    ("pred_terms_with_definitions", pred_terms_with_definitions),
    ("pred_names_with_definitions", pred_names_with_definitions),
]

# Call the function for each dataset
for dataset_name, dataset in datasets:
    check_missing_keys(dataset, keys, dataset_name)


## Discussion

In the first prompt (prompt_classify_p1), the confidence levels assigned reflect the absence of a dominant classification. For example, when a statement primarily restricts data (data rule) but also includes specific roles (party rule), a high level of confidence is assigned to the data while a moderate level of confidence is applied to the party, recognizing its secondary relevance. Similarly, statements that reference roles such as "Secretary" or "interested person" without explicit party restrictions receive moderate confidence for party classification due to ambiguity in interpretation. Procedural elements that impact data handling, such as document routing, receive high confidence for data rules; however, a moderate level of confidence is assigned to activity rules when procedural references are indirect.

An example of this behavior was the classification of the sentence: “If the Secretary certifies that the Commission has received proceedings, petitions, or other documents under paragraph (a)(1) of this section and forwarded those documents to a party designated by paragraph (a)(2) of this section, such certification constitutes evidence of service to that party.” The first classification (classify_P1) had low confidence because the sentence admits two interpretations: party rule or data consistency rule. The role rule focuses on governance and accountability, highlighting the roles of individuals, such as the Secretary, whose certification serves as evidence of service. The data consistency perspective considers the relationship between data elements, evolving the logical consistency between them, such as certification and forwarding of documents.

Initially, the prompt did not specify the focus, which led to the choice of the technical perspective (data) and caused problems in the second classification, where the LLM could not find an appropriate category (sublevel) in the taxonomy. With the addition of the sentence “You are working for regulators, auditors, or process managers.” interpretation has shifted to part rule classification, increasing confidence, consistency in classification, and locating a classification in the taxonomy sublevel (classify_P2_operative_rule) - Responsibility rules.