<a href="https://colab.research.google.com/github/asantos2000/master-degree-santos-anderson/blob/main/code/src/chap_6_semantic_annotation_rules_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Semantic Annotation - rules classification

Classify elements according to the taxonomy and get templates for transformation.

Chapter 6. Ferramentas de suporte
- Section 6.2 Implementação dos principais componentes
  - Section 6.2.3 Anotações semânticas
    - Section Algoritmo "taxonomy classification and templates"

## Google colab

In [1]:
%load_ext autoreload
%autoreload 2

import sys
import os

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
  from google.colab import drive
  drive.mount('/content/drive')
  !rm -rf cfr2sbvr configuration checkpoint
  !git clone https://github.com/asantos2000/master-degree-santos-anderson.git cfr2sbvr
  %pip install -r cfr2sbvr/code/requirements.txt
  !cp -r cfr2sbvr/code/src/configuration .
  !cp -r cfr2sbvr/code/src/checkpoint .
  !cp -r cfr2sbvr/code/config.colab.yaml config.yaml
  DEFAULT_CONFIG_FILE="config.yaml"
else:
  DEFAULT_CONFIG_FILE="../config.yaml"

## Imports

In [2]:
# Standard library imports
from collections import defaultdict
import json
import re
from pathlib import Path
from typing import List

# Third-party libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from pydantic import BaseModel, Field
from sklearn.metrics import confusion_matrix, classification_report
from typing import List, Dict, Optional, Any, Tuple, Set

# Local application/library-specific imports
import checkpoint.main as checkpoint
from checkpoint.main import (
    restore_checkpoint,
    save_checkpoint,
    Document,
    DocumentProcessor,
    get_all_checkpoints,
    get_true_table_files,
)
import configuration.main as configuration
import logging_setup.main as logging_setup
import token_estimator.main as token_estimator
from token_estimator.main import estimate_tokens
import rules_taxonomy_provider.main as rules_taxonomy_provider
from rules_taxonomy_provider.main import RuleInformationProvider, RulesTemplateProvider
import llm_query.main as llm_query
from llm_query.main import query_instruct_llm

DEV_MODE = True

if DEV_MODE:
    # Development mode
    import importlib

    importlib.reload(configuration)
    importlib.reload(logging_setup)
    importlib.reload(checkpoint)
    importlib.reload(token_estimator)
    importlib.reload(rules_taxonomy_provider)
    importlib.reload(llm_query)

## Settings

Default settings, check them before run the notebook.

### Get configuration

In [4]:
# load config
DEFAULT_CONFIG_FILE = "../config.yaml"
config = configuration.load_config(DEFAULT_CONFIG_FILE)

Generated files for analysis in this run

In [5]:
print(config["DEFAULT_CHECKPOINT_FILE"],
config["DEFAULT_EXTRACTION_REPORT_FILE"],
config["DEFAULT_EXCEL_FILE"])

../data/checkpoints/documents-2024-11-09-1.json ../outputs/extraction_report-2024-11-09-1.html ../outputs/compare_items_metrics.xlsx


### Logging configuration

In [6]:
logger = logging_setup.setting_logging(config["DEFAULT_LOG_DIR"], config["LOG_LEVEL"])

2024-11-09 13:38:00 - INFO - Logging is set up with daily rotation.


## Checkpoints

Documents, annoted datasets, statistics and metrics about the execution of the notebook are stored by checkpoint module.

Checkpoints are stored / retrieved at the directory `DEFAULT_CHECKPOINT_FILE` in the configuration file.

During the execution, it will restore the checkpoint at the beginning of the section and saved at the end. We can run and restore the checkpoint several times. If the run fails, check the closest checkpoint and restore it.

### Restore the checkpoint

In [None]:
# Restore the checkpoint

# To run after extraction
last_checkpoint = configuration.get_last_filename(config["DEFAULT_CHECKPOINT_DIR"], "documents", "json")

logger.info(f"{last_checkpoint=}")

config["DEFAULT_CHECKPOINT_FILE"] = last_checkpoint

manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

2024-11-07 14:42:00 - INFO - DocumentManager restored from file: ../data/checkpoints/documents-2024-11-01-3.json
2024-11-07 14:42:00 - INFO - Checkpoint restored from ../data/checkpoints/documents-2024-11-01-3.json.


## Datasets

Datasets used in the notebook. They are divided into sections and true tables. The sections are the documents from CFR and true tables are annoted  or "golden" datasets.

### General functions and data structures

### True tables

True tables are annotated or "golden" datasets in which entities have been manually identified and labeled within the original source data.

True tables for sectiona 275.0-2, 275.0-5 and 275.0-7

Load true table for P1 - Taxonomy Classification - top level and P2 - Taxonomy Classification - sub levels

In [None]:
true_table_files = get_true_table_files(config["DEFAULT_DATA_DIR"])

In [None]:
for item in true_table_files:
    with open(item["path"], 'r') as file:
        data = json.load(file)

        logger.debug(data[item["id"]])
        logger.info(f"Adding {item['id']} true table to the manager")

        manager.add_document(
            Document.model_validate(data[item["id"]])
        )

# Persist the state to a file
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

2024-11-07 15:05:45 - INFO - Adding classify_P1|true_table true table to the manager
2024-11-07 15:05:45 - INFO - Adding classify_P2_Definitional_facts|true_table true table to the manager
2024-11-07 15:05:45 - INFO - Adding classify_P2_Definitional_names|true_table true table to the manager
2024-11-07 15:05:45 - INFO - Adding classify_P2_Definitional_terms|true_table true table to the manager
2024-11-07 15:05:45 - INFO - Adding classify_P2_Operative_rules|true_table true table to the manager


## extract / classify elements

### General functions and data structures

LLM response model for P2

In [10]:
class Classification(BaseModel):
    type: str = Field(..., description="Type of the rule (e.g., Party, Data, Activity)")
    confidence: float = Field(..., ge=0, le=1, description="Confidence level of the classification")
    explanation: str = Field(..., description="Explanation of why the classification was made")


class StatementClassification(BaseModel):
    doc_id: str = Field(..., description="Document ID associated with the statement")
    statement_id: str = Field(..., description="A provided string that identifies the statement. e.g., '1', 'Person'")
    statement_text: str = Field(..., description="The statement to be classified")
    statement_source: str = Field(..., description="Source of the statement")
    classification: List[Classification] = Field(..., description="List of classifications with explanations")

LLM response model for P2

In [11]:
class SubClassification(BaseModel):
    subtype: str = Field(..., description="Subtype of the rule. The title of the section/subsection.")
    templates_ids: List[str] = Field(..., description="List of template IDs that matched the statement.")
    confidence: float = Field(..., ge=0, le=1, description="Confidence level of the classification")
    explanation: str = Field(..., description="Explanation of why the classification was made")


class StatementSubClassification(BaseModel):
    doc_id: str = Field(..., description="Document ID associated with the statement")
    statement_id: str = Field(..., description="A provided string that identifies the statement. e.g., '1', 'Person'")
    statement_text: str = Field(..., description="The statement to be classified")
    statement_source: str = Field(..., description="Source of the statement")
    classification: List[SubClassification] = Field(..., description="List of classifications with explanations")


In [12]:
from itertools import islice
from collections import defaultdict

def classify_and_add_prompts(element_type, rule_type, rules_to_classify, manager):
    user_prompt_classify = []
    system_prompt_classify = []

    # Helper to split list into batches with up to 15 items each
    def batch(iterable, max_batch_size):
        iterator = iter(iterable)
        while True:
            batch_list = list(islice(iterator, max_batch_size))
            if not batch_list:
                break
            yield batch_list

    # Process rules in batches of up to 15 items
    for batch_num, batch_rules in enumerate(batch(rules_to_classify, 15)):
        logger.info(f"Processing batch {batch_num + 1} with {len(batch_rules)} items")
        
        # Group by 'statement_type' within each batch
        grouped_data = defaultdict(list)
        for item in batch_rules:
            grouped_data[item["statement_type"]].append(item)

        # Process each statement type within the batch
        for statement_type, items in grouped_data.items():
            system_prompt = get_system_prompt_classify_p2(len(batch_rules), element_type, rule_type, statement_type)
            system_prompt_classify.append(system_prompt)
            user_prompt = get_user_prompt_classify(items)
            user_prompt_classify.append(user_prompt)
            
            # Log token counts for the current batch and statement type
            logger.info(f"Batch {batch_num + 1} - token count system prompt {statement_type}: {token_estimator.estimate_tokens(system_prompt)}")
            logger.info(f"Batch {batch_num + 1} - token count user prompt {statement_type}: {token_estimator.estimate_tokens(user_prompt)}")

            # Add documents to the manager with batch and statement type info
            manager.add_document(
                Document(
                    id=f"prompt-classify_P2_{element_type.replace(' ', '_')}_batch{batch_num + 1}_{statement_type}",
                    type="prompt",
                    content=system_prompt
                )
            )
    
    return user_prompt_classify, system_prompt_classify


In [13]:
def classify_and_get_templates(
    element_name, part, user_prompt_classify, system_prompt_classify, manager
):
    # Log the start of the process
    logger.info(
        f"{part}. processing {len(system_prompt_classify)} prompts..."
    )
    logger.info(f"{part}. Classifying and getting templates for rules...")

    # Initialize an empty list to accumulate all responses
    all_responses_classify = []

    # Loop through each pair of user and system prompts with a counter
    for index, (user_prompt, system_prompt) in enumerate(zip(user_prompt_classify, system_prompt_classify), start=1):
        logger.info(f"Processing classification and templates prompt {index} ...")
        logger.debug(system_prompt)
        logger.debug(user_prompt)

        # Query the language model
        response_classify = query_instruct_llm(
            system_prompt=system_prompt,
            user_prompt=user_prompt,
            document_model=List[StatementSubClassification],
            llm_model=config["LLM"]["MODEL"],
            temperature=config["LLM"]["TEMPERATURE"],
            max_tokens=config["LLM"]["MAX_TOKENS"],
        )

        logger.debug(response_classify)

        # Accumulate the responses in the list
        all_responses_classify.extend(response_classify)

        logger.info(f"Finished processing classification and templates prompt {index}.")

    # After the loop, create a single Document with all the accumulated responses
    doc = Document(
        id=f"classify_{part}_{element_name.replace(' ', '_')}",
        type="llm_response_classification",
        content=all_responses_classify,
    )
    manager.add_document(doc)

    return all_responses_classify


### Prompt engeneering

#### Taxonomy classification and templates for definitional and operative rules

Instructions for classify rules using the top level of Witt (2012) taxonomy.

In examining the classification of terms, names, fact types, and rules within business systems, key distinctions arise between definitional and operative rules, as well as in the structuring of term definitions. Terms, names, and fact types are foundational elements in the taxonomy of definitional rules, though they are not rules themselves. Instead, they serve as the core vocabulary for creating precise definitions and facilitating the rule-making process within an organization. Terms represent general concepts or classes, names uniquely identify specific instances or entities, and fact types capture relationships between terms. Together, these elements enable the consistent and unambiguous use of language across rule statements, models, and documentation, ensuring that definitional and operative rules function cohesively.

Definitional rules aim to provide precise structures for organizational concepts, establishing clear, logical statements that support consistent interpretation and application of business language. These rules formalize term definitions, establish categorization schemes, and delineate relationships using fact types, which are classified by their structure as unary, binary, or higher-order and allow organizations to express relationships from simple Boolean properties to complex multi-term associations. When structured through definitional rule templates, fact types enhance clarity, facilitate shared understanding, and ensure accurate application within organizational rules, while supporting a conceptual backbone for business definitions.

Names function as unique identifiers for specific instances or entities, adding precision to rule statements that require exact identification. Embedded within definitional rules, names provide specific references crucial for rules that depend on individual entities, distinguishing them from broader terms and reducing ambiguity in complex business contexts. Through structured templates within the definitional rule taxonomy, organizations can integrate names with uniformity and clarity, allowing for reliable reference to distinct entities within rule statements and models. 

Operative rules, on the other hand, govern actionable requirements, setting conditions under which actions must or must not occur. These include data, activity, and party rules, each ensuring compliance, standardization, and procedural integrity across business processes. Operative rules provide the necessary conditions for maintaining organizational consistency, detailing what actions are authorized in particular circumstances or specifying roles within defined tasks, thereby aligning processes with business objectives.

To formalize term definitions, statements must adhere to specific templates within the definitional rule taxonomy, which clearly articulate each term's scope, meaning, and responsibilities. For instance, the term "Commission" can be formally defined within Template T7, part of the definitional rule taxonomy, to clarify its procedural functions as the entity that receives and forwards legal documents. By using templates for definitional rules, organizations achieve consistent, unambiguous documentation of terms, fact types, and names, which minimizes ambiguity and ensures standardized interpretation across all business contexts. This structured approach supports precise rule governance and enhances communication within organizational processes, reinforcing the integrity and reliability of rule-based systems.

System prompt (Instructions) for classify operative rules using the top level of Witt (2012) taxonomy.

In [63]:
def get_system_prompt_classify_p1():
    return """
You are an expert in SBVR (Semantics of Business Vocabulary and Business Rules).

You are working for regulatory bodies, auditors, or process managers.

You will be provided with a list of statements formatted as JSON.

Your task is to classify each statement into one or more Operative Rules types according to the given definitions.

You also need to record a confidence level for each classification and provide an explanation for why the classification was made.

# Classifications
The **Operative rules** Govern actions or constraints that must or must not happen under certain conditions, such as Data Rules, Activity Rules, and Party Rules. types to classify are:
- **Party rules**: A type of operative rule that restrict what parties can perform processes, activities, or play roles. They are operative rules.
- **Data rules**: A type of operative rule that constrain the data included in a transaction (e.g., forms or messages) or a persistent dataset.
- **Activity rules**: A type of operative rule that constrain the operation of one or more business processes or activities.

# Input JSON Format
The statements are provided in the following JSON format:

```
[
    {
        "doc_id": "some doc id",
        "statement_id": "some id",
        "statement_source": "some source",
        "statement_text": "some text",
    }
]
```

# Task Requirements

1. Classify each "statement" into one or more of the provided rule types (Party, Data, Activity).
2. Assess a **confidence level** for each classification between 0 and 1.
3. Provide a **clear explanation** for the classification decision.

# Output Format
Your output must also be in JSON format. It should contain, for each statement:
- The `doc_id`
- The `statement_id`
- The original `statement_text`
- The `statement_source` of the statement
- A list of classifications (`classification`), each containing:
  - The `type` of the rule.
  - The `confidence` in your classification.
  - An `explanation` detailing why you made the classification decision.

Here is an example of the expected output:

```
[
    {
        "doc_id": "some doc id",
        "statement_id": "some id",
        "statement_text": "some text",
        "statement_source": "some source",
        "classification": [
            {
                "type": "Data",
                "confidence": 0.9,
                "explanation": "This statement defines a constaint that mandates the presence of data."
            },
            {
                "type": "Party",
                "confidence": 0.2,
                "explanation": "There is little reference to any restriction on participants or parties, which means this may not be a valid classification."
            },
            ...
        ]
    },
    {
        "doc_id": ...,
        "statement_id": ...,
        "statement_text": ...,
        "statement_source": ...,
        "classification": ...
    }
]
```

# Notes
- **Detail the Reasoning**: Make sure to provide explanations that justify why a particular rule type was chosen.
- **Confidence Values**: The confidence value should genuinely represent how strongly you believe the classification is correct, with 1 being an absolute match and 0 meaning unlikely.

Make sure that every statement is analyzed thoroughly, and the final justification for each classification is straightforward and adequately supports both the type choice and confidence level.
"""

User prompt including the list of statements.

In [62]:
def get_user_prompt_classify(rules_to_classify):
    return f"""
# Here's the list of elements you'll need to classify

{json.dumps(rules_to_classify, indent=2)}
"""

System prompt (Instructions) for classify definitional and operative rules using the sub levels of Witt (2012) taxonomy.

- element_type: term, name, fact, operative rule
- rule_type: Definitional, Activity, Data, Party


In [61]:
def get_system_prompt_classify_p2(element_count, element_type, rule_type, classification):
    rule_information_provider = RuleInformationProvider("../data")

    subclassification_text = rule_information_provider.get_classification_and_templates(f"{classification} rules")        

    return f"""
You are an expert in SBVR (Semantics of Business Vocabulary and Business Rules).

You are working for regulatory bodies, auditors, or process managers.
    
Classify {element_count} {element_type}(s) from the list of elements into one or more {rule_type} Rule subtypes, provide an explanation for each classification, and assign a confidence score between 0 and 1.

Use the {rule_type} Rule subtype definitions, templates, and guidelines provided to perform a thorough analysis of each statement.

# Steps

1. **Identify {rule_type} Rule Subtype(s)**: For each `statement_text` provided, classify the rule type according to the given {rule_type} Rules subtypes and templates.
    - Use provided templates, definitions, and details on each subtype to determine the correct classification.
    - Cross-reference with template examples to accurately determine the appropriate rule subtype.

2. **Assign Confidence Level**: Assess the suitability of each classification by assigning a confidence level between 0 and 1. 
    - 1 indicates a very strong match, while lower numbers indicate weaker matches.

3. **Provide Explanation**: Provide a detailed, but concise, explanation justifying why a given subtype was assigned to the statement.
    - Include reasoning related to the template structure, terminology used, or specific conditions the rule mentions.

# {rule_type} Rule subtypes

{subclassification_text}

## Definitions
- **attribute term**: A term that signifies a non-Boolean property of an entity class (or object class).
- **role term**: A term that signifies the role played by one of the participating parties or objects in a relationship: for example, employer and employee are role terms (with respect to the relationship whereby an organization employs a person), whereas organization and person are not role terms.
- **category attribute term**: A term is usually admin-defined, with some external inputs. They have unique labels (e.g., 'Cash') and may use internal codes. Boolean attributes indicate "Yes" or "No" responses, shown as checkboxes or "Y/N" fields.
- **quantitative attribute**: An attribute on which some arithmetic can be performed (e.g., addition, subtraction) and on which comparisons other than "=" and "<>" can be performed.
- **qualifying clause**: refines a rule's scope or specificity by limiting the subject or other terms to particular subsets or conditions (e.g., “for a return journey” or “that is current”).

# Output Format

The output must be provided in JSON format. Each element of the statement list must contain:
- `doc_id`: The Document ID from the input.
- `statement_id`: The original statement.
- `statement_text`: The original statement_text.
- `statement_source`: The original statement_source.
- `classification`: A list that may contain multiple entries, each of which should have:
    - `subtype`: Assigned rule subtype, use the title of the section/subsection (e.g., "Activity time limit rules").
    - `templates_ids`: A list of template IDs that matched the statement.
    - `confidence`: A float indicating confidence in classification.
    - `explanation`: A textual explanation detailing why the classification was appropriate.

The output JSON should look like this:

```
[
    {{
        "doc_id": "some doc id",
        "statement_id": "some id",
        "statement_text": "some text",
        "statement_source": "some source",
        "classification": [
            {{
                "subtype": "Some Subtype Title",
                "templates_ids": ["T123", "T456"],
                "confidence": 0.9,
                "explanation": "This statement restricts the occurrence of an activity during a specified time. The use of 'must not occur' clearly indicates an Activity time limit rule."
            }},
            {{
                "subtype": "Another Subtype Title",
                "templates_ids": ["T789"],
                "confidence": 0.4,
                "explanation": "There are elements of a participation restriction, but since it isn't clearly specified, the match is weak."
            }}
        ]
    }},
    {{
        "doc_id": "another doc id",
        "statement_id": "another id",
        "statement_text": "another text",
        "statement_source": "another source",
        "classification": [
            {{
                "subtype": "Subtype Title",
                "templates_ids": ["T123"],
                "confidence": 0.7,
                "explanation": "The clause dictates who is allowed to perform the task, indicating a party restriction context."
            }}
        ]
    }},
    ...
]
```

# Notes
- **Detail the Reasoning**: Ensure the explanations refer to template structure, specific terminology used, or matching requirements for the rule subtype.
- **Confidence Assessment**: Be honest about the degree of certainty in your classification, and provide meaningful values for confidence (e.g., if templates do not match perfectly but there are similarities, confidence should be moderate to low).
- **Multiple Classifications**: In cases where one statement seems to fit multiple rule subtypes, include multiple classifications with appropriate confidence ratings and explanations for each.

This output will be used to not only understand classifications but also inform next steps regarding validation and business rule structuring.
"""

### Execution

#### Operative Rules classification

##### Processing P1 - Classify Operative Rules

User prompt (Data)

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

processor = DocumentProcessor(manager)

rules_to_classify_p1 = [
    {
        "doc_id": item["doc_id"],
        "statement_id": item["statement_id"],
        "statement_source": item["source"],
        "statement_text": item["statement"]
    }
    for item in processor.get_rules()
]

logger.info(f"Total rules to classify: {len(rules_to_classify_p1)}")

2024-11-07 00:07:24 - INFO - DocumentManager restored from file: ../data/checkpoints/documents-2024-11-01-3.json
2024-11-07 00:07:24 - INFO - Checkpoint restored from ../data/checkpoints/documents-2024-11-01-3.json.


Running classification.

In [66]:
# Part 1 - Classify Operative Rules
user_prompt = get_user_prompt_classify(rules_to_classify_p1)
system_prompt = get_system_prompt_classify_p1()

logger.info("P1. Classifing Operative Rules...")
logger.debug(system_prompt)
logger.debug(user_prompt)

response_classify_p1 = query_instruct_llm(
    system_prompt=system_prompt,
    user_prompt=user_prompt,
    document_model=List[StatementClassification],
    llm_model=config["LLM"]["MODEL"],
    temperature=config["LLM"]["TEMPERATURE"],
    max_tokens=config["LLM"]["MAX_TOKENS"],
)

logger.debug(response_classify_p1)

doc_1 = Document(id="classify_P1", type="llm_response_classification", content=response_classify_p1)
manager.add_document(doc_1)

logger.info("Saving checkpoint...")
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

logger.info("Finished processing classification.")

2024-11-07 00:07:31 - INFO - P1. Classifing Operative Rules...
2024-11-07 00:07:51 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-11-07 00:08:16 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-11-07 00:08:16 - INFO - Execution time for query_instruct_llm: 39.17 seconds
2024-11-07 00:08:16 - INFO - Saving checkpoint...
2024-11-07 00:08:16 - INFO - DocumentManager state persisted to file: ../data/checkpoints/documents-2024-11-01-3.json
2024-11-07 00:08:16 - INFO - Checkpoint saved.
2024-11-07 00:08:16 - INFO - Finished processing classification.


##### Processing P2 - Classify and get templates

User prompt (Data)

System and user prompt data

Create a prompt for each of the type of rule (Activity, Data, Party) from P1 with the statements for that type.

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

processor = DocumentProcessor(manager)

rules_to_classify_p2 = [
    {
        "doc_id": item["doc_id"],
        "statement_id": item["statement_id"],
        "statement_source": item["source"],
        "statement_text": item["statement"],
        "statement_type": item["type"]
    }
    for item in processor.get_rules()
]

logger.info(f"Total rules to classify: {len(rules_to_classify_p2)}")

2024-11-07 00:09:14 - INFO - DocumentManager restored from file: ../data/checkpoints/documents-2024-11-01-3.json
2024-11-07 00:09:14 - INFO - Checkpoint restored from ../data/checkpoints/documents-2024-11-01-3.json.


Create a prompt for each of the type of rule (Activity, Data, Party) from P1 with the statements for that type.

In [None]:
user_prompt_classify_p2_op_rules, system_prompt_classify_p2_op_rules = (
    classify_and_add_prompts(
        "operative rule", "Operative", rules_to_classify_p2, manager
    )
)

logger.info(f"Prompts to run {len(user_prompt_classify_p2_op_rules)}")

# Persist the state to a file
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

2024-11-07 00:09:48 - INFO - Processing batch 1 with 6 items
2024-11-07 00:09:49 - INFO - Batch 1 - token count system prompt Party: 3198
2024-11-07 00:09:49 - INFO - Batch 1 - token count user prompt Party: 340
2024-11-07 00:09:49 - INFO - Batch 1 - token count system prompt Activity: 5339
2024-11-07 00:09:49 - INFO - Batch 1 - token count user prompt Activity: 272
2024-11-07 00:09:49 - INFO - DocumentManager state persisted to file: ../data/checkpoints/documents-2024-11-01-3.json
2024-11-07 00:09:49 - INFO - Checkpoint saved.


P2. User Prompt to classify Operative Rules and get templates

Processing P2 - Subclassify Operative Rules

Running classification. For each type of rule get response from the LLM.

In [None]:
all_responses_classify_p2_op_rules = classify_and_get_templates(
    "rules",
    "P2_Operative",
    user_prompt_classify_p2_op_rules,
    system_prompt_classify_p2_op_rules,
    manager,
)

logger.info(f"Total responses: {len(all_responses_classify_p2_op_rules)}")

# Persist the state to a file
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

2024-11-07 00:10:06 - INFO - P2_Operative. processing 2 prompts...
2024-11-07 00:10:06 - INFO - P2_Operative. Classifying and getting templates for rules...
2024-11-07 00:10:06 - INFO - Processing classification and templates prompt 1 ...
2024-11-07 00:10:17 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-11-07 00:10:17 - INFO - Execution time for query_instruct_llm: 10.47 seconds
2024-11-07 00:10:17 - INFO - Finished processing classification and templates prompt 1.
2024-11-07 00:10:17 - INFO - Processing classification and templates prompt 2 ...
2024-11-07 00:10:23 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-11-07 00:10:23 - INFO - Execution time for query_instruct_llm: 6.53 seconds
2024-11-07 00:10:23 - INFO - Finished processing classification and templates prompt 2.
2024-11-07 00:10:23 - INFO - DocumentManager state persisted to file: ../data/checkpoints/documents-2024-11-01-3.json
2024

##### Investigating miss classification

Prompt P2

In [None]:
doc_first_attempt_p1 = {
    "doc_id": "\u00a7 275.0-2",
    "statement_id": "4",
    "statement_text": "If the Secretary certifies that the Commission was served with process, pleadings, or other papers pursuant to paragraph (a)(1) of this section and forwarded these documents to a named party pursuant to paragraph (a)(2) of this section, this certification constitutes evidence of service upon that party.",
    "statement_source": "(a)(3)",
    "classification": [
        {
            "type": "Data",
            "confidence": 0.6,
            "explanation": "The statement involves certification, which is a form of data that serves as evidence of service."
        },
        {
            "type": "Activity",
            "confidence": 0.5,
            "explanation": "The statement describes the activity of certifying and forwarding documents, which are part of the process."
        }
    ]
}

In [None]:
doc_first_attempt_p2 = {
    "doc_id": "\u00a7 275.0-2",
    "statement_id": "4",
    "statement_text": "If the Secretary certifies that the Commission was served with process, pleadings, or other papers pursuant to paragraph (a)(1) of this section and forwarded these documents to a named party pursuant to paragraph (a)(2) of this section, this certification constitutes evidence of service upon that party.",
    "statement_source": "(a)(3)",
    "classification": [
        {
            "subtype": "Data content rules",
            "templates_ids": [],
            "confidence": 0.6,
            "explanation": "The statement describes a condition under which a certification constitutes evidence of service. It involves the presence of certain documents and their forwarding, which relates to data content. However, it does not fit neatly into a specific template, leading to a moderate confidence level."
        }
    ]
}

In [None]:
doc_second_attempt_p1 = {
    "doc_id": "\u00a7 275.0-2",
    "statement_id": "4",
    "statement_text": "If the Secretary certifies that the Commission was served with process, pleadings, or other papers pursuant to paragraph (a)(1) of this section and forwarded these documents to a named party pursuant to paragraph (a)(2) of this section, this certification constitutes evidence of service upon that party.",
    "statement_source": "(a)(3)",
    "classification": [
        {
            "type": "Party",
            "confidence": 0.7,
            "explanation": "The statement involves the certification of service to a named party, indicating a role for the Secretary and the parties involved. This aligns with Party rules as it specifies actions related to parties.",
        },
        {
            "type": "Activity",
            "confidence": 0.6,
            "explanation": "The statement describes the certification process as part of a business activity, which aligns with Activity rules as it governs the operation of certifying service.",
        },
    ],
}

In [None]:
doc_second_attempt_p2 = {
    "doc_id": "\u00a7 275.0-2",
    "statement_id": "4",
    "statement_text": "If the Secretary certifies that the Commission was served with process, pleadings, or other papers pursuant to paragraph (a)(1) of this section and forwarded these documents to a named party pursuant to paragraph (a)(2) of this section, this certification constitutes evidence of service upon that party.",
    "statement_source": "(a)(3)",
    "classification": [
        {
            "subtype": "Responsibility rules",
            "templates_ids": ["T56"],
            "confidence": 0.7,
            "explanation": "This statement involves the certification process by the Secretary, which serves as evidence of service. It defines a responsibility related to the certification of document forwarding, fitting the responsibility rule subtype.",
        }
    ],
}

In [80]:
print(manager.retrieve_document("prompt-classify_P2_operative_rule_batch1_Party", "prompt").content)


You are an expert in SBVR (Semantics of Business Vocabulary and Business Rules).

You are working for regulatory bodies, auditors, or process managers.
    
Classify 6 operative rule(s) from the list of elements into one or more Operative Rule subtypes, provide an explanation for each classification, and assign a confidence score between 0 and 1.

Use the Operative Rule subtype definitions, templates, and guidelines provided to perform a thorough analysis of each statement.

# Steps

1. **Identify Operative Rule Subtype(s)**: For each `statement_text` provided, classify the rule type according to the given Operative Rules subtypes and templates.
    - Use provided templates, definitions, and details on each subtype to determine the correct classification.
    - Cross-reference with template examples to accurately determine the appropriate rule subtype.

2. **Assign Confidence Level**: Assess the suitability of each classification by assigning a confidence level between 0 and 1. 
    -

**Conclusion**

The classification of the given statement initially had a lower confidence level because it could be interpreted from two distinct perspectives: a **party rule** or a **data consistency rule**. A **party rule** focuses on procedural governance and accountability, emphasizing the roles, responsibilities, and authority of individuals—in this case, the Secretary, whose certification serves as evidence of service. This perspective is crucial when considering formal roles, defining valid actions, and ensuring compliance, which makes it more fitting for scenarios involving regulatory oversight, accountability, and process management.

The **data consistency rule** perspective, on the other hand, considers the relationship between different data elements—here, the certification and document forwarding—and aims to ensure logical consistency between these elements. This classification fits more technical or data-centric contexts, such as ensuring the integrity of records. However, with the prompt specifying a focus on regulatory bodies, auditors, or process managers, the interpretation naturally shifts toward the **party rule** classification. Consequently, the confidence in this classification increases, as it directly aligns with the needs of those focused on governance, responsibility, and ensuring procedural correctness.

According to Opsahl-Ong et al. (2024) small changes in the prompt affect the outcome because language models are highly sensitive to the structure and context of the instructions. Minimal alterations in word choice, sentence order, or tone can lead the model to interpret the task differently, as it relies on statistical correlations and previously observed patterns. In multi-stage pipelines, this is even more critical, as a change in one module can create a cascading effect, impacting the results of subsequent stages. This sensitivity makes precise prompt adjustments essential to achieve the desired performance in specific tasks.

#### Definitional rules classification

##### Terms

Selecting terms to classify. Terms without definition will be skipped.

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

processor = DocumentProcessor(manager)

terms_to_classify_p2 = [
    {
        "doc_id": item["doc_id"],
        "statement_id": item["statement_id"],
        "statement_source": item["source"],
        "statement_text": item["definition"],
        "statement_type": "Definitional"
    }
    for item in processor.get_terms(definition_filter="non_null")
]

logger.info(f"Number of terms to classify: {len(terms_to_classify_p2)}")

2024-11-06 23:13:02 - INFO - DocumentManager restored from file: ../data/checkpoints/documents-2024-11-01-3.json
2024-11-06 23:13:02 - INFO - Checkpoint restored from ../data/checkpoints/documents-2024-11-01-3.json.


Preparing prompts.

In [30]:
user_prompt_classify_p2_terms, system_prompt_classify_p2_terms = (
    classify_and_add_prompts("term", "Definitional", terms_to_classify_p2, manager)
)

2024-11-06 23:13:08 - INFO - Processing batch 1 with 15 items
2024-11-06 23:13:09 - INFO - Batch 1 - token count system prompt Definitional: 6008
2024-11-06 23:13:09 - INFO - Batch 1 - token count user prompt Definitional: 1143
2024-11-06 23:13:09 - INFO - Processing batch 2 with 15 items
2024-11-06 23:13:09 - INFO - Batch 2 - token count system prompt Definitional: 6008
2024-11-06 23:13:09 - INFO - Batch 2 - token count user prompt Definitional: 1119
2024-11-06 23:13:09 - INFO - Processing batch 3 with 15 items
2024-11-06 23:13:09 - INFO - Batch 3 - token count system prompt Definitional: 6008
2024-11-06 23:13:09 - INFO - Batch 3 - token count user prompt Definitional: 1233
2024-11-06 23:13:09 - INFO - Processing batch 4 with 15 items
2024-11-06 23:13:09 - INFO - Batch 4 - token count system prompt Definitional: 6008
2024-11-06 23:13:09 - INFO - Batch 4 - token count user prompt Definitional: 1158
2024-11-06 23:13:09 - INFO - Processing batch 5 with 15 items
2024-11-06 23:13:09 - INFO

Running classification.

In [None]:
all_responses_classify_terms = classify_and_get_templates(
    "terms",
    "P2_Definitional",
    user_prompt_classify_p2_terms,
    system_prompt_classify_p2_terms,
    manager,
)

logger.info(f"Total responses: {len(all_responses_classify_terms)}")

# Persist the state to a file
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

2024-11-06 23:13:50 - INFO - P2_Definitional. processing 7 prompts...
2024-11-06 23:13:50 - INFO - P2_Definitional. Classifying and getting templates for rules...
2024-11-06 23:13:50 - INFO - Processing classification and templates prompt 1 ...
2024-11-06 23:14:19 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-11-06 23:14:19 - INFO - Execution time for query_instruct_llm: 25.93 seconds
2024-11-06 23:14:19 - INFO - Finished processing classification and templates prompt 1.
2024-11-06 23:14:19 - INFO - Processing classification and templates prompt 2 ...
2024-11-06 23:14:44 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-11-06 23:14:44 - INFO - Execution time for query_instruct_llm: 21.94 seconds
2024-11-06 23:14:44 - INFO - Finished processing classification and templates prompt 2.
2024-11-06 23:14:44 - INFO - Processing classification and templates prompt 3 ...
2024-11-06 23:15:07 - INFO - HTTP

Avarage 45s per prompt.

##### Names

Selection names to classify. Names without definition will be skipped.

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

processor = DocumentProcessor(manager)

names_to_classify_p2 = [
    {
        "doc_id": item["doc_id"],
        "statement_id": item["statement_id"],
        "statement_source": item["source"],
        "statement_text": item["definition"],
        "statement_type": "Definitional"
    }
    for item in processor.get_names(definition_filter="non_null")
]

logger.info(f"Number of names to classify: {len(names_to_classify_p2)}")

2024-11-06 23:16:29 - INFO - DocumentManager restored from file: ../data/checkpoints/documents-2024-11-01-3.json
2024-11-06 23:16:29 - INFO - Checkpoint restored from ../data/checkpoints/documents-2024-11-01-3.json.


In [None]:
user_prompt_classify_p2_names, system_prompt_classify_p2_names = (
    classify_and_add_prompts("name", "Definitional", names_to_classify_p2, manager)
)

logger.info(f"Prompts to run {len(user_prompt_classify_p2_names)}")

2024-11-06 23:16:29 - INFO - Processing batch 1 with 15 items
2024-11-06 23:16:29 - INFO - Batch 1 - token count system prompt Definitional: 6008
2024-11-06 23:16:29 - INFO - Batch 1 - token count user prompt Definitional: 1147


Running classification.

In [None]:
all_responses_classify_p2_names = classify_and_get_templates(
    "names",
    "P2_Definitional",
    user_prompt_classify_p2_names,
    system_prompt_classify_p2_names,
    manager,
)

logger.info(f"Total responses: {len(all_responses_classify_p2_names)}")

# Save the checkpoint after adding the combined document
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

2024-11-06 23:16:53 - INFO - P2_Definitional. processing 1 prompts...
2024-11-06 23:16:53 - INFO - P2_Definitional. Classifying and getting templates for rules...
2024-11-06 23:16:53 - INFO - Processing classification and templates prompt 1 ...
2024-11-06 23:17:21 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-11-06 23:17:21 - INFO - Execution time for query_instruct_llm: 27.25 seconds
2024-11-06 23:17:21 - INFO - Finished processing classification and templates prompt 1.
2024-11-06 23:17:21 - INFO - DocumentManager state persisted to file: ../data/checkpoints/documents-2024-11-01-3.json
2024-11-06 23:17:21 - INFO - Checkpoint saved.


Average time 30s per prompt.

##### Fact / Fact Types

Selecting facts to classify.

In [None]:
manager = restore_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"])

processor = DocumentProcessor(manager)

facts_to_classify_p2 = [
    {
        "doc_id": item["doc_id"],
        "statement_id": item["statement_id"],
        "statement_source": item["source"],
        "statement_text": item["statement"],
        "statement_type": "Definitional"
    }
    for item in processor.get_facts()
]

logger.info(f"Number of facts to classify: {len(facts_to_classify_p2)}")

2024-11-06 23:17:21 - INFO - DocumentManager restored from file: ../data/checkpoints/documents-2024-11-01-3.json
2024-11-06 23:17:21 - INFO - Checkpoint restored from ../data/checkpoints/documents-2024-11-01-3.json.


Preparing prompts.

In [None]:
user_prompt_classify_p2_facts, system_prompt_classify_p2_facts = (
    classify_and_add_prompts("fact type", "Definitional", facts_to_classify_p2, manager)
)

logger.info(f"Prompts to run {len(user_prompt_classify_p2_facts)}")

2024-11-06 23:17:21 - INFO - Processing batch 1 with 15 items
2024-11-06 23:17:21 - INFO - Batch 1 - token count system prompt Definitional: 6009
2024-11-06 23:17:21 - INFO - Batch 1 - token count user prompt Definitional: 1354
2024-11-06 23:17:21 - INFO - Processing batch 2 with 2 items
2024-11-06 23:17:21 - INFO - Batch 2 - token count system prompt Definitional: 6009
2024-11-06 23:17:21 - INFO - Batch 2 - token count user prompt Definitional: 179


Running classification.

In [None]:
all_responses_classify_p2_facts = classify_and_get_templates(
    "facts", "P2_Definitional", user_prompt_classify_p2_facts, system_prompt_classify_p2_facts, manager
)

logger.info(f"Total responses: {len(all_responses_classify_p2_facts)}")

# Save the checkpoint after adding the combined document
save_checkpoint(filename=config["DEFAULT_CHECKPOINT_FILE"], manager=manager)

2024-11-06 23:17:21 - INFO - P2_Definitional. processing 2 prompts...
2024-11-06 23:17:21 - INFO - P2_Definitional. Classifying and getting templates for rules...
2024-11-06 23:17:21 - INFO - Processing classification and templates prompt 1 ...
2024-11-06 23:17:58 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-11-06 23:18:31 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-11-06 23:18:31 - INFO - Execution time for query_instruct_llm: 63.65 seconds
2024-11-06 23:18:31 - INFO - Finished processing classification and templates prompt 1.
2024-11-06 23:18:31 - INFO - Processing classification and templates prompt 2 ...
2024-11-06 23:18:42 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-11-06 23:18:42 - INFO - Execution time for query_instruct_llm: 8.68 seconds
2024-11-06 23:18:42 - INFO - Finished processing classification and templates prompt 2.
2024-11

Avarage 50s per prompt.

### Discussion

In the first prompt (prompt_classify_p1), the confidence levels assigned reflect the absence of a dominant classification. For example, when a statement primarily restricts data (data rule) but also includes specific roles (party rule), a high level of confidence is assigned to the data while a moderate level of confidence is applied to the party, recognizing its secondary relevance. Similarly, statements that reference roles such as "Secretary" or "interested person" without explicit party restrictions receive moderate confidence for party classification due to ambiguity in interpretation. Procedural elements that impact data handling, such as document routing, receive high confidence for data rules; however, a moderate level of confidence is assigned to activity rules when procedural references are indirect.

An example of this behavior was the classification of the sentence: “If the Secretary certifies that the Commission has received proceedings, petitions, or other documents under paragraph (a)(1) of this section and forwarded those documents to a party designated by paragraph (a)(2) of this section, such certification constitutes evidence of service to that party.” The first classification (classify_P1) had low confidence because the sentence admits two interpretations: party rule or data consistency rule. The role rule focuses on governance and accountability, highlighting the roles of individuals, such as the Secretary, whose certification serves as evidence of service. The data consistency perspective considers the relationship between data elements, evolving the logical consistency between them, such as certification and forwarding of documents.

Initially, the prompt did not specify the focus, which led to the choice of the technical perspective (data) and caused problems in the second classification, where the LLM could not find an appropriate category (sublevel) in the taxonomy. With the addition of the sentence “You are working for regulators, auditors, or process managers.” interpretation has shifted to part rule classification, increasing confidence, consistency in classification, and locating a classification in the taxonomy sublevel (classify_P2_operative_rule) - Responsibility rules.