# Test breakdown generator

In [1]:
PROMPT_JSON_EXTRACTION = """
# Prompt: Transform Legal Text into structured JSON representation

## Objective

Convert the provided legal text into structured JSON representation structure following the step by step below.

## Step by Step

1. Break the paragraph down into **individual sentences** and convert them to the **singular tense**.
    - Every subject, verb and complement must be converted into *singular tense**.

2. For each sentence do the following relevant information:
    - Subject: Entity (vehicle, person, etc.) that performs the action described in the conclusion.
    - Conclusion: The action that is present in the main sentence.
    - Deontic Modality: Optional. The importance of the action indicated in the conclusion. Either, should/should not or must/must not, can/can not.
    - Conditions: Things that have to be true, in order to the conclusion to be applicable.
    - Syntactic structure: Sub-divide both conclusion and conditions in: subject, deontic modality (for conclusion only), action (verb), and complement (place, object, etc.). If conclusion or conditions are **None**, syntactic structure must be empty.

3. Present the JSON with the extracted elements for each sentence.

The JSON must have the following structure.
- Every sentence will become a JSON object.
- Every subject and verb and complement must be converted into *singular tense**
- Each JSON object will have the "source_text" and the "syntactic_structure".
- The "source_text" is the original sentence.
- "syntactic_structure" contains the structure syntatic information for both conclusions and conditions, if any. That is, "subject", "deontic", "verb" and "complement".
- If the subject is "you" replace by "vehicle".
- If the verb relates to intent, such as "wants", "will", "intends", replace it with "intends".
- If there is no explicit deontic operator, assume "can" if applicable.
- Convert everything to **singular tense**.

Example:
```json
[
    {
        "source_text": "[Original sentence]",
        "syntactic_structure": {
            "conclusion": {
                "subject": "[Subject, if any, or null]",
                "deontic": "[Deontic operator, if any, or null]",
                "verb": "[Verb, if any, or null]",
                "object": "[Object, if any, or null]",
            },
            "conditions": [
                {
                    "subject": "[Subject, if any, or null]",
                    "verb": "[Verb, if any, or null]",
                    "object": "[Object, if any, or null]",
                }
                [More conditions if any...]
            ]
        }
    },
]
```


## How to find conclusions and conditions

### Conclusions

Follow bellow some details on how to extract the conclusions.

- *Predicate name* is the deontic modality if it exists (must, should, etc.). If not assume "can".
- Arguments are the subject, action and complements (if they exist). A complement can be a location, a time, a characteristic.

### Conditions

Follow bellow some details on how to extract the conditions.

- Find the subject, the verb, and the complement.
- Use the verb as the predicate name.
- The arguments are the subject and complements.
- USE **singular** to write subject, verb, and complement.


## Few Shot examples

### Example 1:

**Text:** You MUST stop behind the line at a junction with a 'Stop' sign and a solid white line across the road. Wait for a safe gap in the traffic before you move off.

**## Break the paragraph down into **individual sentences** and convert them to the singular tense**

1. You MUST stop behind the line at a junction with a 'Stop' sign and a solid white line across the road.
2. Wait for a safe gap in the traffic before you move off.

**Step 2: For each phrase, extract the relevant information**

1. You MUST stop behind the line at a junction with a 'Stop' sign and a solid white line across the road.

- Subject: "vehicle"
- Conclusion: "stop behind the line at a junction"
- Deontic modality: "must"
- Conditions:
    * Singular: "the junction has a 'Stop' sign"
    * Singular: "the junction has a solid white line across the road"
- Syntactic structure (Singular tense):
    Conclusion:
        * Subject: "vehicle", Deontic: "must", Verb: "stop", Complement: "behind the line at a junction"
    Conditions:
        * Subject: "junction", Verb: "has", Complement: "a 'Stop' sign"
        * Subject: "junction", Verb: "has", Complement: "a solid white line across the road"


2. Wait for a safe gap in the traffic before you move off.

- Subject: "vehicle"
- Conclusion: "move off"
- Deontic modality: None
- Conditions:
    * Singular: "there is a safe gap in the traffic"

- Syntactic structure (Singular tense):
    Conclusion:
        * Subject: "vehicle", Deontic: "can", Verb: "move off", Complement: "behind the line at a junction"
    Conditions:
        * Subject: "a safe gap", Verb: "is", Complement: "in the traffic"


**Step 3: Present the JSON**

```json
[
    {
        "source_text": "You MUST stop behind the line at a junction with a 'Stop' sign and a solid white line across the road.",
        "syntactic_structure": {
            "conclusion": {
                "subject": "vehicle",
                "deontic": "must",
                "verb": "stop",
                "complement": "behind the line at a junction",
            }
            "conditions":[
                {
                    "subject": "junction",
                    "verb": "has",
                    "complement": "a 'Stop' sign",
                },
                {
                    "subject": "junction",
                    "verb": "has",
                    "complement": "a solid white line across the road",
                },
            ]
        }
    },
    {
        "source_text": "Wait for a safe gap in the traffic before you move off.",
        "syntactic_structure": {
            "conclusion": {
                "subject": "vehicle",
                "deontic": "None",
                "verb": "move off",
                "complement": "",
            }
            "conditions":[
                {
                    "subject": "a safe gap",
                    "verb": "is",
                    "complement": "in the traffic",
                }
            ]
        }
    }
]
```

## Final remarks

- Extract the sentences and re-write them into the **singular tense**
- Replace subject "you" with "vehicle"
- Extract the JSON structure specificed.


""".strip()

PROMPT_GENERATOR = """
# Prompt: Transform Legal Text and JSON structure into Logical English (LE)

## Objective

Convert the provided legal text considering the JSON structure into Logical English (LE) following the step by step below.


- For each JSON object with conclusion and conditions, *identify the templates*. And explain using comment ("%") why you chose the templates.


**Conclusion**

- You should explicitly choose *one* of the following templates to write the conclusion.

```le
the templates are:
*an agent* should *an action*.
*an agent* must *an action*.
*an agent* can *an action*.
*an agent* must *an action* to *an agent*.
*an agent* must *an action* at *a location*.
*an agent* should *an action* to *an agent*.
*an agent* sees *an item*.
*an agent* is at *a place*.
*an agent* is in *a place*.
*an agent* _is *a place*.
*an agent* cannot *an action*.
```

- Only create a new template, if and only if, the existing templates *do not fit*.
- Each template has a syntax and an intent. You must check both before choosing a template.
- Present the conclusion together with the corresponding template. For example:
    * "you must stop behind the line at a junction": *an agent* must *an action* at *a location*.

**Conditions**

Follow bellow some details on how to extract the conditions.

- Find the subject, the verb, and the specification.
- Use the verb as the predicate name.
- The arguments are the subject and specifications.
- You should explicitly choose *one* of the following templates to write the conditions.

```le
the templates are

*a thing* has *a thing*
*a thing* is *a thing*
*a thing* exists in *a thing*
```

- Only create a new template, if and only if, the existing templates *do not fit*.
- For each condition, present it together with the corresponding template.

For example:
"The junction has stop sign": *a thing* has *a property*.

The output for this step should be like:
**Conditions:**
    - "Condition 1": `Associated template`
    - "Condition 2": `Another associated template`
    - ...
**Conclusion:**
    - "Conclusion": `Associated template`

Now let's go back to the step-by-step:

5. *Build the rules* using the identified templates.

- Connect the rules and conditions using the templates and logical connections from Logical English.
- If the subject is "you", replace the agent in the templates with the constant *vehicle*.
- The ouput will be a single `le` code space delimited by triple ticks.
""".strip()

In [2]:
user_prompt_json = """
## Sentence to analyse

"@SOURCE_TEXT"

""".strip()

user_prompt_generator = """
## Sentence to analyse

"@SOURCE_TEXT"

## JSON Object

````json
@JSON_OBJECT
```

## Previous templates

````le
@PREVIOUS_TEMPLATES
```

""".strip()

In [3]:
from pipeline.rule_generation import process_existing_le
import re
import logging
from dotenv import load_dotenv

rules_to_ignore = [
]
load_dotenv()

markdown_dataset = open("../Using the road (159 to 203).md").read().strip()

existing_logical_english = open("../highway_code.le").read().strip()
existing_templates, existing_kb = process_existing_le(existing_logical_english)

existing_le = "\n\n".join([existing_templates, existing_kb])
logging.info(f"-----\n{existing_templates}\n=====\n{existing_kb}")



2025-07-11 15:05:03,101 - INFO - Logging setup complete.
2025-07-11 15:05:03,105 - INFO - -----
the templates are:
*an agent* should *an action*.
*an agent* must *an action*.
*an agent* can *an action*.
*an agent* must *an action* to *an agent*.
*an agent* should *an action* to *an agent*.
*an agent* sees *an item*.
*an agent* is at *a place*.
*an agent* is in *a place*.
*an agent* _is *a place*.
*an agent* cannot *an action*.
=====
the knowledge base highway_code includes:

a vehicle must an action
if the vehicle must the action to an agent.
a vehicle should an action
if the vehicle should the action to an agent.

% Rule 170
% [...] give way to pedestrians crossing or waiting to cross a road into which or from which you are turning.
% If they have started to cross they have priority, so give way (see Rule H2)
a vehicle should give way to a pedestrian
if the vehicle is at a junction
and the pedestrian _is crossing
    or the pedestrian _is waiting to cross.

% Rule 171
% You MUST stop 

In [4]:
from pipeline.rule_generation import parse_markdown_rules

parsed_source_text = parse_markdown_rules(markdown_dataset)
junction_rules = parsed_source_text["Road junctions (rules 170 to 183)"]

In [5]:



from together import Together


def api_call(system_prompt=None, user_prompt="", options=None):
    if options is None:
        options = {}

    # Initialize client
    client = Together()

    # Build message list
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": user_prompt})

    try:
        # Send API call with dynamic options
        response = client.chat.completions.create(
            model=options.get("model", "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free"),
            messages=messages,
            **{k: v for k, v in options.items() if k != "model"}  # Pass all other options
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

import json
import re

def extract_json_from_markdown(markdown_text: str) -> dict | None:
    """
    Parses a JSON object from a Markdown code block.

    Args:
        markdown_text: A string containing the Markdown output from an LLM.

    Returns:
        A dictionary parsed from the first JSON code block found,
        or None if no valid JSON block is found.
    """
    # This regex pattern finds a JSON code block and captures its content.
    # It handles potential leading/trailing whitespaces and newlines.
    pattern = r"```json\s*\n(.*?)\n\s*```"

    # re.DOTALL makes '.' match any character, including newlines.
    match = re.search(pattern, markdown_text, re.DOTALL)

    if match:
        json_string = match.group(1).strip()
        try:
            return json.loads(json_string)
        except json.JSONDecodeError as e:
            logging.error(f"Error decoding JSON: {e}")
            return None

    return None


from typing import List, Dict, Any

def filter_content(data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """
    Filters a list of objects and cleans their 'conditions' list.

    An object is kept if its 'conclusion' is valid. Within each kept object,
    any 'condition' that does not have both a subject and a verb is removed.

    Args:
        data: A list of dictionary objects to be filtered and cleaned.

    Returns:
        A new list containing the valid objects with cleaned condition lists.
    """
    filtered_list = []

    for item in data:
        # Use deepcopy to avoid modifying the original input list
        item_copy = json.loads(json.dumps(item))
        structure = item_copy.get("syntactic_structure")

        if not structure:
            continue

        # 1. Validate the 'conclusion'
        conclusion = structure.get("conclusion")
        if not conclusion:
            continue

        is_conclusion_valid = all([
            conclusion.get("subject") == "vehicle",
            conclusion.get("deontic"),
            conclusion.get("verb"),
            conclusion.get("verb") != "watch out"
        ])

        if not is_conclusion_valid:
            continue

        # 2. Clean the 'conditions' list
        original_conditions = structure.get("conditions", [])
        if original_conditions:
            valid_conditions = []
            for condition in original_conditions:
                # A condition is valid only if both 'subject' and 'verb' are present and not empty
                if condition.get("subject") and condition.get("verb"):
                    valid_conditions.append(condition)

            # Replace the old conditions list with the cleaned one
            structure["conditions"] = valid_conditions

        # 3. If the conclusion was valid, add the cleaned item to the result
        filtered_list.append(item_copy)

    return filtered_list


def extract_json_le(source_text, previous_le):
    global user_prompt_generator, user_prompt_json, PROMPT_JSON_EXTRACTION

    filled_user_prompt_json = str(user_prompt_json)
    filled_user_prompt_json = filled_user_prompt_json.replace("@SOURCE_TEXT", source_text)
    # filled_user_prompt = filled_user_prompt.replace("@PREVIOUS_TEMPLATES", previous_le)

    json_structure = None
    while not json_structure:

        response = api_call(
            system_prompt=PROMPT_JSON_EXTRACTION,
            user_prompt=filled_user_prompt_json,
            options={
                "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free",
                "temperature": 0.6,
                "max_tokens": 6000
            }
        )
        logging.info(f"Response: \n\n====\n{response}\n=======")

        json_structure = extract_json_from_markdown(response)

        valid_json_content = filter_content(json_structure)

        logging.info(f"Validated output: \n\n====\n{json.dumps(valid_json_content, indent=3)}\n=======")



In [6]:
for rule, source_text in junction_rules.items():
    if len(source_text.strip()) == 0 or rule.strip() in rules_to_ignore:
        continue

    logging.info("-" * 50)
    logging.info(f"Processing: {rule}")
    logging.info(f"Source text: \n\"{source_text}\"")
    logging.info("-" * 50)

    content = extract_json_le(
        source_text=source_text,
        previous_le=existing_logical_english
    )

    #input("\n\nPress a key to continue...")

2025-07-11 15:05:03,228 - INFO - --------------------------------------------------
2025-07-11 15:05:03,229 - INFO - Processing: Rule 170
2025-07-11 15:05:03,230 - INFO - Source text: 
"Take extra care at junctions. You should

*   watch out for cyclists, motorcyclists and pedestrians including powered wheelchairs/mobility scooter users as they are not always easy to see. Be aware that they may not have seen or heard you if you are approaching from behind
*   give way to pedestrians crossing or waiting to cross a road into which or from which you are turning. If they have started to cross they have priority, so give way (see [Rule H2](/guidance/the-highway-code/introduction#ruleh2))
*   remain behind cyclists, horse riders, horse drawn vehicles and motorcyclists at junctions even if they are waiting to turn and are positioned close to the kerb
*   watch out for long vehicles which may be turning at a junction ahead; they may have to use the whole width of the road to make the turn (see

AttributeError: 'list' object has no attribute 'get'