# Rule-Based Flagging

Court rulings can have different levels of impact. Some just follow routine procedures. Others raise important questions—like whether someone’s rights were violated, whether key evidence should have been allowed, or whether a judge made a surprising decision.

In this notebook we use simple, transparent rules to flag court rulings that might indicate elevated legal or policy risk — such as disputes over constitutional rights, evidentiary issues, or adverse outcomes.

## Why Rule Based Flagging?

**Transparent and interpretable**: Rule-based labels are easy to understand, explain, and audit.

**Quick to implement**: No need for large training datasets—rules can be written and applied immediately.

**Reliable for known patterns**: Especially effective in domains where specific phrases strongly indicate meaning (e.g., “terminated parental rights” → high risk).

**Great for bootstrapping**: Provides an initial labeled dataset that can help train or evaluate LLMs.

**Valuable in regulated settings**: Ensures consistency and compliance in domains like law, healthcare, or finance.


In [2]:
# --- Set working directory ---
import os
os.chdir("/Users/lidasmac/compliance-nlp/")  # Use only if necessary (e.g., running from a notebook)

# --- Standard Library Imports ---
import pandas as pd
import re

# --- Custom Module Imports ---
from src.rule_based_labeling import (
    label_risk,
    find_matched_phrases
)
# --- Load data ---
df = pd.read_csv("data/party_and_decision_metadata.csv")
labels_df = pd.read_csv("data/labels.csv")

## Step 1: Apply Rule-Based Labels

We will scan each decision text for known trigger phrases using our label dictionary. If a phrase is found, we assign the corresponding risk label. For now, we use a simple matching strategy to check if any phrase appears in the text.

In [3]:
# --- Normalize text columns ---
df["decision_text"] = df["decision_text"].fillna("").str.lower()

# --- Load label dictionary ---
label_dict = dict(zip(labels_df["phrase"].str.lower(), labels_df["label"].str.lower()))


# --- Apply labeling ---
df["rule_based_label"] = df["decision_text"].apply(lambda x: label_risk(x, label_dict))

# --- Preview result ---
df[["decision_text", "rule_based_label"]].head()


Unnamed: 0,decision_text,rule_based_label
0,appeal by the defendant from a judgment of the...,high
1,"in an action, in effect, to recover damages fo...",high
2,"in an action, inter alia, for injunctive relie...",low
3,in related proceedings pursuant to family cour...,low
4,"in an action to foreclose a mortgage, the defe...",low


## Step 2: Apply Rule-Based Phrase Matching

Using the dictionary of labeled phrases, we scan each decision text and record any phrases that appear. This helps us understand why a ruling might be flagged and supports later stages like prioritization, evaluation, and LLM-assisted refinement.


In [4]:
# --- Prepare phrase-label dictionary ---
phrase_to_label = dict(zip(labels_df["phrase"], labels_df["label"]))

# --- Apply to your DataFrame ---
df["flag_phrases"] = df["decision_text"].fillna("").apply(lambda x: find_matched_phrases(x, phrase_to_label))

In [5]:
df.head()

Unnamed: 0,doc_index,party_line,decision_text,rule_based_label,flag_phrases
0,0,"The People of the State of New York, responden...",appeal by the defendant from a judgment of the...,high,[ineffective assistance of counsel]
1,1,"Maria Airene Pantanilla, respondent, v Guiller...","in an action, in effect, to recover damages fo...",high,[default judgment]
2,2,"Franklin Carroll, LLC, appellant, v Carroll De...","in an action, inter alia, for injunctive relie...",low,[]
3,3,"Pamela De Phillips, etc., appellant, v Nicole ...",in related proceedings pursuant to family cour...,low,[]
4,4,"Bank of America, N.A., respondent, v Dale Bent...","in an action to foreclose a mortgage, the defe...",low,[]


## Step 3: Save Labeled Output

Now that we’ve applied our rule-based labeling logic, we save the resulting DataFrame to the `data/clean/` folder. This allows us to reuse the labeled output in future steps like evaluation, semantic expansion, or model-assisted review.

In [8]:
# --- Save the labeled DataFrame ---
output_path = "data/clean/labeled_rule_based.csv"
df.to_csv(output_path, index=False)

print(f"Saved labeled data to {output_path}")

Saved labeled data to data/clean/labeled_rule_based.csv
