# Demo: Contract Clause Classification

This notebook walks through preparing CUAD few-shot examples and running the classifier on a short sample contract. Execute the cells in order inside Google Colab or a local Jupyter environment (after installing the package in editable mode).

> **Offline tip:** before running any cells, upload `category_descriptions.csv` and `cuad_v1.json` into the repo-level `data/` folder (or point `CUAD_CATEGORY_CSV` / `CUAD_QA_JSON` to their paths). These files are required for category metadata and few-shot snippets.
>
> **GPU tip:** when a CUDA device is available (Colab GPU runtime), the setup cell below automatically sets `device_map="cuda"` so the LLM runs on the GPU; otherwise it falls back to `"auto"`. You can override this by changing `device_map` manually.

In [9]:
# Install dependencies
!pip install -q -r requirements.txt
!pip install -q -e .

  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
  Building editable for contract-cuad (pyproject.toml) ... [?25l[?25hdone


In [10]:
# Configure environment and load models
import logging
import os
from pathlib import Path

import nltk
import torch

logging.basicConfig(level=logging.INFO, format="[%(levelname)s] %(message)s", force=True)
nltk.download("punkt")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [11]:
from contract_cuad import classifier, config, categories, cuad_fewshot, llm_client

Path("data").mkdir(exist_ok=True)
local_csv = os.environ.get("CUAD_CATEGORY_CSV")
if not local_csv:
    cached_csv = Path("data") / "category_descriptions.csv"
    if cached_csv.exists():
        os.environ["CUAD_CATEGORY_CSV"] = str(cached_csv)
    else:
        print("Set CUAD_CATEGORY_CSV or copy category_descriptions.csv into ./data to run offline.")

device_map = "cuda" if torch.cuda.is_available() else "auto"
print(f"Using device_map={device_map}")

Using device_map=cuda


In [12]:
cats = categories.load_cuad_categories()
fewshot = cuad_fewshot.build_fewshot_examples(cats, max_examples_per_category=1, max_total_examples=10)
model_cfg = config.ModelConfig(
    model_name="Qwen/Qwen2-7B-Instruct",
    max_new_tokens=512,
    temperature=0.1,
    device_map=device_map,
)
client = llm_client.LLMClient(config=model_cfg)

[INFO] Loaded CUAD categories from data/category_descriptions.csv
[INFO] Parsed 41 CUAD categories
[INFO] Building few-shot examples (max 1 per category, 10 total)
[INFO] Loading CUAD QA JSON from /content/data/cuad_v1.json
[INFO] Loaded 20910 QA records from CUAD JSON
[INFO] Prepared 10 total few-shot examples across 10 categories


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [13]:
sample_contract = """
Either party may terminate this Agreement for convenience upon thirty (30) days’ prior written notice to the other party, without cause and without any further liability, except for payment of fees accrued up to the effective date of termination.

In the event of a Change of Control of Customer, Customer shall provide written notice to Provider within ten (10) days. Provider may, at its sole discretion, terminate this Agreement upon thirty (30) days’ prior written notice following such Change of Control.

Neither this Agreement nor any of the rights or obligations hereunder may be assigned or transferred by Customer, whether by operation of law or otherwise, without the prior written consent of Provider. Any attempted assignment in violation of this Section shall be null and void.

Except for liability arising from gross negligence, willful misconduct, or Customer’s breach of its payment obligations, each party’s aggregate liability under this Agreement shall in no event exceed the total amounts paid by Customer to Provider during the twelve (12) months immediately preceding the event giving rise to the claim.

In no event shall Provider be liable to Customer for any indirect, incidental, consequential, special, punitive, or exemplary damages, including loss of profits or business interruption, even if advised of the possibility of such damages.

During the term of this Agreement and for a period of twelve (12) months thereafter, neither party shall, directly or indirectly, solicit for employment or hire any employee of the other party who was involved in the performance of this Agreement, without the prior written consent of such other party.
"""

results = classifier.classify_contract_text(
    sample_contract,
    cats,
    client,
    fewshot_examples=fewshot,
    max_clause_tokens=300,
    max_clauses=3,
)

results[:3]

[INFO] Split contract into 3 clauses (max_clauses=3)
Classifying clauses: 100%|██████████| 3/3 [00:18<00:00,  6.26s/it]
[INFO] Completed classification for 3 clauses


[{'clause_id': 0,
  'clause_text': 'Either party may terminate this Agreement for convenience upon thirty (30) days’ prior written notice to the other party, without cause and without any further liability, except for payment of fees accrued up to the effective date of termination.',
  'categories': [{'category': 'Category: Termination for Convenience',
    'category_index': 16,
    'confidence': 0.95,
    'reason': 'The clause allows for termination of the agreement without cause, which is a characteristic of a termination for convenience clause.'},
   {'category': 'Category: Notice Period to Terminate Renewal',
    'category_index': 7,
    'confidence': 0.8,
    'reason': 'The clause specifies a 30-day notice period, which is relevant to the notice period required to terminate a renewal.'}],
  'primary_category': 'Category: Termination for Convenience',
  'primary_category_index': 16,
  'primary_confidence': 0.95,
  'raw_output': '{\n  "categories": [\n    {\n      "category": "Categ

In [14]:
for row in results:
    print("TEXT:", row["clause_text"][:120], "...")
    print("PRIMARY:", row["primary_category"], row["primary_confidence"])
    print("TOP:", [(c["category"], c["confidence"]) for c in row["categories"]])
    print("ERROR:", row["error"])
    print("-" * 60)

TEXT: Either party may terminate this Agreement for convenience upon thirty (30) days’ prior written notice to the other party ...
PRIMARY: Category: Termination for Convenience 0.95
TOP: [('Category: Termination for Convenience', 0.95), ('Category: Notice Period to Terminate Renewal', 0.8)]
ERROR: None
------------------------------------------------------------
TEXT: In the event of a Change of Control of Customer, Customer shall provide written notice to Provider within ten (10) days. ...
PRIMARY: Category: Change of Control 0.95
TOP: [('Category: Change of Control', 0.95), ('Category: Termination for Convenience', 0.85), ('Category: Anti-Assignment', 0.75)]
ERROR: None
------------------------------------------------------------
TEXT: Neither this Agreement nor any of the rights or obligations hereunder may be assigned or transferred by Customer, whethe ...
PRIMARY: Category: Anti-Assignment 0.95
TOP: [('Category: Anti-Assignment', 0.95), ('NONE', 0.0)]
ERROR: None
----------------

## Classify a full contract file

Upload or mount a `.txt` contract into the repo-level `data/` folder (e.g. `data/example_agreement.txt`).
The helper below filters out clauses that the model labels as `NONE` so you only see
clauses with an assigned CUAD category plus the model's explanation.

In [15]:
import json
from typing import Optional

import pandas as pd

def classify_contract_file(
    contract_path: str,
    *,
    fewshot_examples: Optional[dict] = None,
    max_clause_tokens: int = 450,
    paragraph_overlap: int = 0,
    max_clauses: Optional[int] = None,
):
    """Run the full pipeline and return a DataFrame with per-clause metadata."""

    text = Path(contract_path).read_text()
    results = classifier.classify_contract_text(
        text,
        cats,
        client,
        fewshot_examples=fewshot_examples or fewshot,
        max_clause_tokens=max_clause_tokens,
        paragraph_overlap=paragraph_overlap,
        max_clauses=max_clauses,
        show_progress=True,
    )
    df = pd.DataFrame(results)
    df["top_categories_json"] = df["categories"].apply(lambda cats: json.dumps(cats, ensure_ascii=False))
    return df


contract_path = Path("data/example_agreement.txt")
contract_df = classify_contract_file(contract_path)
print(f"Total clauses: {len(contract_df)}")
print("Error counts:\n", contract_df["error"].value_counts(dropna=False))

[INFO] Split contract into 98 clauses (max_clauses=None)
Classifying clauses: 100%|██████████| 98/98 [09:11<00:00,  5.62s/it]
[INFO] Completed classification for 98 clauses


Total clauses: 98
Error counts:
 error
None    98
Name: count, dtype: int64


In [16]:
for row in results:
    print("TEXT:", row["clause_text"][:120], "...")
    print("PRIMARY:", row["primary_category"], row["primary_confidence"])
    print("TOP:", [(c["category"], c["confidence"]) for c in row["categories"]])
    print("ERROR:", row["error"])
    print("-" * 60)

TEXT: Either party may terminate this Agreement for convenience upon thirty (30) days’ prior written notice to the other party ...
PRIMARY: Category: Termination for Convenience 0.95
TOP: [('Category: Termination for Convenience', 0.95), ('Category: Notice Period to Terminate Renewal', 0.8)]
ERROR: None
------------------------------------------------------------
TEXT: In the event of a Change of Control of Customer, Customer shall provide written notice to Provider within ten (10) days. ...
PRIMARY: Category: Change of Control 0.95
TOP: [('Category: Change of Control', 0.95), ('Category: Termination for Convenience', 0.85), ('Category: Anti-Assignment', 0.75)]
ERROR: None
------------------------------------------------------------
TEXT: Neither this Agreement nor any of the rights or obligations hereunder may be assigned or transferred by Customer, whethe ...
PRIMARY: Category: Anti-Assignment 0.95
TOP: [('Category: Anti-Assignment', 0.95), ('NONE', 0.0)]
ERROR: None
----------------

TODO: check with their dataset which categories did they find in the dock