### Legal Brief Argument - Counter Argument Linking

Objective
This project aims to analyze legal briefs by linking argument sections between "moving briefs"
and "response briefs" filed in court cases. The focus is on identifying and matching arguments
from the initial brief with corresponding counter-arguments in the response. Automating this
analysis will help aiding in legal research and argument drafting.
Brief documents are filed by plaintiff / defendant attorneys to argue their stance in support /
opposition of a motion. These briefs are filed in a back-and-forth fashion until the judge passes
an order on the motion. Briefs contain various section types: introduction, standard of review,
factual summary, argument sections, conclusion, etc. Arguments form the crux of a brief.
You are presented with a list of argument sections extracted from a list of brief pairs. The first
brief in every brief pair is called the “moving brief” and the second brief in the pair is called the
“response brief”
, since it is filed in response to the moving brief. Your task is to link argument
sections from the moving brief to those in the response brief, thereby getting the set of counter
arguments (from the response brief) for every argument in the moving brief. Every argument in
the moving brief can have multiple counter arguments from the response brief.

In [1]:
# import libraries

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
import textwrap
from numpy import dot
from numpy.linalg import norm

In [2]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_TEXT_EMBEDDING= 'text-embedding-3-large'
BRIEF_PAIRS_FILE_PATH = "stanford_hackathon_brief_pairs.json"

In [3]:
# load open ai

load_dotenv(override=True)
api_key = os.getenv("OPENAI_API_KEY")

if not api_key:
  print("No API key was found")
else:
  print("API key found")

API key found


In [4]:
# load file

with open(BRIEF_PAIRS_FILE_PATH, "r", encoding="utf-8") as f:
    brief_pairs = json.load(f)

print(f"Loaded {len(brief_pairs)} brief pairs.")

Loaded 10 brief pairs.


In [5]:
def print_argument_sections(brief_name, arguments):
    print(f"\n📄 {brief_name.upper()} BRIEF")
    print("=" * (len(brief_name) + 9))
    for i, arg in enumerate(arguments):
        print(f"\n--- Argument {i+1}: {arg['heading']} ---")
        wrapped_text = textwrap.fill(arg["content"], width=100)
        print(wrapped_text)

data = brief_pairs[0]

print_argument_sections("Moving", data["moving_brief"]["brief_arguments"])
print_argument_sections("Response", data["response_brief"]["brief_arguments"])


📄 MOVING BRIEF

--- Argument 1: BACKGROUND ---
Plaintiffs own properties located in Franklin County, Virginia in the area around Boones Mill. MVP
holds an easement to construct and maintain a natural gas pipeline across each of the properties.
Recently, MVP began construction of the Mountain Valley Pipeline on Plaintiffs' properties. The
Mountain Valley Pipeline is an approximately 300-mile-long pipeline running from Wetzel County, West
Virginia to Pittsylvania County, Virginia. Construction of the pipeline is conditioned on, among
other things, a permit from the Virginia State Water Control Board ("SWCB") pursuant to Section 401
of the Federal Clean Water Act. Under the Clean Water Act the SWCB must certify that construction
and operation of the pipeline would not violate water quality standards. Moreover, the SWCB's permit
made compliance with Virginia's erosion and sediment control statue and regulations a condition of
the permit. The first phase of construction of the pipeline req

In [6]:
client = OpenAI()

In [None]:
# TEST CONNECT TO OPEN AI

response = client.chat.completions.create(
    model="o1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a fun fact about space."}
    ]
)

print(response.choices[0].message.content)

A fun fact about space is that on Venus, a single day (the time it takes the planet to complete one rotation) is actually longer than a year (the time it takes to orbit the Sun). This happens because Venus rotates very slowly in the opposite direction compared to most other planets!


In [8]:
# create embeddings

def get_embedding(text, cilent, model=MODEL_TEXT_EMBEDDING):
    response = client.embeddings.create(
        model=model,
        input=[text]
    )
    return response.data[0].embedding


In [10]:
from tqdm import tqdm

def embed_text(text):
    return client.embeddings.create(
        input=text,
        model="text-embedding-ada-002"
    ).data[0].embedding

moving_embeddings = []
response_embeddings = []

for pair in tqdm(brief_pairs, desc="Embedding brief pairs"):
    moving_args = []
    response_args = []

    for arg in pair["moving_brief"]["brief_arguments"]:
        full_text = f"{arg['heading']}\n\n{arg['content']}"
        embedding = embed_text(full_text)
        moving_args.append({
            "heading": arg["heading"],
            "content": arg["content"],
            "embedding": embedding
        })

    for arg in pair["response_brief"]["brief_arguments"]:
        full_text = f"{arg['heading']}\n\n{arg['content']}"
        embedding = embed_text(full_text)
        response_args.append({
            "heading": arg["heading"],
            "content": arg["content"],
            "embedding": embedding
        })

    moving_embeddings.append(moving_args)
    response_embeddings.append(response_args)

Embedding brief pairs:   0%|          | 0/10 [00:00<?, ?it/s]

Embedding brief pairs: 100%|██████████| 10/10 [01:23<00:00,  8.31s/it]


In [11]:
with open("moving_embeddings.json", "w") as f:
    json.dump(moving_embeddings, f)

with open("response_embeddings.json", "w") as f:
    json.dump(response_embeddings, f)

In [12]:
# cosine similarity function

def cosine_similarity(a, b):
    return dot(a, b) / (norm(a) * norm(b))


In [13]:
# Store similarity scores as (moving_heading, response_heading, score)
all_similarity_links = []

for i in range(len(brief_pairs)):
    moving_args = moving_embeddings[i]  # list of {"heading": ..., "content": ..., "embedding": ...}
    response_args = response_embeddings[i]
    brief_links = []

    for m in moving_args:
        m_heading = m["heading"]
        m_content = m["content"]
        m_embedding = m["embedding"]

        for r in response_args:
            r_heading = r["heading"]
            r_content = r["content"]
            r_embedding = r["embedding"]

            score = cosine_similarity(m_embedding, r_embedding)

            # Store full metadata for clarity
            brief_links.append({
                "moving_heading": m_heading,
                "moving_excerpt": m_content[:200],  # Optional: for inspection
                "response_heading": r_heading,
                "response_excerpt": r_content[:200],  # Optional: for inspection
                "similarity": round(float(score), 4),
            })

    all_similarity_links.append(brief_links)

In [19]:
# Store similarity scores as (moving_heading, moving_content, response_heading, response_content, score)
all_similarity_links = []

from tqdm import tqdm

for i in tqdm(range(len(brief_pairs)), desc="Scoring similarity"):
    moving_args = moving_embeddings[i]
    response_args = response_embeddings[i]
    brief_links = []

    for m in moving_args:
        m_heading = m["heading"]
        m_content = m["content"]
        m_embedding = m["embedding"]

        for r in response_args:
            r_heading = r["heading"]
            r_content = r["content"]
            r_embedding = r["embedding"]

            score = cosine_similarity(m_embedding, r_embedding)
            brief_links.append((m_heading, m_content, r_heading, r_content, float(score)))  # 👈 ensure float

    all_similarity_links.append(brief_links)

Scoring similarity: 100%|██████████| 10/10 [00:00<00:00, 164.48it/s]


In [14]:
print(all_similarity_links[0])

[{'moving_heading': 'BACKGROUND', 'moving_excerpt': 'Plaintiffs own properties located in Franklin County, Virginia in the area around Boones Mill. MVP holds an easement to construct and maintain a natural gas pipeline across each of the properties. Rec', 'response_heading': '1. Plaintiffs Are Not Entitled to Injunctive Relief in an Inverse Condemnation Case.', 'response_excerpt': '"The Fifth Amendment does not proscribe the taking of property; it proscribes taking without just compensation."  Williamson Cty. Reg\'l Planning Comm\'n v. Hamilton Bank, 473 U.S. 172, 194 (1985).  Und', 'similarity': 0.8024}, {'moving_heading': 'BACKGROUND', 'moving_excerpt': 'Plaintiffs own properties located in Franklin County, Virginia in the area around Boones Mill. MVP holds an easement to construct and maintain a natural gas pipeline across each of the properties. Rec', 'response_heading': '2. Plaintiffs Are Not Likely to Succeed on Their State Claims', 'response_excerpt': 'Under the modified common-

In [20]:
top_links_per_pair = []

for i, brief_links in enumerate(all_similarity_links):
    moving_brief_id = brief_pairs[i]["moving_brief"]["brief_id"]
    response_brief_id = brief_pairs[i]["response_brief"]["brief_id"]
    grouped = {}

    for m_heading, m_content, r_heading, r_content, score in brief_links:
        key = (m_heading, m_content)
        grouped.setdefault(key, []).append((r_heading, r_content, score))

    top_links = []
    for (m_heading, m_content), response_scores in grouped.items():
        # Grab the best scoring response
        r_heading, r_content, score = max(response_scores, key=lambda x: x[2])
        top_links.append({
            "moving_heading": m_heading,
            "moving_content": m_content,
            "response_heading": r_heading,
            "response_content": r_content,
            "score": score
        })

    top_links_per_pair.append({
        "moving_brief_id": moving_brief_id,
        "response_brief_id": response_brief_id,
        "top_links": top_links
    })

In [21]:
import json

with open("top_links.json", "w") as f:
    json.dump(top_links_per_pair, f, indent=2)